Server, heal thyself
April 27, 2001 11:55 AM PDT
By Stephen Shankland
Staff Writer, CNET News
IBM has embarked on a new multibillion-dollar effort called eLiza to build computer systems that can fix themselves while problems are in the early stages.
The effort is an attempt to bring some of the self-healing abilities of living creatures to the brittle world of computers, where component failures can bring down larger systems and ripple across a network to other computers as well.
“Just like the human body, when you sweat, it evaporates and cools you down,” John Patrick, vice president of Internet technologies at IBM, said in an interview about the program. “And when you’re cold, you shiver and that warms you up. When you cut your finger, you bleed and that heals the wound.
“Just like that, we’re intending to invest in a broad range of software that will allow infrastructure to be self-managing and self-healing.”
Analysts see IBM’s effort putting the company at the front of an as-yet unproven market. “IBM’s self-healing systems will definitely put pressure on other manufacturers to follow,” said ARS Market Intelligence analyst Steve Greenberg. “But it is going to be interesting to see when this hits the market, or if it does at all.”
“It’s a long-term project,” added Illuminata analyst Jonathan Eunice. “You’ll see parts of it rolling out this year and next year. But if you’re waiting for full delivery, you’ll be waiting five years, eight years, 10 years. It’s not a product; it’s a vision statement.”
But there are some differences between IBM’s plan and actual biological systems. IBM essentially is patching today’s computing technology, adding another layer on top of a very complicated system rather than employing radically different designs. For example, human brains, in some ways resembling a computer, sometimes can adapt to keep functions such as speech working despite serious damage.
IBM’s Greg Burke will lead the multiyear effort, reporting to Irving Wladawsky-Berger–the man who led IBM’s effort to embrace the Internet six years ago and the Linux operating system two years ago. Wladawsky-Berger will unveil eLiza at an analyst meeting Friday.
The effort will take place at five IBM research labs, the company said. It will consume a quarter of the company’s server research funds.
The effort will consolidate several smaller programs under way within various groups at Big Blue. Hundreds will work on eLiza, Patrick said, spreading changes to all IBM’s server lines, its storage products and software packages such as DB2, WebSphere and Tivoli.
With eLiza, computers would monitor everything from patterns in a power supply’s electricity consumption to how many people are using a Web site, Patrick said. When the behavior of an element of the computing system starts showing the first indications of distress, automatic services would fire up backup systems, order replacement parts or take other measures to ensure that people using the system don’t notice problems.
One element of eLiza will be a project called Project Oceano, a prototype that consists of a bunch of Linux servers that can share jobs among each other, with new servers being added into the mix or removed as necessary. The system can even install operating systems and stored data without human intervention.
Oceano probably will arrive as a product later this year, Eunice said.
“A lot of what you’ll be seeing initially will be the evolution of IBM technology from the zSeries mainframe,” Eunice said. For example, “processor sparing,” in which spare CPUs can take over automatically when one stops working, will spread from mainframes to other server lines, he said.
IBM also has been working for more than a year on a feature called software rejuvenation for Windows servers. Unfortunately, servers using Windows must be restarted periodically because of problems such as memory “leaks”–when computing processes claim memory but don’t return it when done.
Consequently, IBM’s Windows servers can automatically restart themselves periodically, and IBM has been working to make the feature more sophisticated, predicting when restarts are needed so the server is available as much as possible.
Also tying into eLiza is Blue Gene, a coming IBM supercomputer devoted to the task of figuring out how genes “fold” molecules into gigantic biochemical molecules called proteins. Blue Gene will have so many CPUs that the computer will have to be able to assess when they start or stop working and adjust accordingly.
IBM also is working on lower-level technology, Patrick said. The company already has begun selling memory systems that can keep working even when memory chips fail completely.
All these fixes may seem complicated, but IBM thrives on complexity. Much of the revenue of its large services division comes from helping customers handle onerous chores such as adding new computers to older networks or running customers’ systems at IBM for a fee.
There is some risk to having servers managing themselves automatically, where they have enough power to damage themselves as well as heal themselves, Eunice said. But companies with large numbers of servers don’t have the luxury of having enough people to control all servers and computing infrastructure, he said.
“Automatic actions have some risks, but the honest-to-God truth is we have no option. If you have 100 servers, it’s no trouble. When you have 4,000 or 40,000, you have incredible trouble (managing) by human intervention,” Eunice said.
IBM’s goal–shared by competitors such as Sun Microsystems, Hewlett-Packard and EMC–is to reduce the difficulties of administering the large servers at the heart of Web operations and corporate networks. There simply aren’t enough knowledgeable administrators to go around, particularly as people grow accustomed to having guaranteed access to the Internet and more and more operations depend on the Internet, Patrick said.
Although all big computing companies are working on increased reliability, IBM has decades of research and expects to spend about $400 million a year on the problem, Eunice said. “The mainstream competition like Dell, EMC, Microsoft and Intel just doesn’t have the resources to compare here,” he said.
A very small number of computer experts are able to diagnose thorny problems in the most complicated combinations of computing hardware.
“We’re trying to capture that knowledge and automate the process,” Patrick said. “We can see a real crisis ahead as the expectations go up and the transactions go up.”
IBM didn’t base the project’s name on Eliza, a storied pre-PC program that performed psuedo-psychoanalysis on its users, but to another biological system. The name is a reference to IBM’s Deep Blue chess-playing machine, which Wladawsky-Berger said had the intelligence of a lizard–not very smart by some measures, but not bad for a computer.
While the project is ambitious in its scope, IBM has a bigger footprint in the computing industry than any of its competitors.
“I definitely think IBM’s the right company to try to attempt to get this kind of technology,” Greenberg said. “It’s a huge, huge project.”