Recent systemic failures in many different domains such as the BP Deepwater Horizon oil spill (2010) and the subprime market crisis (2008-09) have reminded us, once again, of the fragility of complex systems. Most recent catastrophic accidents are systemic failures. Union Carbide’s Bhopal Gas Tragedy in 1984 in which some 5000 died and about 100,000 were seriously injured by the accidental release of methyl isocynate was a systemic failure. Another important example is the Piper Alpha disaster in 1988 where an offshore oil platform operated by Occidental Petroleum in the North Sea, U.K., exploded killing 167 and resulting in about $2 billion in losses.
The Challenger (1986) and Columbia (2003) space shuttle disasters, Schering Plough inhaler recall (1999), the Northeast electrical power blackout (2003), the spread of SARS (2003), the Johnson & Johnson multi-drug recall (2010) are all examples of systemic failures. Examples of financial systemic failures include the Enron (2001) and WorldCom (2002) collapse, and the Madoff Ponzi scheme (2008). The collapse of the News of the World newspaper organization (2011) is an example of systemic failure from the media domain.
Postmortem investigations of many disasters have shown that systemic failures rarely occur due to a single failure of a component or personnel. Even though the senior management of a company typically tries to spin the blame on some unanticipated equipment failure, operator error, or a rogue trader, that is rarely the case for major disasters. For instance, Union Carbide initially claimed that the Bhopal Gas Tragedy was caused by a disgruntled employee, who had sabotaged the equipment. Enron management initially blamed Andrew Fastow, Enron’s CFO, as the sole culprit. But, again and again, investigations have shown that there are always several layers of failures, ranging from low-level personnel to senior management to regulatory agencies, which have led to major disasters.
Systemic failures typically occur due to fragility in complex systems. Modern technological advances are creating an increasing number of complex engineered systems. The task of designing such systems and the associated control mechanisms that can ensure safe operations of these systems over their life cycles is extremely challenging. Complex systems have a very large number of inter-connected components with non-linear interactions that can lead to “emergent” behavior – i.e. the behavior of the whole is more than the sum of its parts -- that can be difficult to anticipate and control. Moreover, these systems are not isolated – they interact with humans and the physical environment; in particular, human decision making and the associated errors are part of the feedback process in these systems. The cumulative effect of the non-linearity, inter-connectedness, and interactions with humans and the environment makes these systems-of-systems fragile and very susceptible to systemic failures.
Chemical engineers might study the BP Oil Spill report, and finance experts the subprime crisis report, but rarely does one compare failures across the different domains to study their commonalities and differences. Although the failures listed above occurred in very different domains, in different facilities, triggered by different events, involved different materials, there are, however, certain common underlying patterns driving these systemic failures. Understanding these patterns is essential if we are to avoid such disasters in the future.