Skip navigation
June 27, 2013

National Safety Month II

Posted by: Dave Ryman

For our second Public Safety Month post we have a Q & A with Nancy Leveson, author of Engineering a Safer World. Click here for a free download.

What brought you to the field of system safety engineering?

Right after I got my Ph.D. in 1980 and started my first university position, I received a phone call from a large defense/aerospace company asking if I would help them with safety problems they were having on a complex, software-intensive torpedo system. I told them I didn’t know anything about safety engineering. They said they had some money to pay me. I told them I still didn’t know anything about safety but I would try to help them if they could not find anyone else more appropriate. That was the start of this adventure. I was fortunate to be working with some of the finest system safety engineers in the world, and I probably got a lot more out of the experience than they did. One of the things I learned was that traditional safety engineering techniques did not work on the software-intensive systems we were starting to build at the time. I also learned the difference between safety and reliability, which engineers confuse all the time—reliability and safety are not only different system qualities, but sometimes they even conflict. I’ve been working on system safety since that time, both in research and on industrial projects and accident investigations.

In the book, you say “system engineering and system safety engineering techniques have not kept up with the rapid pace of technological innovation.” Why is that?

We are still using the same engineering analysis and design techniques that were devised for safety at least 50 years ago. Engineering has changed a great deal in since that time. Before the advent of computers, electro-mechanical systems were simple enough that engineers could basically exhaustively test them and get out most system design errors before the systems were used. That left component failures as the primary problem in operations. The obvious solution was to make the components highly reliable using redundancy, overdesign (safety margins), and other fault tolerance techniques. Analysis examined the probability of random failure of components and came up with system reliability calculations. These assumptions are no longer true. Computers allow us to build systems that are so complex that exhaustive testing is impossible and design errors persist into operational use.  Accidents today often result from the interaction of system components (including software and human operators) that have not “failed.” In the case of software, almost all software-related accidents have resulted from requirements flaws, not implementation errors. The software satisfies its requirements but the engineers did not adequately consider all possible cases and environmental conditions. The traditional techniques also focus only on the technical system and ignore the social environment in which the systems are operated and thus omit causal factors related to management and safety culture.

What are the problems with old safety engineering techniques when used today?

Traditional safety engineering techniques assume that accidents are caused by component failures and that is therefore what they consider. They do not consider accidents caused by system design errors or the interaction of system components that have not individually failed. In addition, the traditional analysis techniques require a completed design to be effective. In complex systems today, 70-90% of the safety-critical design decisions are made in the early concept development stage. The cost of making changes (“rework”) late in the development process, when hazard analysis is feasible with the old techniques, is enormous and often impossible. The best that can be done late in the process is to add more redundancy, which is very expensive and has no impact on accidents caused by component interactions. In fact, redundancy may even exacerbate the problems when these problems stem from excessive system complexity.

Engineering a Safer World defines a new accident causality model based on systems thinking that treats safety as a control problem rather than a failure problem. This new causality model, called STAMP, includes traditional component failure accidents but also considers the new factors of importance today. The book introduces more powerful, proactive hazard analysis techniques that can be used early in the design and system engineering process to guide the engineers in creating safer designs and integrating system safety into the system engineering process. It also describes more inclusive accident analysis techniques that illuminate more causal factors than the traditional “blame it all on operator error” approach. In additional to analysis, the book describes how to design computer-intensive systems to be safer and to reduce human error. Finally, it includes how to operate systems in a safer way and how to manage the development and operation of safety-critical systems.

How have the safety techniques recommended in your book impacted those that have implemented them?

The book was published in January, 2012. In April of that year, a meeting at MIT was held for users or those interested in learning more. That meeting attracted 250 attendees from 19 countries and from just about every type of safety-critical industry. A year later, another meeting on STAMP and the book was started in Germany sponsored by two German universities (Stuttgart and Braunschweig). At these meetings, people in space, aviation, defense, healthcare, nuclear power, automobiles, medical devices, road safety, home robots, railroads, gas and oil production, and other applications reported successful use of the techniques and better results than their results using traditional safety engineering approaches on their systems. Comparative studies (done mostly in industry), have shown that our new systems approach is not only more effective, but it also appears to be much cheaper in terms of requiring fewer resources. Although I expected the techniques to be more powerful, the lower cost surprised even me. In several of these comparative studies, where accidents already had occurred in the systems (unbeknownst to the analysts), only our new techniques identified them and not the traditional techniques. Although out only a little over a year, the book is already being translated into Chinese and Japanese by publishers in these countries.   

Responses to the blog post