Abstract: With such a forthright question, this would seem like a relatively easy thing to do. But that is by far, not the case. In a basic Google search on ‘definitions of Root Cause Analysis’, we will find as many varying definitions of RCA, as the number of links we open. That is why I personally feel the term ‘RCA’ is useless in the marketplace. In this blog I will discuss the definition I use for RCA, as a career analyst/investigator, and explain why it makes sense to me. You decide if it makes sense in your facility.
RCA is so ill-defined that no matter what people use to solve problems at their facilities (i.e.- troubleshooting, brainstorming, problem-solving or scribbling on a bar napkin)…they will call it ‘RCA’. As a result, in the minds of leadership, all ‘RCA’ approaches are often viewed as equal.
Let me prove my point. I did a quick Google search on RCA and found 3 varying definitions from credible sources
In science and engineering, root cause analysis (RCA) is a method of problem solving used for identifying the root causes of faults or problems. https://en.wikipedia.org/wiki/Root_cause_analysis
Root cause analysis (RCA) is defined as a collective term that describes a wide range of approaches, tools, and techniques used to uncover causes of problems. Some RCA approaches are geared more toward identifying true root causes than others, some are more general problem-solving techniques, and others simply offer support for the core activity of root cause analysis. https://asq.org/quality-resources/root-cause-analysis
Root cause analysis (RCA) is a systematic process for identifying “root causes” of problems or events and an approach for responding to them. RCA is based on the basic idea that effective management requires more than merely “putting out fires” for problems that develop, but finding a way to prevent them. https://des.wa.gov/services/risk-management/about-risk-management/enterprise-risk-management/root-cause-analysis
From these definitions, we see how wide the range of interpretations come - some are very generic, and others are more specific. But even more noteworthy, we see how ‘RCA’ can be legitimately misinterpreted, to the RCA analysts’ advantage.
For instance, in these definitions we hear words like:
All of these terms can be interpreted to be compliant with these definitions.
What if we had an unexpected outage due to a pump failure? Our ‘RCA’ team concluded ‘the’ root cause was a fatigued bearing (physical root or PH). Their corrective action was to replace the bearing using another manufacturer or a different type of bearing. Would that be a compliant ‘RCA’ given these definitions? Perhaps.
What if in the same case, we drill a little deeper and find that the fatigue is due to an original misalignment issue (human root or HR)? Would that be classified as a compliant ‘RCA’ given these definitions? Perhaps.
Drilling a little deeper, we strive to understand why someone would align the way they did (latent root or LR). We find in our evidence-based search
In each of these cases, would they be compliant with the above definitions? Likely so because the definitions are vague enough to be left to interpretation. We have to be realistic that there is no perfect definition, and all will be left to some level of misinterpretation. The best we can do is minimize the risk of misinterpretation, while maintaining the effectiveness of the RCA.
I will propose a different definition for RCA that we use at Reliability Center, Inc. (RCI):
"The establishing of logically complete, evidence-based, tightly coupled chains of factors from the least acceptable consequences to the deepest significant underlying causes."
Even when I look at this definition, it looks too complex and it uses intimidating engineering jargon. So, I simplified it. I broke it down into five simple components for us each to remember:
Let’s look to Figure 1 to dissect this definition. Consider this logic tree as a graphical reconstruction of an undesirable outcome of some kind. (Full disclosure, it is just an abbreviated view for example’s sake, and is not as linear in true practice. Here is a link to a full video case if interested - https://www.youtube.com/watch?v=1vnsUxofIUg&t=519s.
The ‘Event’ is the undesirable outcome. The ‘Modes’ are the facts accumulated from the scene that need to be explained. The ‘Hypotheses’ (H) reflect the exploration of logic to explain the Modes (the facts).
When we drill down past the Modes, we are exploring the physical nature of the failure or the ‘failure physics’. We lead this exploration with the question ‘How Can?’
Figure 1. Basic Logic Tree Representation
This brings us to our 1st RCA definition component:
This is the difference between asking ‘How Can’ and ‘Why’.
Think about this using a detective metaphor…if we ask ourselves ‘How a crime occurs’ versus ‘Why a crime occurs’, wouldn’t the answers be different? The use of ‘How Can’ to explore the physics of failure is appropriate, because the physical sciences tend to have a more finite range of possibilities (i.e. – how can fatigue occur). We are not seeking just one, linear answer, we are seeking all the possibilities that could have occurred. This is because failure is not linear and normally multiple failure pathways converge at some point in time to cause a bad outcome.
As RCA analysts we are continually, visually recreating the events that occurred in our minds. This is just like the flashbacks we see on crime shows like CSI, where they play out a hypothesis. As RCA analysts, we are doing the same thing, like rolling back a video recording of the failure in short increments of time.
Notice, when we arrive at the decision maker (HR), our question shifts from ‘How Can’ to ‘Why’. I am not interested in ‘How Could’ someone make a decision, as the potential answers are infinite. I am interested in why , at that time, their decision seemed appropriate. That is very specific reasoning. When at this point in the logic tree, the decision reasoning point, we switch from using deductive logic to inductive logic.
Validating all hypotheses using sound evidence as opposed to hearsay.
How likely is it for a lawyer to win their case in court when their primary evidence is hearsay? How often do we see RCA’s presented to us (or leadership) where they are full of assumptions and hearsay? To me, if we do not have sound evidence to back up our hypotheses, IT IS NOT AN RCA! This is a critical element that is often missed, because most of us are time-pressured to complete our RCAs. What takes the most time when doing an RCA…collecting the evidence!! Therefore, when that time pressure is applied, we are forced to take short-cuts, usually in the form of not properly validating our hypotheses.
In an effective RCA, each hypothesis has a verification log entry that includes a:
This is basically a ‘chain-of-custody’ approach applied to an RCA.
The utilization and expression of linear/non-linear, cause-and-effect logic.
In Figure 1, when reconstructing logic, level-to-level represents a cause-and-effect relationship. That is what ‘tightly coupled’ means; we can directly correlate logic from one level to another. So, with evidence, I can link the deficient management system (LR) to their direct influence on the decision-maker’s decision (HR). Then I can link the consequences of the decision to their physical, observable impacts (PR)…or to the physics of failure.
This ‘coupling’ is different from categorical RCA Approaches like fishbone diagrams (see Figure 2) that use cause categories to explore. Within these categories, we are encouraged to brainstorm what cause could have contributed to the overall undesirable outcome? Using this approach does not reflect, direct cause-and-effect relationships.
Figure 2. Conventional 6M Fishbone Diagram Expression
The threshold or trigger which initiates an RCA in an organization.
This will vary from company to company based on different drivers such as regulatory and internal KPI’s. However, such triggers are usually reactive in nature as the thresholds involve serious consequences like significant losses, injuries/deaths, and regulatory violations.
I would encourage analysts to lower the RCA triggers to include proactive opportunities, such as:
The point at which drilling down in an RCA ceases to be value-added.
This is a question we often hear in training, ‘When do we stop drilling down?’. It is a fair and legitimate question because one can take an analysis back to Adam and Eve if they want, but at what point is it non-value added?
My simple rule of thumb is that when the corrective actions involve going outside the fence (the boundaries of the facilities) we may not have control of the fix. We can control fixing systems in our organizations, but we can’t control socio-technical factors like changing regulations and laws. That doesn’t mean they shouldn't be addressed, but it does mean that we can hand off those corrective actions to people that can effectively address them. We want to focus on implementing fixes that we can control.
There are many definitions of RCA in the marketplace and many of them have different purposes. For instance, regulatory definitions drive compliance. Vendor’s definitions often have a commercial variant related to their proprietary approach. Plus there are even more definitions from researchers and academics which often reflect theory versus practice. So, it’s a buyer beware market where ‘RCA’ is concerned. In the end, use what works best for you!
Our recommendation is the focal point of every RCA effort be on its EFFECTIVENESS. Unfortunately, just being RCA compliant does not mean your RCA effort is effective. Those in the field know what I’m talking about.
We at RCI utilize this particular RCA definition, because we feel it represents the effectiveness of a holistic RCA system. It is not related to our commercial products alone, as it can be applied to any form of ‘RCA’.
Collectively, as RCA professionals, we must unite to defeat this paradigm:
‘We NEVER seem to have the time and budget to do RCA right, but we ALWAYS seem to have the time and budget to do RCA again’!
About the Author
Robert (Bob) J. Latino is CEO of Reliability Center, Inc. a company that helps teams and companies do RCAs with excellence. Bob has been facilitating RCA and FMEA analyses with his clientele around the world for over 35 years and has taught over 10,000 students in the PROACT® methodology.
Bob is co-author of numerous articles and has led seminars and workshops on FMEA, Opportunity Analysis and RCA, as well as co-designer of the award winning PROACT® Investigation Management Software solution. He has authored or co-authored six (6) books related to RCA and Reliability in both manufacturing and in healthcare and is a frequent speaker on the topic at domestic and international trade conferences.
Bob has applied the PROACT® methodology to a diverse set of problems and industries, including a published paper in the field of Counter Terrorism entitled, "The Application of PROACT® RCA to Terrorism/Counter Terrorism Related Events."
Subscribe to our newsletter for industry leading Reliability content & ideas.