When I give speeches around the world, I often poll my audiences about how they define ‘RCA’. The fact is, I get as many answers, as the people I ask!
Are there definitions out there? Absolutely - there are hundreds of them. But you don't need to read about all that. Here's what you need to know:
Chapter 1: What is Root Cause Analysis (RCA)?
Chapter 2: What are the Ways to do RCAs?
Chapter 3: How to Convert a Shallow Cause Analysis into a Root Cause Analysis
Chapter 4: How To Make Your RCA’s More Effective
Chapter 5: How RCAs Contribute to the Bottom-Line
On the surface this seems quite simple, but unfortunately it is quite complex. Honestly…the term ‘RCA’ (Root Cause Analysis) is quite vague, misleading and easily misinterpreted by those who are not immersed in its use. It is a useless and counter-productive term because there is no universally accepted, standard definition. Therefore, any process/tool someone is using to solve a problem is likely to be labelled as ‘RCA’. It could be troubleshooting, brainstorming and/or some other more structured problem solving approaches such as 5-Whys, fishbone diagrams, causal factor trees and/or logic trees.
When I train or present speeches around the world, I often poll my audiences about how they define ‘RCA’. The fact is I will get as many answers, as I have people that I ask. Are there definitions out there, absolutely! There are hundreds of them. Various regulatory agencies have their own such definitions, as do corporations and companies. However, when definitions differ between agencies, corporations and industries, it is hard to measure the effectiveness of ‘RCA’ across the board, because everyone considers whatever they are doing, as ‘RCA’.
Essentially the term ‘RCA’ is a noun these days. The different brands of RCA on the market are merely the adjectives describing different RCA approaches and providers. The brand then becomes the ‘uniqueness’.
Our management (or organizational) systems are the rules and guidelines in which our organization’s operate, much like the laws of our lands that govern how our countries operate. Since these are created and maintained by humans, they are not flawless. They can be insufficient, inadequate, wrong and even non-existent (for situations unforeseen). We refer to these management system flaws as Latent Root Causes.
Examples such as inadequate torque wrench calibration system, the accountability system was weak and rarely enforced, training was less than adequate, etc. show that latent root causes are always there, lying dormant, and waiting to be activated by the human.
With flawed management systems, we feed incomplete and/or inaccurate information to people who must process the information to make their decisions. Ultimately, this will likely result in an inappropriate decision for the situation at hand! We refer to these ‘decisions’ or ‘choices’ as Human Root Causes.
When humans make an inappropriate decision, it is expressed in one of two ways, 1) errors of commission or 2) errors of omission. This means we took an action that was inappropriate (error of commission) or we should have taken an appropriate action and didn’t (error of omission).
Examples are endless to describe these two situations but an error of commission may be that we closed a valve in a manufacturing operation that we should have left open. An error of omission may be that an ER nurse improperly triaged a patient and as a result, they died waiting for care in the waiting room.
When humans make decision errors, they often result in observable consequences. At this point, the error chain has not been obvious because it is still in the mind of the decision-maker. Only after the decision is made, are the consequences observable. We will refer to these consequences as Physical Root Causes.
Examples of these observable consequences are the tangible failure roots, like a fatigued component, an inhaled substance, a corroded pipe, etc.
An RCA myth encountered by many managers is they think RCA methods are all the same, when in fact they are NOT. The range in what is believed by many managers as an effective RCA method is vast. Many RCA techniques have little to no emphasis on establishing all the possible ways a problem can occur, while others expand the user’s overall effectiveness by having pre-built logic templates to ensure what is often viewed as having all the possibilities of occurrence available for discovery. Verification of each possibility can also range from someone saying it happened (weak verification) to re-construction and testing of each possibility (strong verification).
RCA methods can be shallow, or they can be robust, it depends on what the management wants to accomplish. Here are some analytical methodologies that are often grouped into the category of RCA:
When something ‘fails’ in our workplaces, do faces come to mind of the people that we immediately turn to, to make everything alright? These are our ‘heroes’ who get us back to normalcy quickly, and they get a fair amount of recognition for doing so. However, these individuals are being provided positive recognition for being great responders. They are rarely doing any analytics to understand root causes, but they are great at implementing temporary solutions. A progressive management team, under these conditions, would be asking, “Why is this person getting so much practice at responding?” That's where the meat is. This approach is normally attractive because it is quick, inexpensive and demonstrates immediate action.
We can all relate to this analytical technique. When something bad happens, we tend to put very smart people in a room who listen to a summary of what we know happened (the bad outcome). Oftentimes this description embeds hearsay as fact, and the group is accepting of this hearsay as they move on to solutions. So many bright people throw out disconnected ideas as to what they think happened, and then they move on to action plans. Usually this approach focuses on speed to demonstrate activity, and as a result is weak on evidence and analysis down to root causes. Chances are this team will be meeting again because the failure will recur. This approach takes a little longer because we are not dealing with an individual, we are dealing with several individuals so this requires more give-and-take in discussions.
This approach is very common and it is essentially brainstorming plus a structured analytical tool like 5-whys or a fishbone diagram. The ‘tool’ provides a degree of discipline as it has a series of steps and a structure to follow. This is certainly more progressive than troubleshooting and brainstorming. This approach, when applied properly, could be effective. However, in my 35+ years of this business, I find that this approach is not always applied properly. People are usually time constrained to hurry up, and therefore the time to collect evidence is expensed. So we run with hearsay and assumption, we treat them as facts, and we develop and implement recommendations accordingly. Dick Swanson (Owner, Performance Management Initiatives, Inc.) says, “The irony of this association is rooted in the fact that the 5-Why approach was developed by Toyota as a tool for assembly floor supervisors to keep production moving, and not as a tool to identify deep, underlying causes of complex events”.
I will add another potential form of RCA, used by many in industry, which is ‘Trial and Error’. I do not list it with the others because I really don’t consider it an analytical process. This approach just supports the paradigm of ‘if it ain’t broke, don’t fix it!’ This is more akin to applying a crisis maintenance strategy; there really is no analytics going on, we just fix things when they break.
The intent of “true” Root Cause Analysis is to mitigate or eliminate the possibility of recurrence. For this to take place the methodology used must have a problem definition that is accurate and factual. Possible ways the problem can occur must be identified, and each possibility must be verified as true (did happen) or false (did not happen) using sound evidence (not hearsay).
To look at RCA agnostically, getting away from brands and labels, let’s briefly explore what core steps constitute a valid RCA. I suggest the following:
Time pressure has a huge impact on the effectiveness of an RCA. When anyone is time pressured to do anything, they will often take short-cuts. In the RCA world, the short-cuts are likely to take place from the most time-consuming part of an analysis/investigation, which is the data collection phase. So when we take short-cuts on gathering our evidence, we increase the risk of recurrence because we are not operating on facts, just hearsay and assumptions.
In summary, the faster we do an analysis due to time pressures, the more likely we are to do the same analysis again. This is because we are likely to be weak on evidence and our focus is on solutions and not analysis. When RCA is done properly, it does take more time to conduct an effective analysis. However, we should not have to do it again if we did it right the first time!
Many managers underestimate the amount of support needed for a successful RCA system. The paradigm of, “Send the candidates to RCA training and they will solve problems,” rarely works.
The RCA infrastructure is often not well thought out and when practitioners encounter obstacles, they are not able to complete their RCA successfully. This usually results in abandonment of the practitioner’s internal drive to execute the process correctly.
There are common barriers encountered by newly trained analysts. The student/analyst:
Now that we understand where a failure comes from and how the error chain grows, how can we make our RCA processes more effective? Why do we often seem to be doing RCA on the same events, over and over again? Are we not learning from the past? Is it that our RCA’s just aren’t that good?
Having been an RCA practitioner now for over 35 years working in various industry sectors, my observation is that we have a difficult time looking in the mirror and accepting that we could be part of the problem!
Many organizations seem content with their RCA processes, when their analyses pass some kind of regulatory audit. Passing such an audit or survey means the regulators are off their backs...for now.
However, that is not the true measure of RCA effectiveness and it is misleading. RCA effectiveness should be measured based on quantifiable and meaningful bottom-line metrics that correlate to corporate dashboards or KPIs.
The key to RCA effectiveness is facing the truth, and unfortunately, we are not very good at accepting the truth when it involves our taking an introspective look at our potential contribution to the bad outcome.
The ‘truth’ is embedded in the management systems we spoke about earlier. Oftentimes we focus on the decision-makers and then levy discipline for making a poor decision. However, RCA is not about ‘who’ made the poor decision. We are more interested in why the person felt his or her decision was appropriate at the time.
When we get into the decision-maker's head and understand their reasoning for their decisions, most of the time their rationales are perfectly logical. Their decisions are most often well-intended. More importantly, others would likely make the same decision given the same information and under the same conditions.
When we delve this deep, this will bring us right back to the flawed management systems that provide these people such information. These systems are supposed to be in place to help our people make better decisions. So when they are flawed, our systems are at risk of not performing as intended.
Think about it, if we choose to ignore these deeper issues (because it is easier and more comfortable to do so), then the ‘seeds’ of failure are still implanted in our systems. This just means they will be activated by someone else at a later time and the patient in the hospital, or operation, will risk peril once again.
For RCA’s to be truly effective, we have to look in the mirror and face the possibility that we could have unintentionally contributed to the bad outcome…that is the only way we will make progress. This type of openness and non-punitive environment is a key principle of a High Reliability Organization (HRO).
Remember, “We NEVER seem to have the time and budget to do things right, but we ALWAYS seem to have the time and budget to do them again!”
At this point our RCA is completed and now we have to develop, sell and implement our solutions. Remember, RCA is a ‘system’ and not a task. This is yet another critical link in the RCA chain. This is because if we can’t sell the need for our recommendations, all the investigative and analytical work we did was a waste of time (plus we would be less driven to do a great job next time).
As analysts, we have to ask ourselves ‘What is our definition of success?’ for our analysis. Compliance should NOT be the definition of success for an RCA.
Conducting such an assessment on an annual basis will allow us to measure our progress. Such an assessment will identify which sections we are strong in, as well as where we could use improvement. In our weaker areas, we can take corrective actions to shore up the RCA system and help our analysts’ be the best they can be.
Management can increase the success rate of RCA’s by making sure the infrastructure is in place to:
In order for an RCA to be successful, there has to be some type of bottom-line improvement. Something has to get better as a result of your RCA, what is that? Simply clicking a checkbox from a list indicating your RCA is complete, is not a measure of success (or shouldn’t be). That just means the determination of causes may be complete, but we still have nothing to show for it on the bottom-line.
Most RCA’s tend to drop off a cliff at this point, because there is a lack of accountability for the recommendations. Each recommendation should have a person assigned to ensure each is completed, with a due date. Each recommendation should have a cost/benefit calculation attached to it, to measure ROI. This will greatly aid in the selling of the recommendation to finance people.
To complete our loop with RCA being viewed as a ‘system’, closure will be that a measurable, demonstrable benefit has been realized. This means that we have to have tracking mechanisms in place to measure the effectiveness of each recommendation and for the RCA overall.
Rounding out our RCA system, these are some tasks that we should be concerned about when it comes to measuring effectiveness:
As part of our RCA management support systems, Leadership should tell us what their expectations are for the RCA initiative. Oftentimes this is correlated to the corporate dashboards and/or KPIs. We should be able to demonstrate that our RCA’s are narrowing the gaps of such corporate metrics.
Without such oversight, if someone is not doing their task and there is no negative consequence, they likely never will. They have other priorities and this one may be low, especially when no one is checking to see if the task was done.
As mentioned earlier, one of the greatest benefits of an effective RCA system, is the creation of a living and growing knowledge management system. This would be a database of RCA experience, or ‘corporate memory’. This would prevent people from having to do the same RCA’s over and over again, just because they did not know one had been done in the past. Imagine the costs of re-work when we have to do RCA’s over and over again. Who’s calculating that number in an organization?
I can assure you as a CEO myself, if I see such initiatives saving my company millions of dollars/year, I will continue to invest in such initiatives. As an FYI, our documented average ROI for our case study database is over 600% (as published in our books). That will raise the brow of any finance person.
If not, we should be. This is because they will see, they were part of something successful and they will be motivated to continue to help in the future as well.
In the end, an analysis is only as good as the analyst!
About the Author
Robert (Bob) J. Latino is CEO of Reliability Center, Inc. a company that helps teams and companies do RCAs with excellence. Bob has been facilitating RCA and FMEA analyses with his clientele around the world for over 35 years and has taught over 10,000 students in the PROACT® methodology.
Bob is co-author of numerous articles and has led seminars and workshops on FMEA, Opportunity Analysis and RCA, as well as co-designer of the award winning PROACT® Investigation Management Software solution. He has authored or co-authored six (6) books related to RCA and Reliability in both manufacturing and in healthcare and is a frequent speaker on the topic at domestic and international trade conferences.
Bob has applied the PROACT® methodology to a diverse set of problems and industries, including a published paper in the field of Counter Terrorism entitled, "The Application of PROACT® RCA to Terrorism/Counter Terrorism Related Events."
Subscribe to our newsletter for industry leading Reliability content & ideas.
These Stories on root cause analysis rca
2907 W Marshall Street
Richmond, VA 23230
+1 (804) 458-0645
+1 (800) 457-0645