How 250+ Safety Projects Turned RCA Into Control Restoration
A case-study article on how incident investigation becomes useful when RCA produces restored controls, named owners, and field verification.

Key takeaways
- 01RCA becomes useful only when it restores a failed or missing control, not when it produces a longer cause list.
- 02In more than 250 cultural-transformation projects supported by Andreza Araujo's team, the recurring weakness is not investigation effort but weak conversion into field decisions.
- 03The case pattern shows why corrective actions need owners who control budget, engineering, staffing, or supervision, not only an EHS coordinator who tracks due dates.
- 04James Reason's latent-failure logic helps keep investigations focused on design, management, and supervision conditions rather than the last person in the sequence.
- 05A 30 to 60 day verification loop is the difference between action closure and risk reduction.
- 06The strongest post-incident review asks which control must be restored today, which control must be redesigned, and which decision allowed the weakness to stay hidden.
Incident investigation can look disciplined while leaving the operation almost unchanged. The team builds a timeline, interviews witnesses, names causes, opens actions, and files the report. Thirty days later, the same weak control is still present on night shift, the same contractor still receives unclear instructions, and the same supervisor still has no authority to stop the condition that made the event credible.
This case-study article uses a pattern observed across more than 250 cultural-transformation projects supported by Andreza Araujo's team. The proprietary lesson is blunt: RCA becomes safety work only when it restores controls. Without that conversion, investigation becomes a well-written explanation of yesterday's exposure.
The common market advice says organizations need better root-cause analysis methods. Sometimes they do. The bigger hole is downstream. Many companies already know enough to act, although the action is written too softly, assigned too low, verified too early, or disconnected from the control that actually failed.
Initial scenario: RCA was complete, but exposure stayed alive
In many organizations, the incident file is cleaner than the worksite. That was the recurring starting point in the transformation projects behind this case pattern. Reports contained causal trees, photos, witness statements, corrective-action tables, and meeting minutes, yet field observations still found missing isolation checks, incomplete handover, weak permit review, and supervisors managing risk through memory.
The failure was not laziness. It was a governance problem. EHS teams were asked to coordinate investigations, but operations, engineering, maintenance, procurement, and senior leaders controlled the decisions that would actually restore the failed barriers. When every action landed in the same tracker, the action list grew while control authority stayed blurred.
James Reason's work on organizational accidents gives this pattern a technical anchor. Accidents emerge when active failures meet latent conditions in design, supervision, maintenance, planning, and management systems. An RCA that names only the visible act can satisfy the report while leaving the latent condition intact.
Andreza Araujo's Portuguese title Sorte ou Capacidade, translated as Luck or Capability, reinforces the same point from a safety-culture perspective. When an organization treats an event as bad luck or individual failure, it misses the systemic layers that either protected people or allowed exposure to reach them.
The decision: stop treating closure as the finish line
The decision that changed the investigation rhythm was simple in wording and hard in practice. The team stopped asking whether the action was closed and started asking whether the control was restored, redesigned, or replaced.
That shift changed the language of the review. A weak action said, "retrain operators." A stronger action asked which control allowed the hazardous task to depend on memory, attention, or informal supervision. A weak action said, "review procedure." A stronger action asked whether the procedure matched the actual task, especially during maintenance, cleaning, contractor work, or production recovery.
The distinction matters because incident investigation often rewards visible activity. Training sessions, toolbox talks, procedure revisions, and new signatures are easy to count. Control restoration is harder because it requires field evidence, named authority, and a test of whether the corrected condition survives real work.
In Safety Culture Diagnosis: Learn how to do your own, Andreza Araujo treats diagnosis as a comparison between declared systems and lived practice. The same diagnostic discipline belongs inside RCA. If the organization cannot show how the work changed, it has not finished learning from the event.
Execution: build the control-restoration loop
The execution model had 5 parts. First, every significant investigation had to identify the failed or missing control in plain language. The team could still use causal methods, but the final review had to answer which barrier was absent, weak, bypassed, misunderstood, unavailable, or unsupported by leadership decisions.
Second, action ownership moved to the role that controlled the barrier. Engineering owned design corrections. Maintenance owned safeguard reliability. Operations owned task execution and supervision. Procurement owned contractor and supplier requirements. Senior leadership owned budget, shutdown time, risk acceptance, and escalation when the required control competed with production pressure.
Third, each action needed evidence criteria before closure. A completed purchase order, attendance list, or procedure revision was not enough for high-consequence exposure. The evidence had to show that the control existed in the field, that exposed workers understood it, and that supervisors knew what to do when it was unavailable.
Fourth, serious events and SIF-potential near misses received a temporary-control review before permanent solutions were finished. That prevented the organization from keeping the same exposure open for weeks while the formal investigation moved through approvals.
Fifth, the team added a 30 to 60 day verification point. The timing mattered because many actions look effective immediately after an event, when attention is high. The real test happens after schedules tighten, contractors rotate, maintenance backlog returns, and the site stops behaving as if an investigation is still watching.
Measured result: the investigation changed what leaders could see
The measurable result in this case pattern was not a claim that every company reduced incidents by the same percentage. That would be false precision. The anchored fact is that Andreza Araujo's work has reached 250+ companies and 30+ countries, and the repeated gain in these projects was visibility: leaders could see which incidents had restored controls and which incidents had only closed actions.
That visibility changed review meetings. Instead of scanning overdue items, leaders could ask whether a serious-risk barrier was verified, who owned the decision, which temporary control protected the site, and what evidence proved the exposure had changed. A report that once ended with a corrective-action table became a governance conversation.
The table below shows how the case pattern changed the investigation output.
| Investigation output | Paperwork model | Control-restoration model |
|---|---|---|
| Cause statement | Explains the event | Names the weak or missing control |
| Action owner | Often EHS or the direct supervisor | The role with authority over the control |
| Closure evidence | Training record, signature, or revised file | Field verification and supervisor escalation test |
| Leadership review | Checks overdue actions | Challenges unresolved serious-risk exposure |
| Learning value | Preserves the report | Changes the work condition that made harm credible |
Generalizable lesson one: the failed control is more useful than the root cause label
Root cause labels can become too abstract. "Lack of training," "poor communication," and "human error" do not tell a plant manager what must be restored today. A failed-control statement is more operational because it links the event to a barrier that should have interrupted the path to harm.
For example, after an energy-isolation event, the useful question is not only why the worker made a mistake. The investigation should ask whether the isolation point was identifiable, whether the verification step was physically possible, whether supervision accepted shortcuts, and whether the permit system forced a real check before work began.
This is why barrier analysis before RCA is often more useful than adding another cause category. It keeps the team close to the exposure and prevents the report from drifting into generic behavioral language.
Generalizable lesson two: action ownership must match decision authority
Weak investigations assign high-consequence actions to people who can only remind others. That creates a polite failure pattern. The EHS coordinator follows up, the supervisor promises attention, the due date moves, and the original decision power remains untouched.
The control-restoration model forces a different question. Who can actually change the condition? If the answer is engineering, maintenance, operations, procurement, or senior leadership, the action should sit there. EHS can support method, verification, and challenge, but EHS cannot own every control that other functions create or weaken.
This ownership discipline also protects supervisors. A frontline leader should not carry responsibility for a control that requires capital, staffing, equipment redesign, or supplier enforcement outside that leader's authority. Responsibility without authority produces blame in a more administrative form.
Generalizable lesson three: verification must happen after attention fades
Immediately after an incident, everyone behaves better. Leaders visit the area, supervisors brief the crews, and workers know the topic is visible. If verification happens only during that period, the organization may mistake temporary attention for permanent control.
A 30 to 60 day verification point gives the action a harder test. The reviewer should observe the task, ask exposed workers how the control works, check whether supervisors know the escalation rule, and confirm that maintenance or inspection evidence matches the claim made in the investigation report.
This connects directly to testing corrective-action effectiveness in 30 days. The investigation is not finished when the action owner uploads evidence. It is finished when field conditions show that risk has been reduced.
What to apply in your next investigation
Start with the next serious incident or SIF-potential near miss. Before the RCA meeting ends, require one sentence that names the failed or missing control. Then require one owner whose authority matches the control, one temporary protection if exposure remains open, and one verification method that will be checked after normal work pressure returns.
Do not rebuild the entire investigation program at once. Select the 10 most consequential open actions from the last quarter and review them against 4 questions: which control did this action restore, who owns the decision, what field evidence proves it works, and what escalation happens if the control is missing again?
If the answer is unclear, the action is not mature enough for closure. It may be administratively complete, but it is not yet a control decision. That is the difference between an RCA file and a safer operation.
Every incident file that closes without restored controls teaches the organization to document harm without changing the conditions that made harm possible.
Andreza Araujo's team helps organizations connect incident investigation, safety culture diagnosis, and control governance so post-incident learning reaches the worksite. To discuss how this applies to your operation, start through Andreza Araujo.
Frequently asked questions
What does control restoration mean after an incident?
Why do RCA actions often fail?
How soon should corrective actions be verified?
Should every incident investigation use the same RCA depth?
Which book by Andreza Araujo connects to this topic?
About the author
Andreza Araújo
Safety Culture Expert | Senior EHS Executive
Andreza Araújo is a safety culture expert and senior EHS executive with more than 25 years of experience in environment, health and safety. She is a Civil Engineer and Occupational Safety Engineer from Unicamp, holds a Master's degree in Environmental Diplomacy from the University of Geneva, and completed sustainability studies at IMD Switzerland. Andreza has served in Global Head of EHS roles in Fortune 500 environments, leading cultural transformation programs across multinational operations. She has represented Brazil as a speaker at the United Nations in Paris and has spoken at the International Labour Organization in Turin. She is the author of more than 16 books on safety culture in Portuguese, Spanish, English and German. Her work has earned more than 10 EHS awards, including two recognitions from Indra Nooyi, former PepsiCo CEO.
- Civil & Safety Engineer (Unicamp)
- M.A. Environmental Diplomacy (University of Geneva)
- Sustainability Cert (IMD Switzerland)
- People Management & Coaching (Ohio University)
- UN Paris speaker representative for Brazil
- ILO Turin speaker
- LinkedIn Top Voice
- Indra Nooyi PepsiCo CEO recognition (2x)
Documentaries
Watch Andreza's documentaries
Three productions on safety culture, organizational failure and the human lessons behind major disasters.
Podcasts
Listen to Andreza's podcasts
She hosts three shows on safety leadership, EHS and organizational culture, in English and Portuguese.