Control Assurance: Audits vs Checks vs Field Evidence
Control assurance needs more than audit scores, because leaders need current proof that critical controls still work where real work happens.

Key takeaways
- 01Separate audit scores, control checks and field evidence because each one answers a different safety decision.
- 02Use audit scores for management-system discipline, not as proof that fatal-risk barriers are working today.
- 03Prioritize control checks when leaders need current evidence that critical controls are present, functional and owned.
- 04Add field evidence to expose work as performed, especially when dashboards are green but reporting is quiet.
- 05Request an Andreza Araújo safety-culture diagnostic when executive confidence depends on proving control, not protecting a score.
Control assurance is the difference between a safety dashboard that looks calm and a safety system that can prove its barriers still work. Audit scores, control checks and field evidence all have value, but they answer different questions. Treating them as interchangeable is how leaders end up with green reports while risk is quietly accumulating in the work.
The central thesis is simple enough to test in any plant, logistics site or construction project. Audit scores tell leaders whether the management system has evidence. Control checks tell them whether a defined barrier exists and is working today. Field evidence tells them whether the work, as performed by real people under real pressure, matches the story in the procedure, the training file and the monthly dashboard.
Across 25+ years leading EHS in multinational environments, Andreza Araújo has seen that executives often ask for more metrics when they actually need better proof. In her Portuguese title *Diagnóstico de Cultura de Segurança* (Safety Culture Diagnosis), she argues that quantity is not a synonym for quality and commitment in health and safety. That position matters here because a metric pack can grow every month while the organization still has no reliable answer to a more serious question: which critical controls would fail if production pressure increased tomorrow?
Evaluation Criteria For Safety Decisions
The right metric method depends on the decision being made. A board that is deciding whether to fund machine guarding upgrades needs a different signal than an EHS manager who is testing whether permit-to-work quality improved after supervisor coaching. Because the decision changes, the proof standard also changes.
Use five criteria before choosing the method. First, decide whether the question is about compliance, control effectiveness, behavior under pressure, trend movement or executive accountability. Second, define the time horizon, since an annual audit and a weekly field verification cannot carry the same signal. Third, test whether the method can reveal weak signals before injury data appears. Fourth, decide who can challenge the result. Fifth, identify whether the number can be gamed, especially when targets influence bonuses, reputation or promotion.
This is where metric hygiene becomes a governance issue rather than a data-cleaning exercise. If definitions change by site, if missing observations are quietly excluded, or if green status depends on self-reported closure, the dashboard may reward the easiest story instead of the safest work.
Option 1: Audit Scores
Audit scores are useful when leaders need a structured view of management-system discipline. ISO 45001:2018 specifies monitoring, measurement, analysis, performance evaluation and internal audit as part of the occupational health and safety management system, which means audit evidence belongs in the assurance architecture. The weakness begins when a high score is treated as proof that the field is controlled.
An audit score usually measures whether evidence exists, whether procedures are documented, whether training records are complete and whether responsible people can explain the system. Those are legitimate questions. They protect against improvisation, undocumented decisions and the slow erosion of accountability. They also create a common language across sites, which is helpful when a regional EHS director needs comparability across countries.
The trap is that audit scores are often periodic, prepared and heavily mediated by documentation. A site can score well because the right binder exists, because the interviewees are rehearsed, or because the audit samples miss the high-risk work that happens at night, during maintenance or under contractor pressure. As Andreza Araújo often emphasizes in cultural diagnosis work, the official version of safety and the operated version of safety can diverge sharply, especially when the organization has learned to protect the score.
Choose audit scores when the decision is about system maturity, certification readiness, governance coverage or consistency between sites. Do not choose them as the primary proof for serious injury and fatality prevention unless they are paired with real-time verification of the barriers that would prevent the event.
Option 2: Control Checks
Control checks are stronger than audit scores when the question is whether a named barrier is present, functional and owned. A control check does not ask whether the procedure exists. It asks whether the interlock works, whether the isolation was tested, whether the rescue kit is complete, whether the scaffold tag matches the actual condition, or whether the supervisor has authority to stop work when the control is missing.
This method fits high-risk work because it moves from paperwork to barrier condition. It also improves executive conversations. A plant manager can debate whether a 94 percent audit score is good enough, but it is much harder to normalize a failed control check on a fatal-risk task. The discussion becomes concrete, which is why control verification belongs beside SIF prevention rather than buried in a generic compliance dashboard.
There is still a failure mode. Control checks become theater when they turn into tick-box routines, especially when the verifier is rushed, lacks technical competence, or feels pressure to avoid stopping production. The article on safety KPIs, bonuses and control checks explains why incentives can distort the signal when the organization rewards green status more than honest escalation.
Choose control checks when the decision involves fatal-risk exposure, contractor work, permit-to-work quality, barrier restoration after incidents, or capital prioritization. The method is weaker for culture diagnosis by itself because it can tell leaders what failed, although it may not explain why people accepted the failure.
Option 3: Field Evidence
Field evidence is the most revealing option when leaders need to know whether work as done matches work as imagined. It includes structured observations, worker interviews, photos of actual conditions, pre-task conversations, near-miss narratives, supervisor debriefs and evidence from deviations that did not become injuries. The point is not to collect more stories, but to test whether the system behaves as designed when real constraints appear.
This is the option most leaders underestimate because it is less tidy than an audit score. It requires judgment, calibration and time in the operation. It can also expose uncomfortable contradictions. The procedure may say that a task requires two people, while the night shift routinely performs it with one. The dashboard may show closed corrective actions, while the same crew keeps building informal workarounds because the original fix slowed the job without reducing risk.
Field evidence is especially valuable for weak signals. The board may see no injuries, but the field may already be showing equipment bypasses, repeated delays in closing high-risk actions, unexplained silence in near-miss reporting, or supervisors who avoid escalation because escalation is interpreted as poor performance. That is why weak-signal metrics for boards should include evidence from the worksite, not only aggregate injury rates.
Choose field evidence when the decision is about cultural truth, operational drift, leadership behavior, procedure usability or the credibility of reported metrics. Its weakness is comparability, since qualitative evidence must be coded carefully if leaders want to compare one site with another. In practice, the solution is not to discard field evidence, but to standardize the sampling method and keep the evidence close to the decision.
Decision Matrix
The comparison below is a practical way to decide which option should lead the assurance process. Most mature organizations need all three, although one method should be primary for each decision.
| Decision need | Audit scores | Control checks | Field evidence |
|---|---|---|---|
| Certification or system discipline | Strong | Supporting | Supporting |
| Fatal-risk barrier confidence | Weak alone | Strong | Strong when sampled near work |
| Executive dashboard credibility | Supporting | Strong for critical controls | Strong for weak signals |
| Culture diagnosis | Weak alone | Supporting | Strong |
| Cross-site comparability | Strong | Moderate if definitions are stable | Moderate if coded consistently |
| Resistance to gaming | Moderate to weak | Moderate | Strong when triangulated |
The matrix creates a practical rule. Audit scores are the best entry point for governance, control checks are the best entry point for high-risk reliability, and field evidence is the best entry point for truth. When leaders confuse those roles, they ask a weak instrument to answer a strong question.
Recommendations By Context
For a board or C-level team, the best monthly view is not a single safety score. It is a three-layer assurance pack that separates system discipline, critical-control status and field truth. The board should see the audit trend, but it should also see failed control checks on fatal-risk work and a short narrative of the weak signals that do not yet appear in injury statistics.
For an EHS manager, control checks should carry more weight than audit scores when resources are limited. A low audit score may reveal system disorder, but a failed control on energized work, confined space, working at height or vehicle-pedestrian separation tells the EHS manager where loss potential is alive today. The article on SIF rate, TRIR and precursor indicators expands this point because severe-risk metrics need a different logic from general recordability.
For a site manager, field evidence should be used to challenge comfort. If the site has low injury rates, high audit scores and little reporting, that silence should not be celebrated. In more than 250 cultural transformation projects supported by Andreza Araújo's team, one recurring pattern is that organizations often find their most important risk information after they stop treating bad news as an embarrassment.
For a regional EHS analyst, statistical tools still matter. A signal that appears in one field visit may be anecdotal, while a signal repeated across sites deserves escalation. That is why SPC, run charts and heat maps should sit beside the assurance model. Trend tools show movement, but field and control evidence explain what the movement means.
Board Questions That Expose Weak Assurance
A board does not need to become a technical audit team, but it does need better questions. The first question is whether the organization can name its critical controls for fatal-risk scenarios and prove their condition with current evidence. If the answer is a procedure list, the assurance model is still immature.
The second question is whether red signals are increasing because risk is getting worse or because reporting is becoming healthier. Andreza Araújo's critique in *Muito Além do Zero* (Far Beyond Zero) is useful here because the absence of accidents can protect the number instead of protecting life. A company that punishes red signals will eventually get fewer red signals, although that does not mean it got safer.
The third question is whether the same evidence would survive an unannounced visit to the field. This is where many executive dashboards fail. They are built from data that has already passed through filters, summaries and status meetings, which can remove the hesitation, conflict and uncertainty that made the original field signal valuable.
The fourth question is who owns the decision after assurance fails. If a failed control check creates a dashboard note but no resource decision, no design correction and no operating authority, the organization has recorded risk rather than controlled it. That is a governance failure, not a reporting gap.
Where To Start In The Next 30 Days
Start by choosing one fatal-risk scenario and mapping the three proof layers. Do not begin with every site and every metric. Select one scenario such as machine intervention, loading dock traffic, confined-space entry, work at height or energized maintenance. List the audit evidence, the critical controls and the field evidence that would prove whether the work is controlled.
During the first week, clean the definitions. Decide what counts as a control check, what counts as field evidence, who can verify the item and what makes the evidence unacceptable. During the second week, sample the work without warning the team to prepare a performance. During the third week, compare the official score with the control condition and field reality. During the fourth week, present the contradictions to leadership as decisions, not as observations.
The most important output is not a prettier dashboard. It is a sharper conversation about risk. If the audit score is green but field evidence shows shortcuts, leaders should fund supervision, redesign or workload correction. If control checks are failing repeatedly, leaders should stop treating the problem as awareness and identify the constraint that makes the control hard to maintain. If field evidence is silent, leaders should investigate whether people believe it is safe to tell the truth.
Control assurance works when the organization stops asking one metric to carry every meaning. Audit scores protect system discipline, control checks protect barrier reliability, and field evidence protects truth. The executive task is to keep those meanings separate long enough for the right decision to become visible.
Frequently asked questions
What is control assurance in safety?
Are audit scores enough for a safety dashboard?
When should leaders use field evidence?
How does this connect with Andreza Araújo's safety-culture work?
What should a board ask about safety metrics?
About the author
Andreza Araújo
Safety Culture Expert | Senior EHS Executive
Andreza Araújo is a safety culture expert and senior EHS executive with more than 25 years of experience in environment, health and safety. She is a Civil Engineer and Occupational Safety Engineer from Unicamp, holds a Master's degree in Environmental Diplomacy from the University of Geneva, and completed sustainability studies at IMD Switzerland. Andreza has served in Global Head of EHS roles in Fortune 500 environments, leading cultural transformation programs across multinational operations. She has represented Brazil as a speaker at the United Nations in Paris and has spoken at the International Labour Organization in Turin. She is the author of more than 16 books on safety culture in Portuguese, Spanish, English and German. Her work has earned more than 10 EHS awards, including two recognitions from Indra Nooyi, former PepsiCo CEO.
- Civil & Safety Engineer (Unicamp)
- M.A. Environmental Diplomacy (University of Geneva)
- Sustainability Cert (IMD Switzerland)
- People Management & Coaching (Ohio University)
- UN Paris speaker representative for Brazil
- ILO Turin speaker
- LinkedIn Top Voice
- Indra Nooyi PepsiCo CEO recognition (2x)
Documentaries
Watch Andreza's documentaries
Three productions on safety culture, organizational failure and the human lessons behind major disasters.
Podcasts
Listen to Andreza's podcasts
She hosts three shows on safety leadership, EHS and organizational culture, in English and Portuguese.