Observation Quality: 6 Blind Spots in Safety Metrics

Observation quality exposes whether safety conversations are reducing risk or merely producing comfortable dashboards for leaders.

By Andreza Araújo May 24, 2026 7 min read updated May 31, 2026

metrics dashboard representing observation quality 6 blind spots in safety metrics — Observation Quality: 6 Blind Spots in Sa

Key takeaways

01Audit observation quality by reading the decision changed by the observation, not only the number of cards submitted by each team.
02Separate hazard spotting from risk reduction, because a dashboard can look active while exposure remains untouched in routine work.
03Test supervisor coaching quality through evidence, escalation, and worker voice, since weak dialogue converts observation into inspection theater.
04Connect observation quality to SIF potential, corrective-action closure, and control verification before executives treat volume as safety maturity.
05Use Andreza Araujo's safety culture work to turn observation metrics into field learning, practical accountability, and stronger prevention.

Observation quality is the difference between a safety metric that helps leaders see work as it is and a metric that rewards people for filling out another form. Many companies celebrate observation volume because it is visible, easy to graph, and politically comfortable, even when leaders have not tested the pattern with SPC in safety metrics. The harder question is whether the observation changed a decision before someone was hurt. Observation quality should also be defined in a safety metric dictionary before volume pressure turns it into a cosmetic indicator.

Observation quality also depends on knowing when observation is the wrong lead tool, which is why the comparison of safety coaching, toolbox talks, and behavioral observation matters before leaders interpret the metric. A related active care and observation quality case shows why changed conditions matter more than card volume.

That question matters because safety observations often become a substitute for risk management. A plant can collect thousands of cards and still miss the exposure that will produce the next serious injury, especially when observers are trained to notice housekeeping, PPE, and simple behaviors while high-energy tasks stay outside the conversation.

Across 25+ years leading EHS in multinational operations, Andreza Araujo has observed that the strongest safety cultures do not ask only how many conversations happened. They ask what the conversation revealed, which control was challenged, and whether the supervisor had the authority to remove the exposure. That is why observation quality belongs in the same dashboard as SIF potential, DART rate, corrective-action closure, and control verification.

In *Safety Culture: From Theory to Practice*, Araujo argues that culture appears in routine choices, not in slogans. Observation metrics should therefore reveal routine choices. If they only prove that people are busy with cards, they may hide the same operational silence that makes safety metrics look too clean, including presenteeism at work when attendance masks reduced readiness.

1. Observation volume is not observation quality

The first blind spot appears when leaders treat volume as proof of maturity. A rising observation count may mean workers trust the process, although it may also mean the company created a quota that people satisfy with harmless entries. The number alone cannot separate learning from paperwork.

A quality observation contains enough detail for a decision. It names the task, energy source, exposure, existing control, gap, conversation, and next action. If a record only says unsafe act observed or employee reminded, it does not help the EHS manager understand what changed in the field.

The practical test is simple. Pull twenty observations from the last month and ask whether a supervisor could act on them without interviewing the observer again. If most records require reconstruction, the metric is counting activity rather than usable safety intelligence.

This is where many behavior-based programs lose credibility. The existing article on behavioral observation failures explains the broader BBS trap. Observation quality narrows the issue to the dashboard, because a weak metric can keep the program alive long after the field has stopped learning from it.

2. Low-risk observations can bury SIF exposure

The second blind spot is severity blindness. Teams often report easy findings because they are socially safe to mention. A missing glove, an unmarked walkway, or a cable on the floor deserves attention, but it should not crowd out suspended loads, line breaking, confined space entry, or energized work.

Frank Bird and Heinrich's pyramid still help leaders think about precursor events, but they become dangerous when every low-consequence observation receives the same weight. The real question is not whether the event is frequent. The question is whether the exposure contains enough energy to change a life.

A strong observation-quality score therefore includes SIF potential. It asks whether the observer recognized fatal energy, whether the control was verified, and whether escalation happened fast enough. Without that layer, the dashboard can improve while serious risk remains untouched.

Executives should read this beside leading indicators that TRIR will never show. Observation quality is not a replacement for those indicators. It is one of the filters that tells leaders whether the organization is seeing the right work.

3. Generic comments destroy decision value

The third blind spot is language poverty. Comments such as be careful, use PPE, or follow procedure look harmless, yet they erase the operational detail that a manager needs. They also push blame toward the worker because the record says nothing about design, workload, supervision, or missing resources.

James Reason's work on active and latent failures is useful here because it reminds leaders that visible behavior is rarely the whole story. When an observation describes only the person's act, it leaves out the conditions in which that act made sense. Araujo's Portuguese title *A Ilusao da Conformidade*, or *The Illusion of Compliance*, makes the same point from a culture lens.

Quality improves when the form forces context. A good record might say that the operator bypassed the marked pedestrian route because the only available path was blocked by staging material for maintenance. That sentence gives leaders a design and planning problem, not a lecture topic.

The metric should reward specificity. If observers learn that detailed context counts more than card volume, the conversation changes because people start looking for the condition behind the behavior.

4. Supervisor coaching can turn into inspection theater

The fourth blind spot sits in the conversation itself. Observation is supposed to create dialogue, but many supervisors conduct it as a small inspection. They identify the deviation, issue a reminder, and move on before the worker can explain pressure, confusion, or missing support.

Andreza Araujo's *Vamos a Hablar?* methodology treats behavioral observation as a conversation about risk, not a hunt for fault. That distinction matters because the worker often holds the missing context. If the supervisor does not ask, the observation becomes a performance of control rather than a source of learning.

Quality metrics should therefore include coaching evidence. Did the supervisor ask what made the behavior likely? Did the conversation identify a barrier that was absent or weak? Did the worker help define the action? These questions show whether observation is building capability or only reinforcing hierarchy.

A useful dashboard can sample conversation notes and classify them as inspection, reminder, coaching, or risk redesign. This connects directly to safety conversations that change behavior, since the script matters less than the supervisor's ability to hear operational truth.

5. Closure without verification creates false confidence

The fifth blind spot appears after the observation is closed. Many systems allow closure when an action owner writes completed, although no one checks whether the exposure was removed. The dashboard then reports progress, while the task continues under the same weak control.

Observation quality needs a verification layer. If the action was to repair a guard, change a traffic route, revise a permit step, or remove a stored-energy exposure, someone must verify the control in the field. A photo, a supervisor sign-off, or a repeated observation trend can support closure, but the evidence must match the risk.

During the tenure at PepsiCo South America, where the accident ratio fell 50% in six months under a 180-day plan, one lesson was that prevention requires disciplined follow-through. A metric that closes actions quickly but never tests field change rewards administrative speed over risk reduction.

This is why observation quality should be read with action effectiveness. Fast closure is useful only when it proves that the hazard was reduced and that the same exposure is less likely to appear again.

6. The metric can punish honesty

The sixth blind spot is incentive design. When leaders praise only clean areas, perfect compliance, and high card volume, people learn which observations are welcome. Hard findings disappear because they create work, expose leadership gaps, or make a department look difficult.

That pattern is especially visible when supervisors are ranked publicly by observation count or by the number of overdue actions. Ranking may create movement, although it can also teach teams to choose low-friction observations that close quickly. The metric then measures political safety, not operational risk.

In more than 250 cultural-transformation projects supported by Andreza Araujo's team, the same pattern appears across sectors. People report what leadership makes safe to report. If managers punish difficult observations through tone, delay, or career consequences, the dashboard becomes a filtered version of the work.

Executives need a counter-metric. Track the percentage of observations that identify system conditions, the share with SIF potential, the number escalated beyond the supervisor, and the quality of closure evidence. Those measures make honesty visible.

Observation-quality scorecard for EHS managers

A practical scorecard does not need to be complex. It needs to make weak records visible enough for coaching and strong records visible enough for learning. The table below gives EHS managers a starting point for auditing a monthly sample.

Dimension	Weak signal	Quality evidence
Risk relevance	Only low-consequence findings	SIF potential and energy source identified
Context	Generic comment about behavior	Task, pressure, control, and condition described
Dialogue	Reminder delivered to worker	Worker voice captured and considered
Action	Retraining or awareness only	Control repair, design change, or supervision change
Verification	Closed by status update	Field evidence confirms exposure reduction

The scorecard should be used for calibration, not punishment. If supervisors fear the audit, they will write safer-looking records. If they see it as coaching, the organization gets better evidence and better conversations.

What leaders should change next

Start by separating activity from value. Keep observation volume on the dashboard if it helps monitor participation, but never let it stand alone. Add quality sampling, SIF potential, context richness, coaching evidence, and closure verification so leaders can see whether the process changes risk.

The EHS manager should review a small sample every month with supervisors, not as an office audit but as field calibration. The best question is not who wrote the best card. The best question is which observation helped us see a risk we were about to miss.

For organizations that want to build this discipline into culture, *Safety Culture Diagnosis* offers a practical way to test whether routines, perceptions, and leadership behaviors are aligned. Observation quality is one of those routines, because it reveals whether the company wants reports that look good or truth that prevents harm.

Andreza Araujo's work helps leaders connect safety culture, behavioral dialogue, and executive decision-making without reducing prevention to a spreadsheet. Observation quality is a useful place to begin because it forces every level of leadership to ask whether the metric is telling the truth.

Topics observation-quality leading-indicators behavioral-observation safety-metrics ehs-manager supervisor

Frequently asked questions

What is observation quality in safety metrics?

Observation quality measures whether a safety observation identifies meaningful exposure, triggers a useful conversation, assigns a credible action, and verifies that risk actually changed.

How is observation quality different from observation volume?

Volume counts how many observations were submitted. Quality checks whether those observations were specific, risk-based, discussed with the worker, escalated when needed, and closed with evidence.

Which indicators show poor observation quality?

Repeated low-risk findings, generic comments, no SIF potential rating, no supervisor feedback, late closure, and actions that only say retrain the worker all point to weak observation quality.

Should observation quality be reported to executives?

Yes, but it should be summarized as a decision indicator. Executives need to see exposure themes, control weakness, escalation quality, and verified risk reduction rather than raw card counts.

Where should an EHS manager start?

Start with a sample of recent observations, score them against risk relevance and action quality, then coach supervisors on the conversations that convert weak signals into prevention.

About the author

Andreza Araújo

Safety Culture Expert | Senior EHS Executive

Andreza Araújo is a safety culture expert and senior EHS executive with more than 25 years of experience in environment, health and safety. She is a Civil Engineer and Occupational Safety Engineer from Unicamp, holds a Master's degree in Environmental Diplomacy from the University of Geneva, and completed sustainability studies at IMD Switzerland. Andreza has served in Global Head of EHS roles in Fortune 500 environments, leading cultural transformation programs across multinational operations. She has represented Brazil as a speaker at the United Nations in Paris and has spoken at the International Labour Organization in Turin. She is the author of more than 16 books on safety culture in Portuguese, Spanish, English and German. Her work has earned more than 10 EHS awards, including two recognitions from Indra Nooyi, former PepsiCo CEO.

Civil & Safety Engineer (Unicamp)
M.A. Environmental Diplomacy (University of Geneva)
Sustainability Cert (IMD Switzerland)
People Management & Coaching (Ohio University)
UN Paris speaker representative for Brazil
ILO Turin speaker
LinkedIn Top Voice
Indra Nooyi PepsiCo CEO recognition (2x)

Follow Andreza

Documentaries

Watch Andreza's documentaries

Three productions on safety culture, organizational failure and the human lessons behind major disasters.

Um Dia Para Não Esquecer

73 Segundos — O Desastre Anunciado

TITANIC — O Silêncio Que Ainda Ouvimos

Podcasts

Listen to Andreza's podcasts

She hosts three shows on safety leadership, EHS and organizational culture, in English and Portuguese.

Headline Podcast in English

Headline Podcast in Portuguese

O Conselho de Segurança

safety-indicators-and-metrics

How PepsiCo South America Cut Its Accident Ratio 50% in 6 Months and What the Metric Still Hid

A PepsiCo South America case study showing why one improved accident ratio was not enough, and how precursor metrics, field checks, and ownership made the result usable.

Andreza Araújo July 08, 2026 7 min

safety-indicators-and-metrics

How to Validate a Safety Dashboard for the Monthly Review in 8 Steps

A safety dashboard only matters when it changes a decision. Use eight checks to test definitions, ownership, field evidence, and follow-up before the monthly review.

Andreza Araújo July 07, 2026 5 min

safety-indicators-and-metrics

TRIR vs LTIFR vs DART vs SIF Rate: Which Metric Fits Board Decisions?

A board-level comparison of TRIR, LTIFR, DART, and SIF rate that separates reporting, comparison, disability impact, and fatal-risk governance.

Andreza Araújo July 05, 2026 9 min