How do we prevent Andon alert fatigue?

Use thresholds and deduplication, keep severity levels small and action-distinct, prune nuisance triggers, and ensure every alert has a clear owner and response expectation.

Do Andon alerts need audit trails?

Yes when alerts influence operational decisions, containment actions, or quality workflows. Edits, reclassification, and closures should be attributable and preserved via audit trails consistent with data integrity expectations.

Andon Alert SystemGlossary

Andon Alert System

Q: What should trigger an Andon alert?

Manual operator calls, automated equipment signals such as machine states and faults, and critical failures at in-process quality gates. Triggers must be governed, defensible, and tied to clear actions.

Q: How does Andon connect to MES and dispatch?

Alerts are typically captured from equipment signals, distributed through a resilient messaging layer, contextualized with production data, and reflected in dispatch tools so the plant responds coherently rather than locally.

This topic is part of the SG Systems Global regulatory & operations guide library.

Andon Alert System: real-time abnormality signaling that triggers response, containment, and learning.

Updated Jan 2026 • andon alerts, escalation workflow, line support calls, machine states, audit trails • Cross-industry

An Andon alert system is the operational “nervous system” of a plant: it makes abnormalities visible in real time and forces a defined response path—who shows up, how fast, what must be contained, and how the event becomes learnable data instead of shift folklore. Classic Andon is a light tower and a pull cord; modern Andon is a signal + workflow + evidence chain tied to machine state monitoring, real-time shop floor execution, and governed exception handling.

If your “Andon” is just a dashboard tile, you don’t have Andon—you have a report. Real Andon changes behavior: it shortens time-to-respond, prevents hidden rework and silent quality escapes, and creates a consistent record of what broke and how the team recovered.

“An Andon alert that doesn’t trigger action is noise. An Andon alert that triggers the wrong action is risk.”

TL;DR: A credible Andon Alert System has (1) clear trigger definitions (manual and automated), (2) time-bounded alert lifecycle (raise → acknowledge → contain → resolve → close), (3) escalation logic that routes to the right owner fast, (4) tight coupling to equipment signals via PLC tag mapping and/or SCADA, (5) resilient messaging (often message broker architecture with an MQTT messaging layer) so alerts don’t vanish during network hiccups, and (6) integrity controls: RBAC, SoD, audit trails, and change control for rules and code lists. The “gotcha” is alert fatigue: too many triggers, weak thresholds, and ambiguous ownership will train the plant to ignore the system.

Table of Contents

What an Andon alert system really is
Triggers: manual pulls, automated signals, and quality gates
Alert lifecycle: raise → acknowledge → contain → resolve → close
Escalation design: ownership, severity, and time-based routing
Architecture patterns: edge, SCADA, brokers, and MES integration
Integrity & defensibility: audit trails, edits, and “no silent closure”
Governance: RBAC, SoD, change control, and validation
KPIs that prove Andon is working
Copy/paste drill and vendor demo script
Pitfalls: how Andon gets ignored (or becomes dangerous)
Cross-industry examples
Extended FAQ

1) What an Andon alert system really is

Andon sits at the intersection of Lean discipline and execution control. Conceptually it supports jidoka (autonomation): stop and signal when abnormality occurs, then recover with learning—not improvisation.

In practice, Andon is a controlled mechanism for three outcomes:

Visibility: the right people know about the abnormality immediately (no “we found out at shift change”).
Containment: the process prevents silent bad continuation (especially when the MES/control layer uses execution-level enforcement or hard-gated manufacturing execution).
Learning: the event becomes structured data that feeds improvement work (Kaizen, RCA, CAPA) instead of anecdote.

Simple definition

Andon = signal + response workflow + evidence. Remove any one of those and you get noise, heroics, or unverifiable “stories.”

2) Triggers: manual pulls, automated signals, and quality gates

Andon triggers should be explicit and governed. There are three primary classes:

Trigger class	Examples	Best used when	Common failure mode
Manual (human-initiated)	Pull cord / button, HMI call, mobile call	Abnormality is visible to the operator but not reliably detectable by sensors	Overuse (fatigue) or underuse (fear/blame culture)
Automated (signal-initiated)	Fault state, blocked/starved, safety trip via machine states	You can justify the trigger from equipment evidence	False positives from noisy signals or poor thresholds
Quality gate (control-initiated)	Out-of-window check, verification failure, hold trigger	Quality protection must be immediate (containment)	Alerts not tied to a containment action; “alarm but keep running”

In digitally controlled environments, quality-triggered Andon often ties to:

in-process quality gates that require pass/fail outcomes
in-line quality enforcement that blocks continuation when critical checks fail
in-process compliance enforcement for prerequisite controls

Tell-it-like-it-is: If an Andon is allowed to fire without a clear “what must happen next,” you’re building a siren, not a system. Sirens get ignored.

3) Alert lifecycle: raise → acknowledge → contain → resolve → close

Andon only works when the lifecycle is unambiguous and time-bounded. A practical lifecycle looks like this:

Stage	What it means	What the system should capture	Minimum control
Raised	Abnormality detected or called	Timestamp, asset, trigger source (manual/auto/gate), initial severity	Event creation must be reliable (no “lost alerts”)
Acknowledged	Someone owns the response	Who acknowledged, response start time	Ownership must be explicit (not “everyone saw it”)
Contained	Risk is controlled (stop, hold, segregate)	Containment action taken and by whom	Containment should be enforced where needed (hard gates)
Resolved	Fix applied / abnormality removed	Resolution notes, parts used, verification steps	Resolution should be verifiable (not “trust me”)
Closed	Record is complete; learning path assigned	Close timestamp, final classification, follow-up linkage	No silent closure; preserve audit history

The lifecycle should map cleanly into an exception handling workflow so you can standardize what happens after the dust settles: review requirements, evidence collection, escalation, and trending.

4) Escalation design: ownership, severity, and time-based routing

Escalation logic is where most Andon deployments fail. People obsess over lights and screens and neglect the real design question:

Key design question

“If no one responds in X minutes, who gets paged next—and what changes operationally?”

Use severity in a way that triggers different actions, not just different colors:

Severity	Definition	Expected response	Containment expectation
Info / Assist	Help needed; process can remain safe	Supervisor/line support	Containment optional (depends on risk)
Stop / Technical	Equipment/process abnormality impacting output	Maintenance/engineering within target minutes	Stop or controlled degraded mode only
Quality / Compliance risk	Potential product risk or control failure	QA/authorized role responds	Containment required (hold/segregate/block)
Safety	Human safety risk	Safety response protocol	Immediate stop; do not “work through it”

Route actions via operational orchestration tools when available, e.g. updating the production dispatch board using a dispatching rules engine so the rest of the plant doesn’t keep feeding work into a constraint.

5) Architecture patterns: edge, SCADA, brokers, and MES integration

Modern Andon systems usually combine equipment evidence, messaging resilience, and workflow logic:

Equipment evidence: state and fault signals from PLC/HMI (see PLC tag mapping for MES).
Supervision layer: collection/normalization through SCADA and historian-adjacent tooling (optional).
Messaging layer: event distribution through a message broker architecture (often MQTT for plant-floor-adjacent publishers).
Application layer: alert workflow, dashboards, mobile/paging, and integrations via MES API gateway patterns.
Context attachment: tying events to production context via MES data contextualization so an alert isn’t “Line 3 stopped,” it’s “Line 3 stopped while running SKU X on Order Y.”

Architecturally, one hard truth applies: Andon is latency-sensitive. If alerts arrive late, operators and support teams stop trusting the system. That’s how you end up with parallel channels (radio/text/phone) and the “official” system becomes an after-the-fact log. Watch for execution latency risk indicators—slow networks, overloaded brokers, or heavyweight integrations in the critical path.

Design rule: Don’t put fragile integrations in the raise/acknowledge path. Capture the alert reliably first; enrich and synchronize second.

6) Integrity & defensibility: audit trails, edits, and “no silent closure”

Andon records influence decisions. In regulated or high-consequence environments, they can also become part of investigations or compliance narratives. That means the record must be trustworthy.

Minimum integrity controls:

Attributable actions: who raised, acknowledged, contained, and closed.
Immutable timestamps: no backdating and no “time travel” sequences (tie to ALCOA expectations).
Audit trails for edits: if severity or classification changes, preserve the original with an audit trail (GxP).
Controlled closure: closures require a minimum data set (reason, action, and follow-up decision) so alerts can’t be “dismissed to make the board look green.”

Non-negotiable

If the system allows “close with no reason” or “close by deleting the alert,” you’ve created a metric-laundering tool. That will eventually blow up in a management review or an audit.

When Andon events indicate quality/systemic risk, route them into formal quality workflows (see quality event management) and, when warranted, CAPA with follow-up controls like CAPA effectiveness checks.

7) Governance: RBAC, SoD, change control, and validation

Andon rules are operational controls. Treat them as controlled configuration, not a “settings screen.”

Governance controls that actually matter:

Role-based access: only authorized roles can modify alert rules, thresholds, and routing (RBAC).
User access discipline: define who can acknowledge/close which severities (UAM).
Segregation of duties: the people accountable for metrics shouldn’t be able to rewrite alert truth (SoD).
Change control: rule changes follow change control with traceable approvals and rollback plans.
Revision control: maintain versions of rules/taxonomies (see revision control).

In validated environments, Andon behavior can be considered part of the controlled execution ecosystem and may require risk-based validation (see CSV and GAMP 5). This is especially true when alerts drive holds, blocks, or governed exception paths.

Also: don’t ignore security. A system that can be spammed or spoofed becomes either useless (ignored) or disruptive (false stops). Align to MES cybersecurity controls for authentication, authorization, and event integrity.

8) KPIs that prove Andon is working

Track KPIs that measure response discipline and learning—not vanity counts.

MTTA (Mean Time To Acknowledge)
Minutes from alert raised to acknowledged (response ownership).

MTTR (Mean Time To Resolve)
Minutes from raised to resolved (recovery speed).

Unacknowledged alert rate
% of alerts that escalate without acknowledgment.

False/low-value alert rate
Alerts closed with “no issue” or repeat nuisance triggers.

Repeat alert recurrence
Same alert on same asset within 7/30 days (learning effectiveness).

Downtime minutes tied to alerts
How Andon correlates with OEE loss and recovery discipline.

If MTTA is improving but false alert rate is rising, you’re training the plant to respond to noise. If false alert rate is low but unacknowledged rate is high, you have an ownership problem (or a routing problem). And if repeat alert recurrence never declines, you don’t have learning—you have chronic firefighting.

9) Copy/paste drill and vendor demo script

If you want to test an Andon system seriously—internally or in a vendor demo—run drills that force the full lifecycle and expose weak governance.

Drill A — Manual Andon, real ownership, real escalation

Trigger a manual Andon call on a line (operator-initiated).
Verify the right roles are paged and that the alert is visible on the dispatch board.
Do not acknowledge for the target window. Prove escalation occurs automatically to the next role.
Acknowledge late. Confirm audit trail captures the late acknowledgment and escalation history.

Drill B — Automated trigger from machine state evidence

Force a fault state captured through machine state monitoring (or a simulated fault tag).
Confirm the system creates exactly one alert (no duplicates) and ties it to the correct asset.
Clear the fault. Confirm the alert does not auto-close unless your rules explicitly allow it (and if it does, it still records the closure evidence).

Drill C — Quality-risk Andon with containment

Simulate a failed quality gate that should trigger a Quality/Compliance severity alert.
Verify containment behavior occurs (hold/block/segregate) consistent with in-line quality enforcement.
Close the alert only after documenting containment and disposition path via exception handling workflow.

Drill D — Governance under stress (anti-gaming)

Attempt to close a high-severity alert with a low-privilege account.
Verify RBAC blocks the action.
Re-attempt with an authorized role; require a reason and preserve the change history in an audit trail.

If a vendor can’t demonstrate these drills live, assume the system is UI-forward and control-light.

10) Pitfalls: how Andon gets ignored (or becomes dangerous)

Alert fatigue: too many triggers, weak thresholds, and “everything is urgent.”
No single owner: alerts seen by everyone are owned by no one.
Escalation without consequence: escalation happens, but nothing operational changes—so people stop caring.
Silent closures: alerts disappear without reason, destroying trust and learnability.
Blame culture: operators stop pulling Andon because it’s punished, not supported.
Fragile architecture: alerts drop during network hiccups because messaging isn’t resilient (no broker, no buffering).
Misuse as a KPI weapon: optimizing for “low alert counts” drives under-reporting, not performance.

Hard truth: A weak Andon system doesn’t just fail—it teaches the organization to ignore signals. That’s worse than not having Andon at all.

11) Cross-industry examples

Pharma / regulated batch: quality-risk Andon events often route into quality event management and may influence CAPA and management review.
Food processing: high-speed lines benefit from automated Andon based on blocked/starved and fault states; micro-events need filtering to avoid fatigue.
Medical devices: Andon tied to verification failures protects traceability and reduces hidden rework; closure discipline supports defensible investigations.
Plastics / molding: Andon signals can feed predictive maintenance by capturing recurring fault patterns with consistent timestamps and context.
CPG / frequent changeovers: Andon can separate “planned” vs “abnormal” changeover behaviors and prevent normalization of delay.

12) Extended FAQ

Q1. What is an Andon alert system?
An Andon alert system is a real-time abnormality signaling and response workflow that makes issues visible immediately, assigns ownership, enforces containment where needed, and creates a trustworthy record for learning.

Q2. What should trigger an Andon alert?
Manual operator calls, automated equipment states/faults from machine state monitoring, and critical failures at in-process quality gates. Triggers must be governed and defensible.

Q3. How do we prevent alert fatigue?
Use thresholds, reduce duplicates, limit severities to a small set with distinct actions, and continuously prune nuisance triggers via structured review.

Q4. Do Andon events need audit trails?
If Andon is used for control, escalation, or quality decision-making, yes—especially for edits, reclassification, and closures (see audit trail (GxP) and data integrity).

Q5. How does Andon connect to MES and dispatch?
Modern Andon events are contextualized and routed through APIs/brokers, then reflected in operational tools like the production dispatch board so the plant reacts coherently, not locally.

BACK TO GLOSSARY

OUR SOLUTIONS

Three Systems. One Seamless Experience.

Explore how V5 MES, QMS, and WMS work together to digitize production, automate compliance, and track inventory — all without the paperwork.

Manufacturing Execution System (MES)

Control every batch, every step.

Direct every batch, blend, and product with live workflows, spec enforcement, deviation tracking, and batch review—no clipboards needed.

Faster batch cycles
Error-proof production
Full electronic traceability

LEARN MORE

Quality Management System (QMS)

Enforce quality, not paperwork.

Capture every SOP, check, and audit with real-time compliance, deviation control, CAPA workflows, and digital signatures—no binders needed.

100% paperless compliance
Instant deviation alerts
Audit-ready, always

Learn More

Warehouse Management System (WMS)

Inventory you can trust.

Track every bag, batch, and pallet with live inventory, allergen segregation, expiry control, and automated labeling—no spreadsheets.

Full lot and expiry traceability
FEFO/FIFO enforced
Real-time stock accuracy