Andon Alert System
This topic is part of the SG Systems Global regulatory & operations guide library.
Andon Alert System: real-time abnormality signaling that triggers response, containment, and learning.
Updated Jan 2026 • andon alerts, escalation workflow, line support calls, machine states, audit trails • Cross-industry
An Andon alert system is the operational “nervous system” of a plant: it makes abnormalities visible in real time and forces a defined response path—who shows up, how fast, what must be contained, and how the event becomes learnable data instead of shift folklore. Classic Andon is a light tower and a pull cord; modern Andon is a signal + workflow + evidence chain tied to machine state monitoring, real-time shop floor execution, and governed exception handling.
If your “Andon” is just a dashboard tile, you don’t have Andon—you have a report. Real Andon changes behavior: it shortens time-to-respond, prevents hidden rework and silent quality escapes, and creates a consistent record of what broke and how the team recovered.
“An Andon alert that doesn’t trigger action is noise. An Andon alert that triggers the wrong action is risk.”
- What an Andon alert system really is
- Triggers: manual pulls, automated signals, and quality gates
- Alert lifecycle: raise → acknowledge → contain → resolve → close
- Escalation design: ownership, severity, and time-based routing
- Architecture patterns: edge, SCADA, brokers, and MES integration
- Integrity & defensibility: audit trails, edits, and “no silent closure”
- Governance: RBAC, SoD, change control, and validation
- KPIs that prove Andon is working
- Copy/paste drill and vendor demo script
- Pitfalls: how Andon gets ignored (or becomes dangerous)
- Cross-industry examples
- Extended FAQ
1) What an Andon alert system really is
Andon sits at the intersection of Lean discipline and execution control. Conceptually it supports jidoka (autonomation): stop and signal when abnormality occurs, then recover with learning—not improvisation.
In practice, Andon is a controlled mechanism for three outcomes:
- Visibility: the right people know about the abnormality immediately (no “we found out at shift change”).
- Containment: the process prevents silent bad continuation (especially when the MES/control layer uses execution-level enforcement or hard-gated manufacturing execution).
- Learning: the event becomes structured data that feeds improvement work (Kaizen, RCA, CAPA) instead of anecdote.
Andon = signal + response workflow + evidence. Remove any one of those and you get noise, heroics, or unverifiable “stories.”
2) Triggers: manual pulls, automated signals, and quality gates
Andon triggers should be explicit and governed. There are three primary classes:
| Trigger class | Examples | Best used when | Common failure mode |
|---|---|---|---|
| Manual (human-initiated) | Pull cord / button, HMI call, mobile call | Abnormality is visible to the operator but not reliably detectable by sensors | Overuse (fatigue) or underuse (fear/blame culture) |
| Automated (signal-initiated) | Fault state, blocked/starved, safety trip via machine states | You can justify the trigger from equipment evidence | False positives from noisy signals or poor thresholds |
| Quality gate (control-initiated) | Out-of-window check, verification failure, hold trigger | Quality protection must be immediate (containment) | Alerts not tied to a containment action; “alarm but keep running” |
In digitally controlled environments, quality-triggered Andon often ties to:
- in-process quality gates that require pass/fail outcomes
- in-line quality enforcement that blocks continuation when critical checks fail
- in-process compliance enforcement for prerequisite controls
3) Alert lifecycle: raise → acknowledge → contain → resolve → close
Andon only works when the lifecycle is unambiguous and time-bounded. A practical lifecycle looks like this:
| Stage | What it means | What the system should capture | Minimum control |
|---|---|---|---|
| Raised | Abnormality detected or called | Timestamp, asset, trigger source (manual/auto/gate), initial severity | Event creation must be reliable (no “lost alerts”) |
| Acknowledged | Someone owns the response | Who acknowledged, response start time | Ownership must be explicit (not “everyone saw it”) |
| Contained | Risk is controlled (stop, hold, segregate) | Containment action taken and by whom | Containment should be enforced where needed (hard gates) |
| Resolved | Fix applied / abnormality removed | Resolution notes, parts used, verification steps | Resolution should be verifiable (not “trust me”) |
| Closed | Record is complete; learning path assigned | Close timestamp, final classification, follow-up linkage | No silent closure; preserve audit history |
The lifecycle should map cleanly into an exception handling workflow so you can standardize what happens after the dust settles: review requirements, evidence collection, escalation, and trending.
4) Escalation design: ownership, severity, and time-based routing
Escalation logic is where most Andon deployments fail. People obsess over lights and screens and neglect the real design question:
“If no one responds in X minutes, who gets paged next—and what changes operationally?”
Use severity in a way that triggers different actions, not just different colors:
| Severity | Definition | Expected response | Containment expectation |
|---|---|---|---|
| Info / Assist | Help needed; process can remain safe | Supervisor/line support | Containment optional (depends on risk) |
| Stop / Technical | Equipment/process abnormality impacting output | Maintenance/engineering within target minutes | Stop or controlled degraded mode only |
| Quality / Compliance risk | Potential product risk or control failure | QA/authorized role responds | Containment required (hold/segregate/block) |
| Safety | Human safety risk | Safety response protocol | Immediate stop; do not “work through it” |
Route actions via operational orchestration tools when available, e.g. updating the production dispatch board using a dispatching rules engine so the rest of the plant doesn’t keep feeding work into a constraint.
5) Architecture patterns: edge, SCADA, brokers, and MES integration
Modern Andon systems usually combine equipment evidence, messaging resilience, and workflow logic:
- Equipment evidence: state and fault signals from PLC/HMI (see PLC tag mapping for MES).
- Supervision layer: collection/normalization through SCADA and historian-adjacent tooling (optional).
- Messaging layer: event distribution through a message broker architecture (often MQTT for plant-floor-adjacent publishers).
- Application layer: alert workflow, dashboards, mobile/paging, and integrations via MES API gateway patterns.
- Context attachment: tying events to production context via MES data contextualization so an alert isn’t “Line 3 stopped,” it’s “Line 3 stopped while running SKU X on Order Y.”
Architecturally, one hard truth applies: Andon is latency-sensitive. If alerts arrive late, operators and support teams stop trusting the system. That’s how you end up with parallel channels (radio/text/phone) and the “official” system becomes an after-the-fact log. Watch for execution latency risk indicators—slow networks, overloaded brokers, or heavyweight integrations in the critical path.
6) Integrity & defensibility: audit trails, edits, and “no silent closure”
Andon records influence decisions. In regulated or high-consequence environments, they can also become part of investigations or compliance narratives. That means the record must be trustworthy.
Minimum integrity controls:
- Attributable actions: who raised, acknowledged, contained, and closed.
- Immutable timestamps: no backdating and no “time travel” sequences (tie to ALCOA expectations).
- Audit trails for edits: if severity or classification changes, preserve the original with an audit trail (GxP).
- Controlled closure: closures require a minimum data set (reason, action, and follow-up decision) so alerts can’t be “dismissed to make the board look green.”
If the system allows “close with no reason” or “close by deleting the alert,” you’ve created a metric-laundering tool. That will eventually blow up in a management review or an audit.
When Andon events indicate quality/systemic risk, route them into formal quality workflows (see quality event management) and, when warranted, CAPA with follow-up controls like CAPA effectiveness checks.
7) Governance: RBAC, SoD, change control, and validation
Andon rules are operational controls. Treat them as controlled configuration, not a “settings screen.”
Governance controls that actually matter:
- Role-based access: only authorized roles can modify alert rules, thresholds, and routing (RBAC).
- User access discipline: define who can acknowledge/close which severities (UAM).
- Segregation of duties: the people accountable for metrics shouldn’t be able to rewrite alert truth (SoD).
- Change control: rule changes follow change control with traceable approvals and rollback plans.
- Revision control: maintain versions of rules/taxonomies (see revision control).
In validated environments, Andon behavior can be considered part of the controlled execution ecosystem and may require risk-based validation (see CSV and GAMP 5). This is especially true when alerts drive holds, blocks, or governed exception paths.
Also: don’t ignore security. A system that can be spammed or spoofed becomes either useless (ignored) or disruptive (false stops). Align to MES cybersecurity controls for authentication, authorization, and event integrity.
8) KPIs that prove Andon is working
Track KPIs that measure response discipline and learning—not vanity counts.
Minutes from alert raised to acknowledged (response ownership).
Minutes from raised to resolved (recovery speed).
% of alerts that escalate without acknowledgment.
Alerts closed with “no issue” or repeat nuisance triggers.
Same alert on same asset within 7/30 days (learning effectiveness).
How Andon correlates with OEE loss and recovery discipline.
If MTTA is improving but false alert rate is rising, you’re training the plant to respond to noise. If false alert rate is low but unacknowledged rate is high, you have an ownership problem (or a routing problem). And if repeat alert recurrence never declines, you don’t have learning—you have chronic firefighting.
9) Copy/paste drill and vendor demo script
If you want to test an Andon system seriously—internally or in a vendor demo—run drills that force the full lifecycle and expose weak governance.
Drill A — Manual Andon, real ownership, real escalation
- Trigger a manual Andon call on a line (operator-initiated).
- Verify the right roles are paged and that the alert is visible on the dispatch board.
- Do not acknowledge for the target window. Prove escalation occurs automatically to the next role.
- Acknowledge late. Confirm audit trail captures the late acknowledgment and escalation history.
Drill B — Automated trigger from machine state evidence
- Force a fault state captured through machine state monitoring (or a simulated fault tag).
- Confirm the system creates exactly one alert (no duplicates) and ties it to the correct asset.
- Clear the fault. Confirm the alert does not auto-close unless your rules explicitly allow it (and if it does, it still records the closure evidence).
Drill C — Quality-risk Andon with containment
- Simulate a failed quality gate that should trigger a Quality/Compliance severity alert.
- Verify containment behavior occurs (hold/block/segregate) consistent with in-line quality enforcement.
- Close the alert only after documenting containment and disposition path via exception handling workflow.
Drill D — Governance under stress (anti-gaming)
- Attempt to close a high-severity alert with a low-privilege account.
- Verify RBAC blocks the action.
- Re-attempt with an authorized role; require a reason and preserve the change history in an audit trail.
If a vendor can’t demonstrate these drills live, assume the system is UI-forward and control-light.
10) Pitfalls: how Andon gets ignored (or becomes dangerous)
- Alert fatigue: too many triggers, weak thresholds, and “everything is urgent.”
- No single owner: alerts seen by everyone are owned by no one.
- Escalation without consequence: escalation happens, but nothing operational changes—so people stop caring.
- Silent closures: alerts disappear without reason, destroying trust and learnability.
- Blame culture: operators stop pulling Andon because it’s punished, not supported.
- Fragile architecture: alerts drop during network hiccups because messaging isn’t resilient (no broker, no buffering).
- Misuse as a KPI weapon: optimizing for “low alert counts” drives under-reporting, not performance.
11) Cross-industry examples
- Pharma / regulated batch: quality-risk Andon events often route into quality event management and may influence CAPA and management review.
- Food processing: high-speed lines benefit from automated Andon based on blocked/starved and fault states; micro-events need filtering to avoid fatigue.
- Medical devices: Andon tied to verification failures protects traceability and reduces hidden rework; closure discipline supports defensible investigations.
- Plastics / molding: Andon signals can feed predictive maintenance by capturing recurring fault patterns with consistent timestamps and context.
- CPG / frequent changeovers: Andon can separate “planned” vs “abnormal” changeover behaviors and prevent normalization of delay.
12) Extended FAQ
Q1. What is an Andon alert system?
An Andon alert system is a real-time abnormality signaling and response workflow that makes issues visible immediately, assigns ownership, enforces containment where needed, and creates a trustworthy record for learning.
Q2. What should trigger an Andon alert?
Manual operator calls, automated equipment states/faults from machine state monitoring, and critical failures at in-process quality gates. Triggers must be governed and defensible.
Q3. How do we prevent alert fatigue?
Use thresholds, reduce duplicates, limit severities to a small set with distinct actions, and continuously prune nuisance triggers via structured review.
Q4. Do Andon events need audit trails?
If Andon is used for control, escalation, or quality decision-making, yes—especially for edits, reclassification, and closures (see audit trail (GxP) and data integrity).
Q5. How does Andon connect to MES and dispatch?
Modern Andon events are contextualized and routed through APIs/brokers, then reflected in operational tools like the production dispatch board so the plant reacts coherently, not locally.
Related Reading
• Signals & States: Machine State Monitoring | PLC Tag Mapping for MES | SCADA
• Execution & Control: Real-Time Shop Floor Execution | Event-Driven Manufacturing Execution | Execution-Level Enforcement | Hard-Gated Manufacturing Execution
• Quality & Exceptions: In-Process Quality Gates | In-Line Quality Enforcement | Exception Handling Workflow | Quality Event Management
• Architecture: Message Broker Architecture | MQTT Messaging Layer | MES API Gateway | MES Data Contextualization
• Dispatch & Response: Production Dispatch Board | Dispatching Rules Engine
• Integrity & Governance: Audit Trail (GxP) | Data Integrity | ALCOA | Role-Based Access | User Access Management | Segregation of Duties in MES | Change Control | Revision Control | CSV | GAMP 5
• Improvement: Kaizen | Root Cause Analysis (RCA) | CAPA | Management Review
OUR SOLUTIONS
Three Systems. One Seamless Experience.
Explore how V5 MES, QMS, and WMS work together to digitize production, automate compliance, and track inventory — all without the paperwork.

Manufacturing Execution System (MES)
Control every batch, every step.
Direct every batch, blend, and product with live workflows, spec enforcement, deviation tracking, and batch review—no clipboards needed.
- Faster batch cycles
- Error-proof production
- Full electronic traceability

Quality Management System (QMS)
Enforce quality, not paperwork.
Capture every SOP, check, and audit with real-time compliance, deviation control, CAPA workflows, and digital signatures—no binders needed.
- 100% paperless compliance
- Instant deviation alerts
- Audit-ready, always

Warehouse Management System (WMS)
Inventory you can trust.
Track every bag, batch, and pallet with live inventory, allergen segregation, expiry control, and automated labeling—no spreadsheets.
- Full lot and expiry traceability
- FEFO/FIFO enforced
- Real-time stock accuracy
You're in great company
How can we help you today?
We’re ready when you are.
Choose your path below — whether you're looking for a free trial, a live demo, or a customized setup, our team will guide you through every step.
Let’s get started — fill out the quick form below.































