Wolf! – Meaningful Messages and Alarming Alarms

By Daniel Birket of Birket Engineering, Inc. Originally written Jan-2002

Synopsis

A poorly designed fault message system can desensitize operators to real problems. This document discusses control systems that cry, “Wolf!” and presents techniques to help keep messages meaningful.

Ancient Wisdom

Control systems depend on their human operators to handle problems beyond the computer’s capabilities. While a control system is usually very good at managing the details of complex equipment, it falls far short of the ability to handle every problem that may (and eventually will) occur. System designers close the gap between the expected, designed-for situations and the rest of the universe of possible situations by designing the system to yell for help with alarm messages. But yelling for help too often can lead to trouble.

Aesop’s fable of “The Shepherd’s Boy” teaches that people will eventually ignore a false alarm. Control systems that demand the operator’s attention too often or for too little reason tend to lose the operator’s attention instead. When the operator begins to treat alarm messages as a nuisance, the ability of the message to help insure safety is impaired or lost. If slapping the [Silence Alarm] button has become a Pavlovian response to the sound of the alarm buzzer, the alarm system is no longer effective – and no one will respond when the wolf really comes.

Don’t be a Nuisance

A nuisance message is the most common way that a system cries “Wolf!” Any message that the operator feels is a waste of time is a “nuisance”. Frequent nuisance messages will quickly train the operator to slap the [Silence Alarm] button without investigating.

A message may be labeled a nuisance for several reasons:

  • False Trigger: The message appears in response to an event other than the intended trigger. For example: A “Sensor failure” message triggered by turning a subsystem on or off.
  • Hair Trigger: The message appears when the system is operating outside its nominal range, but still within its tolerance limits. For example: A “Response failure” message when the response was merely a little slower than usual.
  • Poor Trigger: The parameters that trigger the message don’t consider all pertinent conditions. For example, a “water level too low” alarm that doesn’t matter if the water pumps are not running.
  • Misunderstood: Sometimes “nuisance messages” are simply poorly worded or not explained well. If the message doesn’t mean anything to the operator and doesn’t seem to affect anything, it will probably be regarded as a nuisance.

Consequences

One good rule-of-thumb for message systems is “If the system stops, its important to explain the problem with a message.” Sometimes this leads operators to believe the converse too – “If the system doesn’t stop, the message isn’t important.” This can lead to trouble when the system displays a warning message, but leaves it up the operator to deal with the problem.

This problem can be managed through the careful assignment of consequences to each message. Most messages should have a reasonable, measurable, and known impact on operation. A warning message that complains, “Train is too fast exiting slowdown brake” might not receive attention for some time. A fault message that announces, “Auto mode is disabled. Train is too fast exiting slowdown brake. Use supervisor’s key.” will get the immediate attention of a supervisor.

Wachusay?

One way to help insure the long-term effectiveness of a messaging system is to write clear, easy to understand messages. Some control systems report messages with extremely short messages (perhaps due to a character limit), numeric codes alone, or even just a light. This is common and acceptable in simple systems, but more complex systems require more explanation.

Cryptic messages are harder to understand than “plain English” and may get the wrong response or no response from an operator. Consider these two messages:

  • Fault: LEL sensor 4 high
  • EMERGENCY STOP: A flammable gas leak has been detected in the Southeast corner of the facility services room in the basement. Notify security immediately at extension 7911 and start the evacuation procedure. (LEL gas detector 4)

Both are possible messages describing a signal from a lower-explosion-limit gas detector, but only one is guaranteed to elicit a timely and appropriate reaction from the operator.

With a modern messaging system, the system designer can include all of the following information in a message:

  • When? The date and time of the fault. Advanced systems can record all messages in a searchable database for easy recall and analysis days or months later.
  • What? The complete identification of the faulted device or subsystem. The name should match the name used in the system’s manuals and drawings. Advanced systems can show a picture of the device and access the appropriate manual or drawings. Park-wide messaging systems will specify the attraction and system in addition to the device.
  • Where? The exact location of the faulted device or subsystem. Advanced systems can locate the device on an architectural drawing, map, or aerial view.
  • Why? The exact cause of the fault with as much explanation as is required. Advanced systems can display the program logic that triggered the alarm.
  • Who? The people required to respond to the fault: operations supervisor, maintenance crew, or the security department. Advanced systems can notify these people directly via email, pager, or other means.
  • How? The appropriate response and troubleshooting procedure for this problem. Advanced systems can display help files containing almost any necessary material, including manufacturer’s manuals and drawings.

News and Trivia

Messaging systems must take care to separate important news from trivial information. Some systems use the same alarm mechanism (perhaps a pop-up window) to announce all events regardless of their importance. If a system beeps and posts a message every time a minor event happens, any important message will be lost in the sea of trivial messages.

One way to avoid loosing important messages in the crowd is to separate messages into classes by severity. Colorizing the severity classes is one way to help the operator to distinguish the classes and react appropriately.

Some major severity classes are:

  • FAULT – A problem that prevents continued normal operation
  • WARNING – A problem that requires operator attention, but normal operation can continue at the operator’s discretion.
  • NOTICE – Information the operator may need at the moment.
  • LOG – Information that should be recorded for possible later review.

The FAULT and WARNING classes require operator attention and should ring the buzzer and/or blink the trouble light. The NOTICE and LOG classes don’t require the operator’s attention and should not. LOG messages are not of interest at the time they occur and should not appear to the operator at all.

The FAULT class of message reports problems that stop equipment. For example: “RIDE STOP: Segment 2 of slowdown zone brakes failed to engage. Call maintenance at x7611 and make Ride Stop announcement”.

WARNING messages don’t affect operation, perhaps because there is no clearly appropriate response to the trouble. For example: “WARNING: Unusually high average train speed from lift exit to slowdown brake entrance. Adjust lift chain exit speed”?

NOTICE messages should only be sent when the operator clearly needs coaching. For example, if the operator presses the [Dispatch] button on a roller coaster without effect, the system might provide this help: “Close the queue gates before dispatching the train”. The alarm sound should not be used with NOTICE messages.

The system designer must take care to avoid using the lower levels unnecessarily. In particular, LOG messages should not be presented to the operator at all. For example: “2002-02-22 14:22:00 Train 2 dispatched”. These messages are only useful for reconstructing the events that led up to a problem.

… and Nothing but the Truth

It’s also possible for a system to lose the trust of the operator by being too helpful. Systems are typically able to detect more problems than they can accurately diagnose. A system may be able to detect that a motor is no longer running, but can’t know if it has stopped because of lack of power, overheat, overload, relay failure, mechanical failure, or something else. If the system reports “Motor 1 overloaded” every time someone turns off the manual breaker, people will not trust the system’s diagnosis. Instead the system should only report what it actually knows, in this case: “Motor 1 has stopped unexpectedly. Check switch, overload, overheat, & rotation”.

Conclusion

Large systems require well-designed message systems to help manage their complexity. Simply ringing the bell and printing a cryptic line of text for every event is not enough. Poorly thought-out messages may fail to communicate their meaning or their importance. A pattern of “nuisance” messages will erode the operator’s confidence in the system. If the attention of the operator is lost, the alarm will no longer serves its purpose of enlisting human aid – and no help will come to deal with the wolf.

Notice

This document is presented as a service to the entertainment community for informational and promotional purposes. It is not intended as engineering advice or opinion and is not guaranteed to be current, correct, or complete. Links to other web sites are not an endorsement of those sites.

You are welcome to forward this document to others interested in the safety of rides and shows or control systems in general. You’ll find similar safety articles here in the Reading Room.

Discussion

Amusement-safety and Show-control are professional, spam-free discussion groups. They are hosted at http://groups.yahoo.com by and for industry professionals. Birket Engineering, Inc. does not operate either group.