We are processing auto-generated incident tickets for hardware warning and critical conditions for all monitored devices, using a handful of alert rules to try and manage all the variations in behavior from monitored components in servers and network devices.
One difficulty is how to deal with monitored objects that float above and below their monitored thresholds intermittently (like power supply voltage, board battery status and temperature sensors).
We have attempted to compromise with the alert recipients by setting insane levels of suppression in the trigger and reset, but I'd welcome suggestions on a more eloquent solution for hardware alerting.
Trigger Condition:
Reset Condition: