So I recently started setting up alerts for typical network issues, such as node down, high interface utilization, etc. For a trigger action, I have it set up to send me an email and also to put a short entry into a log file. It works perfectly when I test it, but I just saw two nodes go down, and another one is currently at 96% utilization, and none of the alerts triggered. They ARE turned on. Does anyone know why these would work in a test, but not in actual production deployment? I've double and triple checked, and as far as I can see, everything is set correctly.
Here's what I have:
General tab: Alert name, description. Enable box is checked. Checks for alert every 1 minute.
Trigger Condition: Trigger when ALL of the following apply: Node Status is equal to Down (I have since changed ALL to ANY and added Node Status equal to Unreachable) Type of property to monitor: Node
Reset Condition: Reset when trigger conditions are no longer true.
Alert Suppression: Suppress Alert when ALL of the following apply: Device (This is a Custom Property, and it IS set correctly on the device that triggered) is NOT equal to Router (I only want router down messages for now)
Time of day: Default setting (24x7)
Trigger Actions: Send email to (me), Log Alert to (log file)
Reset Actions: Log alert to (log file)
Alert Sharing: Default setting, except I changed severity from warning to critical.
Again, when I test the alert, it sends me an email with all the correct data I specified in the Trigger Actions, and it writes a line into the log file. This leads me to believe that I'm not monitoring for the proper event. Basically, our provider just had a DS3 issue, and it took down two of our routers. It seems to me that that would count as the node being "down." I added "unreachable" to the alert criteria, and I just changed the default property under the Alert Sharing tab for Node Name, which was set to ${SysName} to ${NodeName}.
Now I guess I just have to sit back and wait to see if the changes I made work, unless anyone else can see any glaring errors as to how I set it up.
Just FYI, the same thing is happening with the interface utilization alerts I created. Trigger condition exists, tests work fine, alert is turned on, but it doesn't appear to trigger the alert when the citeria are met.
Edit: I'm using Advanced Alert Manager V2013.2.1 (We're planning an update soon)