In my last post on the subject of what I called "monitoring-spam" I talked about what can happen when you install a solution like NPM and turn on all alerts--or at least leave on the out-of-the-box alerts. I also asked how you all deal with those problems and promised that I'd cover how my team mitigates these issues.
Several of you have similar strategies to ones I use today, including:
* Don't monitor access ports, printers, or other devices that either tend to "flap" or aren't as critical to know about immediately. Consider that a lot of users, for instance, undock and transition from wired to wireless during meetings--do you really need to know each time that happens?
* Separate logging from monitoring. I assume you log events somehow, but that's very different from active monitoring or alerting. We log every event that happens on our network, generating hundreds of gigabytes daily (every device with power and the ability to "talk" on the network sends all logs to a central location), but that is a very different thing from actively monitoring and possibly alerting on trouble areas.
* Establish reasonable baselines for alerting. If a server uses all of its memory for a short period, that may not be as critical as a server constantly pegged. Then again, even that may not be an issue as certain servers (SQL, Exchange) will grab everything they can get.
What I'm interested in now, however, is a little more granularity in the discussion--specifically in one area: links and routing:
* Do you monitor link saturation?
* Do you monitor unicast routing tables?
* Do you monitor multicast route tables?