I often get asked when discussing operational activities or event management about how we should monitor our environment. There are several methods to accomplish this depending on what you’re monitoring, what resources you have available and the criticality of what you’re monitoring. Defining these elements will then help you choose one or more of the following methods.
- Ongoing device interrogation to determine its status.
- Resource intensive.
- Usually used proactively for critical devices or systems
- Transmits event to a listening device.
- Most commonly used method
- Requires good definition of events and instrumentation of systems being monitored.
- Requests or triggers action following an event or failure
- Used for exceptions and normal operations
- Can be used to diagnose which device is causing the failure and under what conditions.
- Used to detect event patterns that can indicate a system or service is about to fail.
- Used to determine real time status of a device or system
- Usually used for critical components or following the recovery of a failed device to ensure full recovery has taken place
- Records are correlated over time to build trends or patterns. These patterns are then defined and programmed into correlation engines for future recognition.
- Focus on real time monitoring to ensure compliance to a performance norm.
- Differs from Active Monitoring which may not be continuous.
- This, like Active Monitoring, can be resource intensive so normally reserved for critical services or components.
Exception –Based Monitoring:
- Does not report on real time performance but detects and reports on exceptions.
- Less resource intensive more cost effective.
- May result in lengthier outages (reactive).
- Used on less critical systems or services.
- Use must be reflected in SLAs and OLAs.