The Best of Service Operation, Part 4
Event Management Activities
Originally Published on November 9, 2010
In an earlier blog I was asked how to move from a reactive organization to a proactive one. My answer was through the use of Event Management, along with good design and proactive Problem Management. In this installment I would like to speak to the activities within Event Management and the impact they play in our ability to deliver a consistent level of services and a stable infrastructure to deliver them across. By definition an event is any detectable or discernible occurrence that has significance for the management of the IT infrastructure or the delivery of IT services. Event Management is the process that monitors all events that occur through the IT infrastructure to allow for normal operation and to detect and evaluate the impact any deviation might cause to the IT infrastructure or delivery of IT services. Event Management has several activities that we engage when implementing this process.
- Event Notification: In the design stage (through engaging all stakeholders) we identify the events we want to detect for each CI and define and document meaningful notification data and associated roles and responsibilities. CIs can communicate status information by either polling a device or generating a notification under certain conditions. Notification types include regular operation, unusual but not exceptional operation and an exception.
- Event Detection: Events can be detected by an agent on the same system or transmitted to an event management tool.
- Event Filtering: This where 1st level correlation is performed. Determination of the significance of the event and whether the event is informational, a warning or an exception. This correlation is done by an agent on the CI. If no action is required it is logged and recorded.
- Event Correlation: If the event is significant an appropriate response is determined. This is done by a correlation engine which is part of a management tool, which compares the event with a specific set of criteria in a predescribed way and then determines a response on a set of predefined rules.
- Triggers: If correlation recognizes an event, some response will be required. This response will be initiated by a trigger. Triggers are designed specifically for the task it is to initiate. EX: Incident triggers, Problem triggers and Change triggers.
- Response selection: At this point in the process a number of response options are available and these responses can be chosen in any combination. Events can be logged. An auto response can be initiated or an alert can be sent for the purpose of initiating some type of human intervention.
- Review actions: Check that significant events or exceptions have been handled appropriately. Track tends or count event types. Reviews should not duplicate any actions taken if an incident, problem or change has been initiated.
- Close event: informational events are logged and passed to other processes. Events that generate activity in other processes (incident, problem, change) are closed by those processes.