Event Management Reactive to Proactive

I have been asked by many students, how do you move from that role as the fire fighter resolving incidents to the role of being able to prevent them from occurring in the first place? Much of this has to do with good design and a strong proactive problem management process, but a solid event management process is an excellent offensive weapon in the prevention of impacting incidents in your environment.

Event Management is the process that gives IT the ability to detect events, make sense of them and determine the appropriate control action. It is the basis for our operational monitoring and control. This gives us a way to compare actual performance against what was designed and written in SLAs. What is perfect about event management is that we can apply it to any aspect of our environment from delivery of a service, monitoring an individual CI, environmental conditions to software license usage.

In conjunction with the other Service Management processes, along with both passive and active monitoring tools, Event Management can indicate a change in the status of a CI, allowing the early response of the appropriate person or team. This enhances our ability to act proactively in the prevention of exceptions or incidents and insure that we can deliver those desired business outcomes without interruption. Event Management provides the foundation for creating automated operations, increasing effectiveness and efficiencies by allowing more expensive human resources to do the more complex tasks of finding ways to create a competitive advantage for the business.

Event management does not begin the day we go live with a new or changed service. In the design stage of the SM lifecycle we identify the events we want to detect. We define these notifications. Is it regular operations? Is it something unusual, but not exceptional? Or could it be some type of exception. In Transition we build and test these notifications, the tools we will use to generate them and define roles and responsibilities. In operations we implement.
  • Event detection & filtering  
  • Determine significance: Informational, Warning or Exception 
  • Correlation: Determine response on a set of predefined rules
  • Triggers: Mechanism used to initiate a response
  • Response selection: event logged, auto response, alert and human intervention. Open an RFC, open an incident or open a problem
  • Review actions: handled correctly, track trends or counts
  • Close event

Through the implementation of these activities we can begin to proactively monitor availability, reliability, capacity and overall performance and move our organizations into a position of prevention vs reaction. 


