An “incident” is defined as an unplanned interruption to an IT service, the reduction in the quality of an IT service or the failure of a CI that has not yet impacted an IT service. The purpose of incident management is to restore a service to normal operations as quickly as possible by minimizing the impact of incidents on IT services.
Incident Management is the process responsible for managing the lifecycle of all incidents by ensuring, that standardized methods and procedures are utilized to record, respond and report on all incidents. Additionally this process should increase the visibility and communication of incidents to the business and the IT support staff and thereby allowing greater alignment of incident management to the overall IT and business strategies.
In a normal IT environment the IT organization may be dealing with a large number of incidents and many of these are repeatable, something that has happened before and very well may happen again in the future. In these instances, it can be advantageous for organizations to create standardized incident models. Through this activity we can create predefined steps for detecting, diagnosing, repairing, recovering and restoring IT services which will allow us to effectively utilize our resources to increase availability or up time.
Incident Management is the process responsible for managing the lifecycle of all incidents by ensuring, that standardized methods and procedures are utilized to record, respond and report on all incidents. Additionally this process should increase the visibility and communication of incidents to the business and the IT support staff and thereby allowing greater alignment of incident management to the overall IT and business strategies.
In a normal IT environment the IT organization may be dealing with a large number of incidents and many of these are repeatable, something that has happened before and very well may happen again in the future. In these instances, it can be advantageous for organizations to create standardized incident models. Through this activity we can create predefined steps for detecting, diagnosing, repairing, recovering and restoring IT services which will allow us to effectively utilize our resources to increase availability or up time.
Incident models should have some basic requirements including:
- Steps to be taken to handle the incident.
- Predefined chronological order these steps should be taken in, with dependencies defined
- Responsibilities of who should do what (even can build some RACI components)
- Any precautions that may need to be adhered to.
- Time scales and or thresholds.
- Escalations procedures if necessary.
- Preserve any evidence for later investigations.
By utilizing these simple tools, it can greatly increase the IT supports staffs value and ability to:
- Reduce unplanned labor and cost.
- Detect and resolve incidents more effective and efficiently. (Less downtime)
- Align incident management to current business priorities.
- Identify areas of weakness in our environment, through trending historical incident records.
- Identify areas for training opportunities for both business and IT staff.
Comments
but... "...many of these are repeatable, something that has happened before and very well may happen again in the future..." this the definition of a PROBLEM? itily speaking isn't it?