This content is 8 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.
Last month I started a series of preparation notes as I study for my IT Infrastructure Library (ITIL®) Foundation certification:
This post continues by looking at the topic of the fourth stage in the ITIL service lifecycle: service operation.
Service Operation
Service operation has both processes and functions:
- Processes (the “how”):
- Event Management.
- Incident Management.
- Problem Management.
- Request Fulfilment.
- Access Management.
- Functions (the “who”):
- Service Desk.
- Technical Management.
- IT Operations.
- Application Management.
Goals of service operation:
“Maintain business satisfaction and confidence in IT through effective and efficient delivery and support of agreed IT services.”
(i.e. for the services you have agreed to provide)
“Minimise the impact of service outages on day-to-day business activities.”
“Ensure that access to agreed IT services is only provided to those authorised to receive those services.”
(i.e. security)
There is a continuous balancing act/conflict between:
- Internal IT vs. external business services.
- Stability vs. speed of response.
- Cost of service vs. quality of service.
Service Desk
Not just incident/problem management but all of the processes – fulfilling requests, managing access and communicating about events.
ITIL recognises four service desk structures :
- Local service desk:
- Co-located with user community.
- Local knowledge/language.
- Expensive (multiple desks).
- Possibly less knowledge transfer and inconsistent service.
- Centralised service desk:
- All locations contact a central desk, possibly with second-line behind it and various specialist teams (e.g. for technical management, application management, request fulfilment, third party).
- Reduces operational costs; simplified contact.
- Lose local knowledge and possibly language barriers.
- Single point of failure.
- Virtual service desk:
- Looks like centralised but resources in different locations.
- May route requests to teams in different locations (e.g. server support) – match skills to requirement.
- High rollout costs; inconsistent service and reporting; hard to monitor staff.
- Knowledge exchange may be difficult with remote staff.
- Follow-the-sun service desk:
- Route calls to where people are awake, e.g. 3 desks for Asia, EMEA, Americas.
- Works well in global scenarios.
- Hand over to another location at end of shift so knowledge transfer is improved.
- Expensive to maintain, needs technology to ensure connection; language constraints.
- Staff need business understanding as well as communications skills.
Increasingly, self-help service-desk functionality is part of the solution (e.g. automated phone service, direction to websites, etc.).
Technical, IT Operations and Application Management
Technical Management:
“Helps plan, implement and maintain a stable technical infrastructure to support the operations business processes.”
(Managing Technical Infrastructure, Networking, etc.)
IT Operations Management:
“Defines the department, group or team of people responsible for performing the day-to-day operational activities.”
(Managing operations – control: monitoring, backups, etc. – facilities, etc.)
Application Management:
“Working together with Technical Management, ensures that the knowledge required to design, test and manage IT services is there for resources to use.”
(Managing the applications within their lifecycle.)
Together these functions ensure that there is a stable balance for services to provide to customers.
Incident and Problem Management
An incident:
“Concentrates on restoring unexpected degradation of services or disrupted services.”
An incident is an unplanned interruption or reduction in quality of service.
A problem:
“Involves root cause analysis to determine underlying causes of incidents”
Looking at a series of incidents, possibly a trend.
Terms:
- Escalation:
- Assign additional resources to meet service level targets of customer expectations.
- Types:
- Functional: transfer incident to technical team with higher expertise.
- Hierarchical: go to a senior level of management.
- Impact:
- Measure of the effect of the incident on the business process.
- Incident (Major):
- Unplanned interruption to IT service, or reduction in quality.
- Major is highest category – total service disruption.
- Resolution:
- Action used to repair root cause of incident or problem.
- Urgency:
- How long until the incident or problem impacts the business.
- Workaround:
- Reduce/eliminate the impact of the problem, before a resolution is in place.
- Known Error Database (KEDB):
- Database of known errors. Part of the Configuration Management System (CMS).
- Proactive Problem Management:
- Identify problems that will be missed otherwise, and take action before they happen.
- Problem (Major):
- The cause of one or more incidents.
- Root cause:
- The underlying or original cause of an incident or problem.
- Threat:
- Anything that might exploit a vulnerability.
Incident Management
“To restore normal service operation as quickly as possible and minimise the adverse effect on business operations, ensuring agreed levels of service quality are maintained.”
Diagnosis and getting back up to speed as quickly as possible.
Open-In progress-Resolve-Close broken into 9 steps:
- Identification. Service desk call, or monitoring leads to an incident.
- Logging. Record all incidents (with a reference number).
- Categorisation. Type of incident for effective escalation (and reporting).
- Prioritisation. Run incidents through a matrix. Impact + Urgency = Priority.
- Initial diagnosis. First line scripts, etc.
- Incident escalation. VIPs may get more attention!
- Investigation and diagnosis. Identify the error, look for events that may have triggered the incident.
- Resolution and recovery. Apply and test fix (in a controlled manner).
- Incident closure. Ensure users are satisfied, etc. Document into problem management and KEDB.
Problem Management
“To manage the lifecycle of all problems from first identification through eventual removal. It seeks to minimise the adverse impact of incidents and problems and to proactively prevent recurrent of related errors due to incidents.”
Root cause analysis (RCA) is at the heart of problem management.
Reactive:
- Problem detection.
- Problem logging.
- Problem categorisation.
- Problem prioritisation (urgency + impact).
- Diagnosis (leading to RCA).
- Workarounds.
- Raising a “known error” (KEDB).
- Problem resolution.
- Problem closure. Major problem review to understand lessons that have been learned for the future (what went well, what didn’t?)
Proactive:
- Trend analysis – look for trends in incidents.
- Root cause analysis.
- Targeted prevention – cost benefit analysis and target areas that need most support, co-ordinated with availability and capacity management.
Event and Access Management
Event Management: How to make sure that the configuration items that are changing during rollout of a service continue to function.
“Event Management’s purpose is to manage events through their lifecycle. The lifecycle is detecting events, making sense of them and determining the appropriate control action.”
Access Management: Who gets access to a service or information necessary to continue a service.
“Access Management’s purpose is to provide the right for users to use a service, while preventing access to non-authorised users.”
Terms:
- Alert: a notification that an item needs to be changed (e.g. a threshold is met).
- Event: any change of state in a configuration item (e.g. software patching).
- Rights: who can access a service or information.
Event Management
Dealing with configuration items – assets used to deliver IT service. Also environmental conditions, software licencing, performance metrics.
- Informational events – logged.
- Warning events – e.g. meeting a threshold.
- Exception events – e.g. malware alert, CPU spike,
Access Management
- Request access.
- Verification. Verify that the person requesting access is who they say they are.
- Provide rights.
- Monitor identity status. May change status (e.g. if no longer subscribing to a service).
- Logging and tracking access. Locking for anomalies.
- Removing or restricting rights. e.g. after security breach or when no longer have permission to access the service.
Request fulfilment
“Request fulfilment helps maintain user and customer satisfaction by managing the lifecycle of all service requests from users.”
Terms:
- Request Model. Documenting the activities necessary to fulfil a request, associated timescales, etc.
- Service Request. Formal request for service – open a ticket – that can be managed. Known and planned for.
Service requests to fulfil are pre-defined (in the Service Knowledge Management System – SKMS):
- Menu selection. To find the right service (e.g. request a new laptop).
- Request tracking. System to track the request as it moves through the lifecycle.
- Financial approval. May not be needed for some services if there is no fiscal impact (e.g. a password reset).
- Other approval. Compliance, regulatory impact, etc.
- Fulfilment. Service desk takes action.
- Closure. Once the service request has been completed. Log into SKMS.
Some services may be handled in an automated fashion – self-service/self-help.
Wrap-up
The next post in this series will follow soon, looking at continual service improvement.
These notes were written and published prior to sitting the exam (so this post doesn’t breach any NDA). They are intended as an aid and no guarantee is given or implied as to their suitability for others hoping to pass the exam.
ITIL® is a registered trademark of Axelos limited.