Maintenance SLA — the complete operations guide
A maintenance SLA converts vague expectations into enforceable commitments — defining exactly how quickly operations teams must respond to and resolve equipment failures, by priority tier, with automated escalation, breach detection, and compliance reporting built in.
Used by facilities management, healthcare, manufacturing, and property operations teams · No credit card required
What is a maintenance SLA?
A maintenance SLA (Service Level Agreement) is a formal commitment that defines how quickly a maintenance team must respond to and resolve equipment failures — expressed as time-bound targets per priority tier, with escalation workflows that fire automatically when targets are at risk of being missed.
Without an SLA, maintenance response is governed entirely by whoever is available and whoever shouts loudest. The most persistent requestor gets the fastest service regardless of operational impact. The consequence is systematic misalignment between maintenance effort and business need — critical failures wait behind low-priority requests while operations teams lose confidence in maintenance responsiveness.
A well-implemented SLA programme, enforced through a CMMS, converts that chaos into a predictable, auditable system: every failure is classified by impact, every team member knows exactly what response and resolution time is expected, and the system automatically escalates when those commitments are at risk. Operations stakeholders know what to expect. Senior management can see performance data. Clients receive compliance reports.
What a maintenance SLA programme solves
- Critical failures waiting behind low-priority requests with no priority enforcement
- Operations teams with no visibility into when their reported fault will be resolved
- Senior management unable to assess maintenance responsiveness without manual data collection
- Maintenance contractors operating with no formal accountability for response times
- SLA breaches discovered by clients before the maintenance team knows about them
- No audit trail to defend response performance during contract disputes or regulatory reviews
- Multi-site operations with inconsistent service standards across locations
Glossary
- Response Time
- The elapsed time from work order creation to first technician attendance at the asset. Measures mobilisation speed, not resolution speed.
- Resolution Time
- The elapsed time from work order creation to asset restoration and work order closure. The 'time to fix' commitment.
- Priority Tier
- A classification level (P1 Emergency through P4 Planned) that maps to specific response and resolution time targets.
- SLA Breach
- A work order that was not completed within its SLA window. Triggers escalation, is recorded in the audit trail, and counts against compliance rate.
- Escalation Matrix
- The defined structure of who is notified, when, and what action they must take when SLA thresholds are approached or breached.
- Hold Time
- An approved waiting period (parts, access, approval) that may be excluded from the SLA resolution clock per the SLA policy.
- SLA Compliance Rate
- Percentage of work orders completed within their SLA window. Target: ≥95%. Primary governance metric for a maintenance SLA programme.
The four SLA priority tiers
Priority tier classification is the foundation of an SLA programme. Every incoming work order must be assigned to a tier — and the tier determines the response and resolution window the maintenance team is committed to. Classification must be based on operational impact, not the reporter's urgency.
P1 — Emergency
Immediate safety risk, life-critical system failure, or complete operational stoppageResponse
≤ 1 hour
Resolution
≤ 4 hours
The Emergency tier is reserved for failures where delay creates an unacceptable safety, regulatory, or production consequence. Typical threshold: a P1 event cannot wait a full business day without causing harm or significant economic loss. Emergency SLA windows are tight precisely because the consequences of breach are severe — this is the tier where automated escalation to senior leadership must fire before the window expires, not after.
Typical examples
- Life-support or ICU medical equipment failure
- Production line complete stoppage — zero output
- Fire suppression or life safety system fault
- Critical utility failure (primary power, water supply)
Target P1 usage: 5–10% of all work orders
P2 — Urgent
Significant operational impact with degraded but continued operationResponse
≤ 4 hours
Resolution
≤ 24 hours
The Urgent tier covers failures that meaningfully impair operations but can tolerate a same-day response window. HVAC failures in occupied spaces, significant production slowdowns, generator failures with mains power still available, and major network equipment faults typically sit at P2. The 24-hour resolution window allows for parts procurement in most scenarios while still maintaining a clear operational obligation for same-day resolution.
Typical examples
- HVAC failure in occupied building during extreme weather
- Major production slowdown — partial output only
- Server room cooling loss with temperature rising
- Primary elevator failure in multi-floor facility
Target P2 usage: 15–25% of all work orders
P3 — Routine
Minor operational impact — normal operations continue without significant disruptionResponse
≤ 24 hours
Resolution
≤ 72 hours
The Routine tier is the workhorse of a maintenance SLA — the majority of maintenance work orders should fall here. Minor equipment faults, single-room HVAC issues, non-essential fixture failures, and low-impact system anomalies are P3 events. The 72-hour resolution window allows for efficient scheduling of technician time and parts procurement without creating unnecessary urgency. High P3 volume (above 60% of work orders) is operationally healthy; it means P1 and P2 remain genuinely reserved for critical events.
Typical examples
- Single-room HVAC fault in non-critical area
- Non-essential equipment running below optimal
- Minor plumbing fault without service impact
- Lighting failure in non-critical area
Target P3 usage: 50–65% of all work orders
P4 — Planned
No operational impact — the work is deferred maintenance, an upgrade, or a non-urgent replacement that can be scheduled at a convenient time without affecting operationsResponse
≤ 5 business days
Resolution
Agreed scheduled date
The Planned tier is for maintenance work that has zero operational urgency and should be scheduled opportunistically — during planned shutdowns, low-activity periods, or batched with other planned work at the same location. P4 work orders are important to track in the SLA system because deferred items accumulate; a backlog of P4 work orders that never get scheduled represents a deteriorating asset base. The 'resolution = scheduled date' commitment means the team must agree a date within the 5-day response window, not simply leave it indefinitely.
Typical examples
- Preventive maintenance deferred from normal schedule
- Non-urgent component upgrade
- Cosmetic repairs with no service impact
- Minor configuration adjustment during next planned window
Target P4 usage: 10–20% of all work orders
P1 inflation is the single most common SLA design failure. When operations teams use P1 to guarantee fast response — rather than because the failure genuinely warrants emergency intervention — it erodes the tier's meaning, overwhelms technician capacity, and makes genuine emergencies harder to triage. Enforce tier criteria rigorously: if a failure is not creating immediate safety risk or complete operational stoppage, it is not P1.
The SLA lifecycle — from fault report to compliance record
A CMMS-enforced SLA runs automatically through eight stages — from the moment a fault is reported to the final compliance record that feeds the operations dashboard.
Work Order Creation & Priority Assignment
A fault is reported — by a technician, operations manager, or external stakeholder — and a work order is created in the CMMS. At creation, the priority tier is assigned: either manually by the creator based on the defined criteria, or automatically by the system using asset criticality mapping. The priority tier determines which SLA targets apply to this work order. If the system has automated priority classification configured (e.g., any fault on a Critical-rated asset defaults to P1), the assignment is immediate and consistent — not dependent on the reporter's judgment. The work order moves to Open status and the SLA engine activates.
Outputs
SLA Clock Activation
The moment the work order is created, two SLA timers activate simultaneously. The response timer counts up from zero — measuring time elapsed toward the response SLA deadline. The resolution timer also starts counting, running independently of the response timer. Both timers run continuously in real time. On the SLA compliance dashboard, this work order now appears in the 'Active SLAs' view with its deadline and current elapsed time visible. If 75–80% of the response window elapses without a response being logged, the pre-breach escalation fires automatically.
Outputs
Technician Assignment & Dispatch
The Maintenance Manager or Engineering Head assigns the work order to a technician — or an automated assignment rule routes it based on skill, location, or shift. The assigned technician receives an immediate push notification identifying the asset, fault description, priority tier, and the SLA deadline. At this stage the work order status moves to Assigned. The response clock is still running. The technician must physically attend the asset and log their response within the response SLA window. On multi-site operations, the system may also route to the nearest available technician based on location.
Outputs
First Response (Response SLA Clock Stops)
When the technician arrives at the asset and updates the work order status to In Progress, the response SLA clock stops. The elapsed time from creation to this moment is recorded as the response time for this work order. The system immediately compares the recorded response time against the SLA target for the assigned priority tier: if it falls within the target, the response SLA is logged as Met; if it exceeds the target, a Response SLA Breach is recorded with the timestamp. The resolution clock continues running from creation — the clock does not reset at response. The technician begins diagnosis.
Outputs
Active Investigation & Repair
The technician works through the diagnosis and repair. As the resolution SLA window depletes, the dashboard shows the remaining time. At 75% of the resolution window consumed, a pre-breach escalation notifies the supervising manager: 'Work order [ID] approaching resolution SLA deadline — confirm ETA for closure.' The manager can review current status and intervene with additional resources if the work order is unlikely to close within the SLA window. This pre-breach notification is the most operationally valuable escalation event — it creates the opportunity to prevent a breach, not just document one.
Outputs
Hold Time Management (If Applicable)
If the resolution is blocked by a factor outside the maintenance team's control — parts awaited from a supplier, access not yet granted by the tenant, specialist subcontractor en route — the work order status moves to On Hold. Depending on the SLA policy, the resolution clock may pause during approved hold periods. The hold reason, start time, and end time are recorded on the work order. When the blocker is cleared (parts arrive, access granted), the status reverts to In Progress and the clock resumes. Excessive hold time (over 30% of resolution time) is tracked separately in the compliance dashboard and may indicate procurement or access process problems.
Outputs
Resolution & Work Order Closure
The fault is resolved and the asset is restored to service. The technician submits the work order for manager review: completing the fault description, logging labour time, materials consumed, and any findings. The manager verifies the resolution, reviews the cost log, and closes the work order. Closure is the event that stops the resolution clock. The elapsed time from creation to closure is the resolution time for this work order. The system immediately compares it against the priority tier's resolution SLA target and records the outcome: Met or Breached.
Outputs
SLA Performance Recording & Escalation (If Breached)
Closure triggers the final SLA accounting: both response and resolution outcomes are permanently recorded on the work order. If either SLA was breached, the breach event — with timestamps and elapsed times — is recorded in the immutable audit trail. The work order's SLA data feeds into the compliance dashboard: compliance rates update, breach counts increment, and the work order appears in the breach log. If the resolution SLA was breached, an escalation notification fires to the Maintenance Manager or Operations Director summarising the breach: which tier, what asset, by how long the target was exceeded, and the root cause reason entered by the technician.
Outputs
How maintenance SLA operates across industries
SLA frameworks apply differently across operational environments. Here is how mature maintenance SLA programmes run in four industries — including what happens when the SLA is at risk.
Operational outcomes
- P1 response: 27 min of 60 min SLA — 33 minutes under deadline
- P1 resolution: 2h 4min of 4h SLA — full audit trail including calibration record
- On-call response protocol validated through SLA timestamps
Operational outcomes
- Response: 1h 44min of 4h SLA ✓ — hold time documented and excluded per contract terms
- Resolution (net): 3h 14min of 24h SLA ✓
- 97.2% quarterly compliance report exported automatically for client review
Operational outcomes
- Response: 11 min of 60 min SLA ✓ — 49 minutes under deadline
- Resolution: 2h 35min of 4h SLA ✓
- Three P1 events in 90 days identified — PM schedule review initiated
Operational outcomes
- 12 buildings with three differentiated SLA profiles — all auto-applied at work order creation
- Compliance segmented by property class: Class A 98.4%, Class B 95.8%, Class C 97.1%
- P2 out-of-hours breach pattern identified and escalated to contractor before next cycle
Key components of a maintenance SLA system
A CMMS-enforced SLA is built from interconnected components. Understanding these reveals how a well-configured system converts SLA policy into automatic, auditable compliance enforcement.
SLA Policy Definition
The foundational document — or system configuration — that defines response and resolution targets for each priority tier. Includes: tier criteria (what constitutes P1 vs P2 vs P3), time targets per tier, hold time exclusion rules, business hours vs 24/7 coverage scope, and escalation thresholds. The SLA policy is the source of truth for the entire SLA system; all downstream enforcement is based on it. It must be reviewed and formally agreed with all stakeholders before the system is activated.
Governs: all SLA timers, escalation thresholds, and compliance calculation
Priority Classification Engine
The rules that determine which priority tier is assigned to each work order at creation. May be entirely manual (the creator selects the tier), partially automated (asset criticality rating defaults the tier, creator can override), or fully automated (fault type and asset criticality together determine the tier without human input). Automated or partially automated classification produces more consistent tier assignment than fully manual selection — removing the variability introduced by different reporters' urgency perceptions.
Inputs: asset criticality, fault type, location. Output: SLA priority tier
Response Timer
Activates at work order creation and tracks elapsed time toward the response SLA deadline. Stops when the assigned technician updates the work order status to In Progress — the moment of first physical attendance. The system records the exact response time and immediately evaluates compliance against the tier's response target: Met if within the window, Breached if over. Response timers run on 24/7 real-time by default; for maintenance operations that only operate during business hours, the timer can be configured to count business hours only.
Starts: on work order creation. Stops: on status → In Progress
Resolution Timer
Activates simultaneously with the response timer and runs until work order closure. Measures total elapsed time from fault report to asset restoration. Unlike the response timer, the resolution timer continues through technician assignment and active repair — it only stops at work order closure. The resolution time is compared against the priority tier's resolution target to determine compliance. For operations with approved hold time exclusions, the resolution clock may pause during documented hold periods.
Starts: on work order creation. Stops: on work order closure
Escalation Matrix
Defines who is notified at each escalation threshold for each priority tier. A typical matrix has three levels: pre-breach (70–80% of window consumed — supervising manager), breach (SLA window expired — Engineering Head), and extended breach (50%+ over deadline — Operations Director or client). Each escalation event specifies the recipient role, the notification content, and the required response action. The escalation matrix is configured once and fires automatically without any human decision to escalate.
Fires: at configured % thresholds. Recipients: roles, not named individuals
SLA Breach Detection
The real-time monitoring layer that continuously compares elapsed time against SLA deadlines for every open work order. As work orders approach their deadline, they surface at the top of the SLA dashboard — colour-coded by urgency (approaching, imminent, breached). When a work order crosses its SLA deadline without closure, the breach is recorded immediately and the breach escalation fires. Breach detection is always-on: it does not depend on a manager checking the dashboard.
Monitoring: continuous, real-time. Output: dashboard colour-coding, automated notifications
Hold Time Management
The mechanism for recording and managing approved hold periods — where the resolution clock may pause because the delay is outside the maintenance team's control. When a work order moves to On Hold status, the technician records a hold reason category. At On Hold exit (parts arrived, access granted), the timestamp is recorded. The hold duration, reason, and clock impact are visible on the work order. The compliance dashboard shows hold time as a separate metric — preventing hold time from being used to mask resolution performance issues.
Inputs: reason category, timestamps. Clock impact: per SLA policy
Multi-Site SLA Profiles
Distinct SLA configurations applied per location, building class, or contract. Each profile specifies its own response and resolution targets by priority tier, hold time rules, business hours scope, and escalation thresholds. Work orders inherit their SLA profile from the asset's registered location automatically — no manual selection required. Multi-site profiles enable a single operations team to manage portfolios under differentiated contracts, with each site enforced to its specific terms and each client receiving the compliance data relevant to their contract.
Assignment: automatic from asset location. Scope: per site, building class, or contract
SLA Compliance Dashboard
The real-time operations view showing: all active work orders with their SLA status (Met, Approaching, Breached), overall compliance rate for the current period, compliance breakdown by priority tier and location, breach log with root cause summary, and trend chart over the trailing months. The dashboard is the operational nerve centre for SLA management — enabling managers to see what needs intervention now and what the programme's performance trajectory looks like.
Shows: real-time SLA status, compliance rates, breach log, trend chart
SLA Audit Trail
Every SLA-relevant event on a work order is recorded in the immutable audit trail: work order creation timestamp, priority tier assignment, response timer start, status transitions with timestamps, escalation events fired, hold periods started and ended, closure timestamp, and final SLA compliance outcome (Met or Breached with elapsed times). The SLA audit trail is always complete, always timestamped, and cannot be modified retroactively. It is the definitive record for contract disputes, regulatory reviews, and client compliance audits.
Immutable. Contains: all timestamps, escalations, hold events, compliance outcomes
SLA automation and system intelligence
The operational value of a CMMS-enforced SLA comes from what it does automatically. These are the automation behaviours that enforce commitments, surface risks, and record compliance without human intervention.
SLA clock auto-activation
The response and resolution SLA clocks start automatically the moment a work order is created — no manual timer initiation required. The SLA engine applies the correct time targets based on the assigned priority tier and the location's SLA profile. Zero manual setup per work order.
Priority auto-classification
Asset criticality mapping auto-assigns priority tiers: Critical-rated assets default to P1, High-rated to P2. Removes the dependency on the reporter's subjective urgency assessment for high-stakes assets. Managers can override if context warrants, but the default is always consistent.
Pre-breach approaching alert
When 75–80% of a response or resolution SLA window is consumed without closure, an approaching-breach notification fires to the supervising manager. This is the most valuable automation in the entire SLA system: it creates an intervention window before the breach, not a notification after it.
Auto-escalation on SLA breach
When a work order crosses its SLA deadline without closure, an automatic escalation notifies the Engineering Head or Maintenance Director. The escalation specifies: work order ID, asset, priority tier, how long past the SLA deadline the work order is, and the last recorded status. No team member needs to decide whether to escalate.
Extended breach cascade
If a breached P1 or P2 work order remains open beyond a second threshold (e.g., 150% of SLA window), a second escalation fires to the Operations Director or client account manager. This cascading escalation ensures that genuinely prolonged failures surface to the highest-relevant management level.
Hold time clock pause/resume
When a work order moves to On Hold status with an approved hold reason, the resolution clock automatically pauses if the SLA policy permits hold exclusions. When the work order exits hold status, the clock resumes from where it paused. The hold duration is recorded separately and visible in the compliance dashboard.
Compliance outcome auto-recording
At work order closure, the system automatically calculates and records the response SLA outcome (Met or Breached) and resolution SLA outcome (Met or Breached), with the exact elapsed times. This happens without any manual data entry — the compliance record is created as a byproduct of normal work order closure.
Compliance rate auto-calculation
SLA compliance rate is calculated continuously as work orders close — no end-of-month manual aggregation required. The dashboard always shows the current period compliance rate, segmented by priority tier, location, technician, and work order category. Trend data accumulates automatically.
Immutable SLA performance record
Every SLA-relevant timestamp — creation, response, hold periods, closure, escalation events — is written to the immutable audit trail at the moment it occurs. The record cannot be backdated, edited, or deleted. The SLA audit trail is always complete for regulatory review, client audit, or contract dispute resolution.
Maintenance SLA best practices
The difference between an SLA programme that enforces genuine accountability and one that becomes a compliance reporting exercise is in how it is designed, classified, escalated, and reviewed.
SLA design
Ground SLA targets in operational capacity, not aspirational benchmarks
SLA targets that cannot be consistently achieved create chronic breach situations that undermine the SLA's credibility and erode team morale. Set targets based on actual demonstrated response and resolution times from historical work order data, then tighten incrementally as operational capability improves. Starting at a target you can achieve at 95%+ compliance is better than starting at an aspirational target you hit at 60%.
Define 'first response' explicitly in your SLA policy
'Response' in SLA context means physical attendance at the asset or confirmed remote intervention — not 'ticket acknowledged' or 'email replied to.' Ambiguous response definition is the most common source of SLA disputes between maintenance contractors and clients. Define it precisely: 'First response is satisfied when the assigned technician records In Progress status from the asset location.'
Agree hold time exclusion rules before activating the SLA
Hold time exclusions — periods excluded from the resolution clock because the delay is outside the maintenance team's control — must be agreed with all stakeholders before the SLA is active, not debated after each breach. Define the exhaustive list of accepted hold reasons and what documentation is required to apply them. Undefined hold rules become a source of conflict in every breach discussion.
Priority classification
Use asset criticality to auto-assign SLA priority
Configure asset criticality ratings (Critical, High, Medium, Low) and map them to SLA priority tiers. A failure on a Critical-rated asset defaults to P1; High-rated to P2. This ensures consistent priority assignment that doesn't depend on the reporter's subjective urgency assessment — which is particularly important for out-of-hours faults reported by non-technical staff.
Review SLA tier distribution monthly — P1 inflation is a system failure signal
If the percentage of work orders classified P1 is increasing over time without a corresponding increase in actual safety or production events, the classification system is being gamed. Investigate whether managers are using P1 to get faster service rather than because failures genuinely warrant emergency response. A healthy programme shows 5–10% P1 usage.
Define failure impact, not symptoms, in your priority criteria
'Lights flickering' is a symptom; 'electrical fault in occupied area creating fall-of-person risk' is the impact. Priority criteria based on operational impact produce more consistent classification than criteria based on technical symptoms, which require expertise to evaluate quickly under pressure.
Escalation design
Make escalation automatic, not manual
SLA escalations that rely on a team member noticing a breach and deciding to escalate are systematically late. Automated escalation — triggered at 75–80% of the SLA window consumed — ensures senior awareness before a breach occurs, not after. The pre-breach window is the most valuable operational moment in SLA management: it is the last opportunity to prevent the breach.
Escalate to a role, not a named person
Escalation rules that name specific individuals fail when those individuals are on leave, unavailable, or have left. Configure escalation to roles (Engineering Head, Maintenance Manager, Operations Director) so whoever holds the role receives the escalation regardless of personnel changes. For 24/7 SLAs, the escalation role must map to an on-call rotation — not a business-hours position.
Define the required action at each escalation level
An escalation notification that says 'SLA approaching breach' without specifying the expected action is an alert, not an escalation. Level 1 escalation should require: acknowledge receipt + confirm technician is on site or en route + provide ETA. Level 2 should require: confirm resources have been added or explain why the SLA will be breached. Without defined required actions, escalation produces awareness without accountability.
Compliance monitoring
Review breach root causes monthly, not just compliance rates
A 93% SLA compliance rate tells you that 7% of work orders breached their SLA. It tells you nothing about why. Monthly root cause review — are breaches caused by parts unavailability, technician capacity, classification errors, or coverage gaps? — is what drives actual improvement. Compliance rates without root cause analysis are reporting, not management.
Track SLA compliance by technician to identify coaching needs
Aggregate compliance rates hide technician-level variation. A team averaging 95% compliance may contain one technician at 78% — a coaching and development issue, not a system problem. Technician-level visibility makes the right intervention possible: targeted coaching, workload rebalancing, or skill gap training.
Export SLA compliance reports at the agreed client reporting frequency
For external maintenance contracts, SLA compliance reporting is a contractual obligation. Automated monthly or quarterly SLA compliance reports — segmented by priority tier, site, and work order category — build client confidence and provide the evidence base for contract renewal discussions. Delivering reports proactively (before the client asks) is itself a differentiating signal of mature operations management.
Maintenance SLA metrics and KPIs
A maintenance SLA that is not measured is not managed. These KPIs provide the operational data to govern the programme, hold teams accountable, and demonstrate service quality to clients and senior management.
SLA Compliance Rate
PercentagePercentage of work orders completed within their SLA window — combining both response and resolution compliance. The primary governance metric for the entire programme. Tracked at overall level and broken down by priority tier, location, and technician. A declining compliance rate is an early warning signal requiring root cause investigation before the decline becomes a contract or regulatory issue.
Target: ≥ 95%
Response SLA Compliance Rate
PercentagePercentage of work orders where first technician attendance occurred within the priority tier's response time target. Tracked separately from resolution compliance because response breaches and resolution breaches have different root causes: response breaches typically indicate coverage or dispatch problems; resolution breaches typically indicate parts, complexity, or capacity problems.
Target: ≥ 98% (stricter than resolution)
Mean Time to Respond (MTTR)
Minutes / HoursAverage elapsed time from work order creation to first technician attendance, measured separately per priority tier. P1 MTTR should be well below the 60-minute target to provide a safety buffer. Rising MTTR over time signals a coverage, dispatch, or capacity problem before it manifests as a breach rate increase.
P1 target: ≤ 45 min avg (buffer vs 60 min SLA)
Mean Time to Resolve (MTTRe)
HoursAverage elapsed time from work order creation to closure, per priority tier. Distinct from MTTR (time to respond). MTTRe reveals resolution capability — how quickly the team actually fixes problems once they are on site. Tracked separately for each tier to identify whether resolution struggles are concentrated in specific complexity levels.
P1 target: ≤ 3h avg (buffer vs 4h SLA)
SLA Breach Rate
PercentagePercentage of work orders that breached their SLA window — the inverse of compliance rate. More useful when segmented by breach severity: marginally breached (within 10% over deadline), significantly breached (10–50% over deadline), and critically breached (over 50% of SLA deadline). Severity segmentation reveals whether breaches are systemic timing problems or occasional catastrophic failures.
Target: < 5% overall
Escalation Rate
PercentagePercentage of work orders that triggered at least one escalation event — either pre-breach or breach. High escalation rate signals that operational capacity is consistently failing to meet SLA windows without senior intervention. Low escalation rate (under 5%) indicates the programme is running without chronic strain. Escalation rate by tier reveals whether P1 escalations are disproportionate.
Target: < 5% overall; P1 < 10%
Hold Time as % of Resolution Time
PercentageThe proportion of total resolution time spent in approved hold periods, averaged across work orders. Excessive hold time (over 30% of resolution time) indicates that parts procurement, access management, or approval processes are the primary constraint on resolution speed — not technician capability. This metric reveals where the operational bottleneck actually is.
Target: < 20% for P1 and P2
SLA Compliance by Location
Percentage per siteCompliance rate broken down by location, building, or site. Reveals sites with structural performance gaps — whether caused by insufficient technician coverage, poor contractor assignment, challenging access conditions, or parts supply issues. Location-level compliance data is essential for multi-site portfolio management and for identifying which contracts need remediation attention.
All sites target: ≥ 95%
Repeat Breach Rate
PercentagePercentage of SLA breaches that occur for an asset or location that has breached before — within the same quarter. High repeat breach rate indicates that root cause remediation is not happening: the same systemic problem (coverage gap, parts unavailability, chronic understaffing) is causing recurring breaches without the process improvements that should follow a breach review.
Target: < 2% of total work orders
SLA programme maturity benchmark
Ad-hoc (Level 1)
No formal SLA · Response driven by who shouts loudest · No timer tracking · No compliance data · Contract disputes unresolvable
Defined (Level 2)
SLA targets defined in policy but manually tracked · Spreadsheet breach log · Compliance 70–85% · No auto-escalation · Monthly manual reporting
Optimized (Level 3)
CMMS-enforced SLA · ≥95% compliance · Auto-escalation active · Multi-site profiles · Client reporting automated · Breach root cause reviewed monthly
Ad-hoc maintenance vs manual SLA tracking vs CMMS-enforced SLA
Three operating models exist for maintenance SLA management. The difference in operational outcome, audit readiness, and client confidence between them is significant.
| Dimension | Ad-hoc (No SLA) | Manual SLA Tracking | CMMS-Enforced SLA |
|---|---|---|---|
| SLA visibility | None — no agreed targets exist | SLA document exists but not visible in workflow | Real-time dashboard with countdown timers per work order |
| Response timer tracking | Not tracked — no measurement | Manual timestamp logging — error-prone, inconsistent | Automatic — starts at creation, stops at first response |
| Breach detection | Discovered when client or operations team complains | Discovered during end-of-month spreadsheet review | Real-time — breach detected the moment SLA window expires |
| Auto-escalation | Not available — escalation is manual if it happens at all | Manual — depends on manager noticing and deciding to escalate | Automatic — fires at 75–80% of window consumed and at breach |
| Multi-site SLA profiles | Not applicable — no SLA to differentiate | Possible in theory — practically unmanageable across sites | Native — each location has its own SLA profile, auto-applied |
| Priority classification | Whoever shouts loudest — no system | Manual — depends on reporter's judgment, highly variable | Automated from asset criticality + configurable override |
| Hold time management | Not tracked — no distinction between active and hold time | Noted in spreadsheet — no clock impact calculation | Managed: reason recorded, timestamps logged, clock paused per policy |
| Client compliance reporting | Cannot be produced — no data exists | Hours of manual aggregation — monthly or quarterly | One-click export — any date range, filtered by site or tier |
| Audit trail | No audit trail — breach claims are unresolvable | Partial — spreadsheet entries can be edited retroactively | Immutable — every timestamp is locked at the moment it occurs |
| Historical trend analysis | Not possible | Possible with significant manual effort across multiple spreadsheets | Automatic — trailing 12-month trend visible in dashboard at any time |
Practical recommendation: If you are currently operating with ad-hoc or spreadsheet-based SLA tracking, the immediate priority is not to tighten SLA targets — it is to build the measurement and enforcement infrastructure first. Implementing CMMS-enforced SLA at current performance levels establishes the data baseline. Once 6 months of clean compliance data exists, target tightening and escalation refinement can be done with operational evidence rather than guesswork.
Frequently asked questions
Detailed answers to the questions maintenance managers, operations directors, and FM contractors ask most frequently about maintenance SLA design and enforcement.
What is a maintenance SLA?
A maintenance SLA (Service Level Agreement) is a formal commitment that defines how quickly maintenance teams must respond to and resolve reported equipment failures or service requests — based on the severity and operational impact of the failure. A maintenance SLA establishes: priority tiers (Emergency, Urgent, Routine, Planned); response time targets per tier (the maximum time before a technician must attend); resolution time targets per tier (the maximum time to restore the asset to service); escalation triggers (what happens if targets are not met); and measurement and reporting methods (how compliance is tracked and communicated). SLAs convert vague expectations — 'fix things quickly' — into specific, measurable commitments that can be monitored, reported, and enforced.
What is the difference between response time and resolution time in a maintenance SLA?
Response time and resolution time measure two different SLA obligations. Response time is the period from work order creation to first technician attendance at the asset — the 'we will show up within X hours' commitment. The response clock starts when the fault is reported and stops when a qualified technician physically arrives at the asset (or confirms remote engagement). Resolution time is the period from work order creation to asset restoration and work order closure — the 'we will fix it within Y hours' commitment. Response time measures how quickly the team mobilises; resolution time measures how quickly the problem is actually solved. Both matter: fast response with slow resolution still leaves assets out of service. Mature SLA frameworks track both separately by priority tier, because response and resolution timelines behave very differently depending on fault complexity and parts availability.
What happens when a maintenance SLA is breached?
When an SLA is breached, three operational consequences should occur. First, automated escalation: the CMMS should notify the Engineering Head, Maintenance Manager, or Operations Director immediately — not after the fact. Automated escalation before the breach (at 80% of the SLA window consumed) is even more valuable, allowing senior management to intervene before the SLA is technically breached. Second, breach documentation: the CMMS should record the breach event with timestamps — when the SLA window expired, by how long the actual completion exceeded the target, and what status the work order was in at the breach point — creating an immutable audit trail. Third, root cause review: SLA breaches should be reviewed in a regular operational meeting — examining why the breach occurred (capacity, parts, classification error, coverage gap) and what systemic change prevents recurrence. Under external maintenance contracts, SLA breaches may also trigger financial penalties specified in the contract terms.
How many SLA priority tiers should we have?
Most maintenance operations should have four priority tiers: Emergency (P1) for immediate safety risks or complete operational failure requiring maximum-urgency response; Urgent (P2) for significant operational impact requiring same-day response; Routine (P3) for minor issues that don't affect normal operations, requiring response within 24–72 hours; and Planned (P4) for deferred maintenance with no operational impact. Four tiers provide enough granularity to differentiate genuine urgency without creating a classification system so complex that consistent application becomes difficult. The most common failure mode is P1 tier inflation — using P1 for work that is genuinely P2 or P3. A healthy P1 usage rate is typically 5–10% of all work orders; if P1 is consistently above 20%, the classification criteria are not being applied correctly or are genuinely too broad.
How does CMMS software enforce SLA compliance?
A CMMS enforces SLA compliance through four mechanisms. First, automatic SLA clock activation: when a work order is created, the CMMS assigns a priority tier and starts the response and resolution timers — no manual tracking required. Second, real-time breach monitoring: the CMMS compares the current time against SLA deadlines for all open work orders continuously, displaying approaching breaches in a dashboard and triggering notifications at configurable thresholds. Third, automated escalation: when a work order approaches or crosses its SLA deadline without closure, the CMMS automatically notifies the configured escalation recipients — ensuring senior awareness without relying on team members to notice and manually escalate. Fourth, immutable compliance recording: when a work order closes, the CMMS records whether the response and resolution SLAs were met, the elapsed times, and any breach events — creating a permanent compliance record that feeds reporting dashboards and cannot be retroactively altered.
Can SLA timers be paused during maintenance holds?
This depends on the SLA policy configured for the specific maintenance context. Many maintenance SLA frameworks include 'hold time exclusions' — periods where the SLA clock is paused because the delay is outside the maintenance team's control. Common excludable hold reasons include: waiting for spare parts from a supplier (if the part is not a stocked item), waiting for access to restricted areas (tenant permission or third-party site access), waiting for specialist subcontractor attendance, or waiting for management approval of repair costs above a threshold. Hold time exclusions should be agreed explicitly in the SLA policy or contract — not applied unilaterally. When hold time is used, it should be documented with a reason, start timestamp, and end timestamp. Excessive hold time use (over 30% of total resolution time) is itself a performance indicator worth monitoring, as it may signal procurement or process problems rather than genuine uncontrollable delays.
What is an SLA escalation matrix?
An SLA escalation matrix is a structured document that defines who is notified, when, and what action they must take when SLA performance thresholds are approached or breached. A well-designed escalation matrix has three or four levels: Level 1 fires when 70–80% of the SLA window is consumed — notifying the supervising manager to check status and confirm the work order will close on time. Level 2 fires at SLA breach — notifying the Engineering Head or Maintenance Director with a required acknowledgement response. Level 3 fires when breach extends beyond a defined threshold (e.g., 50% over SLA deadline) — notifying the Operations Director. Level 4 (for external contracts) may notify the client account manager. Key design principles: escalation should be automatic (not dependent on a team member deciding to escalate), should escalate to a role (not a named individual), and should specify what action the recipient must take — not just inform them of the breach.
How do you set realistic SLA response and resolution targets?
Realistic SLA targets should be set in three steps. First, establish a baseline using historical data: if you have a CMMS with work order history, run a percentile analysis — what is the actual P75, P90, and P95 response and resolution time for each priority tier? Setting SLA targets at P75 means 75% of current performance already meets the SLA, which is achievable but not challenging. Second, adjust for operational constraints: SLA targets must account for technician availability (on-call coverage hours), travel time (for multi-site operations), and parts procurement reality. A 4-hour resolution SLA for a component that takes 6 hours to procure from the nearest supplier is a target you will breach consistently regardless of team performance. Third, improve incrementally: start with targets your team can consistently achieve (95%+ compliance), establish the SLA discipline and system, then tighten targets by 10–15% each annual review cycle as performance data validates capability improvement.
How do you report SLA performance to clients or senior management?
SLA compliance reports should be structured around three components. The headline metric: overall SLA compliance rate for the period — the percentage of work orders completed within their SLA window. The tier breakdown: compliance rate separately for each priority tier, because a high overall rate can mask poor performance on the critical P1 tier specifically. The trend analysis: compliance rate over the trailing 6–12 months — showing whether performance is stable, improving, or degrading. For external client reporting, add the site or location breakdown and a breach root cause summary — explaining what caused the breaches and what remediation is underway. In UniAsset, SLA compliance reports can be exported directly from the dashboard, pre-structured for client presentation, covering any date range and filtered by site, technician, or work order category.
What causes SLA breaches in maintenance operations?
SLA breaches in maintenance operations have five primary root causes. Technician capacity gaps: more work than available technicians can complete in the SLA windows, particularly during peak failure periods — a resource allocation problem. Parts unavailability: resolution SLAs cannot be met if critical parts are not in stock and take longer to procure than the SLA window allows — managed through critical spares inventory and pre-approved supplier agreements. Coverage gaps: for 24/7 SLA obligations, out-of-hours coverage relies on on-call arrangements; inadequate out-of-hours coverage means night-time and weekend SLAs systematically breach. Priority misclassification: using P1 for work that should be P2 or P3 inflates emergency workload and reduces the team's ability to respond to genuine P1 events within their tighter SLA window. And communication failure: technicians not receiving notifications promptly, or managers not monitoring the SLA dashboard, leaves breaches developing without intervention.
How does UniAsset handle multi-site SLA management?
UniAsset supports differentiated SLA profiles per site, location, or asset category — reflecting the reality that different buildings, contracts, and asset types operate under different SLA terms. Each location can be configured with its own SLA policy: different response times, different resolution windows, different escalation thresholds. When a work order is created for an asset at a specific location, it inherits the SLA profile for that location automatically — no manual SLA selection required. The SLA compliance dashboard can be filtered by location, enabling operations managers to see compliance performance across the entire portfolio and identify which sites are underperforming. For multi-client property management organizations, this means each client's contracted SLA terms are enforced independently, and client-specific compliance reports can be exported without manual aggregation.
What is the difference between internal and external maintenance SLAs?
Internal SLAs govern the performance commitments made between a maintenance team and the internal operations, production, or facilities stakeholders they serve — within the same organization. External SLAs are legally binding contractual commitments made between a maintenance service provider (FM contractor, equipment maintenance company) and their client. The key differences are enforceability and consequences. Internal SLAs are governance tools — breaches trigger internal escalation and management review, but there is no financial penalty. They create operational discipline and accountability without contractual exposure. External SLAs are contracts — breaches may trigger financial penalties, credit notes, termination clauses, or reputational damage. Both types require the same CMMS infrastructure to track and enforce, but external SLAs require stricter configuration (no retroactive status changes, immutable audit trail, client-accessible reporting) because they are subject to third-party audit.
Get started
Enforce your maintenance SLA in UniAsset — free.
Automatic SLA timers, priority tier classification, escalation workflows, breach detection, and compliance reporting — in one system of record for every work order.
Used by facilities management, healthcare, manufacturing, and property operations teams. No credit card required.