Back to Blog
Maintenance

Why Most Maintenance Teams Get SLA Management Wrong

UniAsset Team
SLA management maintenancemaintenance SLA escalationwork order escalationSLA breach maintenancemaintenance response time

Most maintenance teams know they have SLA breaches. What they do not have is a structured accountability chain for what happens when a breach occurs.

A single notification that fires to a manager when a work order passes its deadline is not a response process. It is a notification that can be ignored, missed, or deprioritised in a busy day. If the manager does not act, nothing else happens. The work order continues to age. The equipment stays down. And the only record of the breach is an alert that nobody acknowledged.

This is not a people problem. It is a system design problem. SLA management that relies on one person responding to one alert is structurally fragile. This article makes the case for tiered escalation and explains how to configure it so that accountability is built into the system — not dependent on someone having the right morning.

What most teams get wrong about maintenance SLAs

The failure tends to take one of two forms.

No SLA at all. The maintenance team operates on informal priority judgement. Emergency situations get handled fast because everyone treats them as urgent. Routine repairs get handled when there is time. Nobody has committed to a response window in writing. When equipment stays down for three days on what should have been a 72-hour repair, there is no breach to point to — because there was no deadline to breach.

This approach works well for small teams with simple operations. It stops working when the team grows, when assets multiply, or when a compliance audit asks for documentation of response times. "We try to fix things quickly" is not an auditable SLA.

A flat alert that nobody escalates. The organisation has SLA windows defined. Work orders have deadlines. When a deadline is breached, an alert fires — to the maintenance manager, usually, or to the shift supervisor. One notification. One recipient.

If that person is unavailable, in a meeting, or simply overwhelmed by a busier-than-normal day, the alert sits in their inbox. Nothing else happens. The work order ages. There is no follow-up notification, no escalation to the next level of management, no automatic mechanism that ensures someone acts.

The flat alert model gives organisations the feeling of SLA management without the substance. It produces breach data but not breach responses.

The three components of a working SLA system

A maintenance SLA system that produces actual accountability has three components. Each is necessary. None is sufficient alone.

1. A deadline that is automatically set, not manually entered

Manual deadline entry is the first point of failure. If a manager has to type a due date when creating a work order, the quality of SLA data depends entirely on whether they remember to do it, whether they use consistent logic across different priorities, and whether they update it when circumstances change.

In UniAsset, SLA deadlines are calculated automatically at work order creation based on the priority level selected. Emergency work orders get a 4-hour window. Urgent work orders get 24 hours. Routine work orders get 72 hours. Planned work orders use the scheduled date. The calculation runs without any manual input — so every work order has a deadline from the first second, set consistently, by the same logic.

This consistency is what makes SLA reporting meaningful. If half your work orders have manually entered deadlines and half have none, your compliance rate is measuring compliance with your own data entry habits, not with actual response performance.

2. An escalation path, not just a single alert

This is the component most teams skip. Once a work order breaches its SLA, the response process should not depend on a single person seeing a single alert.

UniAsset's escalation matrix has three levels. Level 1 fires at the moment of breach and notifies the Supervisor. Level 2 fires 60 minutes after breach and notifies the Department Head. Level 3 fires 240 minutes after breach and notifies the General Manager. Each level sends a CRITICAL in-app notification and an email. Each level fires automatically, regardless of whether the previous level was acknowledged.

The escalation is not punitive — it is protective. If the Supervisor is handling another emergency, the Department Head is notified without anyone having to manually chase up. If both are occupied, the GM is notified and can make a resource decision. The system ensures the breach is visible to progressively more senior stakeholders until someone acts.

3. Priority-aware deadlines (not everything is equally urgent)

A flat SLA — "all work orders must be resolved within 48 hours" — treats a ventilator failure the same as a broken coffee machine. This fails in both directions: it creates false urgency on low-stakes work, and it does not create enough urgency on high-stakes work.

Priority-aware SLAs assign different response windows to different urgency levels. The logic should reflect operational reality. An Emergency breakdown stops operations or creates safety risk — four hours is aggressive but appropriate. A Routine repair can run for 72 hours without serious consequence. Planned maintenance has a known future date as its deadline.

The combination of priority and criticality multipliers in UniAsset means that the same priority level can produce different deadlines for different assets. An Emergency work order on a Critical asset (ICU ventilator, production line motor) gets a tighter window than the same Emergency priority on a Low-criticality asset. The system encodes the operational reality that not all assets carry equal risk.

Asset criticality changes the equation

The concept of asset criticality is the multiplier that makes SLA windows reflect operational reality rather than just urgency categories.

Two assets can have the same work order priority — Emergency — and deserve very different response windows. An ICU ventilator and an office printer are both flagged Emergency (one genuinely critical, one hyperbolic). The criticality multiplier corrects for this. The ventilator is tagged Critical (0.5× multiplier) and gets a 2-hour window. The printer is tagged Low (1.5× multiplier) and gets a 6-hour window. Both are Emergency priority. Neither has an inaccurate deadline.

This matters most at the extremes.

In healthcare: a Critical imaging system failure during a busy surgical day has a completely different risk profile from a Low-criticality patient entertainment system failure. The same Emergency priority, applied without a criticality modifier, treats them identically.

In manufacturing: a Critical production line conveyor failure costs the organisation thousands of dollars per hour of downtime. A Low-criticality breakroom refrigerator failure costs approximately the inconvenience of ordering lunch from outside. The criticality multiplier encodes this difference into the SLA calculation.

In logistics: a refrigerated trailer's temperature control unit is Critical — failure threatens cold-chain integrity and cargo value. A standard dry trailer's cargo tracking system is Medium. Identical work order priority, very different acceptable response windows.

Set criticality on the asset record. Change it as your operational context changes. It takes effect on new work orders immediately.

How escalation actually works (a real example)

Consider a hospital biomedical engineering team. An ICU ventilator fails at 14:00.

T+0: A Corrective Maintenance work order is created at Emergency priority. The asset is tagged Critical. SLA deadline: 16:00 (2 hours). Technician Ravi is assigned and notified in-app.

14:30: Ravi arrives, begins diagnostic work. Status: In Progress.

16:00 — SLA breach: Ravi has identified the fault but the replacement part is not in stock. Work order status: On Hold. SLA progress bar turns red. Level 1 escalation fires automatically: CRITICAL in-app notification and email sent to Biomedical Supervisor Priya. Escalation logged on the work order timeline.

17:00 — 60 minutes after breach: The part has been ordered but not yet delivered. Still On Hold. Level 2 fires: CRITICAL notification sent to Engineering Head Anand. Anand reviews the work order, sees the hold reason, calls the supplier to expedite, and authorises sourcing from a nearby hospital as a temporary measure.

18:15: Borrowed part arrives. Ravi installs it, tests the ventilator, submits for approval.

18:30: Anand reviews and closes the work order. Total downtime: 4.5 hours.

What the escalation produced: The Engineering Head was automatically informed at Level 2, which prompted a decision (sourcing from another hospital) that would otherwise have required someone to manually escalate to him. That decision reduced downtime by an estimated 2 hours. The complete timeline — assignment, breach, escalation, resolution — is preserved in the work order record for the compliance audit that will follow.

No one had to remember to tell Anand. The system did it.

Configuring escalation for your organisation

When setting escalation delays, resist the temptation to set Level 1 to 0 minutes with aggressive downstream delays.

A Level 1 escalation at the exact moment of breach means the Supervisor is notified before they have had any grace period to respond. This trains the team to ignore Level 1 alerts because they fire before any reasonable response time has elapsed — and it means every breach immediately triggers a senior notification, which desensitises the chain over time.

A more effective configuration:

  • Level 1 (Supervisor): 30–60 minutes after breach. This gives the assigned technician a short window to resolve a borderline breach before escalation fires. It also means Level 1 alerts carry weight — they represent a breach that has already been given time to self-resolve.
  • Level 2 (Department Head): 2–3× the Level 1 delay after breach. If Level 1 fires at 60 minutes, set Level 2 at 120–180 minutes after breach. This gives the Supervisor a realistic window to act before escalating further.
  • Level 3 (GM): Reserve for genuinely serious breaches. Set Level 3 at 240 minutes (4 hours) after breach or more. GM-level escalations should signal that something seriously wrong has happened — not that a routine repair ran 90 minutes over its SLA window.

The right configuration depends on your industry, your asset criticality profile, and your team's response capacity. Start with defaults and adjust based on what you observe in the first month of operation.

The reporting benefit — knowing your SLA compliance rate

Once your work orders have deadlines and your escalations are configured, SLA data begins to accumulate. After 30–60 days of operation, you have enough data to answer questions that were previously impossible:

What is our SLA compliance rate? The percentage of work orders closed within their SLA deadline. This is the primary KPI for maintenance team performance. A 95% compliance rate on Emergency work orders means one in twenty critical breakdowns is not being resolved within the committed window — which is the baseline for an improvement conversation.

What is our reactive-to-proactive ratio? If corrective maintenance work orders outnumber preventive maintenance work orders by a wide margin, the PM programme is not working. The equipment is being maintained by failure rather than by schedule. That ratio, tracked monthly, tells leadership whether the investment in PM rules and scheduled service is having an effect. See work order management for more on PM work order types.

Which assets generate the most breaches? Repeat SLA breaches on the same asset are a signal worth investigating. Persistent breach patterns may indicate parts availability issues, understaffing for that equipment type, or an asset reaching the end of reliable service life. The breach data identifies the problem. The asset criticality and maintenance cost data provide the context needed to act on it.

These numbers require accurate SLA tracking at the work order level. They do not exist unless every work order has a deadline, every breach is recorded, and every resolution is timestamped. This is why the foundation — automatic deadlines, tiered escalation, consistent data — matters more than the reporting layer built on top of it.

Escalation as infrastructure, not surveillance

The case for tiered escalation is not about creating a system that blames people when things go wrong. Equipment fails. Parts go out of stock. Technicians get tied up on other emergencies. SLA breaches happen even in well-run operations.

The case for escalation is about ensuring that when a breach occurs, the right people know about it — automatically, without anyone having to manually decide to inform them. It is the infrastructure that makes accountability possible without requiring perfect human coordination in every instance.

In practice, most escalations at Level 2 and Level 3 result in resource decisions that improve outcomes: sourcing alternative parts, reassigning a technician, calling in a specialist, or making a temporary operational workaround. These decisions are faster when the right person is informed automatically than when they depend on someone remembering to make a phone call.

Organisations that implement tiered escalation consistently report that the most valuable outcome is not the escalation itself — it is the improvement in first-response behaviour that follows. When technicians know that an unresolved breach will reach their manager, then their manager's manager, there is a strong operational incentive to resolve quickly or communicate blockers clearly. The system creates the conditions for good behaviour without requiring surveillance.

See how SLA escalation works in UniAsset

Ready to put this into practice?

Start tracking your assets, scheduling maintenance, and gaining operational insights today.