intermediate
work-orders
sla
analytics

SLA and Performance Tracking

Configure SLAs per priority, watch the breach cron, and read the analytics page for MTTR and tech leaderboard

Levy Fleets TeamMay 18, 20266 min read

SLA and Performance Tracking

You cannot improve what you do not measure. Levy Service tracks SLA adherence, MTTR (mean time to resolve), tech utilization, and per-vehicle cost so you can spot the patterns that move the metrics.

How SLAs are computed

Every task gets an sla_due_at timestamp at create time. The math is:

sla_due_at = created_at + DEFAULT_SLA_SECONDS[priority]

Default SLAs per priority (in seconds):

PriorityDefault SLA
critical4 hours (14400s)
high24 hours (86400s)
medium72 hours (259200s)
low7 days (604800s)

These defaults live in src/lib/tasks/sla.ts as DEFAULT_SLA_SECONDS. Override per-rule by setting sla_seconds on the rule row.

The breach detection cron

The cron at /api/cron/task-sla-check runs every 15 minutes. For each open task where sla_due_at < NOW():

  1. Write a row to task_sla_breaches if not already breached
  2. Set tasks.sla_breached_at = NOW()
  3. Push-notify the assignee and the ops_manager
  4. Slack-webhook the breach if a webhook URL is configured
  5. (Phase 4) Auto-escalate to a vendor if the rule says so

A task can only breach once. The breach row is the audit record.

Reading the analytics page

The analytics page at /dashboard/tasks/analytics has three primary surfaces:

MTTR over time

A line chart showing mean time to resolve, broken down by task type, over the last 90 days. The target line is the SLA threshold. If your line is consistently below the threshold, your SLAs are too loose. If it's consistently above, your team is overloaded or your SLAs are too tight.

Pulled from GET /api/tasks/analytics/mttr.

Tech leaderboard

A sortable table with one row per tech, showing:

  • Tasks completed (last 90 days)
  • Average resolve time
  • SLA breach count
  • Total cost (parts + labor)
  • Utilization (% of clocked hours with an active task)

Pulled from the technician_performance materialized view, refreshed nightly at 03:30 by /api/cron/refresh-tech-performance. This means the leaderboard is up to ~24 hours stale — fine for trend analysis, not for real-time staffing.

Cost-per-vehicle heatmap

A grid heatmap where each cell is a vehicle and the color intensity is the lifetime maintenance cost. Sort by zone, model, or vehicle age to spot patterns. Vehicles that show up dark red are the lemons worth retiring.

Pulled from GET /api/tasks/analytics/cost-per-vehicle.

The KPIs we track in the spec

MetricTargetSource
MTTR — critical tasks< 24htasks.closed_at - tasks.created_at where priority='critical'
MTTR — scheduled maintenance< 72hsame
% vehicles in maintenance at any time< 8% of active fleetvehicles.status rollup
Tech utilization> 60% of clocked hours have an active tasktask_assignments × shift logs
Tasks resolved per tech-day> 6tasks group by assignee_id, day
% tasks auto-created by rule engine> 50%tasks.created_by_rule_id IS NOT NULL
SLA breach rate< 5%task_sla_breaches count / tasks count
% tasks with before+after photos> 90%task_photos join

If you are above target on more than two of these, talk to your Levy CSM — usually there's a config tweak that closes the gap.

Customizing SLAs per fleet

The default SLAs apply to all fleets but can be overridden per-subaccount through the Settings → SLA Config page (Phase 5). Until that page ships, custom SLAs are set by editing rule rows directly:

  1. Open the rule at /dashboard/task-rules/[id]
  2. Set sla_seconds to your desired value
  3. Save

The new SLA applies to tasks created by that rule going forward, not retroactively.

Notification fatigue

If you have 50 critical tasks breaching SLA at once (you've had a bad day), Expo push will batch the notifications into a single digest rather than spamming 50 individual pushes. The threshold is 5 breaches in 10 minutes — past that, you get one push that says "12 SLA breaches in the last 10 minutes, tap to see."

Slack notifications are not batched; each breach is a separate webhook. If this becomes noisy, configure the rule's action_config.notify_slack: false for low-priority rules.

Diagnosing slow MTTR

Three common causes when MTTR climbs:

  1. Tech shortage — utilization is above 80%. Hire or outsource.
  2. Bad triage — too many tasks are critical when they should be medium. Tune your rule priorities.
  3. Stuck blocked tasks — techs are tapping Block and nobody is unblocking. Filter the Kanban for blocked daily.

The target you should care about most

MTTR for critical tasks under 24 hours. Every other metric flows from this one. If you nail it, your fleet is healthy.