SLA and Performance Tracking
You cannot improve what you do not measure. Levy Service tracks SLA adherence, MTTR (mean time to resolve), tech utilization, and per-vehicle cost so you can spot the patterns that move the metrics.
How SLAs are computed
Every task gets an sla_due_at timestamp at create time. The math is:
sla_due_at = created_at + DEFAULT_SLA_SECONDS[priority]
Default SLAs per priority (in seconds):
| Priority | Default SLA |
|---|---|
critical | 4 hours (14400s) |
high | 24 hours (86400s) |
medium | 72 hours (259200s) |
low | 7 days (604800s) |
These defaults live in src/lib/tasks/sla.ts as DEFAULT_SLA_SECONDS. Override per-rule by setting sla_seconds on the rule row.
The breach detection cron
The cron at /api/cron/task-sla-check runs every 15 minutes. For each open task where sla_due_at < NOW():
- Write a row to
task_sla_breachesif not already breached - Set
tasks.sla_breached_at = NOW() - Push-notify the assignee and the
ops_manager - Slack-webhook the breach if a webhook URL is configured
- (Phase 4) Auto-escalate to a vendor if the rule says so
A task can only breach once. The breach row is the audit record.
Reading the analytics page
The analytics page at /dashboard/tasks/analytics has three primary surfaces:
MTTR over time
A line chart showing mean time to resolve, broken down by task type, over the last 90 days. The target line is the SLA threshold. If your line is consistently below the threshold, your SLAs are too loose. If it's consistently above, your team is overloaded or your SLAs are too tight.
Pulled from GET /api/tasks/analytics/mttr.
Tech leaderboard
A sortable table with one row per tech, showing:
- Tasks completed (last 90 days)
- Average resolve time
- SLA breach count
- Total cost (parts + labor)
- Utilization (% of clocked hours with an active task)
Pulled from the technician_performance materialized view, refreshed nightly at 03:30 by /api/cron/refresh-tech-performance. This means the leaderboard is up to ~24 hours stale — fine for trend analysis, not for real-time staffing.
Cost-per-vehicle heatmap
A grid heatmap where each cell is a vehicle and the color intensity is the lifetime maintenance cost. Sort by zone, model, or vehicle age to spot patterns. Vehicles that show up dark red are the lemons worth retiring.
Pulled from GET /api/tasks/analytics/cost-per-vehicle.
The KPIs we track in the spec
| Metric | Target | Source |
|---|---|---|
| MTTR — critical tasks | < 24h | tasks.closed_at - tasks.created_at where priority='critical' |
| MTTR — scheduled maintenance | < 72h | same |
% vehicles in maintenance at any time | < 8% of active fleet | vehicles.status rollup |
| Tech utilization | > 60% of clocked hours have an active task | task_assignments × shift logs |
| Tasks resolved per tech-day | > 6 | tasks group by assignee_id, day |
| % tasks auto-created by rule engine | > 50% | tasks.created_by_rule_id IS NOT NULL |
| SLA breach rate | < 5% | task_sla_breaches count / tasks count |
| % tasks with before+after photos | > 90% | task_photos join |
If you are above target on more than two of these, talk to your Levy CSM — usually there's a config tweak that closes the gap.
Customizing SLAs per fleet
The default SLAs apply to all fleets but can be overridden per-subaccount through the Settings → SLA Config page (Phase 5). Until that page ships, custom SLAs are set by editing rule rows directly:
- Open the rule at
/dashboard/task-rules/[id] - Set
sla_secondsto your desired value - Save
The new SLA applies to tasks created by that rule going forward, not retroactively.
Notification fatigue
If you have 50 critical tasks breaching SLA at once (you've had a bad day), Expo push will batch the notifications into a single digest rather than spamming 50 individual pushes. The threshold is 5 breaches in 10 minutes — past that, you get one push that says "12 SLA breaches in the last 10 minutes, tap to see."
Slack notifications are not batched; each breach is a separate webhook. If this becomes noisy, configure the rule's action_config.notify_slack: false for low-priority rules.
Diagnosing slow MTTR
Three common causes when MTTR climbs:
- Tech shortage — utilization is above 80%. Hire or outsource.
- Bad triage — too many tasks are
criticalwhen they should bemedium. Tune your rule priorities. - Stuck
blockedtasks — techs are tapping Block and nobody is unblocking. Filter the Kanban forblockeddaily.
The target you should care about most
MTTR for critical tasks under 24 hours. Every other metric flows from this one. If you nail it, your fleet is healthy.