intermediate
ai-ops
troubleshooting
operations

AI Ops Troubleshooting

Common problems with the AI Ops demand surface, recommender, and technician routes — and how to diagnose them.

Levy Fleets TeamMay 18, 20266 min read

AI Ops Troubleshooting

This page covers the most common AI Ops issues and how to resolve them.

The heat map is empty

Symptom: /dashboard/analytics/heat-maps loads but no hex polygons are drawn.

Check in order:

  1. Is ai_ops_enabled = true on the subaccount? If false, no crons run for it. Set it to true and wait one cycle.

  2. Has the backfill run? A new subaccount has no forecasts until the one-shot backfill builds them. Run:

    curl -X POST "https://fleets.levyelectric.com/api/internal/forecast/backfill" \
      -H "Authorization: Bearer $AI_OPS_INTERNAL_TOKEN" \
      -d '{"subaccountId":"<uuid>"}'
    
  3. Does the subaccount have at least 14 days of ride history? Without it, the model can't fit a per-subaccount forecast. Check the rides count.

  4. Did the inference cron run recently? Look at demand_forecasts.created_at for the subaccount. If the newest row is over 2 hours old, the cron is stuck.

The "Forecast may be stale" banner is showing

Symptom: Yellow banner on the heat-map page reads "Forecast may be stale — last refresh was X minutes ago."

The banner appears when the newest forecast is more than 90 minutes old. Causes:

  • The inference cron (/api/cron/ai-ops-run-inference) is failing. Check Vercel cron logs.
  • Modal is down and the local fallback forecaster is timing out on a large subaccount.
  • The feature builder (/api/cron/ai-ops-build-features) isn't producing rows, so inference has nothing to consume.

Diagnose by checking model_runs for recent rows. If model_runs has entries but demand_forecasts doesn't, inference is succeeding but writes are failing — check the DB connection.

No rebalance recommendations appear

Symptom: /dashboard/operations/rebalance is empty.

Check in order:

  1. Is ai_ops_tier pro or enterprise? Starter tier doesn't generate recommendations.
  2. Has the recommender cron run yet? It runs at minute 25 of each hour. Wait one cycle from the time ai_ops_tier was set.
  3. Are there any candidate hexes? The recommender needs both an under-supplied hex (forecast > supply) and an over-supplied hex (supply > forecast) in the same subaccount. If your fleet is well-balanced, no recs are generated — that's correct behavior.
  4. Is projected lift positive? Each candidate is dropped if (dest_gain - src_loss) * avg_fare - distance * tech_cost_per_mile_usd goes negative. A high tech_cost_per_mile_usd can suppress all recommendations.

If you suspect the math is too conservative, try temporarily lowering tech_cost_per_mile_usd to 0.25 and waiting for the next cron run.

All vehicles flipped to maintenance unexpectedly

Symptom: A bunch of vehicles in a subaccount went to maintenance status and you didn't do it.

This is the auto-maintenance rule in action — Phase 3 marks every pickup-stop vehicle as maintenance when a technician accepts a route. It is intentional, but limited:

  • It only happens when ai_ops_tier='enterprise'.
  • It only happens after a tech taps Accept on a planned route.
  • It auto-reverts when the route is completed or abandoned.

If you want to disable it, downgrade the subaccount to ai_ops_tier='pro'. Routes will stop being generated and the rule won't apply.

To recover stranded vehicles right now, find the route in rebalance_routes (status in_progress) and abandon it. Auto-maintenance vehicles will flip back to available.

Technician's Route tab is empty

Symptom: Tech opens the Route tab but no route is shown.

Check in order:

  1. Is the subaccount on ai_ops_tier='enterprise'? If not, no routes are generated.
  2. Is the tech logged into the right subaccount? The tab is subaccount-scoped.
  3. Has the route solver cron run today? It runs every 30 min during 06:00-22:00 local. If it's outside those hours, no new routes are built.
  4. Are there any stops to route? The solver needs at least one low-battery vehicle or one accepted recommendation to produce a route. If the fleet is healthy and no recs are accepted, no route is built.
  5. Is the tech assigned to today's shift? Routes are technician-scoped — only routes assigned to this technician_id are visible.

Offline completions never sync

Symptom: A technician completed stops with no signal, came back online, and the stops still show "pending sync."

The operator-app flushes the offline queue on focus and every 60 seconds. If syncing fails for over a few minutes:

  • Pull-to-refresh the Route tab to force a sync.
  • Force-close and reopen the app.
  • Check that the tech's auth session hasn't expired. Re-login if so.

If the API is returning 4xx for complete-stop calls, check Sentry for the route ID — usually the cause is the route having been abandoned in the meantime.

Recommendations look obviously wrong

Symptom: The recommender suggests moving vehicles into a hex you know is unrideable (closed street, permit dispute, etc.).

The model doesn't know about street closures or permit restrictions. Two ways to handle:

  1. Dismiss the recommendation. The model learns from dismissal patterns — repeated dismissals of the same destination hex reduce future confidence there.
  2. Use parking zones. If a hex falls outside any valid parking zone, supply won't accumulate there and the recommender stops suggesting moves to it. Make sure your zone geometry reflects current operating reality.

Symptom: Modal billing alert fired.

The nightly retrain across all subaccounts should cost ~$15-30/month at our scale. Spikes usually mean:

  • A subaccount with abnormally large feature volume is dominating the retrain time.
  • Modal is auto-scaling more containers than expected.

Inspect the model_runs table for unusually-long duration_ms rows. Cap or shard the affected subaccount in the training script.

Where to get help

  • Sentry — most AI Ops errors are tagged ai-ops. Filter by that tag.
  • Vercel cron logs — check execution status of all five AI Ops crons.
  • model_runs table — audit row per inference call, with MAPE, RMSE, duration.
  • Support — email support@levyelectric.com.