intermediate
ai-ops
data-sources
weather

Data Sources: Weather and Events

What feeds the AI Ops demand forecast — historical rides, weather from Tomorrow.io, events from PredictHQ, and the holiday calendar — and what happens when a source goes down.

Levy Fleets TeamMay 18, 20265 min read

Data Sources: Weather and Events

The AI Ops demand forecast is conditioned on four sources of data. Three are external; one is your own ride history. This page covers what they are, where they live, and what happens when one of them is unreachable.

Historical rides

The foundation of the forecast is your own rides table. Specifically, the feature pipeline reads from ride_events_ts, a materialized view that aggregates ride starts into H3-hex by 1-hour buckets.

  • Training lookback is 90 days minimum, 365 days maximum.
  • A weekly lag feature (rides in the same hex 168 hours ago) captures day-of-week seasonality.
  • Subaccount-level fixed effects let the global model adjust for each operator's baseline volume.

If the subaccount has fewer than 14 days of rides, the model falls back to a global city-density-tier baseline. The recommender stays hidden until 30 days of history are available.

Weather: Tomorrow.io

Weather is pulled from Tomorrow.io. We chose it for hyperlocal accuracy and cost at our volume.

Three features are extracted:

  • temp_c — temperature in Celsius
  • precip_prob — probability of precipitation (0.00 to 1.00)
  • wind_kph — wind speed in km/h

Weather observations are cached in the weather_observations table keyed by H3 hex × hour. The cache is shared across subaccounts in the same city, so we don't pay for duplicate calls.

When Tomorrow.io is down

The weather.ts client gracefully degrades to a climatology stub when TOMORROW_IO_API_KEY is unset or the API is unreachable. The stub uses long-run averages by hour-of-week.

If the outage exceeds 2 hours, a Sentry alert fires. The model still produces forecasts during the outage, but accuracy degrades for hexes where weather is the dominant signal.

Events: PredictHQ

Local events (concerts, sports, festivals) materially shift demand. We pull them from PredictHQ, which scores event impact per location.

For each hex × hour, the feature pipeline computes a single event_count feature: the number of impactful events within the hex during that hour.

When PredictHQ is down

The events.ts client falls back to a deterministic cyclical stub when PREDICTHQ_API_KEY is unset or the API fails. The stub returns event_count = 0 with a small day-of-week modulation.

Recommendations generated during an event-API outage are flagged "weather/events incomplete" in the audit table, so you know the forecast was running with degraded inputs.

Holidays

A hand-curated holiday_calendar table holds national holidays per country. Phase 1 covers:

  • United States
  • Canada
  • United Kingdom
  • Germany
  • Mexico

Markets outside these countries fall back to is-weekend, which captures most of the holiday-like effect.

To add a country, insert rows into holiday_calendar with the country code, date, and a holiday name. The feature pipeline picks it up on the next hourly run.

The trained LightGBM models live on Modal, a Python-runtime platform. Nightly training:

  1. Pulls all demand_features rows over the trailing 90 days.
  2. Trains three regressors — one per horizon (1h, 4h, 24h).
  3. Serializes them as forecast-{subaccount}-h{horizon}.txt artifacts.
  4. Writes a manifest with model versions and per-subaccount MAPE.

Inference is exposed as a /predict web endpoint that the Next.js inference cron POSTs to.

When Modal is down

The forecasting.ts library falls back to a pure-TypeScript gradient-boosted regressor that trains in-process on every inference call. It's slower (subaccounts with more than 5,000 feature rows take over a second) and less accurate, but the forecast pipeline keeps running. Modal absence is logged but is not customer-visible.

Feature freshness

Every layer of the pipeline writes a freshness timestamp:

  • weather_observations.fetched_at — when Tomorrow.io was last polled
  • demand_features.feature_built_at — when the feature row was assembled
  • demand_forecasts.created_at — when the model wrote the prediction
  • model_runs.completed_at — when the last training run finished

If any of these are stale by more than 90 minutes, the dashboard shows the stale forecast banner (see Demand forecast map).

Auditing a forecast

To see what fed a given forecast for a given hex × hour:

SELECT * FROM demand_features
WHERE subaccount_id = '<uuid>'
  AND h3_index = <bigint>
  AND bucket_start = '<timestamp>';

Then join to weather_observations and the events feature for the same key. The row also has a model_version matching demand_forecasts.model_version, so you can trace exactly which trained artifact produced the prediction.