Data Sources: Weather and Events
The AI Ops demand forecast is conditioned on four sources of data. Three are external; one is your own ride history. This page covers what they are, where they live, and what happens when one of them is unreachable.
Historical rides
The foundation of the forecast is your own rides table. Specifically, the feature pipeline reads from ride_events_ts, a materialized view that aggregates ride starts into H3-hex by 1-hour buckets.
- Training lookback is 90 days minimum, 365 days maximum.
- A weekly lag feature (rides in the same hex 168 hours ago) captures day-of-week seasonality.
- Subaccount-level fixed effects let the global model adjust for each operator's baseline volume.
If the subaccount has fewer than 14 days of rides, the model falls back to a global city-density-tier baseline. The recommender stays hidden until 30 days of history are available.
Weather: Tomorrow.io
Weather is pulled from Tomorrow.io. We chose it for hyperlocal accuracy and cost at our volume.
Three features are extracted:
temp_c— temperature in Celsiusprecip_prob— probability of precipitation (0.00 to 1.00)wind_kph— wind speed in km/h
Weather observations are cached in the weather_observations table keyed by H3 hex × hour. The cache is shared across subaccounts in the same city, so we don't pay for duplicate calls.
When Tomorrow.io is down
The weather.ts client gracefully degrades to a climatology stub when TOMORROW_IO_API_KEY is unset or the API is unreachable. The stub uses long-run averages by hour-of-week.
If the outage exceeds 2 hours, a Sentry alert fires. The model still produces forecasts during the outage, but accuracy degrades for hexes where weather is the dominant signal.
Events: PredictHQ
Local events (concerts, sports, festivals) materially shift demand. We pull them from PredictHQ, which scores event impact per location.
For each hex × hour, the feature pipeline computes a single event_count feature: the number of impactful events within the hex during that hour.
When PredictHQ is down
The events.ts client falls back to a deterministic cyclical stub when PREDICTHQ_API_KEY is unset or the API fails. The stub returns event_count = 0 with a small day-of-week modulation.
Recommendations generated during an event-API outage are flagged "weather/events incomplete" in the audit table, so you know the forecast was running with degraded inputs.
Holidays
A hand-curated holiday_calendar table holds national holidays per country. Phase 1 covers:
- United States
- Canada
- United Kingdom
- Germany
- Mexico
Markets outside these countries fall back to is-weekend, which captures most of the holiday-like effect.
To add a country, insert rows into holiday_calendar with the country code, date, and a holiday name. The feature pipeline picks it up on the next hourly run.
Modal (inference engine)
The trained LightGBM models live on Modal, a Python-runtime platform. Nightly training:
- Pulls all
demand_featuresrows over the trailing 90 days. - Trains three regressors — one per horizon (1h, 4h, 24h).
- Serializes them as
forecast-{subaccount}-h{horizon}.txtartifacts. - Writes a manifest with model versions and per-subaccount MAPE.
Inference is exposed as a /predict web endpoint that the Next.js inference cron POSTs to.
When Modal is down
The forecasting.ts library falls back to a pure-TypeScript gradient-boosted regressor that trains in-process on every inference call. It's slower (subaccounts with more than 5,000 feature rows take over a second) and less accurate, but the forecast pipeline keeps running. Modal absence is logged but is not customer-visible.
Feature freshness
Every layer of the pipeline writes a freshness timestamp:
weather_observations.fetched_at— when Tomorrow.io was last polleddemand_features.feature_built_at— when the feature row was assembleddemand_forecasts.created_at— when the model wrote the predictionmodel_runs.completed_at— when the last training run finished
If any of these are stale by more than 90 minutes, the dashboard shows the stale forecast banner (see Demand forecast map).
Auditing a forecast
To see what fed a given forecast for a given hex × hour:
SELECT * FROM demand_features
WHERE subaccount_id = '<uuid>'
AND h3_index = <bigint>
AND bucket_start = '<timestamp>';
Then join to weather_observations and the events feature for the same key. The row also has a model_version matching demand_forecasts.model_version, so you can trace exactly which trained artifact produced the prediction.
Related
- Demand forecast map — how the data is rendered.
- Troubleshooting — what to do when a source is degraded.