intermediate
a-b-testing
experiments
bayesian

A/B Testing with Bayesian Winner Picker

Run 2-4 variant tests on subject, body, CTA, and send time - Engage picks the winner automatically using Bayesian statistics.

Levy Fleets TeamMay 18, 20269 min read

A/B Testing

Engage lets you run 2-4 variants on a campaign and auto-promote the winner. Behind the scenes it uses a Bayesian model - which sounds intimidating but is actually simpler to read than the t-tests most marketing tools use.

Navigation

A/B variants are configured inside the Campaign Builder under the A/B Variants tab.

What You Can Test

ElementExamples
Subject line"Your scooter misses you" vs "Come back for 20% off"
Body copyLong-form vs short-form, formal vs casual
CTA text"Ride now" vs "Take a ride" vs "Unlock a scooter"
Send time10 AM vs 6 PM local
ChannelEmail vs push (rare - usually a journey decision)

You can test any combination of these by creating distinct variants. The traffic split is configurable per variant.

How the Test Runs

  1. You attach 2-4 variants to a campaign.
  2. Each recipient gets assigned to one variant deterministically (seeded by customer_id), so a single rider always sees the same variant if they show up in multiple sends.
  3. As engagement events stream in (delivered, opened, clicked), Engage updates its belief about each variant's true performance.
  4. Once each variant has at least 500 sends AND one variant has a 95%+ probability of being best, the winner is locked in.
  5. If you toggled Send remaining to winner, the rest of the audience receives the winning variant.

What Bayesian Actually Means Here

You may have learned A/B testing through "p-values" and "statistical significance." That approach (called frequentist) asks: "if the variants were actually identical, how often would I see results this extreme?" It's a useful question, but slow to give you a clear answer and easy to misuse.

The Bayesian approach asks the question you actually want answered: "What's the probability variant B is better than variant A?"

Engage gives you that probability directly. The campaign analytics page shows something like:

Variant A: 4.1% click rate (n=523), 12% chance of being best
Variant B: 5.8% click rate (n=518), 87% chance of being best
Variant C: 3.9% click rate (n=515), 1% chance of being best

When any variant crosses 95%, that's your winner.

The Math (Lightly Explained)

Skip this section if you are not a stats nerd - the tool works without it.

Engage models each variant as a Beta-Binomial:

  • The Binomial part is the basic mechanic - of N sends, some succeeded (opened, clicked, whatever you defined).
  • The Beta part is a prior distribution over the true success rate. Engage uses a weakly-informative uniform prior.
  • After observing data, the posterior is also a Beta distribution.

To estimate "P(variant B is best)," Engage takes many random samples from each variant's posterior and counts how often each variant wins. The sampling uses the Marsaglia-Tsang Gamma algorithm, a fast and numerically stable way to draw Beta samples for any reasonable variant count.

The practical upshot: you get a clean probability number that updates in real time as data comes in, with no p-hacking and no early-stopping bias.

Setting Up an A/B Test

  1. In the campaign composer, click A/B Variants.
  2. Click Add Variant.
  3. For each variant:
    • Give it a name (A: original subject, B: emoji subject)
    • Attach a template (variants typically share a template; subject-line variants override just the subject field)
    • Set the traffic split weight (defaults are equal weighting)
  4. Optional: toggle Hold out 10% to keep a control group that never sees any variant - useful for measuring incremental impact.
  5. Optional: toggle Send remaining to winner so once a winner is picked, the unsent audience receives the winning variant.

Save and send.

Reading the Results

The campaign analytics page shows per-variant breakdowns:

ColumnWhat it means
SendsRecipients in this variant
DeliveredProvider confirmed delivery
Open rate(email/push)
Click rate(email/SMS)
Conversion rateHit the goal in the attribution window
P(best)Probability this variant is the best of the set

Once one variant hits 95% and all have at least 500 sends, you'll see a "Winner declared" banner.

Minimum Sample Size

The 500-sends-per-variant gate exists to prevent calling early winners on noisy data. Even with a strong Bayesian framework, 50 sends per variant won't tell you much.

If your audience is smaller than 500 per variant:

  • The test still runs, but no winner gets declared automatically.
  • Use the manual Force winner button on the analytics page if you've reviewed the data and want to lock in one variant.
  • Or skip A/B testing for small audiences - you'd be better off comparing two full sends across two weeks.

Holdout Groups

If you toggled Hold out 10%, that 10% gets no message at all. Their conversion rate becomes the "baseline" - the rate at which the goal would have happened without any send.

The incremental lift of your campaign = winning variant conversion rate minus baseline conversion rate. This is the true "did the campaign matter?" number.

A/B Testing Inside Journeys

Journeys do not currently support per-step A/B testing. If you want to test a step inside a journey, run a standalone campaign for that step first, pick the winner, then bake the winning template into the journey.

Best Practices

  • Test one thing at a time. If A has a new subject AND new body, you don't know which one moved the needle.
  • Be patient. Hitting 95% probability with realistic effect sizes usually takes 1,000-5,000 sends per variant. If you only have 200 riders per variant, expect to wait or accept a less-confident result.
  • Watch for confounders. A subject-line test that runs only on Tuesday is also a "Tuesday vs other day" test. Run for at least a full week unless your audience is huge.
  • Document your winner. Once a variant wins, update your default template so the next campaign starts from the better baseline.

Troubleshooting

Variant traffic split is uneven

Variant assignment is deterministic per recipient. Below ~200 sends per variant, the split converges slowly. At 500+ it should look close to your target weights.

No winner declared after 5,000 sends

This means no variant has hit 95% probability yet, which usually means the variants are genuinely similar. Either accept that and pick whichever you prefer, or stop the test and design a more differentiated B variant.

Click rate is zero on all variants

Check the funnel - if delivered is also zero, the dispatch is broken. If delivered is high and clicked is zero, your link tracking is broken (usually a misformatted URL with double-braces left unresolved).


Need Help?

For A/B testing help, contact support@levyelectric.com.