Experiments – Poisondart

// Experiment 01

The Recency Decay Curve

We map how purchase intent decays over time at the individual subscriber level – then use it to optimise send timing per contact, not per segment.

Engagement Segmentation 8 weeks

// recency_decay_curve
hypothesis: intent decays as fn(t) post-purchase
output: personalised send probability score

// Experiment 02

The Cadence Tolerance Test

We identify each segment's individual frequency ceiling – the point at which additional sends reduce revenue per send rather than increase it.

Send frequency Revenue per send 8 weeks

// cadence_tolerance_test
variable: sends/week across matched cohorts
output: frequency ceiling per segment

// Experiment 03

The Attribution Window Calibration

We measure your actual purchase decision window – not Klaviyo's default – and recalibrate attribution to reflect real buyer behaviour for your specific product category.

Attribution Revenue accuracy 4 weeks

// attribution_window_calibration
method: post-purchase survey + event timestamps
output: custom attribution window

// Experiment 04

The Subject Line Entropy Experiment

We measure habituation rate against subject line predictability and test whether introducing controlled randomness resets engagement – based on sensory adaptation principles from neuroscience.

Open rate Engagement decay 6 weeks

// subject_line_entropy
principle: sensory adaptation / signal detection
output: optimal entropy threshold

// Experiment 05

The Segment Dissolution Test

Using a holdout group methodology, we test whether your current segmentation scheme is actually predictive of purchase behaviour – or inherited convention. If your segments aren't earning their complexity, we dissolve them.

Segmentation Holdout methodology Statistical significance p < 0.05 6 weeks

// segment_dissolution_test
method: holdout group · matched cohort design
null hypothesis: segment messaging ≠ generic messaging (p < 0.05)
output: validate or dissolve each segment

// Experiment 01 – Engagement · Segmentation

The Recency Decay Curve

We map how purchase intent and engagement probability decay over time at the individual subscriber level, then use the resulting model to optimise send timing per contact.

// Hypothesis

Purchase intent and email engagement do not decay uniformly across a subscriber list. Instead, each contact follows an individual decay function that is a product of their purchase history, category affiliation, and historical engagement patterns. A model that treats all contacts as equivalently engaged – or equivalently disengaged – systematically misdirects send volume and suppresses revenue.

// Variables

Independent variable

Days since last purchase / last engagement event

Dependent variable

Open probability, click probability, purchase probability per send

Control condition

Standard segment-level send cadence

Test condition

Send cadence adjusted per contact based on predicted engagement score

Confounders controlled

Seasonality, campaign type, product category, acquisition source

Significance threshold

p < 0.05 on revenue per send, lift vs control group

// Method

We extract the full event history for each contact from Klaviyo – open events, click events, purchase events, site visits – and fit an exponential decay function to their engagement probability over time. This produces a personalised engagement score, updated weekly, for every contact in the account. We then split the list: control group receives standard cadence, test group receives cadence calibrated to their individual score. We run for 8 weeks and measure revenue per send, unsubscribe rate, and list health metrics.

// Typical finding

// result

The decay curve is steeper than most brands assume and varies significantly by product category. Impulse-purchase categories show 70%+ engagement decay within 14 days of last purchase. Considered-purchase categories maintain elevated engagement for 45-90 days. Contacts in the low-score tier (<20% engagement probability) generate negative ROI when sent standard campaign volume – their unsubscribe contribution outweighs their revenue contribution. Reducing their cadence typically increases list health metrics without meaningful revenue loss, while freeing send volume for high-probability contacts.

// Terminal output

// recency_decay_curve · run complete
account: [UK fashion · £6.4m]
contacts_modelled: 84,203

decay_model: exponential · half-life = 11.3 days
r² = 0.847 (strong fit)

segment_high_prob (>60%): 18,440 contacts
segment_mid_prob (20-60%): 31,200 contacts
segment_low_prob (<20%): 34,563 contacts

test_result:
  revenue_per_send: +34% vs control
  unsubscribe_rate: -41% vs control
  list_health_score: improved significantly

status: // CONFIRMED · deploying score model

Want this experiment run on your account? We run it as part of every Catalyst, Compound and Apex programme. Book the diagnostic audit to start.

Book the audit →

// Experiment 02 – Send Frequency · Revenue per Send

The Cadence Tolerance Test

We identify the precise frequency ceiling for each list segment – the point at which additional sends begin to erode revenue per send rather than compound it.

// Hypothesis

Email send frequency follows a dose-response relationship with revenue per send. Below the optimal frequency, revenue is left uncaptured. Above it, marginal sends generate diminishing – and eventually negative – returns as unsubscribe rate climbs and engagement decays. This ceiling is not universal: it varies by segment, category, and list composition. The experiment locates it precisely for each segment rather than applying a blanket frequency across the list.

// Variables

Independent variable

Send frequency (1, 2, 3, 4 emails per week)

Dependent variable

Revenue per send, unsubscribe rate, engagement decay rate

Control condition

Current send frequency (typically 2-3/week)

Test conditions

Four matched cohorts: 1/wk, 2/wk, 3/wk, 4/wk

Confounders controlled

Content quality, subject line format, send time, cohort composition

Run duration

8 weeks minimum for statistical stability

// Method

We create four matched cohorts from the active subscriber list, balanced by recency, purchase history, and engagement tier. Each cohort receives identical content at different frequencies over 8 weeks. We measure revenue per send (not total revenue – which will always favour higher frequency in the short term) alongside unsubscribe rate and 30-day engagement decay. The frequency ceiling is defined as the point at which the revenue per send curve turns negative – i.e. additional sends cost more in unsubscribes than they generate in purchases.

// Typical finding

// result

Most accounts are sending at or above their frequency ceiling without knowing it. The ceiling is typically lower than marketers assume (2/week outperforms 4/week on revenue per send in the majority of accounts we test) but higher than their most conservative instincts suggest (1/week consistently underperforms 2/week). Critically, the ceiling varies by segment – VIP and recent buyers tolerate higher frequency; lapsed and low-engagement contacts punish over-sending severely.

// Terminal output

// cadence_tolerance_test · run complete
account: [UK beauty · £9.8m]
cohorts: 4 · n=8,400 each · matched design

revenue_per_send by frequency:
  1/week: £0.38 (under-sending)
  2/week: £0.61 (optimal)
  3/week: £0.47 (declining)
  4/week: £0.29 (above ceiling)

frequency_ceiling: 2 sends/week
unsub_rate at ceiling: 0.12%/send
unsub_rate above ceiling: 0.34%/send

status: // CONFIRMED · reducing to 2/week

Want this experiment run on your account? We run it as part of every Catalyst, Compound and Apex programme. Book the diagnostic audit to start.

Book the audit →

// Experiment 03 – Attribution · Revenue Accuracy

The Attribution Window Calibration

We measure your actual purchase decision window using post-purchase data and recalibrate Klaviyo attribution to reflect real buyer behaviour – not platform defaults.

// Hypothesis

Klaviyo's default attribution window (5-day click, 1-day open) was designed for a median ecom use case. It systematically overstates email revenue for considered-purchase categories (where the decision window is longer) and understates it for impulse categories (where the decision is made within minutes). Using the default means every business decision based on email revenue data is made on inaccurate figures. The experiment measures the actual decision window for your specific product category and recalibrates accordingly.

// Variables

Measurement method

Post-purchase survey (n=500 minimum) + Klaviyo event timestamp analysis

Primary measure

Time from first email touch to purchase event, per order

Secondary measure

Number of email touches before purchase, channel mix

Output

Custom attribution window that reflects actual buyer behaviour

Revenue impact

Recalculated email revenue share – often significantly different from default

Run duration

4 weeks data collection + 1 week analysis

// Why this matters

If your attribution window is wrong, every decision downstream is wrong. You might be underfunding flows that are genuinely driving revenue because the attribution window doesn't capture the full decision cycle. Or overfunding flows that look good on paper because open-based attribution is inflating the numbers. We have seen accounts where recalibrating attribution changed the reported email revenue share by more than 40% – in both directions.

// Typical finding

// result

High-consideration categories (premium skincare, furniture, considered fashion) consistently show decision windows of 7-21 days from first email touch. Impulse and consumable categories (food, supplements, beauty basics) show 80%+ of purchases occurring within 24 hours of the triggering email. Open-based attribution inflates email revenue by 25-60% in most accounts. The recalibrated window produces a more accurate (typically lower) revenue figure, but identifies the flows that are genuinely driving decisions – often different from what the default attribution suggests.

// Terminal output

// attribution_window_calibration · complete
account: [UK homewares · £12m+]
survey_responses: 847
event_records_analysed: 12,403

decision_window_p50: 8.3 days
decision_window_p90: 19.1 days
touches_before_purchase_median: 3.2

default_attribution_window: 5-day click
calibrated_attribution_window: 12-day click

email_revenue_default: £847,000/mo
email_revenue_calibrated: £1,102,000/mo

status: // RECALIBRATED · +30% revenue identified

Want this experiment run on your account? We run it as part of every Catalyst, Compound and Apex programme. Book the diagnostic audit to start.

Book the audit →

// Experiment 04 – Open Rate · Engagement Decay

The Subject Line Entropy Experiment

We measure habituation rate against subject line predictability and test whether introducing controlled randomness into subject line format resets engagement – based on sensory adaptation principles from neuroscience.

// Hypothesis

Sensory adaptation – the neurological process by which repeated stimuli produce diminishing response – applies to email subject lines. A list that consistently receives subject lines following a predictable format (capitalisation pattern, length range, punctuation style, tone) habituates to those signals over time, producing measurable open rate decay independent of content quality. Introducing controlled entropy – deliberate, systematic variation in subject line format – disrupts the adaptation cycle and resets engagement. This is a signal detection problem, not a copywriting problem.

// Variables

Independent variable

Subject line format entropy score (0=fully predictable, 1=maximum variation)

Dependent variable

Open rate, open rate decay slope over 6-week window

Control condition

Current subject line format (typically low entropy)

Test condition

Deliberately varied format: length, case, punctuation, structure rotated systematically

Entropy dimensions

Capitalisation · Length (2-12 words) · Punctuation · Tone · Structure (question/statement/fragment)

Run duration

6 weeks to measure decay slope difference

// The science behind it

This experiment draws directly on atmospheric data signal detection methodology – specifically the challenge of identifying meaningful signals in environments with high background noise and repetitive pattern interference. An inbox is a high-noise, high-repetition environment. The human visual system is tuned to detect novelty and discount repetition. Subject lines that look the same as previous ones are processed faster and with less attention – regardless of whether the content is genuinely different. Entropy is the antidote.

// Typical finding

// result

Accounts with low subject line entropy (consistent format across campaigns) show a measurable open rate decay of 0.3-0.8 percentage points per month, independent of content quality. Introducing a structured entropy programme – rotating format deliberately rather than randomly – produces open rate stabilisation and, in most accounts, a 4-12 week period of elevated opens as the list reengages with the novelty signal. The key insight: it's not about writing better subject lines. It's about ensuring consecutive subject lines are sufficiently different from each other that adaptation cannot occur.

// Terminal output

// subject_line_entropy · run complete
account: [UK fashion · £6.4m]
baseline_entropy_score: 0.12 (low variety)
open_rate_decay_slope: -0.6pp/month

intervention: entropy score → 0.71
format_dimensions_varied: 5
run_duration: 6 weeks

result:
  decay_slope: +0.1pp/month (stabilised)
  avg_open_rate_w1-6: +4.2pp vs control
  revenue_per_send: +18% vs control

status: // CONFIRMED · entropy programme deployed

Want this experiment run on your account? We run it as part of every Catalyst, Compound and Apex programme. Book the diagnostic audit to start.

Book the audit →

// Experiment 05 – Segmentation · Holdout Methodology

The Segment Dissolution Test

Using a holdout group methodology, we test whether your current segmentation scheme is actually predictive of purchase behaviour – or inherited convention that adds complexity without generating lift.

// Hypothesis

Segmentation complexity has a cost – setup time, maintenance overhead, content production, and cognitive load. That cost is only justified if the segmentation produces a statistically significant lift versus a non-segmented baseline. Many accounts accumulate segments over time without ever testing whether the segments are actually working. The result is an account that looks sophisticated but performs no better – and sometimes worse – than a simpler approach would. This experiment tests the null hypothesis: that your segment-specific messaging does not outperform generic messaging at p < 0.05.

// Variables

Method

Holdout group design · 10% of each segment pooled into control

Null hypothesis

Segment messaging ≠ generic messaging at p < 0.05

Test condition

Holdout group receives generic sends instead of segment-specific content

Primary measure

Revenue per send, conversion rate per segment vs holdout

Decision rule

If segment messaging fails to outperform generic at p < 0.05: dissolve segment

Run duration

6 weeks · minimum 1,000 contacts per segment for statistical power

// Why segments fail

Segments are usually created based on intuition about customer differences, not empirical evidence that those differences are purchase-relevant. A segment for "customers who bought product X" makes intuitive sense – but if those customers buy the same things in response to the same emails as everyone else, the segment is doing nothing except adding maintenance overhead. The dissolution test answers the question every account should ask but almost none do: is this segment actually earning its complexity?

// Typical finding

// result

In a typical account with 8-12 active segments, 3-5 segments fail to demonstrate statistically significant lift over generic messaging. These are dissolved, reducing account complexity by 30-40% while preserving – and often improving – revenue. The segments that survive the test are genuinely predictive and receive increased content investment. The segments that don't survive are typically demographic or product-category-based; the ones that do are typically behaviour-based (recency, frequency, purchase value).

// Terminal output

// segment_dissolution_test · run complete
account: [UK homewares · £12m+]
segments_tested: 11
holdout_size: 10% per segment
run_duration: 6 weeks

results_by_segment:
  vip_buyers: CONFIRMED (p=0.003)
  recent_30d: CONFIRMED (p=0.011)
  high_aov: CONFIRMED (p=0.028)
  product_cat_a: DISSOLVED (p=0.34)
  product_cat_b: DISSOLVED (p=0.28)
  gender_m: DISSOLVED (p=0.61)
  [+5 more...]

segments_dissolved: 6 of 11
complexity_reduction: 55%
revenue_impact: +8% (less noise, more signal)

status: // COMPLETE · account restructured

Want this experiment run on your account? We run it as part of every Catalyst, Compound and Apex programme. Book the diagnostic audit to start.

Book the audit →

Five proprietary experiments.

Five proprietary
experiments.