// The experiments

Five proprietary
experiments.

Every account we work in gets put through the same five experiments. Real methodology. Controlled variables. Statistical significance thresholds. These are not best practices dressed up in lab coats. They are experiments – and the results vary by account.

// Experiment 01
The Recency Decay Curve
We map how purchase intent decays over time at the individual subscriber level – then use it to optimise send timing per contact, not per segment.
Engagement Segmentation 8 weeks
// recency_decay_curve
hypothesis: intent decays as fn(t) post-purchase
output: personalised send probability score
// Experiment 02
The Cadence Tolerance Test
We identify each segment's individual frequency ceiling – the point at which additional sends reduce revenue per send rather than increase it.
Send frequency Revenue per send 8 weeks
// cadence_tolerance_test
variable: sends/week across matched cohorts
output: frequency ceiling per segment
// Experiment 03
The Attribution Window Calibration
We measure your actual purchase decision window – not Klaviyo's default – and recalibrate attribution to reflect real buyer behaviour for your specific product category.
Attribution Revenue accuracy 4 weeks
// attribution_window_calibration
method: post-purchase survey + event timestamps
output: custom attribution window
// Experiment 04
The Subject Line Entropy Experiment
We measure habituation rate against subject line predictability and test whether introducing controlled randomness resets engagement – based on sensory adaptation principles from neuroscience.
Open rate Engagement decay 6 weeks
// subject_line_entropy
principle: sensory adaptation / signal detection
output: optimal entropy threshold
// Experiment 05
The Segment Dissolution Test
Using a holdout group methodology, we test whether your current segmentation scheme is actually predictive of purchase behaviour – or inherited convention. If your segments aren't earning their complexity, we dissolve them.
Segmentation Holdout methodology Statistical significance p < 0.05 6 weeks
// segment_dissolution_test
method: holdout group · matched cohort design
null hypothesis: segment messaging ≠ generic messaging (p < 0.05)
output: validate or dissolve each segment
// Experiment 01 – Engagement · Segmentation
The Recency Decay Curve
We map how purchase intent and engagement probability decay over time at the individual subscriber level, then use the resulting model to optimise send timing per contact.
// Hypothesis
Purchase intent and email engagement do not decay uniformly across a subscriber list. Instead, each contact follows an individual decay function that is a product of their purchase history, category affiliation, and historical engagement patterns. A model that treats all contacts as equivalently engaged – or equivalently disengaged – systematically misdirects send volume and suppresses revenue.
// Variables
Independent variable
Days since last purchase / last engagement event
Dependent variable
Open probability, click probability, purchase probability per send
Control condition
Standard segment-level send cadence
Test condition
Send cadence adjusted per contact based on predicted engagement score
Confounders controlled
Seasonality, campaign type, product category, acquisition source
Significance threshold
p < 0.05 on revenue per send, lift vs control group
// Method
We extract the full event history for each contact from Klaviyo – open events, click events, purchase events, site visits – and fit an exponential decay function to their engagement probability over time. This produces a personalised engagement score, updated weekly, for every contact in the account. We then split the list: control group receives standard cadence, test group receives cadence calibrated to their individual score. We run for 8 weeks and measure revenue per send, unsubscribe rate, and list health metrics.
// Typical finding
// result
The decay curve is steeper than most brands assume and varies significantly by product category. Impulse-purchase categories show 70%+ engagement decay within 14 days of last purchase. Considered-purchase categories maintain elevated engagement for 45-90 days. Contacts in the low-score tier (<20% engagement probability) generate negative ROI when sent standard campaign volume – their unsubscribe contribution outweighs their revenue contribution. Reducing their cadence typically increases list health metrics without meaningful revenue loss, while freeing send volume for high-probability contacts.
// Terminal output
// recency_decay_curve · run complete
account: [UK fashion · £6.4m]
contacts_modelled: 84,203

decay_model: exponential · half-life = 11.3 days
r² = 0.847 (strong fit)

segment_high_prob (>60%): 18,440 contacts
segment_mid_prob (20-60%): 31,200 contacts
segment_low_prob (<20%): 34,563 contacts

test_result:
  revenue_per_send: +34% vs control
  unsubscribe_rate: -41% vs control
  list_health_score: improved significantly

status: // CONFIRMED · deploying score model
Want this experiment run on your account? We run it as part of every Catalyst, Compound and Apex programme. Book the diagnostic audit to start.
Book the audit →
// Experiment 02 – Send Frequency · Revenue per Send
The Cadence Tolerance Test
We identify the precise frequency ceiling for each list segment – the point at which additional sends begin to erode revenue per send rather than compound it.
// Hypothesis
Email send frequency follows a dose-response relationship with revenue per send. Below the optimal frequency, revenue is left uncaptured. Above it, marginal sends generate diminishing – and eventually negative – returns as unsubscribe rate climbs and engagement decays. This ceiling is not universal: it varies by segment, category, and list composition. The experiment locates it precisely for each segment rather than applying a blanket frequency across the list.
// Variables
Independent variable
Send frequency (1, 2, 3, 4 emails per week)
Dependent variable
Revenue per send, unsubscribe rate, engagement decay rate
Control condition
Current send frequency (typically 2-3/week)
Test conditions
Four matched cohorts: 1/wk, 2/wk, 3/wk, 4/wk
Confounders controlled
Content quality, subject line format, send time, cohort composition
Run duration
8 weeks minimum for statistical stability
// Method
We create four matched cohorts from the active subscriber list, balanced by recency, purchase history, and engagement tier. Each cohort receives identical content at different frequencies over 8 weeks. We measure revenue per send (not total revenue – which will always favour higher frequency in the short term) alongside unsubscribe rate and 30-day engagement decay. The frequency ceiling is defined as the point at which the revenue per send curve turns negative – i.e. additional sends cost more in unsubscribes than they generate in purchases.
// Typical finding
// result
Most accounts are sending at or above their frequency ceiling without knowing it. The ceiling is typically lower than marketers assume (2/week outperforms 4/week on revenue per send in the majority of accounts we test) but higher than their most conservative instincts suggest (1/week consistently underperforms 2/week). Critically, the ceiling varies by segment – VIP and recent buyers tolerate higher frequency; lapsed and low-engagement contacts punish over-sending severely.
// Terminal output
// cadence_tolerance_test · run complete
account: [UK beauty · £9.8m]
cohorts: 4 · n=8,400 each · matched design

revenue_per_send by frequency:
  1/week: £0.38 (under-sending)
  2/week: £0.61 (optimal)
  3/week: £0.47 (declining)
  4/week: £0.29 (above ceiling)

frequency_ceiling: 2 sends/week
unsub_rate at ceiling: 0.12%/send
unsub_rate above ceiling: 0.34%/send

status: // CONFIRMED · reducing to 2/week
Want this experiment run on your account? We run it as part of every Catalyst, Compound and Apex programme. Book the diagnostic audit to start.
Book the audit →
// Experiment 03 – Attribution · Revenue Accuracy
The Attribution Window Calibration
We measure your actual purchase decision window using post-purchase data and recalibrate Klaviyo attribution to reflect real buyer behaviour – not platform defaults.
// Hypothesis
Klaviyo's default attribution window (5-day click, 1-day open) was designed for a median ecom use case. It systematically overstates email revenue for considered-purchase categories (where the decision window is longer) and understates it for impulse categories (where the decision is made within minutes). Using the default means every business decision based on email revenue data is made on inaccurate figures. The experiment measures the actual decision window for your specific product category and recalibrates accordingly.
// Variables
Measurement method
Post-purchase survey (n=500 minimum) + Klaviyo event timestamp analysis
Primary measure
Time from first email touch to purchase event, per order
Secondary measure
Number of email touches before purchase, channel mix
Output
Custom attribution window that reflects actual buyer behaviour
Revenue impact
Recalculated email revenue share – often significantly different from default
Run duration
4 weeks data collection + 1 week analysis
// Why this matters
If your attribution window is wrong, every decision downstream is wrong. You might be underfunding flows that are genuinely driving revenue because the attribution window doesn't capture the full decision cycle. Or overfunding flows that look good on paper because open-based attribution is inflating the numbers. We have seen accounts where recalibrating attribution changed the reported email revenue share by more than 40% – in both directions.
// Typical finding
// result
High-consideration categories (premium skincare, furniture, considered fashion) consistently show decision windows of 7-21 days from first email touch. Impulse and consumable categories (food, supplements, beauty basics) show 80%+ of purchases occurring within 24 hours of the triggering email. Open-based attribution inflates email revenue by 25-60% in most accounts. The recalibrated window produces a more accurate (typically lower) revenue figure, but identifies the flows that are genuinely driving decisions – often different from what the default attribution suggests.
// Terminal output
// attribution_window_calibration · complete
account: [UK homewares · £12m+]
survey_responses: 847
event_records_analysed: 12,403

decision_window_p50: 8.3 days
decision_window_p90: 19.1 days
touches_before_purchase_median: 3.2

default_attribution_window: 5-day click
calibrated_attribution_window: 12-day click

email_revenue_default: £847,000/mo
email_revenue_calibrated: £1,102,000/mo

status: // RECALIBRATED · +30% revenue identified
Want this experiment run on your account? We run it as part of every Catalyst, Compound and Apex programme. Book the diagnostic audit to start.
Book the audit →
// Experiment 04 – Open Rate · Engagement Decay
The Subject Line Entropy Experiment
We measure habituation rate against subject line predictability and test whether introducing controlled randomness into subject line format resets engagement – based on sensory adaptation principles from neuroscience.
// Hypothesis
Sensory adaptation – the neurological process by which repeated stimuli produce diminishing response – applies to email subject lines. A list that consistently receives subject lines following a predictable format (capitalisation pattern, length range, punctuation style, tone) habituates to those signals over time, producing measurable open rate decay independent of content quality. Introducing controlled entropy – deliberate, systematic variation in subject line format – disrupts the adaptation cycle and resets engagement. This is a signal detection problem, not a copywriting problem.
// Variables
Independent variable
Subject line format entropy score (0=fully predictable, 1=maximum variation)
Dependent variable
Open rate, open rate decay slope over 6-week window
Control condition
Current subject line format (typically low entropy)
Test condition
Deliberately varied format: length, case, punctuation, structure rotated systematically
Entropy dimensions
Capitalisation · Length (2-12 words) · Punctuation · Tone · Structure (question/statement/fragment)
Run duration
6 weeks to measure decay slope difference
// The science behind it
This experiment draws directly on atmospheric data signal detection methodology – specifically the challenge of identifying meaningful signals in environments with high background noise and repetitive pattern interference. An inbox is a high-noise, high-repetition environment. The human visual system is tuned to detect novelty and discount repetition. Subject lines that look the same as previous ones are processed faster and with less attention – regardless of whether the content is genuinely different. Entropy is the antidote.
// Typical finding
// result
Accounts with low subject line entropy (consistent format across campaigns) show a measurable open rate decay of 0.3-0.8 percentage points per month, independent of content quality. Introducing a structured entropy programme – rotating format deliberately rather than randomly – produces open rate stabilisation and, in most accounts, a 4-12 week period of elevated opens as the list reengages with the novelty signal. The key insight: it's not about writing better subject lines. It's about ensuring consecutive subject lines are sufficiently different from each other that adaptation cannot occur.
// Terminal output
// subject_line_entropy · run complete
account: [UK fashion · £6.4m]
baseline_entropy_score: 0.12 (low variety)
open_rate_decay_slope: -0.6pp/month

intervention: entropy score → 0.71
format_dimensions_varied: 5
run_duration: 6 weeks

result:
  decay_slope: +0.1pp/month (stabilised)
  avg_open_rate_w1-6: +4.2pp vs control
  revenue_per_send: +18% vs control

status: // CONFIRMED · entropy programme deployed
Want this experiment run on your account? We run it as part of every Catalyst, Compound and Apex programme. Book the diagnostic audit to start.
Book the audit →
// Experiment 05 – Segmentation · Holdout Methodology
The Segment Dissolution Test
Using a holdout group methodology, we test whether your current segmentation scheme is actually predictive of purchase behaviour – or inherited convention that adds complexity without generating lift.
// Hypothesis
Segmentation complexity has a cost – setup time, maintenance overhead, content production, and cognitive load. That cost is only justified if the segmentation produces a statistically significant lift versus a non-segmented baseline. Many accounts accumulate segments over time without ever testing whether the segments are actually working. The result is an account that looks sophisticated but performs no better – and sometimes worse – than a simpler approach would. This experiment tests the null hypothesis: that your segment-specific messaging does not outperform generic messaging at p < 0.05.
// Variables
Method
Holdout group design · 10% of each segment pooled into control
Null hypothesis
Segment messaging ≠ generic messaging at p < 0.05
Test condition
Holdout group receives generic sends instead of segment-specific content
Primary measure
Revenue per send, conversion rate per segment vs holdout
Decision rule
If segment messaging fails to outperform generic at p < 0.05: dissolve segment
Run duration
6 weeks · minimum 1,000 contacts per segment for statistical power
// Why segments fail
Segments are usually created based on intuition about customer differences, not empirical evidence that those differences are purchase-relevant. A segment for "customers who bought product X" makes intuitive sense – but if those customers buy the same things in response to the same emails as everyone else, the segment is doing nothing except adding maintenance overhead. The dissolution test answers the question every account should ask but almost none do: is this segment actually earning its complexity?
// Typical finding
// result
In a typical account with 8-12 active segments, 3-5 segments fail to demonstrate statistically significant lift over generic messaging. These are dissolved, reducing account complexity by 30-40% while preserving – and often improving – revenue. The segments that survive the test are genuinely predictive and receive increased content investment. The segments that don't survive are typically demographic or product-category-based; the ones that do are typically behaviour-based (recency, frequency, purchase value).
// Terminal output
// segment_dissolution_test · run complete
account: [UK homewares · £12m+]
segments_tested: 11
holdout_size: 10% per segment
run_duration: 6 weeks

results_by_segment:
  vip_buyers: CONFIRMED (p=0.003)
  recent_30d: CONFIRMED (p=0.011)
  high_aov: CONFIRMED (p=0.028)
  product_cat_a: DISSOLVED (p=0.34)
  product_cat_b: DISSOLVED (p=0.28)
  gender_m: DISSOLVED (p=0.61)
  [+5 more...]

segments_dissolved: 6 of 11
complexity_reduction: 55%
revenue_impact: +8% (less noise, more signal)

status: // COMPLETE · account restructured
Want this experiment run on your account? We run it as part of every Catalyst, Compound and Apex programme. Book the diagnostic audit to start.
Book the audit →