Meta-Analysis Basics

When Numbers Can Be Pooled

When is statistical pooling of health economics evidence appropriate — and when does a pooled estimate create false precision that misleads policy?

The fruit salad problem

A pooled estimate from a meta-analysis has the appearance of authority: a single number, a confidence interval, a forest plot. It feels more definitive than five separate study findings pointing in different directions. This authority is often warranted. Sometimes it is not — and the damage from pooling studies that should not be pooled is worse than the messiness of acknowledging heterogeneity, because a false pooled estimate enters a brief with a precision it has not earned.

The classic objection to inappropriate pooling is the fruit salad problem: combining apples and oranges produces neither a better apple nor a better orange. It produces a meaningless average that describes nothing that actually exists. For health economics evidence on financial risk protection in India, the fruit salad risk is acute: a meta-analysis pooling community-based health insurance studies from Ghana, Rwanda, Kenya, and two Indian states is not evidence about India — it is evidence about the average of four different health systems that share almost nothing except the label "LMIC."

❌ Fruit salad

Pool everything with the same outcome label

Ghana CBHI + Rwanda mutuelles + Kerala insurance + Delhi PM-JAY → pooled estimate of "insurance reduces OOP by 28% in LMICs." Cited in a brief as evidence for India. Describes no real system anywhere.

✓ Comparable ingredients

Pool studies with genuine clinical and contextual similarity

Three Indian state-level PM-JAY evaluations with similar design, comparable benefit packages, and the same OOP outcome measure → pooled estimate with clear scope of inference. Describes something real.

The three gates before pooling

Before any set of studies can legitimately be pooled, they must pass three gates. Failing any one of them means pooling should not proceed — the appropriate output is a narrative synthesis with explicit statement of why statistical combination is not justified.

Clinical homogeneity

Are the interventions and populations genuinely comparable?

Studies must share the same intervention type (not just the same category label), comparable populations, and a similar delivery context. "Community health insurance" in a formal employer-linked Rwandan scheme and "community health insurance" in a voluntary informal-sector Indian cooperative are not the same intervention despite sharing a name.

India test: Would pooling this set produce an estimate that could meaningfully describe the expected effect of PM-JAY or a specific state insurance scheme? If the interventions are too diverse, the answer is no.

Methodological homogeneity

Do the studies measure outcomes the same way?

Catastrophic health expenditure defined as >10% of household consumption is not the same as OOP exceeding 40% of non-food expenditure — even though both are labelled "catastrophic expenditure." Pooling these produces a number that is definitionally incoherent. Study design also matters: pooling RCTs with observational difference-in-differences studies inflates apparent precision.

India test: Are all studies using the same expenditure threshold and the same denominator (consumption vs. income vs. non-food expenditure)? If not, do not pool.

Statistical homogeneity

Is the observed variation consistent with sampling error alone?

I² statistic measures the proportion of variability in effect estimates that is due to heterogeneity rather than chance. An I² above 75% means the studies are so different in their findings that a pooled average is largely meaningless — the variation itself is the finding. A high I² is not a problem to be corrected by switching to a random-effects model; it is a signal to investigate why studies differ.

India test: If I² > 75% in a pool of LMIC health insurance studies, subgroup by India-only, South Asia-only, or by comparable insurance design. The subgroup result is more useful than the full pool.

Reading a forest plot — what AI often gets wrong

When you ask an AI tool to summarise a meta-analysis, it will typically report the pooled estimate and confidence interval — the bottom diamond on the forest plot. It will usually not flag a high I², will not note whether individual study estimates cross the line of no effect while the pooled estimate does not, and will not raise the question of whether the studies should have been pooled at all. These are your jobs.

Schematic forest plot — health insurance and catastrophic expenditure (illustrative)

Ghana CBHI 2019

−0.38 [−0.54, −0.22]

Rwanda mutuelles 2020

−0.18 [−0.28, −0.08]

India PM-JAY 2021

−0.06 [−0.18, +0.06]

Kenya NHIF 2022

−0.29 [−0.39, −0.19]

Pooled (I²=78%)

−0.23 [−0.34, −0.12]

Notice: the India estimate crosses the null (no effect). The pooled estimate — driven by Ghana, Rwanda, and Kenya — does not represent India. I²=78% means 78% of the variation is genuine heterogeneity, not sampling error. An AI summarising this as "insurance reduces catastrophic expenditure by 23% across LMICs" would be accurate and misleading simultaneously.

What to do when you cannot pool

A finding of high heterogeneity or incompatible study designs is not a dead end — it is a result. The appropriate response is a narrative synthesis that does three things the pooled estimate cannot:

Situation	Appropriate synthesis	What to report
Studies comparable, I² < 50%	Pool Fixed or random effects meta-analysis	Pooled estimate, CI, I², sensitivity analysis removing outlier studies
Studies comparable, 50% ≤ I² ≤ 75%	Caution Pool with subgroup analysis	Pooled estimate with explicit I² caveat; subgroup results by region or intervention type; investigation of heterogeneity sources
High heterogeneity I² > 75%	Don't pool Narrative synthesis	Direction and range of effects across studies; explicit statement of why pooling is not appropriate; subgroup by India/South Asia if data allow
Methodologically incompatible outcomes	Don't pool Narrative synthesis	Tabular presentation of each study's effect estimate with outcome definition; note incompatibility explicitly in brief methodology note
India estimate crosses null, LMIC pool significant	Disaggregate Report India separately	"The pooled LMIC estimate suggests benefit; the single available India study (PM-JAY) shows no significant effect on financial protection [CI crosses null]. India-specific evidence is insufficient to draw a conclusion." This is the most honest and policy-useful summary.

Check whether a pool is legitimate

🩻 Pooling decision tool — describe your study set

Describe the studies you are considering pooling — their countries, intervention types, outcome definitions, and any I² or heterogeneity statistics you have. The tool will assess whether pooling is legitimate and recommend the appropriate synthesis approach.

Outcome being synthesised

Policy question this serves

🩻 Pooling assessment

🎯 Key takeaway

A pooled estimate is not automatically more reliable than individual study findings — it is only more reliable if the studies were sufficiently similar to pool. The three gates (clinical, methodological, and statistical homogeneity) are the tests. An I² above 75% is the evidence that the studies are telling different stories, not the same one. For WHO India work, the most common and most consequential error is citing a pooled LMIC estimate as India evidence when the India-specific study in the pool shows no significant effect. Report India separately. The heterogeneity is the finding. Session 5 closes Level 2 with surveillance — how to keep your evidence base current once you have built it.