Five layers from raw evidence to embodied judgment
World → Perception → Agentic → Generative → Embodied. Each layer answers a distinct question. Each has a crème-de-la-crème tool stack. And a feedback loop ties them all together.
"What is true across the general evidence space — before you touch a single paper?"
Problem framing
→
Search
→
Screen
→
Extract
→
Synthesise
→
Brief
Crème-de-la-crème tools · 2026
Primary · PECO-F framing
Claude Opus 4 / Sonnet 4.6
Nuanced LMIC reasoning. Best at disambiguating "health financing" from clinical questions. Output: structured PICO + search keyword set.
Sequential stack · Ukubona method
xAI Grok → Gemini → GPT-4o → Claude
Each model in sequence, cumulative prompt: previous output is fed forward. Not parallel — sequential and deliberate.
The Ukubona Sequential Stack · why this order
Step 1 · xAI Grok — Zeitgeist & Real-Time Signal
Grok reads X (Twitter) in real time and is updated on a ~24hr cycle. It captures the living discourse — what practitioners, policymakers, and critics are actually saying now. No other frontier model does this. Start here to ground the question in present reality before any archival pass.
Step 2 · Google Gemini — Archival Depth & Data Moat
Gemini is grounded in Google's unmatched data infrastructure: Search, Maps, YouTube, Scholar, and advertising signals that index human attention at planetary scale. Feed Grok's output here. Gemini anchors the zeitgeist in documented, retrievable evidence — especially strong on WHO, World Bank, and grey government sources.
GPT is the most powerful abstractor in the stack — and precisely because of that, the most prone to confident hallucination when ungrounded. Fed sequentially after Grok and Gemini, its tendency to confabulate is constrained by the prior context it must account for. Use it to push the accumulated evidence toward structured frameworks, ICER tables, and policy logic trees.
Step 4 · Anthropic Claude — Caution, Generation & Code
Claude closes the loop. Generous token window, extreme care with uncertainty, and unmatched at artifact generation: schemas, briefs, code, and structured outputs. The prior three models have grounded and stress-tested the prompt; Claude now builds. This is not the fastest path — it is the most defensible one.
+ 1 · Expert Human in the Loop — No AGI yet
The prompter is not neutral infrastructure. The Ukubona method treats the human expert as the fifth agent: setting the cumulative prompt strategy, reading what each model reveals about its own blind spots, and deciding when the stack has converged. This is a methodology, not a workflow. The sequence encodes a theory of where each model's epistemic character is strongest.
Feedback loop: if Layer V (Embodied) flags urban bias or transferability failure, the signal returns here — the WorldInput is tightened (e.g. population narrowed to "rural BPL households, Tier-3 districts") and the pipeline re-runs from this layer. This is the adaptive loop neither xAI nor the original session formalized.
Layer II · Perception · What is true in these documents?
"Which real papers and reports actually contain the evidence your question needs?"
Problem framing
→
Search
→
Screen
→
Extract
→
Synthesise
→
Brief
Crème-de-la-crème tools · 2026
Primary · indexed literature
Dimensions.ai + Semantic Scholar
LMIC filters, grant/dataset cross-linking. Semantic Scholar TL;DRs pre-screen relevance. Together they surface what PubMed alone misses.
Primary · grey literature
Claude (PDF upload) + Humata
Unmatched on government PDFs: NSSO, HTAIn assessments, state NHAs, PM-JAY evaluation reports. Claude handles 200k-token documents; Humata for rapid single-doc Q&A.
Secondary · visual network
Connected Papers + Litmaps
Catch seminal works a keyword search misses. Essential for health financing: the citation graph reveals the 3 papers every other paper cites.
Secondary · citation quality
Scite.ai
Distinguishes supporting vs contrasting citations. Surfaces papers that refute PM-JAY findings — critical for equity briefing honesty.
PubMed + grey literature wiring · perception layer
app/services/pubmed.py + pdf.py
from Bio import Entrez
import fitz # PyMuPDF
Entrez.email = "who-india@example.org"defsearch_pubmed(query, n=10):
h = Entrez.esearch(db="pubmed", term=query, retmax=n)
return Entrez.read(h)["IdList"]
defextract_pdf(path):
doc = fitz.open(path)
return"\n".join(p.get_text() for p in doc)
# Grey sources: HTAIn, NSSO, state NHAs
GREY_URLS = [
"https://htain.icmr.org.in/...",
"https://mospi.gov.in/nsso...",
]
The grey literature gap is Layer II's defining problem. A PubMed-only Perception layer is a WEIRD-data bias machine. This layer must explicitly route to NSSO, HTAIn, state NHA portals, and MoHFW evaluation repositories — not as a supplement, but as co-equal sources. Claude PDF upload is the fastest path for dense government reports.
Layer III · Agentic · What can be extracted and structured?
"Which findings, numbers, and metrics survive a rigorous extraction pass?"
Problem framing
→
Search
→
Screen
→
Extract
→
Synthesise
→
Brief
Crème-de-la-crème tools · 2026
Primary · structured extraction
Elicit
Still the strongest single agent for health economics extraction. Pulls ICERs, equity metrics, population subgroups, cost-per-DALY tables across dozens of papers simultaneously. Outperforms GPT-4o on column consistency.
Secondary · complex documents
Claude Projects + Humata
For multi-PDF corpus extraction where Elicit doesn't ingest the document type. Claude Projects maintains extraction schema across the entire corpus context window.
PRISMA automation
Rayyan + Nested Knowledge
Rayyan for collaborative title/abstract screening with AI pre-label; Nested Knowledge for life-sciences PRISMA audit trails and meta-analysis setups.
Systematic review infra
DistillerSR
Enterprise-grade PRISMA compliance, HTA context. Use when the output must be defensible to a regulatory or HTA board (HTAIn submissions).
"How do structured findings become a decision-ready narrative?"
Problem framing
→
Search
→
Screen
→
Extract
→
Synthesise
→
Brief
Crème-de-la-crème tools · 2026
Primary · synthesis + brief
Claude Opus 4 / Sonnet 4.6
Unmatched long-context synthesis. Maintains argument coherence across 50+ extracted rows. Calibrates tone for MoHFW Joint Secretary audience. Produces assumption statements, sensitivity narratives, and executive summaries in one pass.
Secondary · multi-doc corpus
NotebookLM (Google)
Excels when source PDFs must remain cited and attributable in the brief. Audio overview feature useful for rapid team orientation on a new corpus.
Summary layer · plain language
SciSpace + Scholarcy
For explaining complex clinical methods to non-specialist policy audiences. SciSpace annotates PDFs in real time; Scholarcy generates flashcard-style triage.
Medical synthesis
OpenEvidence + Evidence Hunt
For clinical evidence threads embedded in health financing questions. OpenEvidence is fastest for evidence-based Q&A from clinical literature.
Synthesis schema
app/schemas/generative.py
classSynthesis(BaseModel):
summary: str# 2–3 sentence executive lead
key_findings: List[str] # ordered by policy weight
uncertainties: List[str] # "robust if X, fails if Y"
equity_summary:str# who benefits, who is excluded
budget_note: Optional[str]# fiscal space context
brief_draft: str# the policy brief itself
The brief-to-decision gap lives here. A 45-page systematic review does not serve a Joint Secretary under a 48-hour deadline. The Generative layer's sole test: can the brief_draft field be handed to a decision-maker right now? If not, the synthesis has not compressed far enough.
Layer V · Embodied · Should we act on it?
"Does this evidence actually warrant the decision the brief recommends — given this specific context?"
Problem framing
→
Search
→
Screen
→
Extract
→
Synthesise
→
Judgment
This is the Human layer · no AI substitute (yet)
Primary audit stack · 2026
Human + Claude Projects / Grok-3 / GPT-4o
The "Embodied" layer is not a tool — it is the WHO India health economist applying four checks that no current LLM passes reliably: equity auditing (who is excluded), transferability (does Tamil Nadu ≠ Bihar?), budget reality (fiscal space), and political feasibility (what will MoHFW actually act on).
Four embodied checks
Check 1 · Equity
Who is excluded from the evidence base? Are the included populations representative of the BPL households PM-JAY targets, or do they proxy urban, insured, or literate populations?
Check 2 · Transferability
A finding from Kerala does not transfer to Bihar without a transferability statement. What infrastructure, literacy, and provider density assumptions does the evidence embed?
Check 3 · Budget reality
Is the recommended intervention within fiscal space? An ICER below the threshold is irrelevant if the Ministry cannot allocate the implementation budget in the current cycle.
Check 4 · Strategic distortion
Who commissioned this evidence? Who benefits from the recommendation? Embodied judgment is the layer that reads the political economy of the evidence — the one place AI cannot yet go.
Decision schema + feedback trigger
app/schemas/embodied.py
classDecision(BaseModel):
decision: str# adopt | reject | conditional
modifications: List[str] # required caveats
equity_flags: List[str] # e.g. "urban bias"
transferable: bool# to target state/district
political_flag:bool# MoHFW action feasibility
rerun_trigger: Optional[str]# → refeed to World layer
The loop: when rerun_trigger is set (e.g. "urban bias detected → narrow to rural BPL"), the pipeline returns to Layer I. The WorldInput is updated, the search re-runs with tighter inclusion criteria, and Layers II–IV are re-executed. This is Ukubona's differentiator — the adaptive loop neither the original session nor xAI's version formalized.
The complete chain
W
→
P
→
A
→
G
→
E
↻
W
world · perception · agentic · generative · embodied → rerun if flagged