Home · Level 1 · Session 2 · Session 2.1

🔬 Level 1 of 5 · Session 2.1 · Deep Dive

Level 1 · Foundations

The Pentadic Pipeline

Five layers from raw evidence to embodied judgment

World → Perception → Agentic → Generative → Embodied. Each layer answers a distinct question. Each has a crème-de-la-crème tool stack. And a feedback loop ties them all together.

ukubona × who india · session 2 rewrite · 2026

The Pentadic Pipeline

world → perception → agentic → generative → embodied · crème-de-la-crème · generate.sh

Layer I · World · What is true in general?

"What is true across the general evidence space — before you touch a single paper?"

Problem framing

→

Screen

→

Extract

→

Synthesise

→

Brief

Crème-de-la-crème tools · 2026

Primary · PECO-F framing

Claude Opus 4 / Sonnet 4.6

Nuanced LMIC reasoning. Best at disambiguating "health financing" from clinical questions. Output: structured PICO + search keyword set.

Sequential stack · Ukubona method

xAI Grok → Gemini → GPT-4o → Claude

Each model in sequence, cumulative prompt: previous output is fed forward. Not parallel — sequential and deliberate.

The Ukubona Sequential Stack · why this order

Step 1 · xAI Grok — Zeitgeist & Real-Time Signal

Grok reads X (Twitter) in real time and is updated on a ~24hr cycle. It captures the living discourse — what practitioners, policymakers, and critics are actually saying now. No other frontier model does this. Start here to ground the question in present reality before any archival pass.

Step 2 · Google Gemini — Archival Depth & Data Moat

Gemini is grounded in Google's unmatched data infrastructure: Search, Maps, YouTube, Scholar, and advertising signals that index human attention at planetary scale. Feed Grok's output here. Gemini anchors the zeitgeist in documented, retrievable evidence — especially strong on WHO, World Bank, and grey government sources.

Step 3 · OpenAI GPT — Abstraction & Structured Reasoning

GPT is the most powerful abstractor in the stack — and precisely because of that, the most prone to confident hallucination when ungrounded. Fed sequentially after Grok and Gemini, its tendency to confabulate is constrained by the prior context it must account for. Use it to push the accumulated evidence toward structured frameworks, ICER tables, and policy logic trees.

Step 4 · Anthropic Claude — Caution, Generation & Code

Claude closes the loop. Generous token window, extreme care with uncertainty, and unmatched at artifact generation: schemas, briefs, code, and structured outputs. The prior three models have grounded and stress-tested the prompt; Claude now builds. This is not the fastest path — it is the most defensible one.

+ 1 · Expert Human in the Loop — No AGI yet

The prompter is not neutral infrastructure. The Ukubona method treats the human expert as the fifth agent: setting the cumulative prompt strategy, reading what each model reveals about its own blind spots, and deciding when the stack has converged. This is a methodology, not a workflow. The sequence encodes a theory of where each model's epistemic character is strongest.

JSON schema · WorldInput → WorldOutput

app/schemas/world.py

class WorldInput(BaseModel): question: str # "UHC insurance & OOP spending India" population: str # "adults BPL households" intervention: str # "PM-JAY" comparator: str # "no insurance" outcomes: List[str] # ["ICER","OOP reduction","equity"] context: str # "WHO India financing brief" class WorldOutput(BaseModel): search_queries: List[str] grey_sources: List[str] # HTAIn, NSSO, state NHAs inclusion_criteria: dict equity_flags: List[str] # bias checks to carry forward

Feedback loop: if Layer V (Embodied) flags urban bias or transferability failure, the signal returns here — the WorldInput is tightened (e.g. population narrowed to "rural BPL households, Tier-3 districts") and the pipeline re-runs from this layer. This is the adaptive loop neither xAI nor the original session formalized.

Layer II · Perception · What is true in these documents?

"Which real papers and reports actually contain the evidence your question needs?"

Problem framing

→

Screen

→

Extract

→

Synthesise

→

Brief

Crème-de-la-crème tools · 2026

Primary · indexed literature

Dimensions.ai + Semantic Scholar

LMIC filters, grant/dataset cross-linking. Semantic Scholar TL;DRs pre-screen relevance. Together they surface what PubMed alone misses.

Primary · grey literature

Claude (PDF upload) + Humata

Unmatched on government PDFs: NSSO, HTAIn assessments, state NHAs, PM-JAY evaluation reports. Claude handles 200k-token documents; Humata for rapid single-doc Q&A.

Secondary · visual network

Connected Papers + Litmaps

Catch seminal works a keyword search misses. Essential for health financing: the citation graph reveals the 3 papers every other paper cites.

Secondary · citation quality

Scite.ai

Distinguishes supporting vs contrasting citations. Surfaces papers that refute PM-JAY findings — critical for equity briefing honesty.

PubMed + grey literature wiring · perception layer

app/services/pubmed.py + pdf.py

from Bio import Entrez import fitz # PyMuPDF Entrez.email = "who-india@example.org" def search_pubmed(query, n=10): h = Entrez.esearch(db="pubmed", term=query, retmax=n) return Entrez.read(h)["IdList"] def extract_pdf(path): doc = fitz.open(path) return "\n".join(p.get_text() for p in doc) # Grey sources: HTAIn, NSSO, state NHAs GREY_URLS = [ "https://htain.icmr.org.in/...", "https://mospi.gov.in/nsso...", ]

The grey literature gap is Layer II's defining problem. A PubMed-only Perception layer is a WEIRD-data bias machine. This layer must explicitly route to NSSO, HTAIn, state NHA portals, and MoHFW evaluation repositories — not as a supplement, but as co-equal sources. Claude PDF upload is the fastest path for dense government reports.

Layer III · Agentic · What can be extracted and structured?

"Which findings, numbers, and metrics survive a rigorous extraction pass?"

Problem framing

→

Screen

→

Extract

→

Synthesise

→

Brief

Crème-de-la-crème tools · 2026

Primary · structured extraction

Elicit

Still the strongest single agent for health economics extraction. Pulls ICERs, equity metrics, population subgroups, cost-per-DALY tables across dozens of papers simultaneously. Outperforms GPT-4o on column consistency.

Secondary · complex documents

Claude Projects + Humata

For multi-PDF corpus extraction where Elicit doesn't ingest the document type. Claude Projects maintains extraction schema across the entire corpus context window.

PRISMA automation

Rayyan + Nested Knowledge

Rayyan for collaborative title/abstract screening with AI pre-label; Nested Knowledge for life-sciences PRISMA audit trails and meta-analysis setups.

Systematic review infra

DistillerSR

Enterprise-grade PRISMA compliance, HTA context. Use when the output must be defensible to a regulatory or HTA board (HTAIn submissions).

Extraction schema + PRISMA tracker

app/schemas/agentic.py

class Extraction(BaseModel): paper_id: str icer: Optional[float] # USD per DALY daly: Optional[float] oop_change: Optional[float] # % change in OOP population: str # "rural BPL adults" equity_note: Optional[str] # "urban bias detected" prisma_stage: PrismaStage # identified|screened|included exclusion_reason: Optional[str]

Live PRISMA counts

—

Identified

—

Screened

—

Eligible

—

Included

—

Excluded

Layer IV · Generative · What does it mean?

"How do structured findings become a decision-ready narrative?"

Problem framing

→

Screen

→

Extract

→

Synthesise

→

Brief

Crème-de-la-crème tools · 2026

Primary · synthesis + brief

Claude Opus 4 / Sonnet 4.6

Unmatched long-context synthesis. Maintains argument coherence across 50+ extracted rows. Calibrates tone for MoHFW Joint Secretary audience. Produces assumption statements, sensitivity narratives, and executive summaries in one pass.

Secondary · multi-doc corpus

NotebookLM (Google)

Excels when source PDFs must remain cited and attributable in the brief. Audio overview feature useful for rapid team orientation on a new corpus.

Summary layer · plain language

SciSpace + Scholarcy

For explaining complex clinical methods to non-specialist policy audiences. SciSpace annotates PDFs in real time; Scholarcy generates flashcard-style triage.

Medical synthesis

OpenEvidence + Evidence Hunt

For clinical evidence threads embedded in health financing questions. OpenEvidence is fastest for evidence-based Q&A from clinical literature.

Synthesis schema

app/schemas/generative.py

class Synthesis(BaseModel): summary: str # 2–3 sentence executive lead key_findings: List[str] # ordered by policy weight uncertainties: List[str] # "robust if X, fails if Y" equity_summary:str # who benefits, who is excluded budget_note: Optional[str]# fiscal space context brief_draft: str # the policy brief itself

The brief-to-decision gap lives here. A 45-page systematic review does not serve a Joint Secretary under a 48-hour deadline. The Generative layer's sole test: can the brief_draft field be handed to a decision-maker right now? If not, the synthesis has not compressed far enough.

Layer V · Embodied · Should we act on it?

"Does this evidence actually warrant the decision the brief recommends — given this specific context?"

Problem framing

→

Screen

→

Extract

→

Synthesise

→

Judgment

This is the Human layer · no AI substitute (yet)

Primary audit stack · 2026

Human + Claude Projects / Grok-3 / GPT-4o

The "Embodied" layer is not a tool — it is the WHO India health economist applying four checks that no current LLM passes reliably: equity auditing (who is excluded), transferability (does Tamil Nadu ≠ Bihar?), budget reality (fiscal space), and political feasibility (what will MoHFW actually act on).

Four embodied checks

Check 1 · Equity

Who is excluded from the evidence base? Are the included populations representative of the BPL households PM-JAY targets, or do they proxy urban, insured, or literate populations?

Check 2 · Transferability

A finding from Kerala does not transfer to Bihar without a transferability statement. What infrastructure, literacy, and provider density assumptions does the evidence embed?

Check 3 · Budget reality

Is the recommended intervention within fiscal space? An ICER below the threshold is irrelevant if the Ministry cannot allocate the implementation budget in the current cycle.

Check 4 · Strategic distortion

Who commissioned this evidence? Who benefits from the recommendation? Embodied judgment is the layer that reads the political economy of the evidence — the one place AI cannot yet go.

Decision schema + feedback trigger

app/schemas/embodied.py

class Decision(BaseModel): decision: str # adopt | reject | conditional modifications: List[str] # required caveats equity_flags: List[str] # e.g. "urban bias" transferable: bool # to target state/district political_flag:bool # MoHFW action feasibility rerun_trigger: Optional[str]# → refeed to World layer

The loop: when rerun_trigger is set (e.g. "urban bias detected → narrow to rural BPL"), the pipeline returns to Layer I. The WorldInput is updated, the search re-runs with tighter inclusion criteria, and Layers II–IV are re-executed. This is Ukubona's differentiator — the adaptive loop neither the original session nor xAI's version formalized.

The complete chain

→

↻

world · perception · agentic · generative · embodied → rerun if flagged