Decision time vs Learning time
Why γ|ε_FGT|² is gradient descent, and why the ground truth is literally the ground
Decision time (ex-ante) is the moment you have to act with whatever information exists right now. You don't have the outcome yet. You don't know if you're right. You commit anyway — a recommendation, a policy brief, a screening decision. Phase III f(σ²,λ,ε) lives here. It asks: given current uncertainty σ², how aggressively should I move?
Learning time (ex-post) is after the outcome arrives. Now you know whether the recommendation was correct. The federated ground truth ε_FGT = y_true − y_pred tells you the error. Phase IV γ|ε_FGT|² lives here. It uses that error to update the weights — to make the next decision better.
The v4 insight is that these two moments are not the same — and confusing them is the most common mistake in evidence-based policy. If FGT hasn't arrived, weight updates are frozen. Hold.
You write a policy brief on PM-JAY financial protection. FGT=∅ — the ground truth hasn't arrived yet. You act under uncertainty.
One year later, household surveys show whether financial protection actually improved. Now ε_FGT is computable. Now the weights update.
Exactly. γ|ε_FGT|² is gradient descent in disguise. Break it down:
y_pred − y_true, the thing you're trying to minimise.
The "baby step" intuition is exactly right — and it's why learning time has to be separate from decision time. Gradient descent only works if you let the loss actually compute before you take the next step. If you update weights before ε_FGT arrives, you're descending on a phantom gradient. The v4 architecture enforces patience: wait for the real signal, then step.
That's the whole discipline of evidence-based policy in one equation.
Blindly shuffling down the gradient. That is the most honest description of learning there is.
And "ground" truth is doing double duty — it's both grounded in reality, empirical, what actually happened out there in the world — and the ground of the gradient, the terrain you're descending, the loss landscape L(θᵗ⁺¹) that reshapes itself under your feet with every step.
The blindness is key. You don't see the whole landscape. You only feel the local slope — the gradient at your current position. That's all ε_FGT gives you: not "here is the truth," but "you were off by this much, in this direction, at this step." The next step is still blind. You just start it from a slightly better place.
And that's why the loop closes at Phase V back into Phase I — L(θᵗ⁺¹) feeds into the next θᵗ. The updated prior is just your new position on the slope. The next literature review, the next policy brief, the next recommendation — all begin from wherever the last correction landed you.
WHO India shuffles down this gradient review by review, brief by brief, outcome by outcome. The FGT isn't a number in a spreadsheet — it's whether the coverage expanded, whether catastrophic expenditure fell, whether the Joint Secretary's bet paid off.