Decontaminated Endpoint Scaling Law (Compute Vs Final Loss)
Same Chinchilla-style 3-parameter fit as the main report:
\(L_\infty(C) = E + A\,(C/10^{18})^{-\alpha}\), fit per
\((\textrm{mix}, \textrm{LR}, \textrm{val set})\) on the small ladder
(3e18 → 3e20) by \(\texttt{scipy.optimize.curve\_fit}\) with \(E < \min y\), \(A,\alpha \ge 0\)
— identical code path (\(\texttt{fit\_floor\_power}\)). The four val sets: the original
(contaminated) 12,500-window math val as anchor, and the paranoid decon sets dropping val docs
with any verified train near-duplicate at Jaccard ≥ 0.90 / 0.75 / 0.50. Losses come from the
decon eval sweep (one v6e-4 job per checkpoint, all four sets evaluated in-harness together).
1e21 and 1e22 actuals are triangles — never used by the fit below the cutoff. The log-log
2-parameter fit is available as a toggle.