perf(linalg,mpd): compute MPD cluster+hc2_bm CR2 precomputes once#566
Conversation
ac5d27d to
2e20408
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment: ✅ Looks goodExecutive Summary
MethodologyNo findings. The shared core preserves the registered CR2 sandwich and Satterthwaite contrast-DOF formulas, and the registry now explicitly documents the one-precompute MPD path at Code QualityFinding 1 — P3 informational: dormant weighted fallback omits
PerformanceNo findings. The mechanism test at MaintainabilityNo findings beyond the P3 fallback note above. Centralizing CR2 precomputes reduces duplicated math paths. Tech DebtNo findings. The removed TODO row matches the completed MPD SecurityNo findings. No secrets or security-sensitive changes identified. Documentation/TestsNo findings. Static review of the added tests shows relevant coverage in |
MultiPeriodDiD(cluster=..., vcov_type="hc2_bm") built the expensive CR2 Bell-McCaffrey per-cluster precomputes (A_g eigendecompositions, S_W, the residual-maker M = I - H) twice per fit: once in solve_ols's vcov path (whose per-coefficient DOF was then discarded) and again in the separate _compute_cr2_bm_contrast_dof call for the per-coef + post-period-average ATT Satterthwaite DOF. Extract a shared core `_compute_cr2_bm_vcov_and_dof` (vcov + DOF for arbitrary contrasts from one precompute build); `_compute_cr2_bm` and _compute_cr2_bm_contrast_dof` become thin bit-identical wrappers over it (removing the formerly-duplicated ~50-line precompute block, so every CR2 caller routes through one implementation). The shared filter guards the residuals subscript with `residuals is not None` so DOF-only callers that pass weights with zero rows do not crash. At the MPD fit level, bypass solve_ols's vcov on the cluster+hc2_bm (non-survey, unweighted) path and compute vcov + the combined per-coef/avg-ATT DOF from one call, then expand with NaN for dropped columns; the `survey_weights is None` clause keeps the bypass byte-identical to solve_ols (else it falls back to the prior two-call behavior). Bit-identical vcov_, per-period DOF, avg-ATT DOF, p-values, and CIs (proven at atol=0 across balanced, unbalanced, and rank-deficient designs); a mechanism test asserts the per-cluster adjustment matrix is built exactly once per cluster (down from twice). No methodology, numerical, or public-API change. Updates the stale REGISTRY note that described the prior twice-built precompute. Resolves the MPD cluster+hc2_bm Performance row in TODO.md and completes the MultiPeriodDiD follow-up noted in the #565 LinearRegression entry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2e20408 to
974455d
Compare

Summary
_compute_cr2_bm_vcov_and_dofcore inlinalg.py; make_compute_cr2_bmand_compute_cr2_bm_contrast_dofthin bit-identical wrappers over it. Removes the ~50-line precompute block that was duplicated between the two functions, so every CR2 caller (_compute_robust_vcov_numpy, StackedDiD, SunAbraham, WooldridgeDiD, the weighted singleton dispatch, MPD) routes through one implementation.MultiPeriodDiD(cluster=..., vcov_type="hc2_bm")(non-survey, unweighted) previously built the expensive CR2 Bell-McCaffrey per-cluster precomputes (theA_gadjustment-matrix eigendecompositions,S_W, the residual-makerM = I − H) twice per fit — once insolve_ols's vcov path (whose per-coefficient DOF was then discarded) and again in the separate_compute_cr2_bm_contrast_dofcall for the per-coefficient + post-period-average ATT Satterthwaite DOF. The fit now bypassessolve_ols's vcov on this path and computes vcov and the combined DOF from a single precompute build, then expands the reduced vcov with NaN for dropped columns.residualssubscript withresiduals is not None, so DOF-only callers passing weights with zero rows do not crash. Thesurvey_weights is Noneclause keeps the bypass byte-identical tosolve_ols(any hypothetical weighted path falls back to the prior two-call behavior).cluster+hc2_bmPerformance row fromTODO.md; completes the follow-up flagged in the perf(linalg): compute hc2_bm CR2 sandwich once, not twice (#475) #565LinearRegressionCHANGELOG entry.Methodology references (required if estimator / math changes)
docs/methodology/REGISTRY.md→ "Variance Estimation, Cluster-Robust SE";clubSandwich::vcovCR(type="CR2") + coef_test(test="Satterthwaite").Validation
tests/test_estimators_vcov_type.py(newTestMPDClusterHC2BMSharedPrecompute: asserts the per-cluster adjustment matrix is built exactlyGtimes — one precompute build, down from2·G— on balanced and unbalanced designs, plus fit determinism);tests/test_linalg_hc2_bm.py(wrapper/coreatol=0equivalence for the per-coef and DOF-only paths, unweighted + weighted; zero-weight +residuals=NoneDOF-only no-crash regression).atol=0forvcov_, per-period DOF, avg-ATT DOF, p-values, and CIs across balanced, unbalanced, and rank-deficient designs (60 fields × 3 scenarios). The existing R/clubSandwichgoldens (test_multi_period_cluster_hc2_bm_avg_att_uses_clubsandwich_dof,tests/test_linalg_hc2_bm.py) remain the independent absolute-value anchor and pass. Targeted suites green (test_estimators_vcov_type,test_linalg_hc2_bm,test_methodology_wls_cr2,test_estimators, plus the StackedDiD/SunAbraham/Wooldridge callers);black/ruffclean;mypy0 new errors vs baseline.Security / privacy
🤖 Generated with Claude Code