iframe-proxy

igerber · 2026-05-18T09:57:11Z

Summary

Lift MultiPeriodDiD(cluster=..., vcov_type="hc2_bm") NotImplementedError gate at estimators.py:1657. The post-period-average ATT (avg_att = (1/n_post) Σ_{t ≥ t_treat} β_t) is a compound contrast; pre-PR the cluster-aware CR2 Bell-McCaffrey Satterthwaite DOF was only implemented for per-coefficient contrasts.
New _compute_cr2_bm_contrast_dof helper in diff_diff/linalg.py generalizes the per-coefficient loop in _compute_cr2_bm to arbitrary (k, m) contrast matrices via the identical Pustejovsky-Tipton 2018 Section 4 algebra. _compute_cr2_bm is refactored to call the new helper with contrasts=eye(k) (per-coefficient case bit-equivalent at atol=1e-10).
MultiPeriodDiD.fit() extends its existing avg_att DOF block (PR Lift MultiPeriodDiD-absorb HC2/HC2-BM gate via auto-route #459) to branch on effective_cluster_ids: one-way _compute_bm_dof_from_contrasts when None, cluster-aware _compute_cr2_bm_contrast_dof otherwise. Cluster IDs are passed unmodified (per-observation, not subscripted by the rank-deficient column-drop mask).
After this PR, 3 of 6 HC2/HC2-BM gates lifted (DiD-absorb DiD-absorb HC2/HC2-BM: auto-route to fixed_effects internally #458, MPD-absorb Lift MultiPeriodDiD-absorb HC2/HC2-BM gate via auto-route #459, MPD-cluster-contrast-DOF this PR). Remaining: TWFE absorb (Gate 1), weighted CR2-BM (Gates 4-5).

Methodology references

Method name(s): CR2 Bell-McCaffrey cluster-robust variance + Satterthwaite DOF for compound contrasts
Paper / source link(s): Pustejovsky & Tipton (2018) "Small-sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models" §4 / Appendix A. clubSandwich R package Wald_test(constraints=matrix(c, 1), test="HTZ")$df_denom is the R parity target (on a 1-row constraint matrix, HTZ reduces to a Satterthwaite t-test). Bell & McCaffrey (2002), Imbens & Kolesar (2016) for the underlying BM Satterthwaite framework. See docs/methodology/REGISTRY.md § HC2 + Bell-McCaffrey scope-limitation block (MPD cluster+hc2_bm status flipped from REJECT → SUPPORTED).
Any intentional deviations from the source: None. The contrast-DOF algebra is identical to clubSandwich's; the new helper matches Wald_test(test="HTZ")$df_denom at atol=1e-10 on the new mpd_clustered_avg_att_dof fixture (smoke test passed at atol=1e-13 before any source edits).

Validation

Tests added/updated:
- tests/test_linalg_hc2_bm.py::TestCR2BMContrastDOF — 4 new tests: refactor regression (helper with eye(k) matches _compute_cr2_bm at atol=1e-10), R-parity for compound contrast vs clubSandwich (atol=1e-10), ndim+shape validation, cluster-count validation.
- tests/test_estimators_vcov_type.py::TestFitBehavior::test_multi_period_cluster_plus_hc2_bm_produces_finite_inference — existing rejection test flipped to behavioral; asserts finite avg_att + period_effects inference under the lifted gate.
- tests/test_estimators_vcov_type.py::TestFitBehavior::test_multi_period_cluster_hc2_bm_avg_att_uses_clubsandwich_dof — NEW end-to-end estimator-level parity test: fits MPD on the R mpd_clustered_avg_att_dof golden fixture, recovers the implied Satterthwaite DOF from avg_p_value, asserts it matches the R Wald_test target at atol=1e-6. Derives post_periods from the golden's post_interaction_names so the Python and R contrasts are bit-equivalent (local R2 found that MPD's default [3,4] post-period rule diverged from the R fixture's [2,3,4]).
New R golden scenario mpd_clustered_avg_att_dof in benchmarks/R/generate_clubsandwich_golden.R + regenerated benchmarks/data/clubsandwich_cr2_golden.json. 15-unit × 4-period staggered panel with cluster=unit.
Local Codex review: R1 (3 P3) → R2 (1 P3) → R3 (✅ zero findings). All 347 tests in linalg/estimators/methodology suites pass; lint clean.
Backtest / simulation / notebook evidence: N/A (analytical-sandwich methodology; no tutorials touched).

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

…vg_att inference Closes Gate 6 of the six HC2/HC2-BM NotImplementedError gates: MultiPeriodDiD(cluster=..., vcov_type="hc2_bm") at estimators.py:1657 previously raised NotImplementedError because _compute_cr2_bm returns per-coefficient Satterthwaite DOF only — the post-period-average ATT (`avg_att = (1/n_post) Sum_{t >= t_treat} beta_t`) is a compound contrast that needed a cluster-aware contrast DOF helper. New _compute_cr2_bm_contrast_dof in diff_diff/linalg.py generalizes the per-coefficient loop in _compute_cr2_bm to arbitrary (k, m) contrast matrices using the identical Pustejovsky-Tipton 2018 Section 4 algebra (`q = X bread_inv c`, `omega_g = A_g X_g bread_inv c`, `DOF = trace(B)^2 / trace(B^2)`). _compute_cr2_bm is refactored to call the new helper via a private _cr2_bm_dof_inner with `contrasts=eye(k)`; refactor regression at atol=1e-10 confirms the per-coefficient DOFs are preserved (matmul ordering differs slightly from the prior inline loop). MultiPeriodDiD.fit() extends its existing avg_att DOF block (introduced in PR #459) to branch on effective_cluster_ids: one-way _compute_bm_dof_from_contrasts when None, cluster-aware _compute_cr2_bm_contrast_dof otherwise. Cluster IDs are per-observation length n and are NOT subscripted by the rank-deficient column-drop mask `_kept` (which indexes coefficients, not observations). R parity verified at atol=1e-10 against clubSandwich's Wald_test(constraints=matrix(c, 1), test="HTZ")$df_denom on a new mpd_clustered_avg_att_dof fixture in benchmarks/data/clubsandwich_cr2_golden.json. On a 1-row constraint matrix, HTZ reduces to a Satterthwaite t-test and its df_denom IS the BM Satterthwaite DOF. The pre-flight smoke test against this same R target passed at atol=1e-13 before any source edits. Tests: - TestCR2BMContrastDOF (4 new tests): refactor regression vs library, R-parity for compound contrast, shape validation, cluster-count validation. - test_multi_period_cluster_plus_hc2_bm_rejected flipped to test_multi_period_cluster_plus_hc2_bm_produces_finite_inference (end-to-end MPD wire-through with finite avg_att / period_effects inference assertions). After this PR, 3 of 6 HC2/HC2-BM gates are lifted (DiD-absorb #458, MPD-absorb #459, MPD-cluster-contrast-DOF this PR). Remaining: TWFE absorb (Gate 1), weighted HC2-BM (Gates 4-5). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Local Codex review on commit 79e0962 returned ✅ with 3 P3s (all documentation/coverage, no actionable P0/P1). Per the test-coverage P3 upgrade rule (feedback_test_coverage_gap_treat_as_actionable.md), addressing all three: P3 #1 (Code Quality): `_compute_cr2_bm_contrast_dof` was missing the `ndim` validation that the parallel one-way `_compute_bm_dof_from_contrasts` helper has, so a stray `(k,)` 1-D vector would die with a low-level indexing error instead of a contract error. Added the same shape-tuple check pattern (`if contrasts.ndim != 2 or contrasts.shape[0] != k`). P3 #2 (Docs): two stale doc surfaces post-feature-lift — - `estimators.py:68-71` base estimator docstring still said MPD did NOT support cluster + hc2_bm. Rewrote to describe the new cluster-aware contrast-DOF support and flag survey CR2-BM as the remaining gate. - `tests/test_linalg_hc2_bm.py` module banner still said clustered CR2 BM was "deferred to a follow-up". Updated to describe both the per-coefficient and the new compound-contrast DOF surfaces, and narrow the deferral to the weighted CR2-BM case only. P3 #3 (Tests): the new MPD test only asserted finite output, so a regression that silently fell back to the shared n-k DOF would still pass. Added `test_multi_period_cluster_hc2_bm_avg_att_uses_clubsandwich_dof` which fits MPD on the new R `mpd_clustered_avg_att_dof` fixture and recovers the implied Satterthwaite DOF by inverting `avg_p_value = 2 * (1 - t.cdf(|avg_t_stat|, df))` via scipy.brentq. The recovered DOF must match the R `Wald_test(test="HTZ")$df_denom` at atol=1e-6. Also pins that the implied DOF is much smaller than the n-k fallback (~39 here) — catches a regression to the shared df path. All 254 tests in tests/test_linalg_hc2_bm.py + test_estimators_vcov_type.py + test_estimators.py pass; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Local Codex R2 returned ✅ with 1 substantive P3: the new MPD parity test was fitting MPD without an explicit `post_periods=` argument, so MPD's default "last half of periods" rule selected `[3, 4]` on the 4-period fixture while the R generator defines the parity contrast over `[2, 3, 4]` (per `post_interaction_names` in the JSON). The Satterthwaite DOFs happened to coincide here (~8.1) which masked the estimand mismatch — the test would have stayed green if MPD silently fit the wrong contrast. Fix: derive `post_periods` from the golden JSON's `post_interaction_names` field and pass it explicitly to MPD.fit(). The test now asserts that MPD computes `avg_att` over the exact same contrast vector R uses for the Wald_test DOF target. This makes the test a genuine estimand-level parity pin rather than just a DOF-magnitude smell check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-18T10:02:31Z

Overall Assessment

✅ Looks good — I did not find any unmitigated P0/P1 issues in the new clustered MultiPeriodDiD(..., vcov_type="hc2_bm") path.

Executive Summary

The new _compute_cr2_bm_contrast_dof() helper is methodologically aligned with the existing CR2 Bell-McCaffrey machinery and is wired into avg_att inference without introducing inline inference anti-patterns.
The rank-deficient-column handling looks consistent with the existing solve_ols() contract: DOF is computed on kept columns only, and avg_att still falls back to all-NaN inference when any post-period effect is unidentified.
The new tests cover the main methodology surface reasonably well: helper parity/refactor safety, clubSandwich parity for the compound contrast, shape validation, cluster-count validation, and estimator-level implied-DOF recovery.
The main issue is documentation consistency: the primary MultiPeriodDiD registry entry and the class docstring still say clustered hc2_bm is unsupported, while this PR implements and later documents it as supported.
Static review only: I could not execute the tests here because this environment lacks both pytest and numpy.

Methodology

No findings. On the changed code path, I do not see an undocumented deviation from the Methodology Registry or an incorrect variance/SE implementation. The new helper applies the same PT2018-style CR2 adjustment structure already used in _compute_cr2_bm, and the estimator continues to route final inference through safe_inference().

Code Quality

No findings. The new code keeps the inference contract centralized and avoids the project’s forbidden inline t = effect / se pattern.

Performance

P3 diff_diff/estimators.py:L1847-L1901, diff_diff/linalg.py:L1506-L1600, diff_diff/linalg.py:L1668-L1736: MultiPeriodDiD.fit() now recomputes the full CR2 precomputations (H, M, cluster blocks, A_g) after solve_ols() already built the same objects for the vcov path. Impact: clustered hc2_bm MPD fits now pay the expensive O(n²) CR2 setup twice. Concrete fix: plumb compound-contrast DOF through the existing CR2 vcov path, or cache/reuse the shared CR2 precomputes when both vcov and contrast DOF are needed.

Maintainability

No findings beyond the duplicated CR2 setup noted above.

Tech Debt

No findings. The remaining weighted Bell-McCaffrey limitations are already explicitly tracked in TODO.md:L103-L105, so I would not block on them here.

Security

No findings.

Documentation/Tests

P2 docs/methodology/REGISTRY.md:L167-L175, diff_diff/estimators.py:L1109-L1131, contrasted with docs/methodology/REGISTRY.md:L2557-L2557: the Methodology Registry’s main MultiPeriodDiD section and the MultiPeriodDiD class docstring still state that cluster + vcov_type="hc2_bm" is unsupported, but this PR lifts that gate and later documents it as supported. Impact: the project’s methodology source of truth is internally inconsistent, and users reading the class docstring or top-level registry entry will be told the new capability does not exist. Concrete fix: update the older “unsupported” text to match the new supported-path note, and reserve the remaining gate language for weighted/survey CR2-BM limitations only.

CI Codex review on PR #465 (commit 41a323e) returned ✅ with two findings: P2 (Documentation): Two stale doc surfaces post-feature-lift still said MPD does NOT support cluster + hc2_bm. The HC2/BM scope-limitation Note at REGISTRY.md:2557 was already updated in 79e0962, but: - REGISTRY.md:167-176 (main MPD section under HeterogeneousAdoptionDiD requirements-checklist) still had the old "not supported" Note. - estimators.py MPD class docstring's `cluster` and `vcov_type` blocks still said `cluster + hc2_bm` raises NotImplementedError. Both rewritten to describe the new supported path with a pointer to _compute_cr2_bm_contrast_dof and the clubSandwich Wald_test(HTZ) parity target. Weighted CR2-BM remains the only documented gate. P3 (Performance): _compute_cr2_bm_contrast_dof recomputes H, M, and per-cluster A_g matrices that solve_ols → _compute_cr2_bm already built for the vcov path. O(n²k) redundant work per clustered hc2_bm MPD fit; acceptable for typical cluster-robust DiD panel sizes (n ≤ few thousand). Tracked as a new Performance row in TODO.md; acknowledged with a Note in REGISTRY.md per the codex deferral rules (`**Note:**` label + TODO entry downgrades to P3-deferred). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-18T10:10:43Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 59971295ca0b6c794c0d746b48932980e70fafbd

Overall Assessment

✅ Looks good — I did not find any unmitigated P0/P1 issues in the changed clustered MultiPeriodDiD(..., vcov_type="hc2_bm") path.

Executive Summary

The prior review’s documentation mismatch is resolved: the main registry entry and the estimator docstrings now consistently mark clustered hc2_bm as supported (docs/methodology/REGISTRY.md:L167-L185, docs/methodology/REGISTRY.md:L2565-L2567, diff_diff/estimators.py:L55-L74, diff_diff/estimators.py:L1109-L1131).
Methodology looks aligned: _compute_cr2_bm() now delegates the per-coefficient case to the new contrast-aware helper, and MultiPeriodDiD.fit() uses that helper for the compound avg_att contrast (diff_diff/linalg.py:L1545-L1736, diff_diff/estimators.py:L1847-L1908).
The changed inference path still routes through safe_inference(), and I did not find any new inline t = effect / se or partial-NaN inference anti-patterns in the touched files (diff_diff/estimators.py:L1920-L1974).
The remaining duplicated CR2 precomputes are now explicitly documented and tracked, so that issue is mitigated P3 tech debt rather than a blocker (TODO.md:L143-L153, docs/methodology/REGISTRY.md:L179-L185).
Static review only: this workspace does not have numpy or pytest, so I could not execute the updated tests.

Methodology

No findings. The changed path is consistent with the Methodology Registry’s stated CR2 Bell-McCaffrey contract, and the new helper is wired only into the contrast-DOF portion of avg_att inference (diff_diff/linalg.py:L1668-L1736, diff_diff/estimators.py:L1847-L1908).

Code Quality

No findings. The PR keeps inference centralized through safe_inference() and avoids the project’s forbidden inline inference pattern (diff_diff/estimators.py:L1920-L1974).

Performance

Severity: P3 (tracked in TODO.md, mitigated).
Impact: clustered hc2_bm MPD fits still rebuild the CR2 hat/residual-maker/adjustment matrices once in solve_ols() and again in the new contrast-DOF path, adding redundant O(n^2 k) work without changing results (diff_diff/estimators.py:L1862-L1900, diff_diff/linalg.py:L1545-L1600, diff_diff/linalg.py:L1668-L1736, TODO.md:L151-L153).
Concrete fix: reuse the existing CR2 precomputes or plumb compound-contrast DOF through the current CR2 vcov path.

Maintainability

No findings beyond the tracked performance duplication above.

Tech Debt

No new findings. The duplicate-precompute debt is now explicitly tracked in TODO.md:L151-L153, so it is mitigated under the review policy.

Security

No findings.

Documentation/Tests

No findings. The prior review’s P2 documentation inconsistency is resolved in both the registry and estimator docstrings (docs/methodology/REGISTRY.md:L167-L185, docs/methodology/REGISTRY.md:L2565-L2567, diff_diff/estimators.py:L55-L74, diff_diff/estimators.py:L1109-L1131).
No findings. The new tests cover helper parity, clubSandwich compound-contrast parity, validation failures, and estimator-level implied-DOF recovery (tests/test_linalg_hc2_bm.py:L689-L752, tests/test_estimators_vcov_type.py:L510-L628).
Static review only: I could not run the suite here because numpy and pytest are unavailable in this workspace.

… (R1 P0) Local codex R1 caught a P0: StackedDiD(vcov_type="hc2_bm") computed CR2 vcov correctly but never propagated the Bell-McCaffrey Satterthwaite DOF into safe_inference() calls. event_study_effects[h]['p_value']/['conf_int'] and overall_p_value/overall_conf_int silently fell back to normal-theory inference (df=None ⇒ scipy.norm), contradicting the registry contract. Fix mirrors the SunAbraham aggregated-inference pattern from PR igerber#472 (sun_abraham.py:997-1097). After solve_ols(), if vcov_type=="hc2_bm" and not on the survey replicate-refit path, build contrast matrix: - Per-event-time: unit vector at each interaction_indices[h] - Overall ATT: 1/K average across post-period interaction columns Call _compute_cr2_bm_contrast_dof(X, cluster_ids, bread, contrasts, weights=composed_weights). Apply per-event-time DOFs to event_study_effects inference, overall DOF to overall_* inference. Wrap in try/except so any rank-deficient or linalg failure emits a UserWarning and falls back to normal-theory (visible deviation, not silent). R fixture extended with the post-period-average ATT contrast DOF via Wald_test(constraints=row_avg, vcov=CR2, test="HTZ")$df_denom (mirrors PR igerber#465's MPD avg_att approach). New goldens at both cluster=unit and cluster=unit_subexp. Test additions / strengthening (addresses R1 P2 + P3): - test_hc2_bm_per_event_dof_matches_coef_test_df_satt_unit_cluster: uses brentq inversion of CI half-width to recover the DOF safe_inference actually used. If propagation failed (the R1 P0 bug), the inversion raises ValueError → test FAILS instead of silently skipping (replaces the prior `continue`-on-failure pattern which could vacuously pass). Hard-asserts validated_count == len(event_times). - test_hc2_bm_overall_att_dof_matches_wald_test_htz_unit_cluster (NEW): pins overall ATT DOF at atol=1e-6 against R Wald_test(HTZ)$df_denom. - test_hc2_bm_overall_att_dof_matches_wald_test_htz_unit_subexp_cluster (NEW): symmetric coverage at alternate cluster level. - Renamed CR1 → CR1S throughout docs/tests/REGISTRY for consistency (diff-diff's HC1+cluster uses Stata-style G/(G-1)*(n-1)/(n-p); plain CR1 omits the (n-1)/(n-p) term and diverges by ~1.4%). 192 tests pass (74 stacked + 19 wls_cr2 + 47 SA + 52 estimators_vcov_type). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Local codex R2 caught a P1: on rank-deficient stacked designs, the BM contrast DOF block called `_compute_cr2_bm_contrast_dof()` on the singular full bread matrix, which raised LinAlgError and triggered the catch-and- fallback path — downgrading the aggregated inference to normal-theory even when the target event-time delta_h coefficients were still identified. Fix mirrors the MultiPeriodDiD rank-deficient pattern (PR igerber#465, estimators.py:1860-1913): - Derive `_identified = ~np.isnan(coef)` (solve_ols emits NaN for dropped columns under R-style rank handling). - Subset X, bread, and contrasts to the identified-column block BEFORE calling `_compute_cr2_bm_contrast_dof`. - Only build per-event-time contrasts for event-times whose target delta_h column is identified; only build the overall ATT contrast when ALL post-period delta_h are identified (otherwise the contrast is undefined). - The remaining try/except is a defensive guard for genuine singularities on the identified-column design (rare; the rank-deficient handling already removes the problematic columns). The earlier behavior was too aggressive: a single dropped nuisance column would knock out BM DOF for ALL contrasts. The new behavior NaN-guards only contrasts that target a dropped column. New regression test `test_hc2_bm_rank_deficient_design_keeps_bm_dof_*` — verifies the fit doesn't emit fallback warnings on a well-conditioned design + checks that CI half-widths use t(BM DOF) instead of z=1.96 (catches the R1/R2 normal-fallback failure mode end-to-end). 75 tests pass across stacked + methodology_stacked_did. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber and others added 3 commits May 17, 2026 21:55

igerber added the ready-for-ci Triggers CI test workflows label May 18, 2026

igerber merged commit 956445e into main May 18, 2026
33 of 34 checks passed

igerber deleted the mpd-cluster-hc2-bm-contrast-dof branch May 18, 2026 12:10

This was referenced May 19, 2026

Lift Gate 1: HC2/HC2-BM for TwoWayFixedEffects via full-dummy auto-route #469

Merged

Thread vcov_type through StackedDiD (Phase 1b 2/8) #479

Merged

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lift Gate 6: cluster-aware CR2 Bell-McCaffrey contrast DOF for MultiPeriodDiD avg_att#465

Lift Gate 6: cluster-aware CR2 Bell-McCaffrey contrast DOF for MultiPeriodDiD avg_att#465
igerber merged 4 commits into
mainfrom
mpd-cluster-hc2-bm-contrast-dof

igerber commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Sunbelt Computer Software

PL/B Language Development and Support

Conversation

igerber commented May 18, 2026

Summary

Methodology references

Validation

Security / privacy

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant