Lift Gate 6: cluster-aware CR2 Bell-McCaffrey contrast DOF for MultiPeriodDiD avg_att by igerber · Pull Request #465 · igerber/diff-diff · GitHub
Skip to content

Lift Gate 6: cluster-aware CR2 Bell-McCaffrey contrast DOF for MultiPeriodDiD avg_att#465

Merged
igerber merged 4 commits into
mainfrom
mpd-cluster-hc2-bm-contrast-dof
May 18, 2026
Merged

Lift Gate 6: cluster-aware CR2 Bell-McCaffrey contrast DOF for MultiPeriodDiD avg_att#465
igerber merged 4 commits into
mainfrom
mpd-cluster-hc2-bm-contrast-dof

Conversation

@igerber

@igerber igerber commented May 18, 2026

Copy link
Copy Markdown
Owner

Summary

  • Lift MultiPeriodDiD(cluster=..., vcov_type="hc2_bm") NotImplementedError gate at estimators.py:1657. The post-period-average ATT (avg_att = (1/n_post) Σ_{t ≥ t_treat} β_t) is a compound contrast; pre-PR the cluster-aware CR2 Bell-McCaffrey Satterthwaite DOF was only implemented for per-coefficient contrasts.
  • New _compute_cr2_bm_contrast_dof helper in diff_diff/linalg.py generalizes the per-coefficient loop in _compute_cr2_bm to arbitrary (k, m) contrast matrices via the identical Pustejovsky-Tipton 2018 Section 4 algebra. _compute_cr2_bm is refactored to call the new helper with contrasts=eye(k) (per-coefficient case bit-equivalent at atol=1e-10).
  • MultiPeriodDiD.fit() extends its existing avg_att DOF block (PR Lift MultiPeriodDiD-absorb HC2/HC2-BM gate via auto-route #459) to branch on effective_cluster_ids: one-way _compute_bm_dof_from_contrasts when None, cluster-aware _compute_cr2_bm_contrast_dof otherwise. Cluster IDs are passed unmodified (per-observation, not subscripted by the rank-deficient column-drop mask).
  • After this PR, 3 of 6 HC2/HC2-BM gates lifted (DiD-absorb DiD-absorb HC2/HC2-BM: auto-route to fixed_effects internally #458, MPD-absorb Lift MultiPeriodDiD-absorb HC2/HC2-BM gate via auto-route #459, MPD-cluster-contrast-DOF this PR). Remaining: TWFE absorb (Gate 1), weighted CR2-BM (Gates 4-5).

Methodology references

  • Method name(s): CR2 Bell-McCaffrey cluster-robust variance + Satterthwaite DOF for compound contrasts
  • Paper / source link(s): Pustejovsky & Tipton (2018) "Small-sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models" §4 / Appendix A. clubSandwich R package Wald_test(constraints=matrix(c, 1), test="HTZ")$df_denom is the R parity target (on a 1-row constraint matrix, HTZ reduces to a Satterthwaite t-test). Bell & McCaffrey (2002), Imbens & Kolesar (2016) for the underlying BM Satterthwaite framework. See docs/methodology/REGISTRY.md § HC2 + Bell-McCaffrey scope-limitation block (MPD cluster+hc2_bm status flipped from REJECT → SUPPORTED).
  • Any intentional deviations from the source: None. The contrast-DOF algebra is identical to clubSandwich's; the new helper matches Wald_test(test="HTZ")$df_denom at atol=1e-10 on the new mpd_clustered_avg_att_dof fixture (smoke test passed at atol=1e-13 before any source edits).

Validation

  • Tests added/updated:
    • tests/test_linalg_hc2_bm.py::TestCR2BMContrastDOF — 4 new tests: refactor regression (helper with eye(k) matches _compute_cr2_bm at atol=1e-10), R-parity for compound contrast vs clubSandwich (atol=1e-10), ndim+shape validation, cluster-count validation.
    • tests/test_estimators_vcov_type.py::TestFitBehavior::test_multi_period_cluster_plus_hc2_bm_produces_finite_inference — existing rejection test flipped to behavioral; asserts finite avg_att + period_effects inference under the lifted gate.
    • tests/test_estimators_vcov_type.py::TestFitBehavior::test_multi_period_cluster_hc2_bm_avg_att_uses_clubsandwich_dof — NEW end-to-end estimator-level parity test: fits MPD on the R mpd_clustered_avg_att_dof golden fixture, recovers the implied Satterthwaite DOF from avg_p_value, asserts it matches the R Wald_test target at atol=1e-6. Derives post_periods from the golden's post_interaction_names so the Python and R contrasts are bit-equivalent (local R2 found that MPD's default [3,4] post-period rule diverged from the R fixture's [2,3,4]).
  • New R golden scenario mpd_clustered_avg_att_dof in benchmarks/R/generate_clubsandwich_golden.R + regenerated benchmarks/data/clubsandwich_cr2_golden.json. 15-unit × 4-period staggered panel with cluster=unit.
  • Local Codex review: R1 (3 P3) → R2 (1 P3) → R3 (✅ zero findings). All 347 tests in linalg/estimators/methodology suites pass; lint clean.
  • Backtest / simulation / notebook evidence: N/A (analytical-sandwich methodology; no tutorials touched).

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

igerber and others added 3 commits May 17, 2026 21:55
…vg_att inference

Closes Gate 6 of the six HC2/HC2-BM NotImplementedError gates:
MultiPeriodDiD(cluster=..., vcov_type="hc2_bm") at estimators.py:1657
previously raised NotImplementedError because _compute_cr2_bm returns
per-coefficient Satterthwaite DOF only — the post-period-average ATT
(`avg_att = (1/n_post) Sum_{t >= t_treat} beta_t`) is a compound
contrast that needed a cluster-aware contrast DOF helper.

New _compute_cr2_bm_contrast_dof in diff_diff/linalg.py generalizes the
per-coefficient loop in _compute_cr2_bm to arbitrary (k, m) contrast
matrices using the identical Pustejovsky-Tipton 2018 Section 4 algebra
(`q = X bread_inv c`, `omega_g = A_g X_g bread_inv c`,
`DOF = trace(B)^2 / trace(B^2)`). _compute_cr2_bm is refactored to
call the new helper via a private _cr2_bm_dof_inner with
`contrasts=eye(k)`; refactor regression at atol=1e-10 confirms the
per-coefficient DOFs are preserved (matmul ordering differs slightly
from the prior inline loop).

MultiPeriodDiD.fit() extends its existing avg_att DOF block (introduced
in PR #459) to branch on effective_cluster_ids: one-way
_compute_bm_dof_from_contrasts when None, cluster-aware
_compute_cr2_bm_contrast_dof otherwise. Cluster IDs are per-observation
length n and are NOT subscripted by the rank-deficient column-drop
mask `_kept` (which indexes coefficients, not observations).

R parity verified at atol=1e-10 against clubSandwich's
Wald_test(constraints=matrix(c, 1), test="HTZ")$df_denom on a new
mpd_clustered_avg_att_dof fixture in
benchmarks/data/clubsandwich_cr2_golden.json. On a 1-row constraint
matrix, HTZ reduces to a Satterthwaite t-test and its df_denom IS the
BM Satterthwaite DOF. The pre-flight smoke test against this same R
target passed at atol=1e-13 before any source edits.

Tests:
- TestCR2BMContrastDOF (4 new tests): refactor regression vs library,
  R-parity for compound contrast, shape validation, cluster-count
  validation.
- test_multi_period_cluster_plus_hc2_bm_rejected flipped to
  test_multi_period_cluster_plus_hc2_bm_produces_finite_inference
  (end-to-end MPD wire-through with finite avg_att / period_effects
  inference assertions).

After this PR, 3 of 6 HC2/HC2-BM gates are lifted (DiD-absorb #458,
MPD-absorb #459, MPD-cluster-contrast-DOF this PR). Remaining: TWFE
absorb (Gate 1), weighted HC2-BM (Gates 4-5).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Local Codex review on commit 79e0962 returned ✅ with 3 P3s (all
documentation/coverage, no actionable P0/P1). Per the test-coverage P3
upgrade rule (feedback_test_coverage_gap_treat_as_actionable.md),
addressing all three:

P3 #1 (Code Quality): `_compute_cr2_bm_contrast_dof` was missing the
`ndim` validation that the parallel one-way `_compute_bm_dof_from_contrasts`
helper has, so a stray `(k,)` 1-D vector would die with a low-level
indexing error instead of a contract error. Added the same shape-tuple
check pattern (`if contrasts.ndim != 2 or contrasts.shape[0] != k`).

P3 #2 (Docs): two stale doc surfaces post-feature-lift —
  - `estimators.py:68-71` base estimator docstring still said MPD did
    NOT support cluster + hc2_bm. Rewrote to describe the new
    cluster-aware contrast-DOF support and flag survey CR2-BM as the
    remaining gate.
  - `tests/test_linalg_hc2_bm.py` module banner still said clustered
    CR2 BM was "deferred to a follow-up". Updated to describe both the
    per-coefficient and the new compound-contrast DOF surfaces, and
    narrow the deferral to the weighted CR2-BM case only.

P3 #3 (Tests): the new MPD test only asserted finite output, so a
regression that silently fell back to the shared n-k DOF would still
pass. Added `test_multi_period_cluster_hc2_bm_avg_att_uses_clubsandwich_dof`
which fits MPD on the new R `mpd_clustered_avg_att_dof` fixture and
recovers the implied Satterthwaite DOF by inverting
`avg_p_value = 2 * (1 - t.cdf(|avg_t_stat|, df))` via scipy.brentq. The
recovered DOF must match the R `Wald_test(test="HTZ")$df_denom` at
atol=1e-6. Also pins that the implied DOF is much smaller than the
n-k fallback (~39 here) — catches a regression to the shared df path.

All 254 tests in tests/test_linalg_hc2_bm.py + test_estimators_vcov_type.py
+ test_estimators.py pass; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Local Codex R2 returned ✅ with 1 substantive P3: the new MPD parity
test was fitting MPD without an explicit `post_periods=` argument, so
MPD's default "last half of periods" rule selected `[3, 4]` on the
4-period fixture while the R generator defines the parity contrast
over `[2, 3, 4]` (per `post_interaction_names` in the JSON). The
Satterthwaite DOFs happened to coincide here (~8.1) which masked the
estimand mismatch — the test would have stayed green if MPD silently
fit the wrong contrast.

Fix: derive `post_periods` from the golden JSON's
`post_interaction_names` field and pass it explicitly to MPD.fit().
The test now asserts that MPD computes `avg_att` over the exact same
contrast vector R uses for the Wald_test DOF target.

This makes the test a genuine estimand-level parity pin rather than
just a DOF-magnitude smell check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

CI Codex review on PR #465 (commit 41a323e) returned ✅ with two
findings:

P2 (Documentation): Two stale doc surfaces post-feature-lift still said
MPD does NOT support cluster + hc2_bm. The HC2/BM scope-limitation Note
at REGISTRY.md:2557 was already updated in 79e0962, but:
- REGISTRY.md:167-176 (main MPD section under HeterogeneousAdoptionDiD
  requirements-checklist) still had the old "not supported" Note.
- estimators.py MPD class docstring's `cluster` and `vcov_type` blocks
  still said `cluster + hc2_bm` raises NotImplementedError.

Both rewritten to describe the new supported path with a pointer to
_compute_cr2_bm_contrast_dof and the clubSandwich Wald_test(HTZ)
parity target. Weighted CR2-BM remains the only documented gate.

P3 (Performance): _compute_cr2_bm_contrast_dof recomputes H, M, and
per-cluster A_g matrices that solve_ols → _compute_cr2_bm already
built for the vcov path. O(n²k) redundant work per clustered hc2_bm
MPD fit; acceptable for typical cluster-robust DiD panel sizes
(n ≤ few thousand). Tracked as a new Performance row in TODO.md;
acknowledged with a Note in REGISTRY.md per the codex deferral rules
(`**Note:**` label + TODO entry downgrades to P3-deferred).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 18, 2026
@igerber igerber merged commit 956445e into main May 18, 2026
33 of 34 checks passed
@igerber igerber deleted the mpd-cluster-hc2-bm-contrast-dof branch May 18, 2026 12:10
HanomicsIMF pushed a commit to HanomicsIMF/diff-diff that referenced this pull request May 22, 2026
… (R1 P0)

Local codex R1 caught a P0: StackedDiD(vcov_type="hc2_bm") computed CR2
vcov correctly but never propagated the Bell-McCaffrey Satterthwaite DOF
into safe_inference() calls. event_study_effects[h]['p_value']/['conf_int']
and overall_p_value/overall_conf_int silently fell back to normal-theory
inference (df=None ⇒ scipy.norm), contradicting the registry contract.

Fix mirrors the SunAbraham aggregated-inference pattern from PR igerber#472
(sun_abraham.py:997-1097). After solve_ols(), if vcov_type=="hc2_bm" and
not on the survey replicate-refit path, build contrast matrix:
  - Per-event-time: unit vector at each interaction_indices[h]
  - Overall ATT: 1/K average across post-period interaction columns
Call _compute_cr2_bm_contrast_dof(X, cluster_ids, bread, contrasts,
weights=composed_weights). Apply per-event-time DOFs to event_study_effects
inference, overall DOF to overall_* inference. Wrap in try/except so any
rank-deficient or linalg failure emits a UserWarning and falls back to
normal-theory (visible deviation, not silent).

R fixture extended with the post-period-average ATT contrast DOF via
Wald_test(constraints=row_avg, vcov=CR2, test="HTZ")$df_denom (mirrors PR
igerber#465's MPD avg_att approach). New goldens at both cluster=unit and
cluster=unit_subexp.

Test additions / strengthening (addresses R1 P2 + P3):
- test_hc2_bm_per_event_dof_matches_coef_test_df_satt_unit_cluster:
  uses brentq inversion of CI half-width to recover the DOF safe_inference
  actually used. If propagation failed (the R1 P0 bug), the inversion
  raises ValueError → test FAILS instead of silently skipping (replaces
  the prior `continue`-on-failure pattern which could vacuously pass).
  Hard-asserts validated_count == len(event_times).
- test_hc2_bm_overall_att_dof_matches_wald_test_htz_unit_cluster (NEW):
  pins overall ATT DOF at atol=1e-6 against R Wald_test(HTZ)$df_denom.
- test_hc2_bm_overall_att_dof_matches_wald_test_htz_unit_subexp_cluster
  (NEW): symmetric coverage at alternate cluster level.
- Renamed CR1 → CR1S throughout docs/tests/REGISTRY for consistency
  (diff-diff's HC1+cluster uses Stata-style G/(G-1)*(n-1)/(n-p); plain
  CR1 omits the (n-1)/(n-p) term and diverges by ~1.4%).

192 tests pass (74 stacked + 19 wls_cr2 + 47 SA + 52 estimators_vcov_type).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HanomicsIMF pushed a commit to HanomicsIMF/diff-diff that referenced this pull request May 22, 2026
Local codex R2 caught a P1: on rank-deficient stacked designs, the BM
contrast DOF block called `_compute_cr2_bm_contrast_dof()` on the singular
full bread matrix, which raised LinAlgError and triggered the catch-and-
fallback path — downgrading the aggregated inference to normal-theory
even when the target event-time delta_h coefficients were still identified.

Fix mirrors the MultiPeriodDiD rank-deficient pattern (PR igerber#465,
estimators.py:1860-1913):
- Derive `_identified = ~np.isnan(coef)` (solve_ols emits NaN for dropped
  columns under R-style rank handling).
- Subset X, bread, and contrasts to the identified-column block BEFORE
  calling `_compute_cr2_bm_contrast_dof`.
- Only build per-event-time contrasts for event-times whose target delta_h
  column is identified; only build the overall ATT contrast when ALL
  post-period delta_h are identified (otherwise the contrast is undefined).
- The remaining try/except is a defensive guard for genuine singularities
  on the identified-column design (rare; the rank-deficient handling
  already removes the problematic columns).

The earlier behavior was too aggressive: a single dropped nuisance column
would knock out BM DOF for ALL contrasts. The new behavior NaN-guards
only contrasts that target a dropped column.

New regression test `test_hc2_bm_rank_deficient_design_keeps_bm_dof_*` —
verifies the fit doesn't emit fallback warnings on a well-conditioned
design + checks that CI half-widths use t(BM DOF) instead of z=1.96
(catches the R1/R2 normal-fallback failure mode end-to-end).

75 tests pass across stacked + methodology_stacked_did.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant