SpilloverDiD: Gardner GMM first-stage correction (Wave D) by igerber · Pull Request #462 · igerber/diff-diff · GitHub
Skip to content

SpilloverDiD: Gardner GMM first-stage correction (Wave D)#462

Merged
igerber merged 4 commits into
mainfrom
spillover-conley-wave-d-gmm-correction
May 17, 2026
Merged

SpilloverDiD: Gardner GMM first-stage correction (Wave D)#462
igerber merged 4 commits into
mainfrom
spillover-conley-wave-d-gmm-correction

Conversation

@igerber

@igerber igerber commented May 17, 2026

Copy link
Copy Markdown
Owner

Summary

  • Adds the Gardner (2022) GMM first-stage uncertainty correction to SpilloverDiD's stage-2 variance across all three vcov paths (HC1, Conley, CR1 via cluster=<col>) for both event_study=False AND event_study=True. Point estimates byte-identical to Wave B/C; SE values shift upward by 1-few percent.
  • Documented synthesis of Butts (2021) §3.1 (IF construction for spillover-aware DiD) + Gardner (2022) §4 (two-stage GMM sandwich) + Conley (1999) (spatial kernel). No reference software combines all three.
  • Closes the documented Wave B/C "SEs biased downward by a few percent" caveat. Closes the Gardner-GMM follow-up row in TODO.md.

Methodology references (required if estimator / math changes)

  • Method name(s): SpilloverDiD stage-2 variance — Gardner GMM first-stage correction (Wave D synthesis)
  • Paper / source link(s):
    • Butts, K. (2023). Difference-in-Differences with Spatial Spillovers. arXiv:2105.03737v3 (§3.1)
    • Gardner, J. (2022). Two-stage differences in differences. arXiv:2207.05943 (§4)
    • Conley, T. G. (1999). GMM Estimation with Cross-Sectional Dependence. JoE 92(1), 1-45
  • Any intentional deviations from the source (and why):
    • vcov_type="classical" raises NotImplementedError upfront — the Wave D synthesis has not been derived for the homoskedastic meat structure (sigma_hat^2 * (X_10' X_10)) and the heteroskedasticity-robust IF outer-product form is the canonical Gardner formulation. Users get a clear remediation pointer to "hc1" / "conley" / cluster=<col>. Documented in REGISTRY.md SpilloverDiD restrictions block and in the fit() docstring.
    • No finite-sample multiplier on the Conley path — preserves conleyreg / Wave B convention. HC1 uses n/(n-p) and CR1 uses G/(G-1) * (n-1)/(n-p).

Validation

  • Tests added/updated: tests/test_spillover.py — new test classes TestSpilloverDiDWaveDGmmCorrectedHc1Hand (hand-derived Psi on a 4-unit × 3-period over-identified panel; atol=1e-12), TestSpilloverDiDWaveDGmmCorrectedEventStudy (vcov shape on event-study path), TestSpilloverDiDWaveDGmmCorrectedNanInferenceContract (rank-deficient column propagation), TestSpilloverDiDWaveDGmmCorrectedValidatorWiring (Conley validator fires from the new helper), TestSpilloverDiDWaveDGmmCorrectedFitIdempotence (clone + repeat-fit bit-identity), TestSpilloverDiDWaveDPublicVarianceContract (end-to-end public cluster=<col> CR1 routing, single-cluster rejection, classical NotImplementedError). Wave B/C SE goldens re-pinned at TestSpilloverDiDEventStudyBackwardCompat (_WAVE_B_GOLDEN_*_WAVE_D_GOLDEN_*; pre-Wave-D references retained as _WAVE_B_UNCORRECTED_* for the directional inflation invariant test_wave_d_se_inflates_relative_to_wave_b_uncorrected).
  • Backtest / simulation / notebook evidence (if applicable): N/A — no tutorial changes. Hand-derivation worksheet at /tmp/wave_d_phase1_handderivation.py (developer-side, not committed) pins the closed-form Psi values that the test fixtures assert against.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

Closes the documented Wave B/C "SEs biased downward by a few percent"
caveat across all three vcov_type paths (HC1, Conley, cluster=<col> for
CR1) for both event_study=False AND event_study=True. Point estimates
byte-identical to Wave B/C; SE values shift upward by 1-few percent.

Documented synthesis of Butts (2021) Section 3.1 (IF construction for
spillover-aware DiD) + Gardner (2022) Section 4 (two-stage GMM sandwich)
+ Conley (1999) (spatial kernel). No reference software combines all
three -- did2s implements GMM without Conley; conleyreg/acreg implement
Conley without two-stage correction. Wave D is the synthesis.

Unified IF outer-product formula:
  psi_i  = gamma_hat' * X_{10,i} * eps_{10,i} - X_{2,i} * eps_{2,i}
  meat   = Psi' @ K @ Psi
  vcov   = (X_2' X_2)^{-1} @ meat @ (X_2' X_2)^{-1}
Kernel K is path-dependent: identity (HC1), block-indicator (cluster),
spatial kernel (Conley). Finite-sample multipliers: n/(n-p) for HC1,
G/(G-1) * (n-1)/(n-p) for cluster CR1, none for Conley.

Public surface:
- No new kwarg -- correction is unconditional.
- Wave D variance mode dispatch derives from the public contract:
    vcov_type="conley"  -> "conley"
    cluster=<col>        -> "cluster" (CR1)
    otherwise            -> "hc1"
- vcov_type="classical" now raises NotImplementedError upfront; the
  Wave D synthesis has not been derived for the homoskedastic meat
  structure (sigma_hat^2 * (X_10' X_10)). REGISTRY restrictions block
  updated to list "classical" alongside "hc2"/"hc2_bm".
- Single-cluster sample raises ValueError ("at least 2 clusters") per
  the standard CR1 rejection (mirrors linalg.py:1942).

Implementation:
- New module-level helper _compute_gmm_corrected_meat in two_stage.py
  (existing _compute_gmm_variance method untouched).
- New module-level helper _build_butts_fe_design_csr in spillover.py.
- _compute_conley_meat factored out of _compute_conley_vcov in conley.py
  so the same kernel-application code path handles both standard
  sandwich (X * residuals) and Wave D IF outer product (Psi).
- SpilloverDiD.fit() bypasses solve_ols's vcov computation, builds the
  fit-sample FE designs + eps_10/eps_2, calls the GMM helper, and
  wraps with the bread sandwich. Rank-deficient column drops handled
  via kept_col_mask + vcov re-inflation with NaN at dropped positions.

Wave B/C SE goldens re-pinned (_WAVE_B_GOLDEN_* -> _WAVE_D_GOLDEN_*);
pre-Wave-D references retained as _WAVE_B_UNCORRECTED_* commented
baselines for the directional inflation invariant.

New test classes:
- TestSpilloverDiDWaveDGmmCorrectedHc1Hand (hand-derived Psi on a
  4-unit x 3-period over-identified panel; atol=1e-12)
- TestSpilloverDiDWaveDGmmCorrectedEventStudy (vcov shape on
  event-study path)
- TestSpilloverDiDWaveDGmmCorrectedNanInferenceContract
  (rank-deficient column propagation)
- TestSpilloverDiDWaveDGmmCorrectedValidatorWiring (Conley validator
  fires from the new helper)
- TestSpilloverDiDWaveDGmmCorrectedFitIdempotence (clone + repeat-fit
  bit-identity)
- TestSpilloverDiDWaveDPublicVarianceContract (end-to-end public
  cluster=<col> CR1 routing, single-cluster rejection, classical
  NotImplementedError)

All 223 existing spillover tests pass; full regression set across
spillover/two_stage/conley_vcov/estimators_vcov_type clean. Closes
the Gardner-GMM follow-up row in TODO.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

When the kept stage-2 design saturates the sample (n_obs == effective_rank
after rank-deficient drops), the HC1 multiplier n/(n-p) and the CR1
multiplier (n-1)/(n-p) are mathematically undefined. The original Wave D
helper used `max(n - p_2, 1)` to clamp the denominator, which silently
fabricated finite multipliers on underdetermined fits — `result.se` and
per-coefficient SEs could stay finite even when only `t_stat`/`p_value`/CI
were NaN-gated via `df_resid=0`. That violates the no-silent-failures
contract.

Fix: when n - p_2 <= 0, return NaN meat with an explicit UserWarning
so the SE surface NaN-propagates consistently with the inference fields.
The Conley path is unaffected (no finite-sample multiplier on that
branch by convention).

Tests: new `test_saturated_design_yields_nan_se_not_finite` in
TestSpilloverDiDWaveDPublicVarianceContract exercises both the HC1 and
CR1 paths on a synthetic n=p_2=4 Psi fixture; asserts NaN meat AND
the saturation warning fires.

Docs: replaced "Wave B MVP limitations" section heading at
docs/api/spillover.rst with "Restrictions and follow-ups" (the
section now describes the shipped Wave D variance + remaining
limitations); updated the SpilloverDiD vs TwoStageDiD comparison table
to label the Conley and cluster rows "(Wave D GMM-corrected sandwich)"
instead of "(via solve_ols at stage 2)".

All 224 spillover tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 4be416a8c6eccdabf6eeca950c8026ce9b91da7b


Overall Assessment

✅ Looks good — this re-review resolves the prior P1/P0 issues, and I did not find any unmitigated diff-scoped P0 or P1 findings.

Executive Summary

  • Affected method: SpilloverDiD Wave D stage-2 variance, i.e. the Gardner GMM first-stage correction across HC1 / Conley / cluster.
  • The prior saturated-design issue is fixed: the new helper now fails closed to NaN when n - p_2 <= 0 on HC1/CR1, instead of fabricating finite SEs on undefined small-sample corrections (diff_diff/two_stage.py:L209-L264, tests/test_spillover.py:L4461-L4507).
  • The earlier public-surface regressions also look fixed: cluster=<col> is routed back to CR1, and vcov_type="classical" now rejects up front with a clear message (diff_diff/spillover.py:L2205-L2226, diff_diff/spillover.py:L2771-L2807, tests/test_spillover.py:L4409-L4526).
  • Against the project’s source material, the Wave D implementation matches the documented Butts/Gardner/Conley synthesis in docs/methodology/REGISTRY.md:L3048-L3097 and the in-code variance docstrings in diff_diff/two_stage.py:L73-L143 and diff_diff/conley.py:L781-L830.
  • P3 only: the user-facing docs are updated incompletely. Some CHANGELOG.md Spillover bullets still describe the old pre-Wave-D variance caveat, and the API page still omits the new vcov_type="classical" restriction (CHANGELOG.md:L16-L17, docs/api/spillover.rst:L221-L226).
  • I could not execute the tests in this review environment because the available python lacks numpy and pytest; this review is by code inspection.

Methodology

  • No findings. The affected method is clearly identified and scoped: SpilloverDiD Wave D variance in diff_diff/two_stage.py:L56-L290 and diff_diff/spillover.py:L2692-L2836. The implementation aligns with the registry’s documented synthesis and documented deviations in docs/methodology/REGISTRY.md:L3048-L3097, including HC1/CR1 finite-sample multipliers, no Conley multiplier, and the explicit vcov_type="classical" rejection.

Code Quality

  • No findings. The refactor cleanly separates kernel application into diff_diff/conley.py:L768-L1050 and keeps the public fit-path logic readable despite the new variance routing.

Performance

  • No findings. The Conley refactor preserves the existing sparse/dense dispatch and reuses the same kernel code path rather than duplicating it.

Maintainability

  • No findings. The new helper boundaries are reasonable: _compute_gmm_corrected_meat owns the IF meat, _compute_conley_meat owns kernel application, and SpilloverDiD.fit() owns public-path dispatch.

Tech Debt

  • No findings. The main Wave D follow-up is properly closed out of TODO.md, and I did not see new untracked correctness debt introduced by this diff.

Security

  • No findings.

Documentation/Tests

  • Severity: P3
    Impact: The documentation is internally inconsistent after Wave D. In the same Unreleased changelog, older Spillover bullets still say event-study/base Spillover SEs omit the Gardner correction, while the code and registry now say Wave D ships that correction. Separately, the API page still documents the HC2/HC2_BM restriction but not the new vcov_type="classical" hard rejection (CHANGELOG.md:L16-L17, docs/api/spillover.rst:L221-L226, diff_diff/spillover.py:L2205-L2226, docs/methodology/REGISTRY.md:L3048-L3092).
    Concrete fix: Update the stale Spillover CHANGELOG.md bullets to say their variance caveat is superseded by Wave D, and add a vcov_type="classical" restriction note to docs/api/spillover.rst alongside the HC2/HC2_BM note.

…lassical restriction to api/spillover.rst

Pre-this-commit, the [Unreleased] block contained internally inconsistent
SpilloverDiD bullets: the Wave B and Wave C entries (added first) each
said the Gardner GMM correction was "NOT applied" / "planned follow-up",
while the Wave D entry (added later in the same release) shipped that
exact correction. Update the Wave B and Wave C variance subsections to
note explicitly that they are "Superseded by the Wave D Gardner GMM
first-stage correction in this same release", cross-referencing the
Wave D bullet.

Also: docs/api/spillover.rst documented the HC2/HC2_BM
NotImplementedError but omitted the new vcov_type="classical"
restriction. Add a classical bullet alongside the HC2/HC2_BM one with
the documented synthesis pointer to hc1 / conley / cluster.

Docs-only change; no source / test edits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 82502a4e2cff9900d6f3790e001ea9560bdd0588


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • Affected method: SpilloverDiD Wave D stage-2 variance, i.e. the Gardner GMM first-stage correction across HC1 / Conley / cluster. Cross-checking the implementation against the Methodology Registry and the new helper docstrings did not reveal an unmitigated methodology mismatch; the IF formula, HC1/CR1 small-sample multipliers, Conley no-multiplier choice, and vcov_type="classical" rejection all match the documented contract (docs/methodology/REGISTRY.md:L3048-L3097, diff_diff/two_stage.py:L56-L290, diff_diff/spillover.py:L2098-L2226, diff_diff/conley.py:L768-L1056).
  • The prior re-review docs/API issue is mostly resolved: docs/api/spillover.rst now documents both the shipped Wave D correction and the new vcov_type="classical" restriction (docs/api/spillover.rst:L177-L235).
  • Severity: P2. The new FE-design builder preserves pre-mask unit/time codes after finite_mask, so supported warn-and-drop fits can create all-zero FE columns and force the GMM stage onto the dense XtX_10.toarray() fallback; that is not a correctness problem on small panels, but it can become a large-memory path on bigger ones (diff_diff/spillover.py:L1500-L1564, diff_diff/spillover.py:L2719-L2751, diff_diff/two_stage.py:L165-L185).
  • Severity: P3. CHANGELOG.md still contradicts the new Wave C/Wave D status by listing event_study=True and the Gardner correction as deferred in the older Spillover bullet, and diff_diff/guides/llms-full.txt still frames shipped Wave C/D functionality under a “Wave B MVP — planned follow-ups” heading (CHANGELOG.md:L16-L17, diff_diff/guides/llms-full.txt:L501-L506).
  • I could not execute the test suite here because the available python lacks numpy and pytest; this review is by code inspection.

Methodology

No findings. The Wave D variance path matches the documented Butts/Gardner/Conley synthesis, including the explicitly documented deviations for Conley finite-sample adjustment and vcov_type="classical" rejection (docs/methodology/REGISTRY.md:L3048-L3097, diff_diff/two_stage.py:L56-L290, diff_diff/spillover.py:L2098-L2226).

Code Quality

No findings.

Performance

  • Severity: P2. Impact: after finite_mask, unit_codes_fit / time_codes_fit are sliced but not re-factorized before _build_butts_fe_design_csr(). If a supported warn-and-drop fit removes an interior unit code, the helper materializes an all-zero FE column; that makes X_10'X_10 exactly singular and deterministically sends _compute_gmm_corrected_meat() into the dense np.linalg.lstsq(XtX_10.toarray(), ...) fallback. On larger panels this can turn a supported fit into a large dense allocation or OOM even though the remaining estimation sample is identified. Concrete fix: compact the post-mask unit/time codes before building X_1 / X_10 (or have _build_butts_fe_design_csr() drop unused codes/columns internally). Refs: diff_diff/spillover.py:L1500-L1564, diff_diff/spillover.py:L2719-L2751, diff_diff/two_stage.py:L165-L185.

Maintainability

No findings.

Tech Debt

No findings. The prior Wave D follow-up row is correctly removed from TODO.md, and I did not see new untracked correctness debt in the diff.

Security

No findings.

Documentation/Tests

  • Severity: P3. Impact: the API doc is updated, but the release notes are still internally inconsistent because the older SpilloverDiD changelog bullet continues to list event_study=True and the Gardner GMM correction as deferred features immediately after the new Wave C/Wave D bullets say they shipped. The internal LLM guide also still labels shipped Wave C/D behavior as “Wave B MVP — planned follow-ups.” Concrete fix: remove or annotate the superseded deferred-feature text in CHANGELOG.md, and retitle/split the corresponding section in diff_diff/guides/llms-full.txt so only genuinely pending items remain under follow-ups. Refs: CHANGELOG.md:L16-L17, diff_diff/guides/llms-full.txt:L501-L506.
  • Test execution: not run in this environment because the available python lacks numpy and pytest.

…ollow-up lists

P2 Performance: when callers pass pre-mask integer codes that have had
interior values dropped via finite_mask (a supported warn-and-drop fit),
the code arrays are sparse — e.g. unit_codes = [0, 1, 3, 4] with code 2
dropped. Building X_10 on the raw codes materialized an all-zero FE
column at index 2, forcing sparse_factorized onto the dense
lstsq/XtX_10.toarray() fallback unnecessarily (large-memory path on big
panels). Fix: re-factorize via pd.factorize at the top of
_build_butts_fe_design_csr to compact the code space to 0..n_unique-1
(no-op when codes are already contiguous). Mirrors the column-space
convention of TwoStageDiD._build_fe_design.

P3 Docs: reconcile two stale follow-up lists where shipped Wave C/D
items still appeared as "planned":

- CHANGELOG Wave B bullet listed event_study=True and Gardner GMM
  correction under "Deferred features (planned follow-ups)" alongside
  genuinely-pending items (covariates, survey_design, ring_method,
  d_bar selection, sparse path). Add "as of Wave B ship-time" qualifier
  + "Shipped in same release: ..." note pointing at the Wave C/D
  bullets above.
- llms-full.txt heading "Restrictions (Wave B MVP — planned
  follow-ups)" was misleading since items below included a mix of
  shipped (event_study, GMM correction) and pending features. Retitle
  to "Restrictions and Wave C/D status" and add the vcov_type=classical
  Wave D restriction bullet alongside the existing covariates /
  survey_design restrictions.

All 224 spillover tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 17, 2026
@igerber igerber merged commit 1dc6a59 into main May 17, 2026
33 of 34 checks passed
@igerber igerber deleted the spillover-conley-wave-d-gmm-correction branch May 17, 2026 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant