TwoStageDiD: thread vcov_type as narrow {hc1} contract (Phase 1b interstitial #5, final) by igerber · Pull Request #498 · igerber/diff-diff · GitHub
Skip to content

TwoStageDiD: thread vcov_type as narrow {hc1} contract (Phase 1b interstitial #5, final)#498

Merged
igerber merged 4 commits into
mainfrom
feature/two-stage-did-vcov-type-phase1b
May 30, 2026
Merged

TwoStageDiD: thread vcov_type as narrow {hc1} contract (Phase 1b interstitial #5, final)#498
igerber merged 4 commits into
mainfrom
feature/two-stage-did-vcov-type-phase1b

Conversation

@igerber

@igerber igerber commented May 30, 2026

Copy link
Copy Markdown
Owner

Summary

  • Thread vcov_type="hc1" through TwoStageDiD (Phase 1b interstitial Add comprehensive code review for diff-diff library #5 — the final standalone estimator, completing the initiative across all 8). Accept only {"hc1"}; reject {classical, hc2, hc2_bm, conley} with GMM-meat-specific messages. Adds vcov_type / cluster_name / n_clusters + to_dict() to TwoStageDiDResults; summary() renders the unit-cluster CR1 label with bootstrap + survey suppression gates. Defensive bootstrap n_clusters<2 / n_psu<2 NaN guard; cluster= + replicate-weights raises NotImplementedError.
  • Report the post-drop effective cluster count in metadata (always-treated units are excluded before estimation), counted via np.unique(df[cluster_var]) so it matches the variance exactly — including always-treated drops and NaN cluster IDs.
  • Expose vcov_type explicitly on the two_stage_did() convenience wrapper (was hidden behind **kwargs), matching the sibling wrappers.
  • Docs: REGISTRY IF-vs-sandwich taxonomy → N=5 + TwoStageDiD **Note**; llms-full.txt signature; both checked-in autosummary RSTs; CHANGELOG; TODO (initiative marked complete + conley follow-up row).

Methodology references

  • Method name(s): TwoStageDiD — Gardner (2022) two-stage DiD, GMM cluster-sandwich (did2s).
  • Paper / source link(s): Gardner (2022) arXiv:2207.05943; Butts & Gardner (2022) did2s, R Journal 14(1).
  • Intentional deviations: vcov_type permanently narrow to {"hc1"} — the GMM-corrected meat S_g = γ̂'c_g − X_2g'ε_2g folds first-stage FE uncertainty into the score, so no single hat matrix spans both stages on which HC2 leverage / Bell-McCaffrey DOF can be defined (documented in REGISTRY.md IF-vs-sandwich taxonomy + TwoStageDiD section, mirroring the SpilloverDiD classical rejection). did2s no-FSA convention: the CR1 family label carries no (n-1)/(n-p) factor (**Note (deviation from R):**). vcov_type="conley" deferred (TODO.md).

Validation

  • Tests added/updated: tests/test_two_stage.py::TestTwoStageDiDVcovType (36 tests) — invalid-vcov rejection, hc1 no-op bit-equality across analytical / cluster / TSL-survey / replicate-survey / bootstrap paths (parametrized over aggregate modes), degenerate-bootstrap full public NaN propagation, post-drop + NaN-cluster metadata parity, summary labels, set_params fit-time revalidation, cluster=+replicate rejection, convenience-fn threading.
  • Evidence: ATT/SE bit-identical vs pre-PR baseline across default / cluster / bootstrap (incl. per-horizon + per-group SE). 4 local Codex rounds (full standard mode) converged clean (no actionable findings).

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

igerber and others added 4 commits May 30, 2026 07:23
…terstitial #5, final)

TwoStageDiD's variance is the Gardner/did2s two-stage GMM cluster-sandwich
(always clusters; default at unit) — a structural twin of ImputationDiD, NOT
the GMM×HC2-BM beast the tracker described (that was SpilloverDiD's helper).

Add vcov_type="hc1" accepting only {hc1}; reject {classical,hc2,hc2_bm} (the
GMM-corrected meat S_g = gamma_hat' c_g - X_2g' eps_2g folds in first-stage
uncertainty, so no single hat matrix spans both stages for HC2 leverage /
BM-DOF) and conley (deferred). Results gains vcov_type/cluster_name/n_clusters
+ to_dict(); summary() renders the unit-cluster CR1 label with bootstrap +
survey suppression gates. Bootstrap n_clusters<2 NaN guard (load-bearing,
post-drop perturbation count) + survey n_psu<2 defense. cluster= + replicate
weights raises NotImplementedError.

Docs: REGISTRY taxonomy -> N=5 + TwoStageDiD Note, llms-full signature, both
autosummary RSTs, CHANGELOG, TODO (initiative complete + conley follow-up).
34 new tests; ATT/SE bit-identical vs baseline across default/cluster/bootstrap.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… P2)

n_clusters / cluster_name were derived from the full input `data`, but the GMM
sandwich computes variance over `cluster_ids=df[cluster_var]` on the POST-DROP
fit sample (always-treated units are removed before estimation). When an
always-treated unit/cluster is excluded, reporting the full-input count
overstates the effective G the SE is based on. Count clusters on `df` instead,
matching the variance. Survey suppression (cluster_name=None) is unchanged;
the Wave E.3 full-domain survey accounting is a separate, intentional path.

Adds a regression test asserting n_clusters equals the post-drop count when an
always-treated cohort is dropped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ct tests (codex P3)

- two_stage_did(): expose vcov_type="hc1" explicitly (was hidden behind **kwargs)
  and forward it, matching the imputation_did/efficient_did sibling wrappers — the
  convenience API surface, generated signature, and IDE help now show the param.
- Degenerate-bootstrap tests now assert the FULL public NaN-propagation contract
  (overall t_stat/p_value/conf_int + every event-study/group inference field) via
  a shared _assert_full_bootstrap_nan helper, not just overall_se, so a partial
  regression in _build_nan_bootstrap_results can't slip through.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…(codex P2)

n_clusters used Series.nunique() (drops NaN), but the GMM sandwich counts
np.unique(cluster_ids) (keeps a single NaN group). A non-survey cluster= column
with missing IDs would make the reported G undercount the SE's actual cluster
count. Count clusters the same way the variance does — np.unique(df[cluster_var])
— which also consolidates the two non-survey branches and still excludes
always-treated-dropped units (df, not data). Adds a NaN-cluster regression test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

@igerber

igerber commented May 30, 2026

Copy link
Copy Markdown
Owner Author

/ai-review

@github-actions

Copy link
Copy Markdown

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 30, 2026
@igerber igerber merged commit b3dc8d0 into main May 30, 2026
33 of 34 checks passed
igerber added a commit that referenced this pull request May 30, 2026
…ed stale rows

Cleanup of TODO.md now that the vcov_type threading initiative is complete (all 8
standalone estimators merged, TwoStageDiD #498 last). TODO.md only — no methodology
or source changes.

Compact prune of the completed initiative's leftovers:
- the `| Done |` umbrella row + its orphan blank lines (rejoins the methodology
  table to its header), the Tier B threading bullet, and the stale duplicate
  TwoStageDiD-Conley row whose "`__init__` lacks `vcov_type`" premise is false post-#498
- the "Rows 104-105 LIFTED" comment block, the two ~~LIFTED~~ weighted-BM rows, and
  the Tier C LIFTED bullet (clubSandwich WLS-CR2 port, #475)
- two resolved-marker HTML comments (WooldridgeDiD cohort_share; PreTrendsPower)
- rewrote the Standard Error Consistency prose to "complete" and repointed its
  weighted-CR2 gate at the open multi-absorb row

Staleness audit of the ~50 remaining follow-up rows (5 subagents; every finding
re-verified against current source before acting -- the vast majority are genuine
open deferrals):
- removed the `bias_corrected_local_linear: weights=` row (shipped; residual
  unweighted-DPI gap already tracked by the sibling row) and narrowed the Tier D
  lprobust bullet's stale `weights` -> `weight-aware auto-bandwidth DPI`
- removed the `compute_survey_metadata`/`raw_w_for_meta` dedup row (done via the
  shared `survey._resolve_survey_for_fit` helper)
- tightened the HAD Phase-4.5-C survey-aware-pretests row: dropped the shipped
  pweight+PSU+FPC+strata narration, kept the two open items (replicate-weight
  designs; lonely_psu='adjust'+singleton on the Stute family)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request May 30, 2026
…ed stale rows

Cleanup of TODO.md now that the vcov_type threading initiative is complete (all 8
standalone estimators merged, TwoStageDiD #498 last). TODO.md only — no methodology
or source changes.

Compact prune of the completed initiative's leftovers:
- the `| Done |` umbrella row + its orphan blank lines (rejoins the methodology
  table to its header), the Tier B threading bullet, and the stale duplicate
  TwoStageDiD-Conley row whose "`__init__` lacks `vcov_type`" premise is false post-#498
- the "Rows 104-105 LIFTED" comment block, the two ~~LIFTED~~ weighted-BM rows, and
  the Tier C LIFTED bullet (clubSandwich WLS-CR2 port, #475)
- two resolved-marker HTML comments (WooldridgeDiD cohort_share; PreTrendsPower)
- rewrote the Standard Error Consistency prose to "complete" and repointed its
  weighted-CR2 gate at the open multi-absorb row

Staleness audit of the ~50 remaining follow-up rows (5 subagents; every finding
re-verified against current source before acting -- the vast majority are genuine
open deferrals):
- removed the `bias_corrected_local_linear: weights=` row (shipped; residual
  unweighted-DPI gap already tracked by the sibling row) and narrowed the Tier D
  lprobust bullet's stale `weights` -> `weight-aware auto-bandwidth DPI`
- removed the `compute_survey_metadata`/`raw_w_for_meta` dedup row (done via the
  shared `survey._resolve_survey_for_fit` helper)
- tightened the HAD Phase-4.5-C survey-aware-pretests row: dropped the shipped
  pweight+PSU+FPC+strata narration, kept the two open items (replicate-weight
  designs; lonely_psu='adjust'+singleton on the Stute family)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@igerber igerber deleted the feature/two-stage-did-vcov-type-phase1b branch May 31, 2026 18:42
@igerber igerber mentioned this pull request Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant