Add LPDiD complex-survey R-parity validation vs survey::svyglm (Phase D2) by igerber · Pull Request #591 · igerber/diff-diff · GitHub
Skip to content

Add LPDiD complex-survey R-parity validation vs survey::svyglm (Phase D2)#591

Merged
igerber merged 1 commit into
mainfrom
docs/lpdid-survey-parity
Jul 1, 2026
Merged

Add LPDiD complex-survey R-parity validation vs survey::svyglm (Phase D2)#591
igerber merged 1 commit into
mainfrom
docs/lpdid-survey-parity

Conversation

@igerber

@igerber igerber commented Jun 30, 2026

Copy link
Copy Markdown
Owner

Summary

  • Phase D2 of the LP-DiD salvage initiative: R-parity validation of the already-merged D1 LPDiD complex-survey path against survey::svyglm (Lumley). Verify-and-document only — no diff_diff/ change. Closes Phase D.
  • New benchmarks/R/generate_lpdid_survey_golden.R + committed lpdid_survey_panel.csv / lpdid_survey_golden.json (own seed, so the absorbing / non-absorbing goldens stay byte-identical).
  • New TestLPDiDSurveyParityR pins three structurally-distinct survey variance paths end-to-end through LPDiD.fit(survey_design=...), each per-horizon (point/SE/df) + pooled-post/pre:
    • VW full design (strata + PSU + FPC) vs svyglm(ids=~psu, strata, fpc, weights); df = n_PSU - n_strata (= 10).
    • inject weights-only, unit injected as PSU vs svyglm(ids=~unit, weights); df = n_PSU - 1 (= 139).
    • covariate direct inclusion vs svyglm(Dy ~ tr + x + factor(time), ...).
  • Plus an FPC-shrinks-SE ordering check (both SEs svyglm-pinned) and a weighting-flows guard.
  • Each LP-DiD horizon uses a fresh svydesign over that horizon's clean long-difference sample (matching the library's per-sample survey_design.resolve(sample), not subset() of a full design). svyglm is the reference implementation of the Binder TSL sandwich, so it anchors the variance directly; the clean-sample construction is independently cross-checked against alexCardazzi/lpdid (the unweighted VW event study matches to <1e-8), and every svyglm WLS point is gated == weighted feols on the same sample (<1e-7).
  • REGISTRY ## LPDiD Deviation Add Synthetic Difference-in-Differences (SDID) estimator #8 + checklist upgraded to "validated vs survey::svyglm"; CHANGELOG entry; D2 TODO row closed.

Methodology references

  • Method name(s): LPDiD complex-survey variance (Binder 1983 stratified-PSU Taylor-linearization sandwich). No estimator or default-behavior change — this PR validates the merged D1 path.
  • Paper / source link(s): Dube, Girardi, Jordà & Taylor (2025), A Local Projections Approach to Difference-in-Differences. Survey-variance reference: survey::svyglm (Lumley 2004).
  • Any intentional deviations from the source (and why): None new. svyglm is itself the reference TSL implementation, so it anchors the variance directly (no third-party survey package gate needed). Documented in docs/methodology/REGISTRY.md ## LPDiD Deviation Add Synthetic Difference-in-Differences (SDID) estimator #8.

Validation

  • Tests added/updated: tests/test_methodology_lpdid.py (TestLPDiDSurveyParityR: VW ES+pooled, weights-only inject, direct-covariate, FPC-shrinks-SE, weighting-flows). Tolerances: point abs=1e-6, SE rtol=1e-5/abs=1e-7, df exact. 17/17 pure-Python + 5/5 Rust green; existing LPDiD parity classes unchanged.
  • Backtest / simulation / notebook evidence: generator fail-closed gates — alexCardazzi clean-sample cross-check (matched to 0.00e+00), feols == svyglm point gate, no-lonely-PSU guard. Existing lpdid_test_panel.csv / lpdid_golden.json / lpdid_nonabsorbing_* confirmed byte-identical.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes (synthetic benchmark data only).

Generated with Claude Code

@github-actions

Copy link
Copy Markdown

@igerber igerber added the ready-for-ci Triggers CI test workflows label Jun 30, 2026
…(Phase D2)

Pin the merged D1 LPDiD survey path end-to-end against survey::svyglm (Lumley)
goldens on a dedicated staggered-absorbing survey panel. Closes Phase D of the
LP-DiD salvage initiative. No diff_diff/ change (verify-and-document).

Three structurally-distinct survey variance paths are R-validated per-horizon
(point/SE/df) + pooled-post/pre:
  * VW full design (strata + PSU + FPC) vs svyglm(ids=~psu, strata, fpc, weights);
    df = n_PSU - n_strata = 10.
  * weights-only, unit injected as PSU vs svyglm(ids=~unit, weights); df = n_PSU - 1.
  * direct covariate inclusion vs svyglm(Dy ~ tr + x + factor(time)).

Each LP-DiD horizon uses a FRESH svydesign over that horizon's clean long-difference
sample (matching the library's per-sample resolution, not subset() of a full design).
svyglm is the reference implementation of the Binder TSL sandwich, so it anchors the
variance directly; the clean-sample construction is independently cross-checked against
alexCardazzi/lpdid (unweighted VW event study matches to <1e-8), and every svyglm WLS
point is gated == weighted feols on the same sample (<1e-7).

Tolerances: point abs=1e-6, SE rtol=1e-5/abs=1e-7, df exact. New
TestLPDiDSurveyParityR + generate_lpdid_survey_golden.R + lpdid_survey_panel.csv /
lpdid_survey_golden.json (own seed -> absorbing/non-absorbing goldens byte-identical).
REGISTRY Deviation #8 + checklist upgraded to "validated vs svyglm"; CHANGELOG; TODO row closed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@igerber igerber force-pushed the docs/lpdid-survey-parity branch from e6a0d96 to ad053ff Compare June 30, 2026 22:42
@github-actions

Copy link
Copy Markdown

@igerber igerber merged commit 6596d1b into main Jul 1, 2026
25 checks passed
@igerber igerber deleted the docs/lpdid-survey-parity branch July 1, 2026 00:24
@igerber igerber mentioned this pull request Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant