Comparing v3.6.0...v3.6.1 · igerber/diff-diff · GitHub
Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: igerber/diff-diff
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v3.6.0
Choose a base ref
...
head repository: igerber/diff-diff
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v3.6.1
Choose a head ref
  • 12 commits
  • 60 files changed
  • 3 contributors

Commits on Jun 29, 2026

  1. Configuration menu
    Copy the full SHA
    491b8d5 View commit details
    Browse the repository at this point in the history
  2. test(lpdid): add R-parity validation harness (Dube et al. 2025), Phas…

    …e B2 (#583)
    
    Pin the absorbing LPDiD estimator against the method authors' own R recipes
    (danielegirardi/lpdid) with an alexCardazzi/lpdid cross-check gate:
    - benchmarks/R/generate_lpdid_golden.R: in-R panel (+ interior-gap unit) and 6
      variants (variance-weighted, reweight, pmd, direct-covariate, pooled,
      RA-point); writes committed lpdid_test_panel.csv + lpdid_golden.json.
    - tests/test_methodology_lpdid.py: skip-guarded parity (att/se to ~1e-12,
      cross-platform asserted at 1e-6/1e-7).
    - benchmarks/python/coverage_lpdid_ra.py + lpdid_ra_coverage.json: ungated
      Monte-Carlo study validating the RA influence-function SE calibration (~0.95).
    
    Resolves the two provisional REGISTRY deviation notes in the library's favour
    with no estimator change: the RA SE matches the Stata teffects convention
    (point-anchored, SE pinned + coverage-validated; no R-package analogue), and the
    pooled estimand matches the authors' fixed-composition recipe (correcting the
    prior "horizon-stacked" wording). no_composition documented as more paper-faithful
    than the R packages (B1-tested). Ticks the REGISTRY B2 checklist box.
    
    Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    igerber and claude authored Jun 29, 2026
    Configuration menu
    Copy the full SHA
    48e1f4c View commit details
    Browse the repository at this point in the history
  3. feat(staggered): materialize non-estimable (g,t) cells as NaN in Call…

    …awaySantAnna (#582)
    
    * feat(staggered): materialize non-estimable (g,t) cells as NaN in CallawaySantAnna
    
    Uniformly materialize a NaN entry (with a machine-readable skip_reason) for every
    non-estimable (g,t) group-time cell across all CS estimation paths (no-covariate
    regression, covariate regression, IPW/DR, repeated cross-section, survey-weighted)
    instead of omitting it. Previously only the covariate-singular case materialized
    NaN; the other paths dropped the cell silently from the grid.
    
    Cells carry no influence-function entry, so they are excluded from every
    aggregation (simple/group/calendar/event-study), balance_e, and bootstrap -- all
    aggregate point estimates and SEs, plus event-study n_groups / by-group n_periods,
    are numerically unchanged and continue to match R did's aggte(). A fit where no
    cell is estimable still raises ValueError. to_dataframe("group_time") now includes
    the NaN rows and a skip_reason column.
    
    Documented per-cell surface deviation from R's att_gt (which omits the rows).
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    
    * fix(staggered): uniform no-IF NaN cells + cover covariate paths (review #582)
    
    - The covariate-regression non-finite cell now materializes via _nan_gt_entry
      with NO influence-function entry, matching the other paths and the documented
      REGISTRY/helper contract (previously it wrote a zero-IF entry and ran batch
      inference). Aggregates and SEs are unchanged -- the cell was finite-masked /
      IF-membership-filtered out either way; now the "NaN cells carry no IF entry"
      invariant holds uniformly across all paths.
    - Extend the materialization test to cover covariate IPW/DR (panel + RCS) paths.
    - Remove the now-implemented CallawaySantAnna NaN-cell row from TODO.md.
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    
    * fix(staggered): treat non-finite ATT(g,t) as non-estimable in general/RCS paths (review #582 P1)
    
    The general (IPW/DR) and RCS estimable builders treated `att_gt is not None` as
    estimable even when att_gt was non-finite (NaN/inf): they stored effect=att_gt
    (which could surface inf), kept the influence-function entry, and did not count
    the cell in the consolidated skip total. Both now branch on
    `att_gt is None or not np.isfinite(att_gt)` first and materialize via
    _nan_gt_entry(skip_reason="non_finite_regression") with NO IF entry, so the
    documented contract (non-estimable cells are NaN entries, carry no IF, excluded
    from the bootstrap) holds uniformly across every path. Aggregate estimates and
    SEs are unchanged (these cells were finite-masked / IF-filtered out either way).
    Adds a regression test mimicking the inf-ATT-with-IF leak.
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    
    * fix(staggered): guard non-finite ATT in the no-covariate path too (review #582 P0)
    
    Completes the uniform invariant: every CallawaySantAnna estimable-cell builder
    (no-covariate vectorized, covariate regression, general IPW/DR, RCS) now routes a
    non-finite ATT(g,t) to _nan_gt_entry(skip_reason="non_finite_regression") with NO
    influence-function entry, skipping batch inference and bootstrap membership. The
    no-covariate diff-in-means ATT is finite given n_t,n_c>0, but a non-finite outcome
    (inf survives the NaN-only valid mask) could otherwise store inf as the effect and
    produce t_stat=inf / p=0 / infinite CI via safe_inference_batch. Aggregate
    estimates and SEs are unchanged. Adds a regression test injecting an inf outcome
    through the no-covariate path.
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    
    * fix(staggered): omit all-non-estimable relative-time buckets from event study (review #582 P1)
    
    _aggregate_event_study() appended an all-NaN row (effect=NaN, se=NaN, n_groups=0)
    for a relative-time bucket whose cells are all non-estimable, instead of omitting
    the bucket. With NaN-cell materialization this surfaced new all-NaN event-study
    rows where the bucket previously had no cells (and thus no row) -- an aggregate-
    surface change vs the prior omit behavior and R did::aggte(). The bucket is now
    dropped when finite-filtering leaves no cell (tracked via a kept-periods list so
    the result lists stay aligned), matching _aggregate_by_group, which already omits
    all-NaN groups. Adds a test asserting an all-non-estimable relative time is absent
    from event_study_effects.
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    
    * docs(staggered): drop nonexistent "calendar" aggregation from CS NaN-cell notes (review #582 P3)
    
    CallawaySantAnna's aggregation options are simple / event_study / group / all
    (there is no calendar aggregation). Remove "calendar" from the REGISTRY edge-case
    Note and the CHANGELOG entry listing which aggregations exclude materialized NaN
    cells.
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    
    * fix(staggered): report real treated/control counts on non-estimable cells (review #582 P2)
    
    The per-cell helper sentinel returns (and the covariate-reg empty-control batch
    site) hardcoded n_treated=n_control=0 for materialized NaN cells even after the
    observation masks had been computed, so group_time_effects / to_dataframe could
    show zero counts for a cell that actually had treated (or control) observations.
    The zero-control / zero-weight exits in _compute_att_gt_fast and _compute_att_gt_rc,
    and the covariate-reg empty-control batch site, now return the observed counts;
    missing-period exits (masks not yet built) keep 0. Display-only metadata --
    estimates, SEs, and aggregation are unchanged.
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    
    * test(staggered): handle materialized NaN cells in ported CS tests
    
    Three test_csdid_ported.py tests relied on non-estimable (g,t) cells being
    OMITTED from group_time_effects; with the new NaN-cell materialization those
    cells are present as NaN, so the tests' membership / golden-iteration guards no
    longer skip them. Update them to preserve their intent on the FINITE cells:
    - test_some_units_treated_first_period: a first-period cohort (no base period) is
      now all-NaN (missing_period) rather than absent -> assert it is all-NaN.
    - test_zero_pretreatment_outcomes: skip NaN pre-cells (the last cohort under
      not_yet_treated has no controls); finite pre-cells are still ~0.
    - test_golden_fewer_periods: skip NaN cells (gapped panel where base g-1 is
      unobserved -> missing_period; R falls back to an available base) -> R-parity on
      the finite cells.
    
    No source change; the cells are correctly non-estimable, only now visible.
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    
    ---------
    
    Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    igerber and claude authored Jun 29, 2026
    Configuration menu
    Copy the full SHA
    81f4e84 View commit details
    Browse the repository at this point in the history
  4. Add LPDiD non-absorbing treatment (entry-effect estimands), Phase C1 (#…

    …584)
    
    * feat(lpdid): non-absorbing treatment (entry-effect estimands), Phase C1
    
    Implement Dube, Girardi, Jorda & Taylor (2025) Section 4.2 non-absorbing
    treatment for LPDiD via a new `non_absorbing` parameter:
    
    - "first_entry" (Eq. 12): effect of entering treatment for the first time and
      staying treated; reuses the absorbing clean control, restricts only the
      treated set. Bit-identical to the absorbing path on absorbing panels.
    - "effect_stabilization" (Eq. 13, `stabilization_window=L`): units whose
      treatment has been stable for >= L periods serve as clean controls, so
      estimation is feasible with few/no never-treated units.
    
    Default `non_absorbing=None` is unchanged (absorbing path, still rejects
    non-absorbing input). Mode-aware clean-sample masks evaluate window conditions
    via cumulative treatment-change/level lookups with a documented "untreated
    before the first observed period" boundary convention; placebo horizons use the
    full pre-span window so pre-trends are uncontaminated; a per-horizon clean-
    treated indicator threads through the estimator / RA / reweight / pooled paths
    so re-entry events are classified correctly. Non-absorbing modes require a
    gap-free panel within each unit's observed span.
    
    Pure-Python validation (tests/test_lpdid.py::TestLPDiDNonAbsorbing): absorbing
    reduction, single-cohort reduction, re-entry mechanism, boundary retention,
    negative-horizon placebos, non-negative weighting, stabilized-control
    admission, equal-weight recovery, and DGP recovery; absorbing tests + R-parity
    goldens unchanged. Exit-event dynamics, R-package parity (PR-C2), and
    survey-design support are tracked follow-ups.
    
    * fix(lpdid): non-absorbing pooled-pre uses deepest reach-back horizon
    
    Codex P1: `_build_pooled_sample(kind="pre")` passed horizon=0 to the
    non-absorbing masks, so the effect_stabilization clean window only covered
    [t-L, t] instead of the pooled-pre reach-back to the most-negative horizon
    ([t - max(L, -h), t-1]). A unit with a prior treated spell at t-3 (clean at
    t-1) leaked into a [-3, -2] pooled-pre placebo and biased it. Pre windows now
    use min(horizons); the absorbing branch keeps horizon=0 (not-yet-treated at t
    already implies a clean pre-span, so its R-parity goldens are unchanged).
    
    Adds a deterministic regression test (spell entrants excluded from the
    pooled-pre sample; verified to fail at 0.286 before the fix).
    igerber authored Jun 29, 2026
    Configuration menu
    Copy the full SHA
    4f1a0a3 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    8b91688 View commit details
    Browse the repository at this point in the history
  6. feat(estimators): iterative alternating-projection demeaning for N-wa…

    …y absorbed FE (#586)
    
    N>1 absorbed fixed effects used single-pass sequential demeaning, which is the
    exact (weighted) Frisch-Waugh-Lovell residualization only on balanced
    orthogonal-FE panels; on unbalanced panels it was a biased approximation
    (coefficients off by ~1e-2 in tested cases).
    
    Add an N-way method-of-alternating-projections engine demean_by_groups() in
    utils.py; route the DiD/MultiPeriodDiD absorb= paths and the shared two-way
    within_transform() through it, fixing TWFE / SunAbraham / BaconDecomposition on
    unbalanced unweighted panels too. Lift the weighted-multi-absorb rejection
    (now supported via weighted MAP).
    
    Single-absorb and balanced-panel results are byte-stable; the weighted
    within_transform output is bit-identical; R-parity goldens unchanged.
    
    Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    igerber and claude authored Jun 29, 2026
    Configuration menu
    Copy the full SHA
    6126f9b View commit details
    Browse the repository at this point in the history

Commits on Jun 30, 2026

  1. Configuration menu
    Copy the full SHA
    21f0c30 View commit details
    Browse the repository at this point in the history
  2. Add LPDiD complex-survey-design support (Phase D1) (#590)

    * Add LPDiD complex-survey-design support (Phase D1)
    
    Adds a `survey_design=` argument to `LPDiD.fit()` (a `SurveyDesign` with
    probability weights + optional strata/PSU/FPC), matching the library-wide
    fit()-time convention. On the variance-weighted default path each horizon's
    long-difference regression is fit by WLS on the survey weights, and the SE is
    the stratified-PSU Taylor-linearization (Binder TSL) sandwich with
    `df = n_PSU - n_strata`, reusing `diff_diff/survey.py` (`compute_survey_vcov`).
    
    The design is re-resolved on each realized (post-clean-control) sample so
    weights/strata/PSU align with the regression rows; with no explicit PSU the
    unit is injected as the PSU. Fails closed to NaN on under-identified samples.
    Rejects `survey_design` with `reweight=True` (the equally-weighted /
    regression-adjustment IF path), replicate-weight designs, and non-pweight
    types (deferred follow-ups). `LPDiDResults` gains `survey_metadata` /
    `n_strata` / `n_psu`, a `"survey_tsl"` vcov_type, and a Survey Design block in
    `summary()`. The non-survey path is byte-for-byte unchanged.
    
    Validated against `survey::svyglm` on the stacked long difference (numeric
    golden parity is the D2 follow-up); 15 new pure-Python invariant tests
    (reduction/unit-clustering, FPC-shrinks-SE, stratification, lonely-PSU,
    NaN-consistency, weighting-moves-point, metadata, rejection paths).
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    
    * fix(lpdid): report survey PSU count as headline G for only_event survey fits
    
    CI-codex P2: under a survey design the effective variance cluster is the PSU
    (cluster_name reports the PSU column), but for only_event=True fits (pooled is
    None) headline_n_clusters fell back to the panel unit count -- so an explicit
    PSU design with n_psu != n_units could display the unit count mislabeled as
    G=<psu>. Per-row event-study n_clusters and inference were already computed on
    the realized survey design, so this was a metadata/labeling issue only, not a
    wrong SE/p-value. Fix: when a survey design is active, seed headline_n_clusters
    from the panel-level effective PSU count (the pooled-post override still prefers
    the realized survey-sample count when available). Regression test added
    (only_event=True, explicit PSU, n_psu != n_units).
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    
    * fix(lpdid): build survey sandwich on the kept-column design (rank-deficient contract)
    
    CI-codex P1: `_estimate_survey_sample` computed the Binder TSL sandwich on the
    UNREDUCED design and recomputed `response - design @ coef`. When the rank
    handler drops a redundant direct-inclusion covariate / absorbed dummy / lag
    (setting that coef to NaN while `treatment_entry` stays identified), the NaN
    coef propagated through the residuals and the full-design X'WX bread
    singularized, collapsing an otherwise-identified treatment SE/t/p/CI to NaN --
    violating the library's rank-deficient contract that the non-survey solve_ols
    path honors.
    
    Fix: keep solve_ols's returned residuals (original-scale, computed on the
    identified reduced design) and build `compute_survey_vcov` on `design[:, kept]`
    where `kept = isfinite(coef)`, mapping treatment back to its kept-column index.
    If treatment itself is dropped, the effect is NaN and the SE stays NaN;
    `rank_deficient_action="error"` still raises from solve_ols. Regression test
    (duplicate + constant covariate, `silent`) asserts the treatment SE stays
    finite and equals the non-redundant reference fit, and that `error` raises.
    
    CI-codex P2: type-check `survey_design` before `_survey_columns` accesses its
    attributes, so a non-SurveyDesign argument raises the intended TypeError rather
    than an incidental AttributeError (test added).
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    
    ---------
    
    Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    igerber and claude authored Jun 30, 2026
    Configuration menu
    Copy the full SHA
    c8928d2 View commit details
    Browse the repository at this point in the history
  3. docs(survey): waive zero-weight-PSU SE-invariance item; lock Lumley f…

    …ull-design convention (#589)
    
    * docs(survey): waive zero-weight-PSU SE-invariance item; lock Lumley full-design convention
    
    Re-examined the TODO row proposing the survey TSL finite-sample correction
    count only positive-weight PSUs so the SE is invariant to zero-weight
    (subpopulation / padded) rows. Investigation shows the premise conflicts with
    the library's documented, R-validated convention:
    
    - `_compute_stratified_psu_meat`'s per-stratum correction
      (1 - f_h)*n_PSU_h/(n_PSU_h - 1) and PSU-mean centering intentionally keep
      genuine-subpopulation zero-weight PSUs. This is the full-design domain
      estimator of Lumley (2004 Section 3.4) / R survey::svyrecvar(subset()),
      already documented in REGISTRY section "Subpopulation Analysis".
    - The ATT is exactly invariant; the survey SE is deliberately NOT invariant to
      genuine-subpopulation zeroing (it should differ from a naive physical subset
      -- that is the whole point of subpopulation()). R produces the matching SE
      (only df differs).
    - Zero-weight rows that reuse an existing PSU label are already bit-invariant.
      The only invariance-violating shape -- appending synthetic new all-zero PSUs
      -- arises in no estimator path (domain padding goes through prep.py's
      zero-padded full-design cell variance, which retains the real PSU layout).
    
    Forcing the meat to positive-weight-only counting would break the documented
    Lumley/R parity, so the item is waived (no estimator behavior change):
    
    - TODO.md: move the row from Actionable Backlog to "Won't-fix / waived
      (decisions on the record)" with the Lumley/R justification.
    - REGISTRY.md: add a Note in section "Subpopulation Analysis" making explicit
      that the TSL meat finite-sample correction counts zero-weight PSUs by design.
    - tests/test_survey.py: add TestZeroWeightPsuConventionWaiver regression-lock
      (inert existing-PSU padding is bit-invariant; subpopulation zeroing keeps the
      full PSU structure so its SE differs from a naive subset). A future
      positive-weight-only change would collapse the two and trip the test.
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    
    * test(survey): add direct _compute_stratified_psu_meat full-design unit test
    
    Addresses the CI review's actionable P3 (Documentation/Tests): the SE-level
    test only asserts that subpopulation zeroing differs from a physical subset,
    which catches a full positive-weight-only rewrite but could miss a partial
    edit (e.g. changing only the finite-sample denominator while still centering
    over the zero PSU).
    
    Adds a direct unit test on _compute_stratified_psu_meat with crafted PSU
    scores including one all-zero-score PSU (a fully zeroed subpopulation PSU),
    asserting the exact full-design meat formula (n_PSU_h including the zero
    PSU) and that it is NOT the positive-weight-only meat. Any change to the
    centering OR the denominator now trips the lock.
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    
    * test(survey): model a true zero-weight PSU in the direct meat fixture
    
    Addresses the re-review's actionable P3: the direct meat test represented its
    all-zero PSU via zero score rows only, with weights left all ones. A future
    denominator-only edit that reads resolved.weights to drop positive-weight PSUs
    would not have been caught. Set PSU 2's weights to 0 so the fixture models a
    true fully zero-weight subpopulation PSU. The current meat ignores weights
    (it operates on scores), so the expected value is unchanged; the change only
    hardens the regression lock.
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    
    ---------
    
    Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    igerber and claude authored Jun 30, 2026
    Configuration menu
    Copy the full SHA
    6b052e6 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    193f0ea View commit details
    Browse the repository at this point in the history

Commits on Jul 1, 2026

  1. Configuration menu
    Copy the full SHA
    6596d1b View commit details
    Browse the repository at this point in the history
  2. Bump version to 3.6.1 (#593)

    Release 3.6.1. Changes since 3.6.0:
    - LPDiD non-absorbing (reversible) treatment with entry-effect estimands
      (Dube, Girardi, Jorda & Taylor 2025) + complex-survey-design support
      (survey_design=), each R-parity validated.
    - TROP non-absorbing (on/off) treatment support, opt-in local method
      (Athey, Imbens, Qu & Viviano 2025).
    - Weighted multiple absorbed fixed effects (absorb=[a, b, ...]) via
      iterative alternating-projection demeaning.
    - CallawaySantAnna materializes non-estimable (g,t) cells as NaN.
    - Fix: BusinessReport appendix render failures now surfaced.
    - R-parity validation backfill for the LPDiD absorbing/non-absorbing/survey
      paths; survey zero-weight-PSU SE-invariance item waived (Lumley
      full-design convention); SciPy lower-bound doc alignment.
    
    Promotes the CHANGELOG [Unreleased] section to [3.6.1] - 2026-07-01 and
    syncs the version across __init__.py, pyproject.toml, rust/Cargo.toml,
    llms-full.txt, and CITATION.cff.
    
    Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    igerber and claude authored Jul 1, 2026
    Configuration menu
    Copy the full SHA
    2bb4be9 View commit details
    Browse the repository at this point in the history
Loading