StaggeredTripleDifference methodology validation + opt-in Eq-4.14 overall ATT by igerber · Pull Request #504 · igerber/diff-diff · GitHub
Skip to content

StaggeredTripleDifference methodology validation + opt-in Eq-4.14 overall ATT#504

Merged
igerber merged 2 commits into
mainfrom
feature/staggered-triple-diff-validation
May 31, 2026
Merged

StaggeredTripleDifference methodology validation + opt-in Eq-4.14 overall ATT#504
igerber merged 2 commits into
mainfrom
feature/staggered-triple-diff-validation

Conversation

@igerber

@igerber igerber commented May 31, 2026

Copy link
Copy Markdown
Owner

Summary

  • StaggeredTripleDifference methodology-review-tracker: In Progress → Complete. Validates the diff_diff source against Ortiz-Villavicencio & Sant'Anna (2025, arXiv:2505.09942v3); PR-A (StaggeredTripleDifference PR-A: Ortiz-Villavicencio & Sant'Anna (2025) paper review #499) added the paper review on file, this PR validates the implementation against it.
  • New opt-in overall_att_es (paper Eq. 4.14 overall) — the unweighted mean of the post-treatment event-study effects ES(e) — on StaggeredTripleDiffResults (with overall_se_es / overall_t_stat_es / overall_p_value_es / overall_conf_int_es), populated only under aggregate="event_study"/"all". The default overall_att (Callaway-Sant'Anna simple post-treatment average, the library-wide convention) is unchanged. Computed via a side-channel stash on the shared CallawaySantAnnaAggregationMixin._aggregate_event_study (no return-signature change; CallawaySantAnna unaffected), over post-treatment e >= -anticipation. Analytical SE = the influence function of the mean (per-event-time combined IFs averaged, routed through the same survey-aware variance estimator as the per-e effects via a new _se_from_psi helper); a multiplier-bootstrap SE replaces it under n_bootstrap>0.
  • REGISTRY fidelity fixes: formalized the previously-unlabeled overall-aggregation prose under a **Note:** documenting both overalls; consolidated the duplicate aggregation-weight deviation and fixed a P(G=g) vs R P(S=g) mislabel.
  • Edge-case hardening surfaced during validation: empty event-study / degenerate-IF aggregation now fails soft (NaN + warning) instead of raising; the shared bootstrap mixin guards empty rel_periods (balance_e-emptied event studies); overall_att_es uses its own replicate-weight effective df.
  • Tracker row → Complete with a Verified Components / R Comparison Results block; docs/references.rst pinned to arXiv:2505.09942v3; autosummary stubs + CHANGELOG updated.

Methodology references

  • Method name(s): StaggeredTripleDifference (staggered triple-differences / DDD), built on the shared CallawaySantAnna aggregation + multiplier-bootstrap mixins.
  • Paper / source link(s): Ortiz-Villavicencio, M. & Sant'Anna, P. H. C. (2025). "Better Understanding Triple Differences Estimators." arXiv:2505.09942v3. R reference: triplediff::ddd(panel=TRUE) + agg_ddd().
  • Intentional deviations (and why) — documented in docs/methodology/REGISTRY.md ## StaggeredTripleDifference, all verified non-masking against the v3 paper:
    • Comparison-cohort admissibility g_c > max(t, base_period) + anticipation (matches the companion R triplediff; the paper states g_c > max(g,t)) — valid cell-by-cell and base-period/anticipation-aware.
    • Aggregation weights P(S=g, Q=1) (eligible-treated; matches the paper's Eq. 4.13 where G_i is finite only for Q=1) vs R's P(S=g) — drives the larger tolerance on aggregated quantities.
    • Per-cohort group-effect WIF (conservative vs R wif=NULL).
    • Default overall_att = CS-simple post-treatment average (library convention); the paper's Eq. 4.14 overall is available opt-in as overall_att_es.
    • Cluster-robust analytical SEs accepted-but-deferred (the multiplier bootstrap provides unit-level clustering).

Validation

  • Tests added/updated:
    • tests/test_methodology_staggered_triple_diff.py — paper-equation-anchored Verified Components (Theorem 4.1 / Eq. 4.5 RA=IPW=DR identification; Eq. 4.1 three-term DDD decomposition; Eqs. 4.11-4.12 optimal-GMM weight normalization + single-group reduction; Eq. 4.13 event-study cohort-share weighting; Eq. 4.14 / Cor. 4.2 overall), overall_att_es R cross-validation, balance_e-empties-event-study fail-soft, and the aggregation return-contract arity.
    • tests/test_survey_staggered_ddd.pyoverall_att_es under survey weighting (uniform-equivalence, nontrivial-weights-change-SE, full design, replicate weights).
    • tests/test_staggered_triple_diff.py — result-surface smoke (summary() / to_dict() expose the new fields; None for the default fit).
  • R parity: cross-validated against triplediff::ddd(panel=TRUE) + agg_ddd() — group-time ATT(g,t) exact, SE within 1%; Eq. 4.14 overall within 10% (ATT) / 3% (SE). CSV fixtures gitignored / regenerated on-the-fly from benchmarks/R/benchmark_staggered_triplediff.R; JSON golden committed. CallawaySantAnna + StaggeredDiD suites pass (shared-mixin regression guard).

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

…rall ATT

Validates the StaggeredTripleDifference source against Ortiz-Villavicencio &
Sant'Anna (2025, arXiv:2505.09942v3) and promotes the methodology-review row to
Complete. Adds an opt-in Eq-4.14 overall ATT (overall_att_es).

Source:
- New _se_from_psi helper factored from _compute_aggregated_se_with_wif
  (survey/replicate/simple variance dispatch), reused for the overall SE.
- _aggregate_event_study stashes self._event_study_overall (mirroring
  _event_study_vcov; no return-signature change, CallawaySantAnna unaffected):
  unweighted mean of post-treatment ES(e) over e >= -anticipation.
- StaggeredTripleDiffResults gains overall_att_es/_se_es/_t_stat_es/_p_value_es/
  _conf_int_es (populated only under aggregate in {event_study, all}); rendered
  in summary() and to_dict(). Default overall_att unchanged.
- Bootstrap parity: per-draw mean of post-treatment ES(e) draws; cluster-
  unidentified NaN guard mirrored for the new scalars.

Methodology docs (REGISTRY ## StaggeredTripleDifference):
- Formalize the previously-unlabeled overall-aggregation prose under a Note
  documenting both overalls (default CS-simple vs opt-in Eq-4.14).
- Consolidate the duplicate aggregation-weight deviation; fix the P(G=g) vs R
  P(S=g) mislabel.

Tests:
- Paper-equation-anchored Verified Components (Thm 4.1/Eq 4.5, Eq 4.1, Eqs
  4.11-4.12, Eq 4.13, Eq 4.14/Cor 4.2) + overall_att_es R cross-validation +
  bootstrap/survey cross-surface coverage.

Tracker/refs: METHODOLOGY_REVIEW.md row -> Complete with Verified Components / R
Comparison Results; priority queue pruned; references.rst pinned to v3; CHANGELOG.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

…te=all / cluster-unidentified coverage

CI Codex review findings on PR #504:

- P2: the Eq. 4.14 overall (overall_att_es) terminal warning re-derived its
  trigger from the post-bootstrap state, so a bootstrap that NaN'd overall_se_es
  for unrelated reasons (e.g. a single-PSU/cluster-unidentified survey design)
  was misdiagnosed as an analytical non-finite influence function. Gate the
  warning on whether the ANALYTICAL SE was non-finite (captured before the
  bootstrap override), and broaden the message to name both causes (non-finite
  IF or unidentified variance). The bootstrap path already emits its own
  authoritative "variance unidentified" warning.
- P3 (tests): add a direct regression that aggregate="all" populates
  overall_att_es and matches the aggregate="event_study" value/inference
  bit-for-bit on the same data.
- Add a single-PSU cluster-unidentified bootstrap regression: overall_att_es
  keeps its point estimate, SE+inference are NaN-consistent, and the Eq. 4.14
  warning (when emitted) is cause-accurate.

No numeric output changes; inference fields remain NaN-consistent in all paths.
Affected suites pass, incl. the shared-mixin CallawaySantAnna regression guard.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 31, 2026
@igerber igerber merged commit d0a8752 into main May 31, 2026
33 of 34 checks passed
@igerber igerber deleted the feature/staggered-triple-diff-validation branch May 31, 2026 13:35
@igerber igerber mentioned this pull request Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant