{{ message }}
StaggeredTripleDifference methodology validation + opt-in Eq-4.14 overall ATT#504
Merged
Merged
Conversation
…rall ATT
Validates the StaggeredTripleDifference source against Ortiz-Villavicencio &
Sant'Anna (2025, arXiv:2505.09942v3) and promotes the methodology-review row to
Complete. Adds an opt-in Eq-4.14 overall ATT (overall_att_es).
Source:
- New _se_from_psi helper factored from _compute_aggregated_se_with_wif
(survey/replicate/simple variance dispatch), reused for the overall SE.
- _aggregate_event_study stashes self._event_study_overall (mirroring
_event_study_vcov; no return-signature change, CallawaySantAnna unaffected):
unweighted mean of post-treatment ES(e) over e >= -anticipation.
- StaggeredTripleDiffResults gains overall_att_es/_se_es/_t_stat_es/_p_value_es/
_conf_int_es (populated only under aggregate in {event_study, all}); rendered
in summary() and to_dict(). Default overall_att unchanged.
- Bootstrap parity: per-draw mean of post-treatment ES(e) draws; cluster-
unidentified NaN guard mirrored for the new scalars.
Methodology docs (REGISTRY ## StaggeredTripleDifference):
- Formalize the previously-unlabeled overall-aggregation prose under a Note
documenting both overalls (default CS-simple vs opt-in Eq-4.14).
- Consolidate the duplicate aggregation-weight deviation; fix the P(G=g) vs R
P(S=g) mislabel.
Tests:
- Paper-equation-anchored Verified Components (Thm 4.1/Eq 4.5, Eq 4.1, Eqs
4.11-4.12, Eq 4.13, Eq 4.14/Cor 4.2) + overall_att_es R cross-validation +
bootstrap/survey cross-surface coverage.
Tracker/refs: METHODOLOGY_REVIEW.md row -> Complete with Verified Components / R
Comparison Results; priority queue pruned; references.rst pinned to v3; CHANGELOG.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…te=all / cluster-unidentified coverage CI Codex review findings on PR #504: - P2: the Eq. 4.14 overall (overall_att_es) terminal warning re-derived its trigger from the post-bootstrap state, so a bootstrap that NaN'd overall_se_es for unrelated reasons (e.g. a single-PSU/cluster-unidentified survey design) was misdiagnosed as an analytical non-finite influence function. Gate the warning on whether the ANALYTICAL SE was non-finite (captured before the bootstrap override), and broaden the message to name both causes (non-finite IF or unidentified variance). The bootstrap path already emits its own authoritative "variance unidentified" warning. - P3 (tests): add a direct regression that aggregate="all" populates overall_att_es and matches the aggregate="event_study" value/inference bit-for-bit on the same data. - Add a single-PSU cluster-unidentified bootstrap regression: overall_att_es keeps its point estimate, SE+inference are NaN-consistent, and the Eq. 4.14 warning (when emitted) is cause-accurate. No numeric output changes; inference fields remain NaN-consistent in all paths. Affected suites pass, incl. the shared-mixin CallawaySantAnna regression guard. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
diff_diffsource against Ortiz-Villavicencio & Sant'Anna (2025, arXiv:2505.09942v3); PR-A (StaggeredTripleDifference PR-A: Ortiz-Villavicencio & Sant'Anna (2025) paper review #499) added the paper review on file, this PR validates the implementation against it.overall_att_es(paper Eq. 4.14 overall) — the unweighted mean of the post-treatment event-study effects ES(e) — onStaggeredTripleDiffResults(withoverall_se_es/overall_t_stat_es/overall_p_value_es/overall_conf_int_es), populated only underaggregate="event_study"/"all". The defaultoverall_att(Callaway-Sant'Anna simple post-treatment average, the library-wide convention) is unchanged. Computed via a side-channel stash on the sharedCallawaySantAnnaAggregationMixin._aggregate_event_study(no return-signature change; CallawaySantAnna unaffected), over post-treatmente >= -anticipation. Analytical SE = the influence function of the mean (per-event-time combined IFs averaged, routed through the same survey-aware variance estimator as the per-e effects via a new_se_from_psihelper); a multiplier-bootstrap SE replaces it undern_bootstrap>0.**Note:**documenting both overalls; consolidated the duplicate aggregation-weight deviation and fixed aP(G=g)vs RP(S=g)mislabel.rel_periods(balance_e-emptied event studies);overall_att_esuses its own replicate-weight effective df.docs/references.rstpinned to arXiv:2505.09942v3; autosummary stubs + CHANGELOG updated.Methodology references
StaggeredTripleDifference(staggered triple-differences / DDD), built on the shared CallawaySantAnna aggregation + multiplier-bootstrap mixins.triplediff::ddd(panel=TRUE)+agg_ddd().docs/methodology/REGISTRY.md## StaggeredTripleDifference, all verified non-masking against the v3 paper:g_c > max(t, base_period) + anticipation(matches the companion Rtriplediff; the paper statesg_c > max(g,t)) — valid cell-by-cell and base-period/anticipation-aware.P(S=g, Q=1)(eligible-treated; matches the paper's Eq. 4.13 whereG_iis finite only forQ=1) vs R'sP(S=g)— drives the larger tolerance on aggregated quantities.wif=NULL).overall_att= CS-simple post-treatment average (library convention); the paper's Eq. 4.14 overall is available opt-in asoverall_att_es.Validation
tests/test_methodology_staggered_triple_diff.py— paper-equation-anchored Verified Components (Theorem 4.1 / Eq. 4.5 RA=IPW=DR identification; Eq. 4.1 three-term DDD decomposition; Eqs. 4.11-4.12 optimal-GMM weight normalization + single-group reduction; Eq. 4.13 event-study cohort-share weighting; Eq. 4.14 / Cor. 4.2 overall),overall_att_esR cross-validation,balance_e-empties-event-study fail-soft, and the aggregation return-contract arity.tests/test_survey_staggered_ddd.py—overall_att_esunder survey weighting (uniform-equivalence, nontrivial-weights-change-SE, full design, replicate weights).tests/test_staggered_triple_diff.py— result-surface smoke (summary()/to_dict()expose the new fields;Nonefor the default fit).triplediff::ddd(panel=TRUE)+agg_ddd()— group-time ATT(g,t) exact, SE within 1%; Eq. 4.14 overall within 10% (ATT) / 3% (SE). CSV fixtures gitignored / regenerated on-the-fly frombenchmarks/R/benchmark_staggered_triplediff.R; JSON golden committed. CallawaySantAnna + StaggeredDiD suites pass (shared-mixin regression guard).Security / privacy
🤖 Generated with Claude Code