Wave 4 Tier A drain: EfficientDiD anticipation note, generate_ddd_panel_data, TROP data-setup helper by igerber · Pull Request #455 · igerber/diff-diff · GitHub
Skip to content

Wave 4 Tier A drain: EfficientDiD anticipation note, generate_ddd_panel_data, TROP data-setup helper#455

Merged
igerber merged 6 commits into
mainfrom
wave-4-tier-a-drain
May 16, 2026
Merged

Wave 4 Tier A drain: EfficientDiD anticipation note, generate_ddd_panel_data, TROP data-setup helper#455
igerber merged 6 commits into
mainfrom
wave-4-tier-a-drain

Conversation

@igerber

@igerber igerber commented May 16, 2026

Copy link
Copy Markdown
Owner

Summary

  • Item 2efficient_did: align REGISTRY note with last_cohort × anticipation trim (efficient_did.py:470 already trims at last_g - anticipation; REGISTRY now describes the anticipation>0 case; docstring cross-references added; regression test in TestLastCohortControl).
  • Item 3prep_dgp: add generate_ddd_panel_data for panel-structured DDD power-analysis DGPs (balanced panel of n_units × n_periods; unit-level time-invariant group/partition; derived post; DDD-CPT preserved by construction). Exported from diff_diff and diff_diff.prep; autofunction stub in docs/api/prep.rst; 14 new tests including a deterministic recovery test (noise_sd=0, ATT recovery to ~1e-15); CHANGELOG entry under [Unreleased] ### Added. Cross-sectional generate_ddd_data unchanged.
  • Item 4trop: extract shared _setup_trop_data(...) helper from TROP.fit() (local path) and _fit_global() — ~85 LoC of near-identical data-setup logic deduped into one helper in trop_local.py. The global-method staggered-adoption check stays in _fit_global as a post-helper validation. Helper returns all four index mappings uniformly (unit_to_idx/period_to_idx/idx_to_unit/idx_to_period); parity regression test verifies round-trip + shape consistency. Behavior-preserving (84 non-slow TROP tests green).
  • TODO.md: 3 Tier-A backlog rows drained; 2 follow-up rows added (TROP bootstrap-loop dedup; TripleDifference power auto-routing).

Methodology references (required if estimator / math changes)

  • Method name(s): EfficientDiD (REGISTRY-only update, no estimator math touched); generate_ddd_panel_data is a new utility DGP, not an estimator; TROP refactor is data-setup extraction, not methodology.
  • Paper / source link(s): No new citations introduced. EfficientDiD's last_g - anticipation trim is the existing code's behavior (PR EfficientDiD: cluster-robust SEs, last-cohort control, Hausman pretest, small cohort warning #230); this PR aligns REGISTRY to that documented behavior. The DDD-CPT identifying assumption for generate_ddd_panel_data follows the standard triple-difference identification (see generate_ddd_data for the cross-sectional analog).
  • Any intentional deviations from the source (and why): None. REGISTRY now matches code; anticipation-contaminated periods are excluded from the pseudo-control's pre-treatment window (methodologically correct).

Validation

  • Tests added/updated:
    • tests/test_efficient_did.py::TestLastCohortControl::test_last_cohort_with_anticipation_trims_at_last_g_minus_anticipation (regression guard for the anticipation trim).
    • tests/test_prep.py::TestGenerateDddPanelData (14 tests including deterministic ATT recovery, finite-sample ATT recovery, validation, balance, time-invariance).
    • tests/test_trop.py::TestTROPModuleSplit::test_setup_trop_data_internal_contract (parity regression for the shared helper's return contract).
  • Backtest / simulation / notebook evidence (if applicable): N/A — no tutorial updates.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

@github-actions

Copy link
Copy Markdown

@github-actions

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 03fce79fc5204b273ee731e24a42d0be5bbc1b2c


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review status: the prior generate_ddd_panel_data() blocker is resolved. The generator now fails fast when rounded cell counts would leave any (group, partition) cell empty and includes regression coverage for both infeasible and smallest-feasible configurations (diff_diff/prep_dgp.py:L1264-L1367, tests/test_prep.py:L1237-L1341).
  • EfficientDiD’s control_group="last_cohort" docs and Methodology Registry now match the existing last_g - anticipation trim, and the PR adds a direct regression test for that behavior (diff_diff/efficient_did.py:L161-L180, diff_diff/efficient_did.py:L463-L480, docs/methodology/REGISTRY.md:L875-L880, docs/methodology/REGISTRY.md:L934-L939, tests/test_efficient_did.py:L1128-L1158).
  • The TROP refactor centralizes shared fit-time setup in _setup_trop_data(), and both local/global paths now consume the same validated context; I did not find a behavior-changing estimator or inference regression in the changed surface (diff_diff/trop_local.py:L74-L194, diff_diff/trop.py:L472-L486, diff_diff/trop_global.py:L587-L605, tests/test_trop.py:L4149-L4190).
  • Remaining TROP bootstrap dedup and TripleDifference power auto-routing work is explicitly tracked in TODO.md, so those items are informational only (TODO.md:L87-L88).
  • Validation note: this was a static review only. I could not run the targeted tests because python in this environment lacks numpy, and pytest is not installed.

Methodology

No findings. The previous DDD generator issue is fixed, and the new generator now enforces the 8-cell contract that TripleDifference.fit() requires (diff_diff/prep_dgp.py:L1264-L1367, diff_diff/triple_diff.py:L776-L798, tests/test_prep.py:L1237-L1341).

Code Quality

No findings. The TROP extraction is a straightforward consolidation of duplicated setup logic into one helper used by both fit paths (diff_diff/trop_local.py:L74-L194, diff_diff/trop.py:L472-L486, diff_diff/trop_global.py:L587-L605).

Performance

No findings.

Maintainability

P3 informational — Impact: the PR improves TROP maintainability by deduplicating fit-time setup, but intentionally leaves bootstrap-loop dedup for follow-up. Concrete fix: none required for this PR; the remaining work is already tracked in TODO.md:L87-L87.

Tech Debt

P3 informational — Impact: TripleDifference power simulation is not yet auto-routed to generate_ddd_panel_data() when n_periods > 2, but the limitation is explicitly tracked and disclosed. Concrete fix: none required for this PR; follow-up is tracked in TODO.md:L88-L88.

Security

No findings.

Documentation/Tests

No blocking findings. Public API exposure and docs for generate_ddd_panel_data() were updated consistently across diff_diff/prep.py:L18-L23, diff_diff/__init__.py:L120-L128, diff_diff/__init__.py:L409-L413, and docs/api/prep.rst:L73-L78, and the new tests cover the previously missing empty-cell edge case plus the smallest feasible success case (tests/test_prep.py:L1237-L1341). Validation note: I could not execute the test suite here because python lacks numpy and pytest is unavailable.

igerber and others added 5 commits May 16, 2026 13:02
The `control_group="last_cohort"` path in EfficientDiD.fit (line 470) trims
periods at `last_g - anticipation`, excluding anticipation-contaminated
periods from the pseudo-control's pre-treatment window. REGISTRY.md
previously described only the `anticipation=0` case. Aligns both the
Edge Cases bullet and the algorithm Note to the code's actual behavior,
cross-references the interaction in the `control_group` and `anticipation`
docstring entries, and adds a regression test asserting the trim cuts at
`last_g - anticipation` rather than `last_g`.

Closes Tier-A backlog item 2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a new public DGP function that generates a balanced panel of n_units
observed over n_periods with two unit-level binary dimensions (`group`,
`partition`, time-invariant) and a derived `post` indicator. The DDD-CPT
identifying assumption holds because `group_partition_interaction` enters
only as a unit-level (time-invariant) effect, leaving the triple
interaction `treatment_effect * group * partition * post` as the sole
source of differential group × partition trend.

The existing cross-sectional `generate_ddd_data` remains unchanged.
Compatible with `TripleDifference.fit(..., time="post")` directly; the
binary 2×2×2 estimator surface is unchanged. Auto-routing of
`power.simulate_power` to the panel DGP for `n_periods > 2` is deferred
to a follow-up (TODO.md row added).

Exports: top-level `diff_diff` and `diff_diff.prep` re-export; new
autofunction stub in docs/api/prep.rst. Tests in
tests/test_prep.py::TestGenerateDddPanelData (14 tests) including a
deterministic recovery test (noise_sd=0, ATT recovery to ~1e-15) and a
finite-sample recovery test.

Closes Tier-A backlog item 3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`TROP.fit()` (local path) and `_fit_global()` previously duplicated ~85
lines of data-setup logic each: panel pivoting, absorbing-state
validation, treated/control unit identification, first-treatment-period
detection, and pre/post period counting. The duplicated blocks were
near-identical line-by-line, differing only in which index mappings
the caller built (local built all four; global built only the forward
maps).

Extracts `_setup_trop_data(...)` in trop_local.py (alongside the existing
`_validate_and_pivot_treatment` helper). Both callers now invoke it and
unpack only the fields they consume. The helper returns all four index
mappings (`unit_to_idx`, `period_to_idx`, `idx_to_unit`, `idx_to_period`)
uniformly, eliminating a class of subtle drift bug; `_fit_global` gains
two unused locals as a trade-off.

The global-method-specific staggered-adoption check stays in
`_fit_global` as a post-helper validation (it depends on estimator
semantics, not data preparation). Bootstrap-loop dedup (~40 LoC across
`_bootstrap_variance` / `_bootstrap_variance_global`) is deferred to a
follow-up (TODO.md row added).

Adds a parity regression test `TestTROPModuleSplit::test_setup_trop_data_internal_contract`
that round-trip-verifies the index mappings, shape consistency, and
treated/control partition disjointness.

Behavior-preserving: TROP test suite (84 non-slow tests) is the safety
net.

Closes Tier-A backlog item 4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes three Methodology/Correctness rows (EfficientDiD anticipation
trim, generate_ddd_panel_data, TROP fit/_fit_global dedup) and the
three corresponding Tier A bullets, addressed in this PR. Adds two
follow-up rows for the deferred scope (TROP bootstrap-loop dedup,
TripleDifference power auto-routing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…l_data

The R1 review identified that independent marginal sampling of `group` and
`partition` could leave one of the four (group, partition) cells empty
under valid inputs — e.g., `n_units=4, group_frac=partition_frac=0.25`
rounds to `n_group_1=1` and `n_p1_g1=0`, so the (1, 1) cell collapses
before `TripleDifference.fit`'s 2x2x2 cell-presence check.

Switches to stratified allocation: assign `group` to its requested
fraction, then within each group stratum, draw `partition=1` at
`partition_frac`. Adds a targeted `ValueError` when the rounded cell
counts would leave any (group, partition) cell empty (with the four
cell counts in the error message so users can pick a feasible config).

Adds two regression tests:
- `test_infeasible_cell_counts_raise` exercises both the `n_units=4`
  small-marginal case and an `n_units=10, group_frac=0.1` case.
- `test_smallest_feasible_config_populates_all_cells` verifies the
  smallest feasible config (`n_units=4, fracs=0.5`) yields one unit per
  cell and that `TripleDifference.fit(..., time="post")` succeeds on
  it (the contract the docstring advertises).

Updates the `group_frac` / `partition_frac` docstring entries to describe
the stratified allocation guarantee, and the `[Unreleased]` CHANGELOG
entry to mention the cell-coverage invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber igerber force-pushed the wave-4-tier-a-drain branch from 03fce79 to 4aa3009 Compare May 16, 2026 17:05
@github-actions

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 4aa3009308294665d03d16544ce453dccc7bbf2b


Overall Assessment

⚠️ Needs changes

One unmitigated P1 remains in the new generate_ddd_panel_data() guidance: it advertises panel-power usage with direct TripleDifference().fit(..., time="post"), but the DDD estimator is the repeated-cross-section (panel=FALSE) implementation and defaults to row-level inference unless the caller clusters.

Executive Summary

  • The prior blocker on empty DDD cells is resolved: generate_ddd_panel_data() now rejects rounded (group, partition) allocations that would leave any cell empty, and the new tests cover both infeasible and smallest-feasible cases (diff_diff/prep_dgp.py:L1279-L1306, tests/test_prep.py:L1237-L1288).
  • EfficientDiD’s control_group="last_cohort" documentation now matches the existing last_g - anticipation trim, and the PR adds a direct regression guard for that behavior (diff_diff/efficient_did.py:L161-L180, diff_diff/efficient_did.py:L463-L479, docs/methodology/REGISTRY.md:L875-L880, docs/methodology/REGISTRY.md:L934-L939, tests/test_efficient_did.py:L1128-L1159).
  • The TROP _setup_trop_data() extraction looks behavior-preserving for the local/global fit paths; I did not find a new validation or inference regression in that refactor (diff_diff/trop_local.py:L74-L194, diff_diff/trop.py:L471-L486, diff_diff/trop_global.py:L587-L626, tests/test_trop.py:L4149-L4190).
  • P1 [Newly identified]: generate_ddd_panel_data() is documented as suitable for panel power analysis and shows direct TripleDifference().fit(..., time="post") usage, but TripleDifference explicitly implements the repeated-cross-section panel=FALSE surface and uses row-level IF/df = n_obs - 8 inference unless cluster is provided (diff_diff/prep_dgp.py:L1176-L1179, diff_diff/prep_dgp.py:L1256-L1262, diff_diff/triple_diff.py:L16-L17, diff_diff/triple_diff.py:L697-L702, docs/methodology/REGISTRY.md:L1719-L1745). With the new unit FE / within-unit serial structure, that guidance can silently understate SEs and overstate power.
  • Validation note: this was a static review only. I could not run the targeted tests here because the environment lacks numpy, pandas, and pytest.

Methodology

  • P1 [Newly identified] Impact: the new DGP adds unit fixed effects and within-unit serial structure, then markets itself as a panel-power DGP and demonstrates unclustered direct use with TripleDifference().fit(..., time="post") (diff_diff/prep_dgp.py:L1176-L1179, diff_diff/prep_dgp.py:L1256-L1262). That conflicts with the documented DDD methodology surface, which is the repeated-cross-section panel=FALSE implementation with individual-level default SE and only optional cluster-robust inference (diff_diff/triple_diff.py:L16-L17, docs/methodology/REGISTRY.md:L1719-L1745, diff_diff/triple_diff.py:L697-L702). Concrete fix: either document the supported panel path as TripleDifference(cluster="unit").fit(...) and add a Note:/deviation entry to the methodology docs, or remove the current “panel power analysis / direct fit” framing until cluster-aware support is explicit.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • P3 informational — Impact: the TROP fit-time setup dedup is a net maintainability improvement, and the remaining bootstrap-loop dedup is explicitly tracked rather than silently deferred (TODO.md:L87-L87). Concrete fix: none required for this PR.

Tech Debt

  • P3 informational — Impact: simulate_power still routes TripleDifference to the fixed 2×2×2 cross-sectional DGP; the new panel DGP is not auto-selected yet, but that limitation is explicitly tracked (diff_diff/power.py:L649-L685, diff_diff/power.py:L825-L829, TODO.md:L88-L88). Concrete fix: none required for this PR.

Security

  • No findings.

Documentation/Tests

  • P2 Impact: the new generate_ddd_panel_data() tests lock cell-population and ATT-recovery behavior, but they do not lock the inference contract for the panel-generated path, so the unclustered usage above could regress unnoticed (tests/test_prep.py:L1260-L1341). Concrete fix: add a regression that uses the documented panel path, e.g. TripleDifference(cluster="unit"), on panel-generated data.

Path to Approval

  1. Update the generate_ddd_panel_data() public guidance so the supported example uses TripleDifference(cluster="unit").fit(..., time="post"), or explicitly state that unclustered direct fit is point-estimate-only and not valid for panel inference/power.
  2. Add a Note: to docs/methodology/REGISTRY.md or equivalent methodology-facing docs documenting that this DGP generates repeated unit rows with serial correlation and therefore requires unit clustering when used with TripleDifference.
  3. Add a regression test that exercises the clustered panel path on generate_ddd_panel_data() output.

R3 review flagged that `generate_ddd_panel_data` advertises direct
`TripleDifference().fit(..., time="post")` usage, but `TripleDifference`
is the repeated-cross-section `panel=FALSE` estimator whose default
analytical SE treats each row as iid (df = n_obs - 8). With the new
panel-shaped output (unit FE + within-unit serial correlation),
unclustered SE understates variance and overstates power.

Updates:
- DGP docstring gains a `.. warning::` block stating the panel + cluster
  requirement; the `Examples` block now demonstrates
  `TripleDifference(cluster="unit").fit(...)` as the recommended pattern.
- REGISTRY.md `## TripleDifference` SE section gains a "Note (panel-shaped
  input, `generate_ddd_panel_data`)" paragraph documenting the
  repeated-cross-section semantics and the cluster contract.
- CHANGELOG `[Unreleased]` entry tightened to mention the cluster
  requirement explicitly.
- New test `test_recommended_clustered_panel_path` locks the documented
  pattern: clustered fit succeeds, `n_clusters == n_units`, ATT point
  estimate is invariant to clustering, but SE differs between clustered
  and unclustered fits (within-unit correlation is non-trivial).

Point estimate semantics unchanged. Fix is documentation + invariance
test only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 16, 2026
@igerber igerber merged commit 9aedd33 into main May 16, 2026
33 of 34 checks passed
@igerber igerber deleted the wave-4-tier-a-drain branch May 16, 2026 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant