{{ message }}
dCDH heterogeneity: per-path + global placebo predict_het R-parity + df threading#449
Merged
Merged
Conversation
Closes TODO #422 + pilot-412 in a single PR (same surface, same R fixture, same parity story). Phase 0 probe verified R behavior: did_multiplegt_dyn(by_path, predict_het, placebo) emits per-path heterogeneity OLS results on backward (placebo) horizons via R's per-by_level dispatcher (DIDmultiplegtDYN:::did_multiplegt_main placebo block at the `effect = matrix(-i, ...)` rbind site). New scenario 22 in benchmarks/R/generate_dcdh_dynr_test_values.R captures this with predict_het=list("het_x", c(-1)) — the c(-1) sentinel triggers "compute heterogeneity for ALL forward (1..effects) AND ALL placebo (1..placebo) positions" per the R source path read at script time. Phase 1A implementation (non-survey): _compute_heterogeneity_test gains a placebo: int = 0 parameter and iterates forward (1..L_max) and backward (-1..-placebo) horizons in a single loop. Explicit `if out_idx < 0: continue` eligibility guard prevents numpy negative-index silent wrap on N_mat[g, out_idx] when F_g - 1 + l_h < 0. _compute_path_heterogeneity_test forwards the param; fit() passes placebo=L_max if self.placebo else 0 to both global and per-path call sites. to_dataframe(level="by_path") placebo rows now read het_* values from path_heterogeneity_effects negative-int keys (mirroring the existing path_placebo_event_study negative-key convention) instead of the pre-PR hardcoded NaN-fill. Survey gate: when survey_design is active AND placebo > 0 + heterogeneity is requested, _compute_heterogeneity_test raises NotImplementedError eagerly with a documented message. The Binder TSL cell-period allocator's REGISTRY justification is tied to post-period attribution; backward-horizon attribution puts ψ_g mass on a pre-period cell, which is a separate library-extension claim that needs its own derivation. Forward-horizon predict_het + survey continues to work unchanged. Pre-period allocator derivation tracked as a new follow-up TODO row. Phase 2 (df threading): _compute_heterogeneity_test now passes df = n_obs - n_params to safe_inference on the non-survey OLS path, matching R did_multiplegt_dyn(predict_het=...)'s t-distribution inference (qt(0.975, df.residual(model)) site). Pre-PR Python used df=None (Z critical), producing 0.1-2% rtol gaps on p_value/conf_int vs R. Existing forward-horizon parity tests now pin t/p/CI at INFERENCE_RTOL=1e-4 (was unpinned). Rank- deficient designs use design.shape[1] as df denominator (pre-drop column count); fully rank-deficient is NaN-short-circuited by the existing guard. Near-rank-deficient edge case tracked as a new Low TODO follow-up. R parity: scenario 22 (multi_path_reversible_predict_het_with_placebo, placebo=2, effects=3, by_path=3) pinned at BETA_RTOL=1e-6/SE_RTOL=1e-5 for beta/se/t_stat/n_obs and INFERENCE_RTOL=1e-4 for p_value/conf_int across 3 paths × (3 forward + 2 placebo) = 15 horizons. Cross-surface tests (TestByPathPredictHetPlacebo): placebo het column population, survey-gate NotImplementedError, forward+survey anti-regression, out_idx<0 eligibility guard, single-path telescope (path_heterogeneity_effects[(only_path,)] == heterogeneity_effects bit- exactly), summary rendering. The two existing local-invariant tests (test_*_inference_matches_safe_inference) refactored to verify SE-derivation wiring (t_stat=beta/se, conf_int symmetric around beta, p_value in [0,1]) without back-deriving n_params. REGISTRY: heterogeneity Z-vs-t deviation note replaced with positive "R parity (post-2026-05-15 df threading)" framing including the rank- deficient caveat. New "Per-path placebo heterogeneity" Note documents the R parity, syntax requirements (c(-1) sentinel), survey gate, and test anchors. CHANGELOG entry under [Unreleased]. llms-full.txt by_path entry extended with placebo het composition + survey-gate mention. API rst extended with the same. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…_het parity
Two changes that resolve scope/contract issues with the previous commit on
this branch:
1. Survey gate (scope correctness). The eager `if use_survey and placebo > 0:
raise NotImplementedError` at function entry of `_compute_heterogeneity_test`
fired on EVERY survey + heterogeneity call when fit() passed
`placebo=L_max if self.placebo else 0` with default `placebo=True`. That
broke the previously-supported forward-horizon survey + predict_het path,
contradicting the changelog's "Forward-horizon predict_het + survey_design
is supported" claim. Refactored to a per-iteration backstop inside the
horizon loop (raises only when actually about to compute a backward
iteration under survey). fit() at both global and per-path heterogeneity
call sites now wraps the placebo arg with a warn-and-skip: if
`survey_design + placebo + heterogeneity` are all active, emit a
UserWarning explaining backward-horizon predict_het is deferred under
survey designs and pass `placebo=0` so only forward-horizon heterogeneity
is computed. The function-level NotImplementedError remains as a defensive
backstop for direct callers that bypass fit().
2. Global placebo predict_het parity (scope completeness). The previous
commit extended `_compute_heterogeneity_test` to compute backward horizons,
which means `results.heterogeneity_effects` (global, non-by_path surface)
also gains negative-int keys when `placebo=True + heterogeneity`. The R-
parity coverage was scoped to per-path only (scenario 22). Added scenario
23 (`multi_path_reversible_predict_het_with_placebo_global`) that calls
`did_multiplegt_dyn(predict_het=list("het_x", c(-1)), placebo=2, effects=3)`
WITHOUT by_path; verified R emits forward (effect=1,2,3) AND placebo
(effect=-1,-2) rows on the global surface with 3 paths × 30 switchers each
= n_obs=90 per row. New parity test class
`TestDCDHDynRParityHeterogeneityWithPlacebo` pins all 6 inference fields
across 5 horizons at the same tolerances as the per-path version.
Test changes:
- `test_predict_het_placebo_survey_design_raises` renamed to
`test_predict_het_placebo_survey_design_warns_and_skips_backward`;
asserts UserWarning is emitted, no exception raised, and both
`heterogeneity_effects` and `path_heterogeneity_effects` contain only
positive-int keys (forward-only emission).
- New `test_compute_heterogeneity_test_direct_call_raises_on_backward_survey`
exercises the per-iteration backstop directly; locks the API contract
for any future internal call site that bypasses fit().
- New `test_parity_multi_path_reversible_predict_het_with_placebo_global`
pins scenario 23 R-parity.
Doc updates: REGISTRY heterogeneity Note clarifies "warn + skip" semantics
(was "raises NotImplementedError"), references both global and per-path
parity classes. CHANGELOG entry rewritten to reflect global + per-path
scope. llms-full.txt and API rst updated similarly.
Test count: 312 -> 314 (added 2; the renamed test still passes under its
new contract).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The top-level heterogeneity Note in REGISTRY.md still said "This
implementation provides only post-treatment regressions" — stale after
the placebo predict_het additions in this branch's earlier commits.
Updated to reflect:
1. Python now matches R on per-horizon placebo regressions (when
`placebo=True` + `heterogeneity=` are co-set) — refers readers to
the "Placebo predict_het" sub-note for the full contract (global +
per-path scope, survey warn+skip behavior, R-parity classes).
2. The remaining gap from R is the joint null Wald F-test across all
predict_het rows — Python emits per-horizon inference only.
3. The `controls` mutex stays unchanged.
Cross-references the existing render-time note in `_render_heterogeneity_section`
("Per-horizon regressions only (no joint F-test)") so the registry text
matches the user-facing summary.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two informational findings from the latest review: 1. Stale `to_dataframe(level="by_path")` docstring at `chaisemartin_dhaultfoeuille_results.py:1530-1558` still claimed placebo `het_*` columns are NaN. Updated to document the post-#422 contract: positive-horizon AND negative-horizon (placebo) rows are both populated when `placebo=True + heterogeneity=` are co-set; placebo rows under `survey_design` remain NaN with a fit-time UserWarning. 2. JSON golden-fixture type instability for empty `placebo_predict_het` / `placebo_horizons` slots. R's `jsonlite::toJSON` serializes plain `list()` as `[]` (array) but populated named lists as `{}` (object), so consumers iterating `.items()` on the slot saw different shapes across scenarios. Fixed both ends: - R-side: extractors initialize empty slots with `structure(list(), names = character(0))` which jsonlite serializes as `{}` even when empty. Verified across 4 scenarios (20, 21, 22, 23) — all `placebo_predict_het` / `placebo_horizons` slots now serialize as objects regardless of population. - Python-side: added `_as_dict` helper in the parity test module as a defensive backstop coercing any non-dict (None / [] / missing) to {}. Used at the two call sites that read optional placebo slots so consumers can call `.items()` uniformly. The golden JSON regenerated; type-stable across all scenarios (verified via `jq` on each scenario's predict_het type). 314 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…tract Two stale in-code documentation surfaces from the latest review: 1. `chaisemartin_dhaultfoeuille.py:980-989` — the `heterogeneity` parameter docstring on `fit()` still said "post-treatment regressions only (no placebo regressions)". Updated to document the post-#422 contract: per-horizon OLS regressions on forward AND backward (placebo) horizons when `placebo=True`; survey_design composes with forward horizons but warns + skips backward horizons until the pre-period cell allocator is derived. 2. `chaisemartin_dhaultfoeuille_results.py:1279-1286` — the heterogeneity summary note didn't mention the survey forward-only fallback. Extended the note to cover the gating semantics so users reading `result.summary()` under `survey_design + heterogeneity` know what they're getting. Both surfaces now match the contract already documented in the API rst, REGISTRY, and CHANGELOG. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TDL77
pushed a commit
to TDL77/diff-diff
that referenced
this pull request
May 18, 2026
…h test Two follow-ups closing the residual Low TODOs from PR igerber#449: 1. igerber#64 — Rank-deficient df threading. `_compute_heterogeneity_test`'s non-survey OLS path now computes `df = n_obs - rank(design)` via `_detect_rank_deficiency` (the same helper `solve_ols` calls internally), matching R's `df.residual = n - rank(design)` post-drop convention. For full-rank designs `rank == n_params` and behavior is bit-identical to the pre-PR `n_obs - n_params` path — all 4 forward-horizon parity tests still pass at the same `BETA_RTOL=1e-6` / `SE_RTOL=1e-5` / `INFERENCE_RTOL=1e-4` tolerances. For near-rank-deficient designs that `solve_ols` retains rather than NaN-out (e.g., cohort-collinearity at high horizons), the post-drop rank is strictly lower than `n_params`, so the post-PR `df` is strictly larger, matching R's `lm()` convention. Locked by `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPredictHetPlacebo::test_heterogeneity_df_uses_post_drop_rank` which pins `safe_inference(... df=n_obs - rank)` against the stored `t_stat` / `p_value` / `conf_int` at `atol=1e-12`. 2. igerber#62 — Negative-baseline path regression test. New `TestByPathNonBinary::test_negative_baseline_path_supported` exercises switchers with `D_{g,1} = -1` and asserts that `path_effects` correctly contains negative-baseline tuple keys (`(-1, 0, 0, 0)`, `(-1, 1, 1, 1)`). The existing `test_negative_integer_D_supported` only covered paths with negative values in non-baseline positions (e.g., `(0, -1, -1, -1)`), which does not trigger R's documented `substr(path, 1, 1)` baseline-extraction bug. Python's tuple-key matching is correct under any baseline value; this test pins the contract. No R-parity fixture is added because R is the buggy side on this regime — the deviation is already documented in the REGISTRY non-binary treatment Note. TODO.md: drops the two corresponding Low rows (igerber#419 negative-baseline test gap + follow-up rank-deficient df threading). REGISTRY heterogeneity caveat updated to drop the "rank-deficient gap" prose and replace with the positive "uses post-drop rank, matching R's lm() convention" claim. CHANGELOG `[Unreleased]` extended with both items. Test count: 314 -> 316. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TDL77
pushed a commit
to TDL77/diff-diff
that referenced
this pull request
May 18, 2026
Two P3 informational findings from R2: 1. TODO.md Tier B backlog still listed the dCDH heterogeneity df- threading and by-path placebo predict_het items as open, but PR igerber#449 closed both. Replaced the two stale bullets with a single bullet for the remaining survey + backward-horizon allocator derivation (the one Medium follow-up explicitly tracked in the wrap-up commit). 2. Three test-prose comments still said `df = n_obs - n_params` while the implementation and REGISTRY now use `df = n_obs - rank(design)`. Updated each comment to the post-drop rank wording; full-rank designs continue to have `rank == n_params` so the SE-derivation invariants under test are unchanged. Comment-only drift; no behavior change. 317 tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Closes TODO #422 (per-path placebo
predict_hetemission) and pilot-412 (heterogeneity OLS df threading) in a single bundle. Same surface, same R fixture, single clean R-parity story.R-fixture probe (scenario 22) verified that
did_multiplegt_dyn(by_path, predict_het, placebo)emits per-by_level heterogeneity OLS rows on backward (placebo) horizons via R's per-by_level dispatcher (DIDmultiplegtDYN:::did_multiplegt_mainplacebo block at theeffect = matrix(-i, ...)rbind site). Scenario 23 confirmed the same emission on the global (non-by_path) surface. Both surfaces now mirror in Python.R syntax note:
did_multiplegt_dyn(predict_het=list(\"X\", c(...)), placebo=N)rejects positive-only horizons (c(1, 2, 3)errors with "specified numbers in predict_het that exceed the number of placebos" because R reuses the same horizon vector for both forward and backward indexing). Thec(-1)sentinel triggers "compute heterogeneity for ALL forward (1..effects) AND ALL placebo (1..placebo) positions" — both new fixtures use this syntax.Survey gate is warn + skip: when
survey_design + placebo + heterogeneityare co-set, fit() emits aUserWarningand falls back to forward-horizon-only heterogeneity. The Binder TSL cell-period allocator's REGISTRY justification is tied to post-period attribution; backward-horizon attribution puts ψ_g mass on a pre-period cell, which is a separate library-extension claim deferred to a follow-up methodology PR. Forward-horizonpredict_het + survey_designcontinues to work unchanged.Phase 2 df threading:
_compute_heterogeneity_testnow passesdf = n_obs - n_paramstosafe_inferenceon the non-survey OLS path (matches Rqt(0.975, df.residual(model))); pre-PR Python useddf=None(Z critical), producing 0.1-2% rtol gaps onp_value/conf_intvs R. Forward parity tests tightened from "unpinned" toINFERENCE_RTOL=1e-4. Rank-deficient designs usedesign.shape[1]as df denominator (pre-drop column count); fully rank-deficient is NaN-short-circuited by the existing guard. Near-rank-deficient edge case tracked as a Low TODO follow-up.Methodology references (required if estimator / math changes)
ChaisemartinDHaultfoeuille.predict_het×placebo×by_path/paths_of_interestDIDmultiplegtDYN 2.3.3did_multiplegt_mainplacebo + predict_het blockpredict_hetrows: R aggregates; Python emits per-horizon inference only. Documented in REGISTRY heterogeneity Note.n_obs - n_params(pre-drop column count) vs R'sdf.residual = n - rankpost-drop. Affects only near-rank-deficient designs thatsolve_olsretains rather than NaN-out. Tracked as Low TODO.Validation
tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityHeterogeneityWithPlacebo(scenario 23, global) +::TestDCDHDynRParityByPathHeterogeneityWithPlacebo(scenario 22, per-path × 3 paths) pinning all 6 inference fields atBETA_RTOL=1e-6/SE_RTOL=1e-5forbeta/se/t_stat/n_obsandINFERENCE_RTOL=1e-4forp_value/conf_int.TestDCDHDynRParityHeterogeneityandTestDCDHDynRParityByPathHeterogeneityextended to assertp_value/conf_intat the newINFERENCE_RTOL=1e-4(was unpinned pre-PR; replaced the Z-vs-t deviation note with the positive parity claim).tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPredictHetPlacebocovers placebo het column population, survey-gate warn+skip, forward+survey anti-regression,out_idx<0eligibility guard, single-path telescope (path_heterogeneity_effects[(only_path,)] == heterogeneity_effectsbit-exactly), summary rendering, and direct-callNotImplementedErrorbackstop.test_*_inference_matches_safe_inference→test_*_inference_local_invariantsto verify SE-derivation wiring (t_stat = beta/se, symmetricconf_int,p_valuein[0, 1]) without back-derivingn_params.benchmarks/data/dcdh_dynr_golden_values.json, regenerable viaRscript benchmarks/R/generate_dcdh_dynr_test_values.R.Security / privacy
🤖 Generated with Claude Code