iframe-proxy

igerber · 2026-05-15T22:52:07Z

Summary

Closes TODO #422 (per-path placebo predict_het emission) and pilot-412 (heterogeneity OLS df threading) in a single bundle. Same surface, same R fixture, single clean R-parity story.

R-fixture probe (scenario 22) verified that did_multiplegt_dyn(by_path, predict_het, placebo) emits per-by_level heterogeneity OLS rows on backward (placebo) horizons via R's per-by_level dispatcher (DIDmultiplegtDYN:::did_multiplegt_main placebo block at the effect = matrix(-i, ...) rbind site). Scenario 23 confirmed the same emission on the global (non-by_path) surface. Both surfaces now mirror in Python.

R syntax note: did_multiplegt_dyn(predict_het=list(\"X\", c(...)), placebo=N) rejects positive-only horizons (c(1, 2, 3) errors with "specified numbers in predict_het that exceed the number of placebos" because R reuses the same horizon vector for both forward and backward indexing). The c(-1) sentinel triggers "compute heterogeneity for ALL forward (1..effects) AND ALL placebo (1..placebo) positions" — both new fixtures use this syntax.

Survey gate is warn + skip: when survey_design + placebo + heterogeneity are co-set, fit() emits a UserWarning and falls back to forward-horizon-only heterogeneity. The Binder TSL cell-period allocator's REGISTRY justification is tied to post-period attribution; backward-horizon attribution puts ψ_g mass on a pre-period cell, which is a separate library-extension claim deferred to a follow-up methodology PR. Forward-horizon predict_het + survey_design continues to work unchanged.

Phase 2 df threading: _compute_heterogeneity_test now passes df = n_obs - n_params to safe_inference on the non-survey OLS path (matches R qt(0.975, df.residual(model))); pre-PR Python used df=None (Z critical), producing 0.1-2% rtol gaps on p_value/conf_int vs R. Forward parity tests tightened from "unpinned" to INFERENCE_RTOL=1e-4. Rank-deficient designs use design.shape[1] as df denominator (pre-drop column count); fully rank-deficient is NaN-short-circuited by the existing guard. Near-rank-deficient edge case tracked as a Low TODO follow-up.

Methodology references (required if estimator / math changes)

Method name(s): ChaisemartinDHaultfoeuille.predict_het × placebo × by_path/paths_of_interest
Paper / source link(s): de Chaisemartin & D'Haultfœuille (NBER WP 29873) Web Appendix Section 1.5 (Lemma 7); R DIDmultiplegtDYN 2.3.3 did_multiplegt_main placebo + predict_het block
Any intentional deviations from the source (and why):
- Survey + backward-horizon predict_het: warn + skip rather than compute. Pre-period Binder TSL allocator derivation is deferred (tracked in TODO.md). Forward + survey continues to work.
- Joint Wald F-test across all predict_het rows: R aggregates; Python emits per-horizon inference only. Documented in REGISTRY heterogeneity Note.
- Rank-deficient df: n_obs - n_params (pre-drop column count) vs R's df.residual = n - rank post-drop. Affects only near-rank-deficient designs that solve_ols retains rather than NaN-out. Tracked as Low TODO.

Validation

Tests added/updated:
- R-parity (new): tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityHeterogeneityWithPlacebo (scenario 23, global) + ::TestDCDHDynRParityByPathHeterogeneityWithPlacebo (scenario 22, per-path × 3 paths) pinning all 6 inference fields at BETA_RTOL=1e-6/SE_RTOL=1e-5 for beta/se/t_stat/n_obs and INFERENCE_RTOL=1e-4 for p_value/conf_int.
- Forward-horizon parity tightened: existing TestDCDHDynRParityHeterogeneity and TestDCDHDynRParityByPathHeterogeneity extended to assert p_value/conf_int at the new INFERENCE_RTOL=1e-4 (was unpinned pre-PR; replaced the Z-vs-t deviation note with the positive parity claim).
- Cross-surface (new): tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPredictHetPlacebo covers placebo het column population, survey-gate warn+skip, forward+survey anti-regression, out_idx<0 eligibility guard, single-path telescope (path_heterogeneity_effects[(only_path,)] == heterogeneity_effects bit-exactly), summary rendering, and direct-call NotImplementedError backstop.
- Local-invariant tests refactored: test_*_inference_matches_safe_inference → test_*_inference_local_invariants to verify SE-derivation wiring (t_stat = beta/se, symmetric conf_int, p_value in [0, 1]) without back-deriving n_params.
- 314 dCDH tests pass (was 312 pre-PR).
Backtest / simulation / notebook evidence: scenarios 22 + 23 in benchmarks/data/dcdh_dynr_golden_values.json, regenerable via Rscript benchmarks/R/generate_dcdh_dynr_test_values.R.

Security / privacy

Confirm no secrets/PII in this PR: yes (canonical secret-pattern scan in pre-merge-check returned no hits).

🤖 Generated with Claude Code

Closes TODO #422 + pilot-412 in a single PR (same surface, same R fixture, same parity story). Phase 0 probe verified R behavior: did_multiplegt_dyn(by_path, predict_het, placebo) emits per-path heterogeneity OLS results on backward (placebo) horizons via R's per-by_level dispatcher (DIDmultiplegtDYN:::did_multiplegt_main placebo block at the `effect = matrix(-i, ...)` rbind site). New scenario 22 in benchmarks/R/generate_dcdh_dynr_test_values.R captures this with predict_het=list("het_x", c(-1)) — the c(-1) sentinel triggers "compute heterogeneity for ALL forward (1..effects) AND ALL placebo (1..placebo) positions" per the R source path read at script time. Phase 1A implementation (non-survey): _compute_heterogeneity_test gains a placebo: int = 0 parameter and iterates forward (1..L_max) and backward (-1..-placebo) horizons in a single loop. Explicit `if out_idx < 0: continue` eligibility guard prevents numpy negative-index silent wrap on N_mat[g, out_idx] when F_g - 1 + l_h < 0. _compute_path_heterogeneity_test forwards the param; fit() passes placebo=L_max if self.placebo else 0 to both global and per-path call sites. to_dataframe(level="by_path") placebo rows now read het_* values from path_heterogeneity_effects negative-int keys (mirroring the existing path_placebo_event_study negative-key convention) instead of the pre-PR hardcoded NaN-fill. Survey gate: when survey_design is active AND placebo > 0 + heterogeneity is requested, _compute_heterogeneity_test raises NotImplementedError eagerly with a documented message. The Binder TSL cell-period allocator's REGISTRY justification is tied to post-period attribution; backward-horizon attribution puts ψ_g mass on a pre-period cell, which is a separate library-extension claim that needs its own derivation. Forward-horizon predict_het + survey continues to work unchanged. Pre-period allocator derivation tracked as a new follow-up TODO row. Phase 2 (df threading): _compute_heterogeneity_test now passes df = n_obs - n_params to safe_inference on the non-survey OLS path, matching R did_multiplegt_dyn(predict_het=...)'s t-distribution inference (qt(0.975, df.residual(model)) site). Pre-PR Python used df=None (Z critical), producing 0.1-2% rtol gaps on p_value/conf_int vs R. Existing forward-horizon parity tests now pin t/p/CI at INFERENCE_RTOL=1e-4 (was unpinned). Rank- deficient designs use design.shape[1] as df denominator (pre-drop column count); fully rank-deficient is NaN-short-circuited by the existing guard. Near-rank-deficient edge case tracked as a new Low TODO follow-up. R parity: scenario 22 (multi_path_reversible_predict_het_with_placebo, placebo=2, effects=3, by_path=3) pinned at BETA_RTOL=1e-6/SE_RTOL=1e-5 for beta/se/t_stat/n_obs and INFERENCE_RTOL=1e-4 for p_value/conf_int across 3 paths × (3 forward + 2 placebo) = 15 horizons. Cross-surface tests (TestByPathPredictHetPlacebo): placebo het column population, survey-gate NotImplementedError, forward+survey anti-regression, out_idx<0 eligibility guard, single-path telescope (path_heterogeneity_effects[(only_path,)] == heterogeneity_effects bit- exactly), summary rendering. The two existing local-invariant tests (test_*_inference_matches_safe_inference) refactored to verify SE-derivation wiring (t_stat=beta/se, conf_int symmetric around beta, p_value in [0,1]) without back-deriving n_params. REGISTRY: heterogeneity Z-vs-t deviation note replaced with positive "R parity (post-2026-05-15 df threading)" framing including the rank- deficient caveat. New "Per-path placebo heterogeneity" Note documents the R parity, syntax requirements (c(-1) sentinel), survey gate, and test anchors. CHANGELOG entry under [Unreleased]. llms-full.txt by_path entry extended with placebo het composition + survey-gate mention. API rst extended with the same. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…_het parity Two changes that resolve scope/contract issues with the previous commit on this branch: 1. Survey gate (scope correctness). The eager `if use_survey and placebo > 0: raise NotImplementedError` at function entry of `_compute_heterogeneity_test` fired on EVERY survey + heterogeneity call when fit() passed `placebo=L_max if self.placebo else 0` with default `placebo=True`. That broke the previously-supported forward-horizon survey + predict_het path, contradicting the changelog's "Forward-horizon predict_het + survey_design is supported" claim. Refactored to a per-iteration backstop inside the horizon loop (raises only when actually about to compute a backward iteration under survey). fit() at both global and per-path heterogeneity call sites now wraps the placebo arg with a warn-and-skip: if `survey_design + placebo + heterogeneity` are all active, emit a UserWarning explaining backward-horizon predict_het is deferred under survey designs and pass `placebo=0` so only forward-horizon heterogeneity is computed. The function-level NotImplementedError remains as a defensive backstop for direct callers that bypass fit(). 2. Global placebo predict_het parity (scope completeness). The previous commit extended `_compute_heterogeneity_test` to compute backward horizons, which means `results.heterogeneity_effects` (global, non-by_path surface) also gains negative-int keys when `placebo=True + heterogeneity`. The R- parity coverage was scoped to per-path only (scenario 22). Added scenario 23 (`multi_path_reversible_predict_het_with_placebo_global`) that calls `did_multiplegt_dyn(predict_het=list("het_x", c(-1)), placebo=2, effects=3)` WITHOUT by_path; verified R emits forward (effect=1,2,3) AND placebo (effect=-1,-2) rows on the global surface with 3 paths × 30 switchers each = n_obs=90 per row. New parity test class `TestDCDHDynRParityHeterogeneityWithPlacebo` pins all 6 inference fields across 5 horizons at the same tolerances as the per-path version. Test changes: - `test_predict_het_placebo_survey_design_raises` renamed to `test_predict_het_placebo_survey_design_warns_and_skips_backward`; asserts UserWarning is emitted, no exception raised, and both `heterogeneity_effects` and `path_heterogeneity_effects` contain only positive-int keys (forward-only emission). - New `test_compute_heterogeneity_test_direct_call_raises_on_backward_survey` exercises the per-iteration backstop directly; locks the API contract for any future internal call site that bypasses fit(). - New `test_parity_multi_path_reversible_predict_het_with_placebo_global` pins scenario 23 R-parity. Doc updates: REGISTRY heterogeneity Note clarifies "warn + skip" semantics (was "raises NotImplementedError"), references both global and per-path parity classes. CHANGELOG entry rewritten to reflect global + per-path scope. llms-full.txt and API rst updated similarly. Test count: 312 -> 314 (added 2; the renamed test still passes under its new contract). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The top-level heterogeneity Note in REGISTRY.md still said "This implementation provides only post-treatment regressions" — stale after the placebo predict_het additions in this branch's earlier commits. Updated to reflect: 1. Python now matches R on per-horizon placebo regressions (when `placebo=True` + `heterogeneity=` are co-set) — refers readers to the "Placebo predict_het" sub-note for the full contract (global + per-path scope, survey warn+skip behavior, R-parity classes). 2. The remaining gap from R is the joint null Wald F-test across all predict_het rows — Python emits per-horizon inference only. 3. The `controls` mutex stays unchanged. Cross-references the existing render-time note in `_render_heterogeneity_section` ("Per-horizon regressions only (no joint F-test)") so the registry text matches the user-facing summary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-15T22:58:13Z

Overall Assessment

✅ Looks good

Executive Summary

No unmitigated P0/P1 issues in the estimator changes. The new placebo predict_het path, negative-index guard, and df threading are consistent with the updated methodology contract.
The remaining methodology gaps are explicitly documented/tracked rather than silent: survey backward-horizon heterogeneity is gated, and near-rank-deficient df still uses pre-drop column count.
Minor user-facing docs drift remains: to_dataframe(level="by_path") still says placebo het_* columns are NaN, but the implementation now fills them.
The regenerated golden-value schema is type-unstable for empty placebo_* mappings ([] when empty, object when populated), which is a maintainability/test-asset issue.
I could not run pytest in this environment because pytest is not installed.

Methodology

No unmitigated P0/P1 findings. The changed estimator path in diff_diff/chaisemartin_dhaultfoeuille.py:L5092-L5215 matches the updated registry note in docs/methodology/REGISTRY.md:L637-L643, and the remaining deviations are documented rather than silent.

Code Quality

No findings.

Performance

No findings.

Maintainability

Severity: P3. Impact: the golden-value contract for placebo heterogeneity is type-unstable: empty placebo_predict_het / placebo_horizons serialize as arrays, while populated cases serialize as objects, which makes generic consumers fragile. See benchmarks/R/generate_dcdh_dynr_test_values.R:L631-L658, benchmarks/R/generate_dcdh_dynr_test_values.R:L665-L699, and the emitted examples at benchmarks/data/dcdh_dynr_golden_values.json:L1492-L1492, benchmarks/data/dcdh_dynr_golden_values.json:L1551-L1551. Concrete fix: normalize empty cases to object-shaped mappings before serialization, or normalize them on load in the parity helpers.

Tech Debt

Severity: P3. Impact: the PR intentionally leaves two edge cases deferred rather than silently wrong: survey placebo + predict_het is gated off, and near-rank-deficient non-survey heterogeneity still uses n_obs - n_params instead of n_obs - rank for t df. See diff_diff/chaisemartin_dhaultfoeuille.py:L3885-L3899, diff_diff/chaisemartin_dhaultfoeuille.py:L3936-L3951, diff_diff/chaisemartin_dhaultfoeuille.py:L5182-L5215, and TODO.md:L63-L64. Concrete fix: follow the existing TODO items when the pre-period survey allocator and rank-threading work are picked up.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: to_dataframe(level="by_path") still documents het_* as populated only on positive horizons and NaN on placebo rows, but the implementation now fills placebo het_* fields from path_heterogeneity_effects. See the stale docstring at diff_diff/chaisemartin_dhaultfoeuille_results.py:L1530-L1558 versus the new behavior at diff_diff/chaisemartin_dhaultfoeuille_results.py:L1865-L1908. Concrete fix: update the docstring to match the new placebo-row contract and mention the survey forward-only fallback.
Verification note: I could not execute the added tests because pytest is not installed in this environment.

Two informational findings from the latest review: 1. Stale `to_dataframe(level="by_path")` docstring at `chaisemartin_dhaultfoeuille_results.py:1530-1558` still claimed placebo `het_*` columns are NaN. Updated to document the post-#422 contract: positive-horizon AND negative-horizon (placebo) rows are both populated when `placebo=True + heterogeneity=` are co-set; placebo rows under `survey_design` remain NaN with a fit-time UserWarning. 2. JSON golden-fixture type instability for empty `placebo_predict_het` / `placebo_horizons` slots. R's `jsonlite::toJSON` serializes plain `list()` as `[]` (array) but populated named lists as `{}` (object), so consumers iterating `.items()` on the slot saw different shapes across scenarios. Fixed both ends: - R-side: extractors initialize empty slots with `structure(list(), names = character(0))` which jsonlite serializes as `{}` even when empty. Verified across 4 scenarios (20, 21, 22, 23) — all `placebo_predict_het` / `placebo_horizons` slots now serialize as objects regardless of population. - Python-side: added `_as_dict` helper in the parity test module as a defensive backstop coercing any non-dict (None / [] / missing) to {}. Used at the two call sites that read optional placebo slots so consumers can call `.items()` uniformly. The golden JSON regenerated; type-stable across all scenarios (verified via `jq` on each scenario's predict_het type). 314 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-15T23:11:50Z

🔁 AI review rerun (requested by @igerber)

Head SHA: bcdb00a19d0d673d29a890a148ebc836633201db

Overall Assessment

✅ Looks good

Executive Summary

No unmitigated P0/P1 issues found in the changed estimator paths. The new backward-horizon predict_het support and finite-df threading are consistent with the updated methodology contract in docs/methodology/REGISTRY.md:L637-L643.
The prior re-review P3s appear addressed: the by-path to_dataframe() docs now describe placebo het_* population in diff_diff/chaisemartin_dhaultfoeuille_results.py:L1546-L1566, and the R fixture extractor now stabilizes empty placebo mappings in benchmarks/R/generate_dcdh_dynr_test_values.R:L631-L707.
Remaining methodology caveats are explicitly documented/tracked rather than silent: survey-weighted backward-horizon heterogeneity is warn+skip, and near-rank-deficient non-survey df still uses n_obs - n_params; both are tracked in TODO.md:L63-L64.
One minor docs regression remains: the public fit() docstring still says heterogeneity has “no placebo regressions,” and the heterogeneity summary note does not mention the new survey forward-only fallback.
Verification note: I could not run pytest here because python -m pytest --version fails with No module named pytest.

Methodology

No unmitigated findings. The changed behavior is internally consistent across diff_diff/chaisemartin_dhaultfoeuille.py:L3878-L3915, diff_diff/chaisemartin_dhaultfoeuille.py:L5087-L5215, the updated registry note at docs/methodology/REGISTRY.md:L637-L643, and the new parity coverage in tests/test_chaisemartin_dhaultfoeuille_parity.py:L1558-L1811.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings. The previous golden-fixture shape instability appears resolved by the explicit empty-object serialization in benchmarks/R/generate_dcdh_dynr_test_values.R:L631-L707.

Tech Debt

Severity: P3-informational. Impact: survey_design + placebo + heterogeneity intentionally falls back to forward-horizon-only heterogeneity because the pre-period cell allocator is still underived; this is now explicit and tracked, not silent. Concrete fix: follow the tracked TODO to derive a pre-period survey allocator before enabling backward-horizon survey predict_het. References: diff_diff/chaisemartin_dhaultfoeuille.py:L3885-L3899, docs/methodology/REGISTRY.md:L637-L643, TODO.md:L63-L63.
Severity: P3-informational. Impact: near-rank-deficient non-survey heterogeneity still uses df = n_obs - n_params instead of post-drop rank, so R parity can still drift in that edge case; fully rank-deficient cases remain NaN-consistent. Concrete fix: thread effective rank from solve_ols() into _compute_heterogeneity_test, as already tracked. References: diff_diff/chaisemartin_dhaultfoeuille.py:L5195-L5215, TODO.md:L64-L64.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: the public/in-code documentation is only partially updated. fit() still says heterogeneity is “post-treatment regressions only (no placebo regressions),” which now contradicts the implementation, API docs, registry, and new parity tests; the heterogeneity summary note also omits the survey forward-only fallback. This can mislead users and future reviewers who rely on in-code docstrings as methodology references. Concrete fix: update the heterogeneity parameter docstring in diff_diff/chaisemartin_dhaultfoeuille.py:L980-L989 and the summary note in diff_diff/chaisemartin_dhaultfoeuille_results.py:L1279-L1283 to match the contract already documented in docs/api/chaisemartin_dhaultfoeuille.rst:L33-L41 and docs/methodology/REGISTRY.md:L637-L643.
No other findings. The previous by-path to_dataframe() doc drift flagged in the last review looks fixed in diff_diff/chaisemartin_dhaultfoeuille_results.py:L1546-L1566, and the new global/per-path placebo parity tests are present in tests/test_chaisemartin_dhaultfoeuille_parity.py:L1558-L1811.

…tract Two stale in-code documentation surfaces from the latest review: 1. `chaisemartin_dhaultfoeuille.py:980-989` — the `heterogeneity` parameter docstring on `fit()` still said "post-treatment regressions only (no placebo regressions)". Updated to document the post-#422 contract: per-horizon OLS regressions on forward AND backward (placebo) horizons when `placebo=True`; survey_design composes with forward horizons but warns + skips backward horizons until the pre-period cell allocator is derived. 2. `chaisemartin_dhaultfoeuille_results.py:1279-1286` — the heterogeneity summary note didn't mention the survey forward-only fallback. Extended the note to cover the gating semantics so users reading `result.summary()` under `survey_design + heterogeneity` know what they're getting. Both surfaces now match the contract already documented in the API rst, REGISTRY, and CHANGELOG. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-15T23:23:00Z

🔁 AI review rerun (requested by @igerber)

Head SHA: d20988da5d099dec24355c6b09e1afaf13469f5b

Overall Assessment

✅ Looks good

Executive Summary

No unmitigated P0/P1 issues found in the changed estimator paths.
The affected method is dCDH heterogeneity testing from Web Appendix Section 1.5 / Lemma 7; the new backward-horizon predict_het behavior and finite-df inference are aligned with the updated Methodology Registry and the added R-parity fixtures.
The prior re-review P3s appear addressed: the public fit() docstring, the heterogeneity summary note, the by-path dataframe docs, and the R fixture extractor’s empty-map handling are all updated.
The new out_idx < 0 guard closes the only obvious silent-indexing risk in the placebo heterogeneity loop at diff_diff/chaisemartin_dhaultfoeuille.py:L5136-L5148.
Remaining caveats are documented and tracked rather than silent: survey-weighted backward-horizon heterogeneity is intentionally deferred, and near-rank-deficient df still uses n_obs - n_params.
Verification note: pytest is not available in this workspace (python -m pytest --version fails with No module named pytest), so this is a diff-based re-review.

Methodology
No findings. The changed heterogeneity paths are consistent with the documented contract in docs/methodology/REGISTRY.md:L637-L647, the fit-time threading in diff_diff/chaisemartin_dhaultfoeuille.py:L3889-L3963, the core implementation in diff_diff/chaisemartin_dhaultfoeuille.py:L5098-L5354, and the new global/per-path R-parity coverage in tests/test_chaisemartin_dhaultfoeuille_parity.py:L1358-L1535 and tests/test_chaisemartin_dhaultfoeuille_parity.py:L1558-L1795.

Code Quality
No findings.

Performance
No findings.

Maintainability
No findings. The earlier fixture-shape instability from empty predict_het maps appears fixed by the extractor changes in benchmarks/R/generate_dcdh_dynr_test_values.R:L631-L707.

Tech Debt

Severity: P3-informational. Impact: survey_design + placebo + heterogeneity intentionally falls back to forward-only heterogeneity because the pre-period Binder TSL allocator is not yet derived. Concrete fix: derive the pre-period allocator, or a covariance-aware alternative, before enabling backward-horizon survey predict_het. References: diff_diff/chaisemartin_dhaultfoeuille.py:L3889-L3920, docs/methodology/REGISTRY.md:L637-L647, TODO.md:L63-L63.
Severity: P3-informational. Impact: near-rank-deficient non-survey heterogeneity can still drift from R because the t-distribution df uses pre-drop n_obs - n_params instead of post-drop rank. Concrete fix: thread effective rank from solve_ols() into _compute_heterogeneity_test. References: diff_diff/chaisemartin_dhaultfoeuille.py:L5193-L5226, TODO.md:L64-L64.

Security
No findings.

Documentation/Tests
No findings. The previous documentation drift appears resolved at diff_diff/chaisemartin_dhaultfoeuille.py:L980-L1000, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1280-L1286, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1555-L1568, and docs/api/chaisemartin_dhaultfoeuille.rst:L33-L41. The new tests cover the changed paths well, including global/per-path placebo parity, survey warn+skip behavior, and the negative-index eligibility guard at tests/test_chaisemartin_dhaultfoeuille.py:L11140-L11380 and tests/test_chaisemartin_dhaultfoeuille_parity.py:L1558-L1795.

…h test Two follow-ups closing the residual Low TODOs from PR igerber#449: 1. igerber#64 — Rank-deficient df threading. `_compute_heterogeneity_test`'s non-survey OLS path now computes `df = n_obs - rank(design)` via `_detect_rank_deficiency` (the same helper `solve_ols` calls internally), matching R's `df.residual = n - rank(design)` post-drop convention. For full-rank designs `rank == n_params` and behavior is bit-identical to the pre-PR `n_obs - n_params` path — all 4 forward-horizon parity tests still pass at the same `BETA_RTOL=1e-6` / `SE_RTOL=1e-5` / `INFERENCE_RTOL=1e-4` tolerances. For near-rank-deficient designs that `solve_ols` retains rather than NaN-out (e.g., cohort-collinearity at high horizons), the post-drop rank is strictly lower than `n_params`, so the post-PR `df` is strictly larger, matching R's `lm()` convention. Locked by `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPredictHetPlacebo::test_heterogeneity_df_uses_post_drop_rank` which pins `safe_inference(... df=n_obs - rank)` against the stored `t_stat` / `p_value` / `conf_int` at `atol=1e-12`. 2. igerber#62 — Negative-baseline path regression test. New `TestByPathNonBinary::test_negative_baseline_path_supported` exercises switchers with `D_{g,1} = -1` and asserts that `path_effects` correctly contains negative-baseline tuple keys (`(-1, 0, 0, 0)`, `(-1, 1, 1, 1)`). The existing `test_negative_integer_D_supported` only covered paths with negative values in non-baseline positions (e.g., `(0, -1, -1, -1)`), which does not trigger R's documented `substr(path, 1, 1)` baseline-extraction bug. Python's tuple-key matching is correct under any baseline value; this test pins the contract. No R-parity fixture is added because R is the buggy side on this regime — the deviation is already documented in the REGISTRY non-binary treatment Note. TODO.md: drops the two corresponding Low rows (igerber#419 negative-baseline test gap + follow-up rank-deficient df threading). REGISTRY heterogeneity caveat updated to drop the "rank-deficient gap" prose and replace with the positive "uses post-drop rank, matching R's lm() convention" claim. CHANGELOG `[Unreleased]` extended with both items. Test count: 314 -> 316. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two P3 informational findings from R2: 1. TODO.md Tier B backlog still listed the dCDH heterogeneity df- threading and by-path placebo predict_het items as open, but PR igerber#449 closed both. Replaced the two stale bullets with a single bullet for the remaining survey + backward-horizon allocator derivation (the one Medium follow-up explicitly tracked in the wrap-up commit). 2. Three test-prose comments still said `df = n_obs - n_params` while the implementation and REGISTRY now use `df = n_obs - rank(design)`. Updated each comment to the post-drop rank wording; full-rank designs continue to have `rank == n_params` so the SE-derivation invariants under test are unchanged. Comment-only drift; no behavior change. 317 tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber and others added 3 commits May 15, 2026 17:49

igerber added the ready-for-ci Triggers CI test workflows label May 15, 2026

igerber merged commit 2cb8def into main May 16, 2026
33 of 34 checks passed

igerber deleted the dcdh-heterogeneity-placebo-and-df branch May 16, 2026 10:18

igerber mentioned this pull request May 16, 2026

dCDH heterogeneity wrap-up: post-drop rank df + negative-baseline path test #452

Merged

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dCDH heterogeneity: per-path + global placebo predict_het R-parity + df threading#449

dCDH heterogeneity: per-path + global placebo predict_het R-parity + df threading#449
igerber merged 5 commits into
mainfrom
dcdh-heterogeneity-placebo-and-df

igerber commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Sunbelt Computer Software

PL/B Language Development and Support

Conversation

igerber commented May 15, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant