Fix #408 holistic audit residuals: sibling-surface replicate-weight test gaps + stale by_path gate list by igerber · Pull Request #435 · igerber/diff-diff · GitHub
Skip to content

Fix #408 holistic audit residuals: sibling-surface replicate-weight test gaps + stale by_path gate list#435

Merged
igerber merged 2 commits into
mainfrom
fix-audit-408-r2
May 14, 2026
Merged

Fix #408 holistic audit residuals: sibling-surface replicate-weight test gaps + stale by_path gate list#435
igerber merged 2 commits into
mainfrom
fix-audit-408-r2

Conversation

@igerber

@igerber igerber commented May 14, 2026

Copy link
Copy Markdown
Owner

Summary

Holistic re-audit of merged #408 (compose by_path × survey_design Wave 4) + #424 (post-merge docs-drift cleanup). The per-PR CI cleanup review on #424 couldn't see the combined post-PR holistic state; local agentic codex review surfaced 2 sibling-surface test gaps + 1 real [Unreleased] CHANGELOG drift.

  • Sibling-surface test gapsCompose by_path / paths_of_interest with survey_design (Wave 4 #10) #408 shipped replicate-weight regressions for by_path but the parallel paths_of_interest selector only had analytical/gate tests under survey. Same _compute_path_effects / _compute_path_placebos IF code path; missing tests were a selector-symmetry oversight. Added test_paths_of_interest_replicate_weight_per_path_se_finite and test_paths_of_interest_survey_design_placebo_replicate_weight, both locking finite per-horizon SE under Rao-Wu (JK1) AND the _refresh_path_inference contract (every per-path entry's t_stat matches safe_inference at the FINAL df_survey).
  • CHANGELOG drift — the original [Unreleased] by_path entry said trends_linear, trends_nonparam, heterogeneity, design2, honest_did, survey_design all raise NotImplementedError. Each subsequent gate-lift PR added its own [Unreleased] entry but never updated this original list. Rewrote to reflect current state: only design2 + honest_did remain gated.

2 files, +152/-1. No methodology changes, no behavior changes. CHANGELOG fix is hygiene; tests add coverage on already-shipped surfaces.

Test plan

  • pytest tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSurveyDesignAnalytical -m '' green (2 new + 16 existing)
  • CI AI review on the diff

🤖 Generated with Claude Code

…ale by_path gate list in CHANGELOG

Holistic re-audit of merged #408 (compose `by_path` × `survey_design` Wave 4) + #424 (post-merge docs-drift cleanup). Per-PR CI on #424 couldn't see the combined post-PR holistic state. Local agentic codex review surfaced 2 sibling-surface test gaps + 1 real `[Unreleased]` CHANGELOG drift.

**Sibling-surface coverage**: #408's Wave-4 PR shipped replicate-weight regressions for `by_path` (`test_per_path_replicate_se_finite`, `test_per_path_inference_refreshes_to_lower_final_df`) but the parallel `paths_of_interest` selector only had analytical / gate / unobserved-path tests under survey. Both selectors share the same `_compute_path_effects` and `_compute_path_placebos` IF code path, so the test gap was a selector-symmetry oversight, not a methodology gap. Added:

- `test_paths_of_interest_replicate_weight_per_path_se_finite` — pins finite per-horizon SE under Rao-Wu (JK1) AND the `_refresh_path_inference` contract (every per-path entry's `t_stat` matches `safe_inference` at the FINAL `df_survey`, not the per-path snapshot from before replicate-weight fits appended to the shared `_replicate_n_valid_list`).
- `test_paths_of_interest_survey_design_placebo_replicate_weight` — same invariants on the `_compute_path_placebos` branch.

**CHANGELOG drift**: the original `[Unreleased]` `by_path` entry (added when by_path first shipped) said `trends_linear`, `trends_nonparam`, `heterogeneity`, `design2`, `honest_did`, and `survey_design` all raise `NotImplementedError`. Each subsequent gate-lift PR shipped its own `[Unreleased]` entry, but none of them went back to update this original entry's stale gated-features list. Users reading the changelog in order get contradictory upgrade guidance. Rewrote the gates list to reflect actual current state: only `design2` + `honest_did` remain gated. Pre-existing single-CHANGELOG-cycle hygiene gap, surfaced by the #408 holistic audit but applies independently of any specific subsequent PR.

Holistic pilot finding NOT addressed: phantom `heterogeneity was composed in` claim in #424's CHANGELOG. That's correct in real main (#412 lifted the heterogeneity gate), but appears as a code/doc mismatch in the pilot because pilot construction (per `feedback_holistic_pilot_true_merge_cherry_pick_pitfall`) is `#408 + #424 deltas only` and doesn't include intermediate sibling PRs like #412. This is a structural limitation of the strict-delta pilot pattern — no fix-PR action needed in main.
@github-actions

Copy link
Copy Markdown

CI AI review on #435:
- P2: exp_p / exp_ci returned from safe_inference() were unused (F841).
  Extended event-study test to assert p_value and conf_int match
  safe_inference output too, not just t_stat. The safe_inference
  contract is joint NaN propagation across the full inference tuple,
  so testing all three jointly is the right invariant.
- P3: placebo test only checked 'at least one finite entry', so a
  silent path-drop on the replicate-weight placebo branch could pass.
  Added explicit assertions that both (0,1,1,1) and (0,1,0,0) are
  present in res.path_placebo_event_study before checking SE/inference.
@github-actions

Copy link
Copy Markdown

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 14, 2026
@igerber igerber merged commit 5372e88 into main May 14, 2026
25 of 26 checks passed
@igerber igerber deleted the fix-audit-408-r2 branch May 14, 2026 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant