Fix #402 holistic audit residuals: HAD↔ContinuousDiD Step-4 + fit() Union annotation + REGISTRY contracts by igerber · Pull Request #431 · igerber/diff-diff · GitHub
Skip to content

Fix #402 holistic audit residuals: HAD↔ContinuousDiD Step-4 + fit() Union annotation + REGISTRY contracts#431

Merged
igerber merged 1 commit into
mainfrom
fix-audit-402-r2
May 14, 2026
Merged

Fix #402 holistic audit residuals: HAD↔ContinuousDiD Step-4 + fit() Union annotation + REGISTRY contracts#431
igerber merged 1 commit into
mainfrom
fix-audit-402-r2

Conversation

@igerber

@igerber igerber commented May 14, 2026

Copy link
Copy Markdown
Owner

Summary

Holistic re-audit of merged #402 (HAD Phase 5 agent surfaces) + #420 (cleanup) under the restored CI reviewer (gpt-5.4 + xhigh). The per-PR CI cleanup review on #420 couldn't see the combined post-PR state; 10 rounds of local agentic codex review against the combined diff surfaced 11 residuals across 3 P1, 4 P2, and 4 P3 categories.

  • HAD↔ContinuousDiD Step-4 handoff: bidirectional handoff rewritten estimand-first (WAS vs ATT(d)) with HAD's stricter panel-shape and first_treat encoding contract spelled out; explicit first_treat=inf -> 0 recode added before the example HAD fits.
  • HeterogeneousAdoptionDiD.fit() return Union: annotation widened to Union[HeterogeneousAdoptionDiDResults, HeterogeneousAdoptionDiDEventStudyResults] so the static type contract matches the documented runtime polymorphism on aggregate. Pinned by new TestFitReturnAnnotation regression.
  • REGISTRY contracts: did_had_pretest_workflow Phase 3 Note rewritten to match the actual aggregate-dispatched validator (overall = 2-period only; event_study = multi-period); stale "static return-type annotation reflects the common overall case" inline comment updated to describe the Union contract.
  • Agent guide mirrors: llms-full.txt HeterogeneousAdoptionDiDEventStudyResults table now enumerates the same four variance_formula labels and the same mass-point Wald-IV effective_dose_mean semantics as the single-period table.

10 files changed, 454 insertions(+), 80 deletions(-). No methodology changes, no behavior changes to fit() outputs — all changes are contract clarifications, guidance corrections, and test additions on surfaces #402 + #420 already established.

Test plan

  • CI green on tests/test_practitioner.py::TestHADDispatch, tests/test_guides.py::TestLLMsFullHADCoverage, tests/test_had.py::TestFitReturnAnnotation, tests/test_had_pretests.py::TestMultiPeriodWorkflow
  • CI AI review on the residual-fix diff

🤖 Generated with Claude Code

… fit() Union annotation + REGISTRY contract clarifications

Holistic codex re-audit of merged #402 (HAD Phase 5 agent surfaces) + #420 (cleanup) surfaced residuals that the per-PR CI review path could not see because the cleanup PR's diff scope hid the holistic state. 10 codex rounds against the combined post-PR state surfaced:

Methodology / contracts (REGISTRY):
- did_had_pretest_workflow Phase 3 Note still claimed the overall path "reduces a multi-period panel"; the validator actually requires exactly two periods. Updated to describe the aggregate-dispatched contract explicitly (overall = 2-period; event_study = multi-period).
- Phase 5 wave-1 notes still said the fit() return Union was "not test-enforced"; new regression closes the gap.

Code / API:
- HeterogeneousAdoptionDiD.fit() return annotation widened to Union[HeterogeneousAdoptionDiDResults, HeterogeneousAdoptionDiDEventStudyResults] so the static type contract matches the documented runtime polymorphism on aggregate. Updated the stale dispatch comment that claimed the annotation was single-period.
- ContinuousDiD -> HeterogeneousAdoptionDiD Step-4 handoff in practitioner.py now (a) explicitly recodes first_treat=inf -> 0 before both HAD example fits (HAD's _validate_had_panel rejects any first_treat outside {0, t_post}; ContinuousDiD's silent inf-normalization is HAD-incompatible), (b) frames the routing nudge around the WAS vs ATT(d) estimand difference rather than around the existence of untreated units, and (c) names HAD's stricter panel-shape and encoding requirements before showing the code.

Agent guides (llms-full.txt, llms-practitioner.txt):
- HeterogeneousAdoptionDiDEventStudyResults table now mirrors the single-period table's variance_formula and effective_dose_mean semantics (event-study path populates the same four pweight / survey_binder_tsl / pweight_2sls / survey_binder_tsl_2sls labels per had.py:639-648; effective_dose_mean inherits the same mass-point Wald-IV semantics per had.py:721-734).
- llms-practitioner.txt continuous-treatment decision tree rewritten estimand-first (WAS vs ATT(d)) with panel-shape contract spelled out for HAD (aggregate='overall' two-period vs 'event_study' multi-period; staggered last-cohort-only WAS warning).

Tests:
- tests/test_had.py::TestFitReturnAnnotation pins the Union return annotation via typing.get_type_hints — drift would have been silent before.
- tests/test_had_pretests.py::TestMultiPeriodWorkflow::test_overall_aggregate_rejects_multi_period locks the registry-described 2-period-only contract.
- tests/test_guides.py::TestLLMsFullHADCoverage::test_llms_full_had_event_study_mirrors_weighted_metadata_semantics asserts the event-study guide table covers all four variance_formula labels and the mass-point Wald-IV effective_dose_mean.
- tests/test_practitioner.py::TestHADDispatch::test_handle_continuous_step_4_recodes_first_treat_inf_for_had locks the inf->0 recode in the emitted snippet.
- Plus 5 additional regression tests for cross-surface alignment between the practitioner handler, llms-practitioner.txt routing, and the underlying estimator contracts.

CHANGELOG:
- Original #402 entry updated to credit the new return-annotation regression and clarify what the inspect.signature-based test does and does not pin.

No methodology changes, no behavior changes to fit() outputs on any existing surface — all changes are contract clarifications, guidance corrections, and test additions on surfaces that #402 + #420 already established.
@github-actions

Copy link
Copy Markdown

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 14, 2026
@igerber igerber merged commit 917a786 into main May 14, 2026
31 of 32 checks passed
@igerber igerber deleted the fix-audit-402-r2 branch May 14, 2026 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant