{{ message }}
SyntheticControl: in-space placebo permutation inference + reporting-stack integration#511
Merged
Merged
Conversation
…stack integration Implements ADH 2010 §2.4 in-space placebo inference for the classic SyntheticControl estimator (PR-1 / #501 shipped the estimator with no analytical SE), and wires SyntheticControlResults into the practitioner / DiagnosticReport / BusinessReport reporting stack. Methodology (diff_diff/synthetic_control.py, synthetic_control_results.py): - SyntheticControlResults.in_space_placebo(): reassigns treatment to each donor, refits a synthetic control for that pseudo-treated donor against the other J-1 donors (the real treated unit is excluded from every placebo pool -- its post-period is treatment-contaminated; matches SCtools::generate.placebos), and ranks the treated unit's post/pre RMSPE ratio among the J+1 units. - New fields placebo_p_value (= rank/(n_placebos+1); an upper-tail rank test on the unsigned RMSPE-ratio statistic, ties via >=), rmspe_ratio (treated statistic, set at fit), n_placebos/n_failed (effective reference-set sizes; non-converged placebos excluded from BOTH numerator and denominator). - placebo_p_value is a SEPARATE field from the always-NaN analytical p_value; it carries no SE/t-stat and does not flow through safe_inference; is_significant stays bound to p_value. - Fail-closed contract: a valid fit requires BOTH inner Frank-Wolfe AND outer-V convergence of the SELECTED incumbent; treated fit fails closed and placebos are excluded on either failure. The Powell polish validates the incumbent only when it converges back AT the incumbent's objective level (np.isclose), never on a success at a strictly-worse point. - Edge cases: scale-aware RMSPE-ratio floor (perfect pre-fit -> finite ratio), J<2 -> NaN+warn, J==2 -> degenerate+coarse warn, deterministic given seed. - get_placebo_df() returns the per-unit RMSPE-ratio table (incl. treated row + failed donors). Placebo compute is opt-in; each fit retains a _SyntheticControlFitSnapshot of the pivoted panel (excluded from pickling). Reporting (diagnostic_report.py, practitioner.py, business_report.py, _reporting_helpers.py): SCM routed like TROP (parallel-trends-free, fit-based). New scm_fit PT analogue (design_enforced_pt verdict reading pre_rmspe), a _scm_native diagnostic block surfacing pre_rmspe + donor-weight concentration + the placebo p-value when already computed (never triggering the refit implicitly), a practitioner _handle_synthetic_control with the placebo as the headline significance step, and a BusinessReport fit-based assumption block with ADH 2010 attribution. Also fixes a latent BR bug where headline is_significant was a non-JSON-serializable numpy bool_ when p_value is numpy NaN. Docs: REGISTRY.md §SyntheticControl (new **Note:** labels for donor-pool construction, failure handling, RMSPE-ratio floor, non-analytical-p-value split), REPORTING.md, docs/api/synthetic_control.rst, LLM guides, README, CHANGELOG. TODO.md tracks the ADH-2015 follow-up (in-time placebo / LOO) and a compact/lazy snapshot representation. Tests: ~19 placebo + convergence tests (infeasible donor counts, treated-fit non-convergence, Powell-succeeds-at-worse-point, failed-placebo exclusion, deterministic reruns, pickle behavior) + SCM reporting-routing tests across all three tools. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ty reason + n_starts validation CI review (PR #511) flagged two P2s: 1. summary() misdiagnosed the infeasible-placebo reason. The treated-fit fail-closed branch and the J<2 branch both leave n_placebos=0, n_failed=0, so reconstructing the cause from counts narrated a non-converged treated fit as "too few donors or all donor refits failed" (false for that path). Add an explicit `_placebo_status` recorded by in_space_placebo() ({treated_fit_nonconverged, too_few_donors, all_placebos_failed, ran}) and render the specific reason in both summary() and DiagnosticReport._scm_native instead of inferring it from counts. The status is a small string, so it survives pickling alongside _placebo_df / the scalar fields. 2. in_space_placebo(n_starts=...) skipped the positive-integer validation the estimator constructor enforces — int(0)/int(-1)/int(2.5) silently coerced into a degenerate or invalid permutation procedure. Mirror the constructor's check and fail fast with a ValueError. Also relabels the lingering "One-sided rank" code comment to "upper-tail rank on the unsigned RMSPE ratio" (matching the REGISTRY/CHANGELOG wording fix). New regressions: summary() names the treated-fit-failure reason (not the donor-side reason) under identical n_placebos/n_failed counts; invalid n_starts overrides raise and leave placebo state untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — this re-review resolves the prior P2s, and I did not find any unmitigated P0/P1 issues in the changed SCM estimator, placebo, or reporting paths. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…lobal filter mutation CI review (PR #511) P3: the new scm_fit fixture called warnings.filterwarnings("ignore"), which permanently mutates the process-wide warning filters and can silently weaken warning-regression coverage in later tests in the same worker. Wrap only the synthetic_control(...) call in warnings.catch_warnings() + simplefilter("ignore") so the suppression is scoped to the fixture's solver call. (Pre-existing sibling fixtures in this file use the same legacy global-filter pattern but are out of scope for this PR.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
SyntheticControlestimator (PR-1 / Add SyntheticControl estimator (classic SCM, Abadie-Diamond-Hainmueller 2010) — PR-1 core #501 shipped the estimator with no analytical SE) and wiresSyntheticControlResultsinto the practitioner / DiagnosticReport / BusinessReport reporting stack.SyntheticControlResults.in_space_placebo(): reassigns treatment to each donor, refits a synthetic control for that pseudo-treated donor against the otherJ−1donors (the real treated unit is excluded from every placebo pool — its post-period is treatment-contaminated; matchesSCtools::generate.placebos), and ranks the treated unit's post/pre RMSPE ratio among theJ+1units.placebo_p_value(= rank/(n_placebos+1); an upper-tail rank test on the unsigned RMSPE-ratio statistic — direction-agnostic, detects an effect of either sign; ties via≥),rmspe_ratio(treated statistic, set at fit),n_placebos/n_failed(effective reference-set sizes; non-converged placebos excluded from BOTH numerator and denominator).placebo_p_valueis a separate field from the always-NaN analyticalp_value— no SE/t-stat, does not flow throughsafe_inference;is_significantstays bound top_value.np.isclose), never on a success at a strictly-worse point.inf),J<2→ NaN+warn,J==2→ degenerate+coarse warn, deterministic givenseed.get_placebo_df()returns the per-unit RMSPE-ratio table (incl. treated row + failed donors). Placebo compute is opt-in; each fit retains a pickle-excluded_SyntheticControlFitSnapshotof the pivoted panel.scm_fitPT analogue (design_enforced_ptverdict readingpre_rmspe), a_scm_nativeblock surfacingpre_rmspe+ donor-weight concentration + the placebo p-value when already computed (never triggering the refit implicitly), a practitioner_handle_synthetic_controlwith the placebo as the headline significance step, and a BusinessReport fit-based assumption block with ADH 2010 attribution. Also fixes a latent BR bug where headlineis_significantwas a non-JSON-serializable numpybool_whenp_valueis numpyNaN.Methodology references (required if estimator / math changes)
SCtools::generate.placebos(CRAN). Documented indocs/methodology/REGISTRY.md§SyntheticControl.**Note:**(its post-period is treatment-contaminated; matchesSCtools). (2) The reportedrmspe_ratiois on the root scale; ADH §3.4 reports the MSPE ratio (the square) — monotone-equivalent, so rank/p-value are identical (documented). (3)placebo_p_valueis kept separate from the analyticalp_value/is_significant(no analytical SE exists for classic SCM) — documented as the non-analytical-p-value split. (4) Scale-aware RMSPE-ratio floor + non-converged-placebo exclusion from numerator AND denominator — documented**Note:**labels. In-time placebo / leave-one-out (ADH 2015) are out of scope and tracked inTODO.md.Validation
tests/test_methodology_synthetic_control.py(~19 placebo + convergence tests: infeasible donor counts, treated-fit non-convergence, Powell-succeeds-at-worse-point regression, failed-placebo exclusion, deterministic reruns, pickle behavior, custom-V path,get_placebo_dfshape/status);tests/test_diagnostic_report.py+tests/test_business_report.py(SCM-native routing,scm_fitprose, target-parameter, robustness, infeasible/precomputed-rejection);tests/test_practitioner.py(placebo step dispatch). Full local run:428 passed, 1 skipped(pure-Python) across the four affected files.Security / privacy
🤖 Generated with Claude Code