{{ message }}
SyntheticControl: cv (out-of-sample) + inverse-variance V-selection (ADH 2015 / Abadie 2021)#523
Merged
Merged
Conversation
…ADH 2015 / Abadie 2021) Completes the ADH-2015 / Abadie-2021 predictor-importance V-selection menu with two new `v_method` values, each threaded through the in-space / leave-one-out / in-time placebo refits so a diagnostic uses the same estimator as the headline fit. `v_method="cv"` — out-of-sample cross-validation (ADH 2015 §; Abadie 2021 Eq. 9). The pre-period is split at `v_cv_t0` (new constructor param; default `len(pre)//2`) into a training and a validation window. Each predictor spec is RE-AGGREGATED over each window (its op recomputed over only that window's periods — a separate `dataprep` per window, standardized per window), so the V-search is genuinely out-of-sample for every predictor type: V is selected to minimize the validation-window outcome MSPE of the training-window fit, then the final weights are re-estimated on the validation-window predictors (step 4). The same V* drives both fits with no zeroed coordinate, so `v_weights` reproduce `donor_weights` and `predictor_balance` is reported on the validation-window basis. `mspe_v` reports the held-out validation MSPE. Abadie 2021 fn.7 non-uniqueness is handled by a deterministic, convergence-aware flat-MSPE tie-break (prefer the densest V; never let a non-converged candidate displace a converged incumbent). `v_method="inverse_variance"` — closed-form `v_h = 1/Var(X_h)` (Abadie 2021 §3.2(a)), variance over donors+treated on the unstandardized predictors, applied to the RAW predictors (the `standardize` pre-scaling is intentionally bypassed — inverse-variance weighting IS the unit-variance rescaling). Exact for every positive variance (no flooring); zero-variance rows get 0 weight; an all-zero / overflow panel falls back to uniform + warn. Fail-closed identification gates (cv): every predictor must SPAN both windows (re-aggregation needs it measurable on each — default single-period lags are rejected), and each window must have cross-DONOR variation (donor-indistinguishable windows leave X0·W constant in W → the weight solve is unidentified). Violations raise on the headline fit; in-space placebos drop the affected refit; in-time-truncated dates are marked `infeasible`. Single-donor fits force w=[1] so V is unidentified → uniform `v_weights` + `mspe_v=None` for all methods (documented). Validation: R `Synth` has no built-in CV, so cv is anchored by deterministic equivalence to the R-anchored `custom_v` path on the per-window re-aggregated predictors (both the step-3 criterion and step-4 weights) + cv self-consistency (in_time_placebo == fresh backdated fit); inverse-variance is bit-for-bit vs `custom_v=1/Var(X)`. New tests cover config validation, exact inverse-variance, the spanning / donor-variation / convergence fail-closed gates, single-donor degeneracy, and placebo/LOO/in-time propagation for both methods. Docs: REGISTRY §SyntheticControl Notes (per-window re-aggregation, fully-spanning + donor gates, tie-break, inverse-variance, single-donor), checklist tick, `docs/api`, the LLM guides, README, CHANGELOG. The remaining ADH-2015 tail (`W^reg` extrapolation, sparse-SC) and an in-space/LOO machine-readable cv-infeasible reason-code stay tracked in TODO.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Merged
TDL77
pushed a commit
to TDL77/diff-diff
that referenced
this pull request
Jun 7, 2026
Promote the [Unreleased] CHANGELOG block to [3.5.1] (2026-06-02) and sync the version string across pyproject.toml, rust/Cargo.toml, diff_diff/__init__.py, diff_diff/guides/llms-full.txt, and CITATION.cff (date-released 2026-06-02). Patch release since v3.5.0 — enhancements, validations, and fixes to existing surfaces (no new estimator): SyntheticControl cross-validation + inverse-variance V-selection (igerber#523); Firpo & Possebom (2018) SCM-inference paper review (igerber#524); HeterogeneousAdoptionDiD fit() UX warnings (igerber#525); covariate-name collision now raises ValueError (igerber#518); EfficientDiD methodology-review promotion to Complete with sieve outcome-regression upgrade (igerber#521); AI PR-reviewer model gpt-5.4 -> gpt-5.5 (igerber#522). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
v_methodvalues toSyntheticControl, completing the ADH-2015 / Abadie-2021 predictor-importanceV-selection menu, each threaded through the in-space / leave-one-out / in-time placebo refits (a diagnostic uses the same estimator as the headline fit).v_method="cv"— out-of-sample cross-validation (ADH 2015 §; Abadie 2021 Eq. 9). Newv_cv_t0param splits the pre-period (defaultlen(pre)//2). Each predictor spec is re-aggregated per window (a separatedataprep/standardization per window), so the V-search is genuinely out-of-sample for every predictor type: V minimizes the validation-window outcome MSPE of the training-window fit, then the final weights are re-estimated on the validation-window predictors (step 4).v_weightsreproducedonor_weights;predictor_balanceis reported on the validation-window basis;mspe_vis the held-out validation MSPE. Deterministic, convergence-aware flat-MSPE tie-break (fn. 7).v_method="inverse_variance"— closed-formv_h = 1/Var(X_h)(Abadie 2021 §3.2(a)) applied to the raw predictors (thestandardizepre-scaling is intentionally bypassed — inverse-variance weighting is unit-variance rescaling). Exact for every positive variance; zero-variance rows → 0 weight; all-zero/overflow → uniform + warn.X0·Wconstant inW→ unidentified). Violations raise on the headline fit; in-space placebos drop the affected refit; in-time-truncated dates →infeasible. Single-donor fits forcew=[1]→Vunidentified → uniformv_weights+mspe_v=None(documented).Methodology references (required if estimator / math changes)
docs/methodology/papers/abadie-diamond-hainmueller-2015-review.md,abadie-2021-review.md.**Note:**/**Deviation from R:**): RSynthhas no built-in CV (ADH-2015's CV is a manual two-dataprepre-run) — our per-window re-aggregation reproduces it for absolute-period spec aggregates; deterministic densest-V tie-break for fn.7 non-uniqueness; single-donor uniform-V degeneracy. All indocs/methodology/REGISTRY.md§SyntheticControl.Validation
tests/test_methodology_synthetic_control.py— config validation, exact inverse-variance (incl. tiny-positive-variance), the spanning / donor-variation / training-convergence fail-closed gates, single-donor degeneracy, and placebo/LOO/in-time propagation + self-consistency for both methods. R-anchored: cv by deterministic equivalence to the R-anchoredcustom_vpath on the per-window re-aggregated predictors (step-3 criterion + step-4 weights) + cv self-consistency (in-time == fresh backdated fit, 1e-7); inverse-variance bit-for-bit vscustom_v=1/Var(X).Security / privacy
🤖 Generated with Claude Code