PowerAnalysis methodology review (PR-A): Bloom 1995 + Burlig 2020 source audits by igerber · Pull Request #506 · igerber/diff-diff · GitHub
Skip to content

PowerAnalysis methodology review (PR-A): Bloom 1995 + Burlig 2020 source audits#506

Merged
igerber merged 1 commit into
mainfrom
feature/power-analysis-paper-review
May 31, 2026
Merged

PowerAnalysis methodology review (PR-A): Bloom 1995 + Burlig 2020 source audits#506
igerber merged 1 commit into
mainfrom
feature/power-analysis-paper-review

Conversation

@igerber

@igerber igerber commented May 31, 2026

Copy link
Copy Markdown
Owner

Summary

  • PR-A (paper-review) of the PowerAnalysis methodology validation — the Step-1 fidelity artifact of the locked 2-PR sequence. Adds Bloom (1995) and Burlig, Preonas & Woerman (2020) source audits under docs/methodology/papers/, sourced only from the papers.
  • Documents (under review — does NOT yet fix) four discrepancies between the authoritative PowerAnalysis surfaces and the source material; reconciliation is deferred to PR-B and tracked in TODO.md.
  • No estimator logic, formula, or test changes. (One class-docstring panel-variance display was corrected from a self-canceling factor to the form _compute_variance already implements — docstring only.)

Methodology references (required if estimator / math changes)

  • Method name(s): PowerAnalysis (analytical MDE / power / sample size; simulation-based power)
  • Paper / source link(s): Bloom (1995) https://doi.org/10.1177/0193841X9501900504 ; Burlig, Preonas & Woerman (2020) https://doi.org/10.1016/j.jdeveco.2020.102458 (reviewed via the open NBER WP 26250)
  • Any intentional deviations from the source (and why): This PR documents (does not resolve) the deviations, via a reviewer-recognized REGISTRY **Note:** + power.py docstring notes + a references.rst clarification, with reconciliation tracked in TODO.md for PR-B:
    1. the analytical path uses the normal (z) approximation (faithful to Bloom, who derives the MDE multiplier from the normal), while Burlig's Eq. 1 uses the t distribution;
    2. the REGISTRY SE formula's 1/√(1−R²) and cluster-size m terms are not what power.py implements;
    3. the displayed sample-size formula omits the T(1−T) allocation factor the code applies;
    4. the panel (1+(T−1)ρ)/T factor is an equicorrelated/Moulton design effect, not Burlig's serial-correlation-robust (SCR) variance (Eq. 2) — an attribution overclaim pending PR-B re-attribution or implementation.

Validation

  • Tests added/updated: None — this is a documentation-only (paper-review) PR. Source validation, equation-anchored methodology tests, and R parity (pcpanel for the panel path; a normal-based reference for the analytical path) are deferred to PR-B.
  • Backtest / simulation / notebook evidence (if applicable): N/A

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

@github-actions

Copy link
Copy Markdown

@igerber igerber force-pushed the feature/power-analysis-paper-review branch from f70b9d6 to aa64e26 Compare May 31, 2026 15:25
@github-actions

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: aa64e2675b635304e55d779da41ba6b782ba5e3b


Overall Assessment

⚠️ Needs changes

Executive Summary

  • The previous re-review P2 on PowerAnalysis is addressed: the methodology docs now explicitly mark the analytical path as under review, document the four source mismatches, and track the follow-up in TODO.md (docs/methodology/REGISTRY.md:L3207-L3220, docs/methodology/REGISTRY.md:L3364, diff_diff/power.py:L1232-L1236, docs/references.rst:L246-L248, TODO.md:L77-L77).
  • The diff is not limited to PowerAnalysis documentation; it also rolls back vcov_type="conley" support on SunAbraham and WooldridgeDiD, and rewrites the StackedDiD Conley deferral (diff_diff/sun_abraham.py:L520-L555, diff_diff/wooldridge.py:L429-L441, diff_diff/stacked_did.py:L203-L216).
  • [Newly identified] SunAbraham.set_params(vcov_type="conley") still accepts the now-forbidden value, so sklearn-style mutation bypasses the new constructor guard and falls into a lower-level Conley code path that this PR removed (diff_diff/sun_abraham.py:L543-L555, diff_diff/sun_abraham.py:L1441-L1450, diff_diff/sun_abraham.py:L2150-L2160).
  • WooldridgeDiD propagates the rollback correctly by revalidating vcov_type in set_params(), so the unmitigated contract regression is specific to SunAbraham (diff_diff/wooldridge.py:L429-L441, diff_diff/wooldridge.py:L487-L521).
  • I did not run the test suite in this environment; numpy is unavailable, so this review is from static inspection.

Methodology

  • Severity: P1 [Newly identified]. Impact: the new SunAbraham variance-family contract is inconsistent across public entry points. __init__ now rejects vcov_type="conley", but set_params() still blindly mutates self.vcov_type without revalidation. Because the old estimator-level Conley front-door and Conley argument threading were removed, a caller can still put the estimator into the forbidden "conley" state via sklearn-style mutation and then reach _fit_saturated_regression() / LinearRegression(vcov_type="conley") without any conley_* inputs. That is exactly the changed-surface propagation problem this repo flags as a P1 anti-pattern. Concrete fix: factor the constructor validation into a shared helper and call it from set_params() before mutation, mirroring WooldridgeDiD.set_params(). Add a regression test that failed mutation leaves vcov_type unchanged. References: diff_diff/sun_abraham.py:L543-L555, diff_diff/sun_abraham.py:L1441-L1450, diff_diff/sun_abraham.py:L2150-L2160, diff_diff/wooldridge.py:L487-L521, tests/test_sun_abraham.py:L1656-L1661.
  • Severity: P3-informational. Impact: the PowerAnalysis source mismatches are now explicitly documented and tracked, so under the review policy they are mitigated rather than blocking. Concrete fix: none required in this PR; PR-B can reconcile or re-attribute the formulas. References: docs/methodology/REGISTRY.md:L3207-L3220, diff_diff/power.py:L1232-L1236, TODO.md:L77-L77.

Code Quality

  • No additional findings beyond the SunAbraham.set_params() contract regression above.

Performance

  • No findings. The estimator changes are removals/deferrals, and the PowerAnalysis changes are documentation-only.

Maintainability

  • Severity: P3-informational. Impact: the StackedDiD Conley deferral is now described as a simple threading follow-up and “same shape as the SunAbraham conley follow-up,” which drops the prior methodology caveat about duplicated units across stacked sub-experiments at distance 0. The feature is still rejected, so this does not change runtime numbers, but it does understate the unresolved design issue for the eventual follow-up PR. Concrete fix: restore a short methodology caveat in the rejection message / REGISTRY / TODO row instead of framing the work as pure plumbing. References: diff_diff/stacked_did.py:L107-L109, diff_diff/stacked_did.py:L208-L216, docs/methodology/REGISTRY.md:L1473-L1478, TODO.md:L107-L109.

Tech Debt

  • No new untracked tech debt. The PowerAnalysis discrepancies are properly tracked in TODO.md, which is the correct mitigation for this docs-first audit (TODO.md:L77-L77).

Security

  • No findings.

Documentation/Tests

  • No separate blocker beyond the missing SunAbraham.set_params() regression noted above. The changed tests only assert constructor-time rejection for the rolled-back Conley surfaces, not mutation-time rejection (tests/test_sun_abraham.py:L1625-L1661, tests/test_wooldridge.py:L1755-L1757).
  • I did not execute the test suite in this environment because numpy is unavailable.

Path to Approval

  1. Make SunAbraham.set_params() re-run the same vcov_type validation as __init__ before mutating state, so vcov_type="conley" is rejected consistently across both public entry points.
  2. Add regression tests that SunAbraham().set_params(vcov_type="conley") and SunAbraham().set_params(vcov_type="foo") both raise and leave self.vcov_type unchanged.

…rce audits

Step-1 fidelity artifact for the PowerAnalysis methodology validation (the paper-review
PR of the 2-PR sequence; source validation + code reconciliation + tests are deferred to
PR-B). No estimator logic, formula, or test changes.

New source audits under docs/methodology/papers/ (sourced only from the papers):
- bloom-1995-review.md: the MDE-multiplier framing — MDE = M*SE with M from the NORMAL
  distribution (Bloom builds 1.65 + 0.84 = 2.49 from z-quantiles, p.548-549), the
  cross-sectional impact-estimator SE sigma*sqrt((1-R^2)/(n*T*(1-T))) (Eq. 1/2), the
  T(1-T) allocation factor (optimal at 50/50), one- and two-sided multipliers; documents
  that Bloom explicitly excludes clustering/multi-site design effects (Note 1).
- burlig-preonas-woerman-2020-review.md: the serial-correlation-robust (SCR) panel-DD
  variance (Eq. 2, verbatim — three covariance terms psi^B/psi^A/psi^X over m pre- and r
  post-periods, psi^X entering negatively), the McKenzie special case (Eq. 3), the
  increase-MDE condition (Eq. 4); Eq. 1 uses the t-distribution; pcpanel is the panel
  parity reference.

The audits surfaced discrepancies between the authoritative PowerAnalysis surfaces and the
source material. Per the agreed approach these are DOCUMENTED as under-review now (not yet
fixed — reconciliation is deferred to PR-B and tracked in TODO.md):
- REGISTRY.md ## PowerAnalysis: umbrella **Note:** enumerating the four discrepancies
  (t-vs-normal-z multiplier; SE R^2/cluster-m terms; missing T(1-T) allocation factor in the
  displayed sample-size formula; panel (1+(T-1)rho)/T is an equicorrelated/Moulton design
  effect, NOT Burlig SCR — an attribution overclaim).
- REGISTRY.md R-equivalents table: annotate the PowerAnalysis row as under-review (analytical
  path is normal-based, so pwr.t.test is not the faithful parity target; panel parity ref is
  Stata pcpanel) — resolves the cross-reference inconsistency the audits introduced.
- power.py: docstring notes on PowerAnalysis and simulate_power flagging the panel attribution
  and normal-vs-t approximation as under review; the class docstring panel-variance display
  corrected from a self-canceling factor to the implemented
  (1/N_treated + 1/N_control) * (1+(T-1)rho)/T (docstring-only; no logic change).
- references.rst: clarify the analytical panel path uses an equicorrelated approximation,
  not Burlig's SCR formula.
- TODO.md: tracker row (Methodology/Correctness) for the PR-B reconciliation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@igerber igerber force-pushed the feature/power-analysis-paper-review branch from aa64e26 to 2470be6 Compare May 31, 2026 15:32
@github-actions

Copy link
Copy Markdown

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 31, 2026
@igerber igerber merged commit fe20a9a into main May 31, 2026
33 of 34 checks passed
@igerber igerber deleted the feature/power-analysis-paper-review branch May 31, 2026 16:55
igerber added a commit that referenced this pull request May 31, 2026
… Eq.2 equicorrelated; tracker -> Complete

Reconciles diff_diff/power.py with the Bloom (1995) + Burlig, Preonas & Woerman
(2020) source audits (paper reviews added in PR-A #506). Behavior change: the
analytical panel-DiD variance was the Moulton (1+(T-1)rho)/T factor (wrong
period-scaling, ~4x too small at rho=0/m=r=5, AND opposite rho-sign). Replaced
with the within-unit equicorrelated special case of Burlig Eq. 2,
sigma^2 (1/n_T+1/n_C)(1/m+1/r)(1-rho), so within-unit correlation now LOWERS the
MDE. The MDE multiplier stays the normal-z Bloom multiplier (documented as a
deliberate large-sample approximation to Burlig's t).

- power.py: equicorrelated variance in _compute_variance + _compute_required_n;
  input validation for ALL designs (n_pre>=1, n_post>=1, rho in [-1/(T-1), 1))
  enforced BEFORE the 2x2-vs-panel router, so invalid two-period shapes no longer
  fall through silently; the (1-rho) factor applies at T=2 too (Burlig footnote
  11, the m=r=1 case), so rho is never silently ignored and rho=0 recovers Bloom's
  2*sigma^2; docstrings rewritten; PR-A under-review notes removed.
- REGISTRY ## PowerAnalysis equation block rewritten (z not t; unified
  equicorrelated SE with the 2x2 as the m=r=1 special case; cluster-m and
  inverted-R^2 terms removed; both reference surfaces; checklist ticked).
- New tests/test_methodology_power.py (Bloom Table 1; 2x2 + panel closed forms;
  literal-equicorrelated Monte-Carlo; sample_size<->mde round-trip; input-guard +
  rho-at-T=2 + compute_* wrapper validation; base-R qnorm parity).
- benchmarks/R/generate_power_golden.R + benchmarks/data/r_power_golden.json.
- tests/test_power.py: inverted test_icc_effect + test_extreme_icc to Burlig's sign.
- references.rst: + Frison & Pocock (1992), McKenzie (2012) lineage.
- docs/tutorials/06_power_analysis.ipynb: corrected rho cells + summary.
- METHODOLOGY_REVIEW.md row -> Complete; TODO row removed; CHANGELOG.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request May 31, 2026
… Eq.2 equicorrelated; tracker -> Complete

Reconciles diff_diff/power.py with the Bloom (1995) + Burlig, Preonas & Woerman
(2020) source audits (paper reviews added in PR-A #506). Behavior change: the
analytical panel-DiD variance was the Moulton (1+(T-1)rho)/T factor (wrong
period-scaling, ~4x too small at rho=0/m=r=5, AND opposite rho-sign). Replaced
with the within-unit equicorrelated special case of Burlig Eq. 2,
sigma^2 (1/n_T+1/n_C)(1/m+1/r)(1-rho), so within-unit correlation now LOWERS the
MDE. The MDE multiplier stays the normal-z Bloom multiplier (documented as a
deliberate large-sample approximation to Burlig's t).

- power.py: equicorrelated variance in _compute_variance + _compute_required_n;
  input validation for ALL designs (n_pre>=1, n_post>=1, rho in [-1/(T-1), 1))
  enforced BEFORE the 2x2-vs-panel router, so invalid two-period shapes no longer
  fall through silently; the (1-rho) factor applies at T=2 too (Burlig footnote
  11, the m=r=1 case), so rho is never silently ignored and rho=0 recovers Bloom's
  2*sigma^2; docstrings rewritten; PR-A under-review notes removed.
- REGISTRY ## PowerAnalysis equation block rewritten (z not t; unified
  equicorrelated SE with the 2x2 as the m=r=1 special case; cluster-m and
  inverted-R^2 terms removed; both reference surfaces; checklist ticked).
- New tests/test_methodology_power.py (Bloom Table 1; 2x2 + panel closed forms;
  literal-equicorrelated Monte-Carlo; sample_size<->mde round-trip; input-guard +
  rho-at-T=2 + compute_* wrapper validation; base-R qnorm parity).
- benchmarks/R/generate_power_golden.R + benchmarks/data/r_power_golden.json.
- tests/test_power.py: inverted test_icc_effect + test_extreme_icc to Burlig's sign.
- references.rst: + Frison & Pocock (1992), McKenzie (2012) lineage.
- docs/tutorials/06_power_analysis.ipynb: corrected rho cells + summary.
- METHODOLOGY_REVIEW.md row -> Complete; TODO row removed; CHANGELOG.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@igerber igerber mentioned this pull request Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant