iframe-proxy

igerber · 2026-04-26T12:04:54Z

Summary

Closes the gaps left after PR #372 added HeterogeneousAdoptionDiD to the canonical surfaces. The narrative pages did not yet mention HAD, and the 12-symbol HAD pretest suite shipped in had_pretests.py was absent from the API page. Also refreshes the inference-contract block to use the survey_design= canonical kwarg consolidated in PR #376.

docs/api/had.rst — new HAD Pretests section covering all 12 public symbols (4 single-period tests + 4 result classes + 3 joint tests + 1 joint result), split into aggregate="overall" and aggregate="event_study" subsections matching the workflow's dispatch. Refreshes the inference-contract block to reference survey_design=make_pweight_design(weights) (pweight shortcut) and survey_design=SurveyDesign(...) (full TSL); notes survey= / weights= are deprecated aliases.
docs/choosing_estimator.rst — HAD entries in all 3 tables (Quick Reference, Standard Error Methods, Survey Design Support) plus a new "Universal Rollout / No Untreated Control" subsection in Detailed Guidance. SE Methods row uses survey_design= canonical naming.
docs/r_comparison.rst — HAD row in Feature Comparison Table, new "No-Untreated Designs (no R parallel)" subsection, Migration Tips bullet.
docs/troubleshooting.rst — new HAD Issues section with 4 subsections (estimand resolution / mass-point fallback / classical SE under survey_design= / panel-only event-study).
docs/practitioner_decision_tree.rst — Start Here option 7, At a Glance row, new "Universal Rollout" section with _section-no-untreated anchor.
docs/references.rst — Binder citation references the canonical survey_design= paths.
docs/doc-deps.yaml — extends had_pretests.py entry with llms.txt user-guide dep; adds new top-level local_linear.py entry.

Methodology references (required if estimator / math changes)

N/A - no methodology changes (docs-only PR; no source files modified).

Validation

Tests added/updated: No test changes (docs-only).
Sphinx build: succeeds (python -m sphinx -b html -q docs docs/_build/html); 0 new warnings from this PR (71 pre-existing in bacon.py / staggered_bootstrap.py autosummary unaffected).
Public API regression: all 12 HAD pretest symbols importable (HADPretestReport, qug_test, stute_test, yatchew_hr_test, did_had_pretest_workflow, QUGTestResults, StuteTestResults, YatchewTestResults, stute_joint_pretest, joint_pretrends_test, joint_homogeneity_test, StuteJointResult); make_pweight_design + SurveyDesign importable.
Rendered HTML coverage: api/had.html 276 hits, narrative pages 4-8 hits each.
Cross-reference resolution: _section-no-untreated anchor (definition + cross-ref).
Style: 0 em dashes in diff.

Security / privacy

Confirm no secrets/PII in this PR: Yes

Closes the gaps left after PR #372 added HeterogeneousAdoptionDiD to the canonical surfaces. The narrative pages did not yet mention HAD, and the 12-symbol HAD pretest suite shipped in `had_pretests.py` was absent from the API page. Also refreshes the inference-contract block to use the `survey_design=` canonical kwarg consolidated in PR #376. - `docs/api/had.rst`: new HAD Pretests section covering all 12 public symbols (4 single-period tests + 4 result classes + 3 joint tests + 1 joint result), split into `aggregate="overall"` and `aggregate="event_study"` subsections matching the workflow's dispatch. Refreshes the existing inference-contract block to reference `survey_design=make_pweight_design(weights)` (pweight shortcut) and `survey_design=SurveyDesign(...)` (full TSL); notes `survey=` / `weights=` are deprecated aliases. - `docs/choosing_estimator.rst`: HAD entries in all 3 tables (Quick Reference, Standard Error Methods, Survey Design Support) plus a new "Universal Rollout / No Untreated Control" subsection in Detailed Guidance. SE Methods row uses `survey_design=` canonical naming. - `docs/r_comparison.rst`: HAD row in Feature Comparison Table, new "No-Untreated Designs (no R parallel)" subsection, Migration Tips bullet. - `docs/troubleshooting.rst`: new HAD Issues section with 4 subsections (estimand resolution / mass-point fallback / classical SE under survey_design / panel-only event-study). - `docs/practitioner_decision_tree.rst`: Start Here option 7, At a Glance row, new "Universal Rollout" section with `_section-no-untreated` anchor. - `docs/doc-deps.yaml`: extend had_pretests.py entry with llms.txt user-guide dep; add new top-level local_linear.py entry. Verification: all 12 HAD pretest symbols importable; `make_pweight_design` + `SurveyDesign` importable; sphinx build succeeds with 0 new warnings (71 pre-existing unaffected); HTML render contains expected HAD content (276 hits in had.html, 4-8 in narrative pages); 0 em dashes; `_section-no-untreated` anchor resolves. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-26T12:10:55Z

Overall Assessment

⚠️ Needs changes — I found two unmitigated P1 issues on the newly documented HAD public surface.

Executive Summary

The new HAD Pretests API section itself looks aligned with the exported 12-symbol source surface.
P1: the PR rewrites the HAD estimator’s pweight/SE contract around survey_design=make_pweight_design(weights), but HeterogeneousAdoptionDiD.fit() explicitly rejects that object and still uses deprecated weights= for the pweight shortcut.
P1: several new HAD snippets are not runnable because they use nonexistent kwargs/results attributes (outcome, unit, time, dose, results.coef, results.fit_path) instead of the actual HAD API.
P2: the new R-comparison text is stale as of April 26, 2026; R now has HAD-oriented package entry points, so “no R parallel” is too strong. (cran.r-universe.dev)

Methodology

Severity: P1. Impact: affected method is HeterogeneousAdoptionDiD’s weighting/variance contract. docs/api/had.rst:L49-L60, docs/api/had.rst:L67-L70, docs/choosing_estimator.rst:L665-L667, and docs/troubleshooting.rst:L535-L540 now document survey_design=make_pweight_design(weights) as the estimator-side pweight route. That does not match the implementation or the Methodology Registry: data-in HAD surfaces take survey_design=SurveyDesign(...), the deprecated weights= path is still the pweight shortcut, and make_pweight_design(...) is reserved for array-in pretests. Worse, the in-code deprecation text says weights= and survey_design=SurveyDesign(...) currently produce different SE families on HeterogeneousAdoptionDiD.fit(), so the rewritten docs also blur a real variance-family distinction. Cross-check: docs/methodology/REGISTRY.md:L2352-L2353, diff_diff/had.py:L2853-L2891, diff_diff/had.py:L2946-L2967, diff_diff/survey.py:L681-L697, diff_diff/survey.py:L798-L808. Concrete fix: restore the estimator docs to the implemented contract: weights= remains the deprecated pweight shortcut on fit(), survey_design=SurveyDesign(weights='col_name', ...) is the data-in survey route, and make_pweight_design() should only be documented on array-in pretest helpers.

Code Quality

No findings in the changed files.

Performance

No findings in the changed files.

Maintainability

No findings in the changed files.

Tech Debt

No findings. I also did not see any changed-file issue that is already mitigated by TODO.md.

Security

No findings.

Documentation/Tests

Severity: P1. Impact: the new HAD examples are not runnable as written. The added snippets call did_had_pretest_workflow(..., outcome=..., unit=..., time=..., dose=...) and HeterogeneousAdoptionDiD.fit(..., outcome=..., unit=..., time=..., dose=...), but the actual signatures are outcome_col, dose_col, time_col, unit_col (diff_diff/had_pretests.py:L4015-L4030, diff_diff/had.py:L2786-L2804). They also print results.coef and results.fit_path, while the HAD result objects expose att and design (diff_diff/had.py:L315-L339, diff_diff/had.py:L667-L690). Affected docs: docs/choosing_estimator.rst:L393-L405, docs/practitioner_decision_tree.rst:L285-L299, docs/r_comparison.rst:L234-L238, docs/troubleshooting.rst:L500-L526. Concrete fix: update all new HAD snippets to use the real kwarg names and result attributes (outcome_col, dose_col, time_col, unit_col, results.att, results.design).
Severity: P2. Impact: docs/r_comparison.rst:L216-L230, docs/r_comparison.rst:L399-L415, and docs/r_comparison.rst:L433-L436 say HAD has “no R parallel” and that the paper has not been packaged in R. As of April 26, 2026, the current package manuals already show DIDHAD and DIDmultiplegt(mode="had"); the R surface appears narrower/QUG-oriented, but it is not accurate to say there is no R counterpart at all. Concrete fix: narrow the claim to the actual gap, e.g. that diff-diff covers a broader HAD surface than the current R packages rather than “no R parallel.” (cran.r-universe.dev)

Not run here: Sphinx build/tests, due the read-only sandbox.

Path to Approval

Revert the HAD estimator SE-contract docs to the implemented split between deprecated weights= and survey_design=SurveyDesign(...), and remove survey_design=make_pweight_design(weights) from estimator-surface guidance.
Fix every newly added HAD snippet to use the real call signature and result fields (*_col, att, design).
Update docs/r_comparison.rst to reflect the current R ecosystem and describe the remaining gap precisely instead of saying there is no R parallel.

P1 (methodology contract): the inference-contract block, choosing-estimator SE Methods row, and troubleshooting "classical SE under survey" subsection incorrectly described `survey_design=make_pweight_design(weights)` as the estimator-side pweight shortcut. Per `survey.py:681-697` and `had.py:2853-2891` that helper is reserved for ARRAY-IN HAD pretest helpers (`stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `qug_test`); on the data-in `HeterogeneousAdoptionDiD.fit` surface the deprecated `weights=np.ndarray` shortcut is the actual pweight route, and it currently yields a different SE family than `survey_design=SurveyDesign(...)`: `weights=` -> `variance_formula="pweight"` / `"pweight_2sls"` (CCT-2014 / 2SLS pweight-sandwich); `survey_design=SurveyDesign(...)` -> `"survey_binder_tsl"` / `"survey_binder_tsl_2sls"` (Binder-TSL). The unification onto a single SE contract is queued for the next minor release. - `docs/api/had.rst` inference-contract block: restore `weights=` shortcut (deprecated) and `survey_design=SurveyDesign(weights="col", ...)` as the two distinct weighted regimes; spell out the SE-family difference and the next-minor unification; add a separate paragraph that documents `make_pweight_design()` correctly as the pweight-only convenience for the array-in pretest helpers. - `docs/api/had.rst` mass-point classical deviation: cband+event_study `NotImplementedError` fires on the deprecated `weights=` shortcut, not on `survey_design=make_pweight_design(...)`. - `docs/choosing_estimator.rst` SE Methods row: same restoration; spells out the variance_formula values and notes the next-minor unification. - `docs/troubleshooting.rst` "classical SE under survey": subsection `NotImplementedError` description corrected to `survey_design=SurveyDesign(...)` + deprecated `weights=` shortcut. P1 (snippet correctness): the new HAD code snippets used `outcome=`, `unit=`, `time=`, `dose=`, `results.coef`, `results.fit_path` - the actual HAD signature uses `outcome_col`, `unit_col`, `time_col`, `dose_col` (`had.py:2786-2804`, `had_pretests.py:3741+`), and the result objects expose `att` and `design` (`had.py:316, 324, 669, 679`). Fixed across: - `docs/practitioner_decision_tree.rst` Universal Rollout snippet (workflow + fit + result-attr). - `docs/choosing_estimator.rst` Universal Rollout snippet (workflow + fit + result-attr). - `docs/r_comparison.rst` Heterogeneous Adoption snippet (fit kwargs). - `docs/troubleshooting.rst` HAD Issues snippets (fit kwargs + `results.fit_path` -> `results.design`). P2 (R-comparison overstatement): the dedicated R package `DIDHAD` (de Chaisemartin et al., August 2025) covers the QUG case (Design 1') of the same arXiv paper, so claiming "no R parallel" was too strong. Reframed to acknowledge `DIDHAD` and position diff-diff's broader surface (Design 1 no-QUG, multi-period event-study extension Appendix B.2, survey-design integration via Binder TSL): - `docs/r_comparison.rst`: section heading "No-Untreated Designs (no R parallel)" -> "Heterogeneous Adoption (HAD)"; body acknowledges `DIDHAD` and frames diff-diff's broader surface; Feature Comparison Table row label simplified; note about R equivalents updated; Migration Tips bullet narrowed and cross-reference target updated. Sphinx build clean (0 warnings in edited files); also fixes a title-underline-too-short warning on the "Resolved estimand..." subsection that was below threshold by one character. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-26T12:26:10Z

/ai-review

github-actions · 2026-04-26T12:34:16Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 9a62e922816c659dc795b08517642b4d4b3a7426

Overall Assessment

⚠️ Needs changes

The prior P1s from the last review look resolved, but two new unmitigated P1 methodology/doc mismatches remain on the changed HAD public surface.

Executive Summary

The earlier blockers appear fixed: the new HAD examples now use the real *_col kwargs / result fields, and the estimator-side weighting docs now distinguish deprecated weights= from canonical survey_design=SurveyDesign(...).
P1 [Newly identified]: docs/api/had.rst now tells users that qug_test uses survey_design=make_pweight_design(weights), but qug_test still permanently rejects all survey/weighted inputs.
P1 [Newly identified]: the new troubleshooting text says “no exact dose == 0 implies Design 1 / WAS_d_lower”, but design="auto" also resolves to Design 1' when d.min() < 0.01 * median(|d|).
P2: the R-comparison refresh is directionally better, but one claim in docs/r_comparison.rst still overstates the remaining gap versus DIDHAD.
The newly documented HAD SE-family split / covariance limitations are already tracked in TODO.md, so those remain P3 informational only.

Methodology

Severity: P1 [Newly identified]. Impact: docs/api/had.rst:L67-L72 says the array-in HAD pretest helpers, including qug_test, use survey_design=make_pweight_design(weights). That contradicts the actual QUG contract: qug_test documents survey_design= / survey= / weights= as permanently unsupported and raises NotImplementedError, and the Methodology Registry explicitly says make_pweight_design(arr) is not a valid QUG migration target (diff_diff/had_pretests.py:L1236-L1255, diff_diff/had_pretests.py:L1361-L1387, docs/methodology/REGISTRY.md:L2352-L2353). Concrete fix: remove qug_test from that list and say make_pweight_design() applies only to stute_test, yatchew_hr_test, and stute_joint_pretest; for weighted data, direct QUG remains unsupported and did_had_pretest_workflow(..., survey_design=...) intentionally skips it.
Severity: P1 [Newly identified]. Impact: docs/troubleshooting.rst:L486-L490 says HAD routes to Design 1 whenever there are no exact dose = 0 rows. That is not the implemented default behavior: _detect_design() still resolves to continuous_at_zero / WAS when d.min() == 0 or when d.min() < 0.01 * median(|d|) (diff_diff/had.py:L1932-L1987, docs/methodology/REGISTRY.md:L2524-L2525). This misstates the public estimand-resolution contract and can make users misread results.target_parameter. Concrete fix: rewrite the troubleshooting cause to match the actual dispatch rule, including the small-positive-d_min Design 1' case.

Code Quality

No findings in the changed files.

Performance

No findings in the changed files.

Maintainability

No findings in the changed files.

Tech Debt

Severity: P3 informational. Impact: docs/api/had.rst:L61-L77 documents the weighted SE-family split, next-minor API consolidation, and missing joint cross-horizon analytical covariance, and those limitations are already tracked in TODO.md:L93-L102. Concrete fix: none required in this PR.

Security

No findings.

Documentation/Tests

Severity: P2. Impact: docs/r_comparison.rst:L227-L233, docs/r_comparison.rst:L420-L446 say diff-diff additionally covers the HAD multi-period event-study extension beyond DIDHAD. The current DIDHAD docs already expose dynamic effects, placebo, and event-study output in the QUG case, so the remaining gap is narrower than written. Concrete fix: narrow the comparison to the actual documented differences (e.g. Design 1 / no-QUG support and survey-design integration), or qualify the claim to the specific Appendix B.2 staggered/last-cohort surface if that is what you mean. (cran.r-universe.dev)

Not run here: local import/Sphinx smoke checks. This sandbox is missing numpy, and the workspace is read-only.

Path to Approval

In docs/api/had.rst, remove qug_test from the make_pweight_design() guidance and explicitly state that weighted/survey-aware QUG is unsupported.
In docs/troubleshooting.rst, replace the “no exact zero dose => Design 1” explanation with the implemented _detect_design() rule (d.min()==0 or d.min()<0.01*median(|d|) can still resolve to Design 1' / WAS).
In docs/r_comparison.rst, narrow the DIDHAD comparison so it does not imply that R lacks HAD event-study support altogether.

…le, DIDHAD claim P1 (qug_test in array-in pretest helper list): `docs/api/had.rst:67-72` listed `qug_test` alongside `stute_test` / `yatchew_hr_test` / `stute_joint_pretest` as accepting `survey_design=make_pweight_design(weights)`. Per `had_pretests.py:1236-1255` and the methodology REGISTRY (Phase 4.5 C0 decision gate), `qug_test` permanently raises `NotImplementedError` on any of `survey_design=` / `survey=` / `weights=` - there is no migration target for survey-aware QUG, and `make_pweight_design()` is explicitly NOT a valid QUG migration target. The composite workflow `did_had_pretest_workflow` handles weighted dispatch by skipping QUG with a `UserWarning`. Removed `qug_test` from the array-in helper list and added an explicit permanent-rejection note pointing to the workflow's skip behavior. P1 (estimand-resolution rule misstatement): `docs/troubleshooting.rst` "Resolved estimand" subsection said "no exact `dose == 0` => Design 1". Per `had.py:1932-1987` `_detect_design()` resolves to Design 1' when EITHER `d.min() == 0` OR `d.min() < 0.01 * median(|d|)` (small-share-of-treated escape clause). Rewrote the cause to spell out both sub-cases and clarify that Design 1 only fires when `d.min()` is meaningfully positive relative to the dose scale. Updated the inspection snippet to compute and print the `0.01 * median(|d|)` threshold instead of just counting `dose == 0` rows. P2 (DIDHAD event-study overstatement): `docs/r_comparison.rst` Heterogeneous Adoption section, R-equivalents note, and Migration Tips bullet claimed diff-diff additionally covers "the multi-period event-study extension (paper Appendix B.2)" beyond `DIDHAD`. The `DIDHAD` package already exposes dynamic effects / placebo / event-study output in the QUG case, so this overstates the gap. Narrowed all three locations to the documented differences: Design 1 (no QUG, `WAS_{d_lower}`) and survey-design integration via Binder TSL. Sphinx build clean (0 warnings in edited files; the unrelated `tutorials/18_geo_experiments.ipynb:61` "File not found: practitioner_decision_tree.html#few-test-markets" warning is pre-existing on origin/main and not introduced here). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-26T12:44:38Z

/ai-review

github-actions · 2026-04-26T12:49:21Z

🔁 AI review rerun (requested by @igerber)

Head SHA: f63f40be6c80e458cc7398c3b83effb988ae7fd4

Overall Assessment

✅ Looks good

Executive Summary

Re-review scope: the prior P1s appear resolved. docs/api/had.rst now correctly treats qug_test as the exception to survey-aware migration, and the estimand-resolution troubleshooting text now matches _detect_design().
The refreshed DIDHAD comparison is narrower than before and no longer makes the earlier overclaim about diff-diff being the only HAD event-study surface.
One non-blocking docs mismatch remains in the new HAD troubleshooting section: it overstates when staggered multi-cohort event-study inputs raise versus when the estimator auto-filters.
No code quality, performance, maintainability, or security findings in the changed files.

Methodology

Severity: P2. Impact: docs/troubleshooting.rst:L572-L597 says HeterogeneousAdoptionDiD.fit(..., aggregate="event_study") raises when staggered timing violates the last-cohort restriction and recommends manual subsetting. The actual contract is narrower: with first_treat_col provided, the estimator auto-filters to the last-treatment cohort plus never-treated and emits a warning; it raises only when the panel is staggered and first_treat_col is missing. That behavior is implemented in diff_diff/had.py:L1230-L1366 and diff_diff/had.py:L1470-L1499, and documented in docs/methodology/REGISTRY.md:L2408-L2408 and docs/methodology/REGISTRY.md:L2533-L2533. Concrete fix: rewrite this troubleshooting entry to split the two cases explicitly and make first_treat_col the primary remedy; keep manual subsetting as optional equivalent guidance.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3 informational. Impact: the weighted SE-family split and missing joint cross-horizon analytical covariance are now clearly documented in docs/api/had.rst:L61-L95 and already tracked in TODO.md:L93-L102, so they are non-blocking per the review policy. Concrete fix: none required in this PR.

Security

No findings.

Documentation/Tests

No additional findings beyond the methodology/doc-contract issue above.
Validation note: I could not run the Sphinx build here because sphinx is not installed in this environment.

The HAD "Panel-only event-study restriction" subsection in `docs/troubleshooting.rst` overstated when staggered multi-cohort event-study inputs raise. Per `had.py:1230-1366` and `had.py:1470-1499` (also documented in `docs/methodology/REGISTRY.md:2408, 2533`): - Common-adoption panel (single first-treat period): `first_treat_col` optional; the period is auto-inferred from the dose invariant. - Staggered panel WITH `first_treat_col`: estimator auto-filters to the last-treatment cohort + never-treated and emits a UserWarning. - Staggered panel WITHOUT `first_treat_col`: estimator raises (the only actual failure mode for this restriction). Rewrote the cause to spell out the dispatch and made `first_treat_col` the primary remedy; kept manual cohort subsetting as an equivalent that skips the UserWarning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-26T13:01:04Z

/ai-review

github-actions · 2026-04-26T13:07:12Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 987bdcd42118bef761c202796cf0089671d706f3

Overall Assessment

⚠️ Needs changes

Executive Summary

Re-review scope: the prior HAD event-study troubleshooting mismatch is resolved. docs/troubleshooting.rst:L572-L602 now matches the last-cohort auto-filter / raise split in diff_diff/had.py:L1233-L1360 and diff_diff/had.py:L1470-L1499.
The new HAD API/pretest coverage is otherwise aligned with the exported public surface in diff_diff/__init__.py:L65-L81 and the registry notes in docs/methodology/REGISTRY.md:L2352-L2353 and docs/methodology/REGISTRY.md:L2488-L2505.
One P1 remains in the new troubleshooting guidance: it says HeterogeneousAdoptionDiD() defaults to robust=True, but the constructor default is robust=False, so the suggested workaround would reproduce the same mass-point + survey vcov_type='classical' failure.
One minor P3 docs mismatch remains: the weighted mass-point inference summaries omit supported CR1 clustered SEs.
No code-quality, performance, maintainability, or security issues in the changed files.
I could not rerun the Sphinx build here because python -m sphinx fails with No module named sphinx.

Methodology

Severity: P1. Impact: docs/troubleshooting.rst:L559-L563 recommends HeterogeneousAdoptionDiD() as the fix for the mass-point + survey + classical-VCOV NotImplementedError and says the default is robust=True. The implementation default is robust=False at diff_diff/had.py:L2643-L2649, and the weighted mass-point survey guard explicitly treats that default mapping as classical and raises at diff_diff/had.py:L3418-L3447. A user following the docs would hit the same error instead of the advertised workaround. Concrete fix: replace the snippet with a working override such as HeterogeneousAdoptionDiD(vcov_type='hc1') or HeterogeneousAdoptionDiD(robust=True), and remove the incorrect “robust=True by default” comment.
Severity: P3. Impact: docs/api/had.rst:L49-L54 and docs/choosing_estimator.rst:L665-L667 describe the deprecated weighted shortcut mass-point path as classical / hc1 only. The implementation and registry also support CR1 when cluster= is supplied on the weighted mass-point path, per diff_diff/had.py:L2276-L2284, diff_diff/had.py:L2433-L2448, and docs/methodology/REGISTRY.md:L2356-L2358 / docs/methodology/REGISTRY.md:L2527. This understates the supported inference surface. Concrete fix: clarify those summaries to say classical / hc1, plus CR1 when cluster= is supplied, or separate clustering from the vcov_type discussion.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3. Impact: the weighted-family split and missing joint cross-horizon analytical covariance called out in docs/api/had.rst:L61-L82 are already tracked in TODO.md:L93-L102, so they are non-blocking under the review policy. Concrete fix: none required in this PR.

Security

No findings.

Documentation/Tests

No additional findings beyond the methodology/doc-contract issues above.
Validation note: I could not rerun the docs build because sphinx is not installed in this environment.

Path to Approval

Fix docs/troubleshooting.rst:L559-L563 so the recommended workaround actually selects HC1 on the mass-point survey path. The simplest correction is to show HeterogeneousAdoptionDiD(vcov_type='hc1') or HeterogeneousAdoptionDiD(robust=True), and state explicitly that plain HeterogeneousAdoptionDiD() keeps the default robust=False -> vcov_type='classical' mapping.

…lusion P1 (default robust value): the troubleshooting "classical SE under survey" solution and the api/had.rst inference-contract block both said the default `robust=True` maps to `vcov_type='hc1'`. Per `had.py:2643-2649` the constructor default is `robust=False`, which the survey + mass-point guard at `had.py:3418-3447` treats as classical and raises - so the suggested workaround `HeterogeneousAdoptionDiD()` reproduces the same NotImplementedError. Replaced both with explicit working overrides (`HeterogeneousAdoptionDiD(vcov_type='hc1')` or `HeterogeneousAdoptionDiD(robust=True)`) and called out the wrong-default trap explicitly. P3 (CR1 omitted from weighted mass-point inference summary): both `docs/api/had.rst:L49-L54` and `docs/choosing_estimator.rst:L665-L667` described the deprecated `weights=` shortcut mass-point path as "`classical` / `hc1` only". Per `had.py:2276-2284, 2433-2448` and `docs/methodology/REGISTRY.md:2356-2358, 2527`, CR1 (Liang-Zeger) is also supported when `cluster=` is supplied on the weighted mass-point path. Added "CR1 when `cluster=` is supplied" to both summaries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-26T13:12:49Z

/ai-review

github-actions · 2026-04-26T13:18:25Z

🔁 AI review rerun (requested by @igerber)

Head SHA: ff6974104d2a25e43f0ca15445df440bca453d08

Overall Assessment

✅ Looks good

Executive Summary

Re-review scope: the prior P1 on the HAD mass-point vcov_type="classical" workaround is resolved. docs/troubleshooting.rst:L541-L569 now matches the actual default mapping and guard in diff_diff/had.py:L2577-L2580 and diff_diff/had.py:L3415-L3447.
The prior P3 on weighted mass-point CR1 support is also resolved. docs/api/had.rst:L49-L55 and docs/choosing_estimator.rst:L665-L667 now match the weighted 2SLS contract documented in docs/methodology/REGISTRY.md:L2355-L2360.
The new HAD API/pretest coverage aligns with the exported public surface in diff_diff/__init__.py:L64-L82 and the registry notes for survey_design=, QUG-under-survey deferral, and variance-formula labels.
One P2 doc-contract issue remains: the new prose overstates did_had_pretest_workflow as adjudicating the HAD design / dispatching by panel structure, but the actual API requires explicit aggregate= selection and the estimator, not the workflow, resolves continuous_at_zero vs continuous_near_d_lower vs mass_point.
No P0/P1 methodology issues, no code-quality/performance/maintainability/security blockers in the changed files.
I could not rerun the Sphinx build here because python -m sphinx fails with ModuleNotFoundError: No module named 'sphinx'.

Methodology

Severity: P2. Impact: docs/api/had.rst:L130-L136, docs/practitioner_decision_tree.rst:L283-L301, and docs/r_comparison.rst:L227-L235 describe did_had_pretest_workflow as if it chooses the workflow shape from panel structure and/or adjudicates which HAD design path applies. In the implementation, the workflow has two explicit modes via aggregate= and requires callers to pass aggregate="event_study" on multi-period panels (diff_diff/had_pretests.py:L4033-L4045), while the HAD design itself is auto-resolved separately by _detect_design() from the dose support (diff_diff/had.py:L1932-L1987). A reader can reasonably expect automatic event-study dispatch or a pretest report that decides between continuous_at_zero, continuous_near_d_lower, and mass_point, but the API does neither. Concrete fix: rephrase the workflow as a diagnostic battery only; state explicitly that callers choose aggregate="overall" vs aggregate="event_study", and that HeterogeneousAdoptionDiD.fit() auto-detects the HAD design from the dose support.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No new untracked tech-debt findings. The remaining HAD limitations newly documented in docs/api/had.rst:L62-L83 are already tracked in TODO.md:L93-L102, so they are non-blocking under the review policy.

Security

No findings.

Documentation/Tests

No additional findings beyond the P2 documentation-contract issue above.
Validation note: I could not rerun the docs build in this environment because sphinx is not installed.

…y, not adjudicator) The new HAD prose overstated `did_had_pretest_workflow` as adjudicating the HAD design path or auto-dispatching by panel structure. Per `had_pretests.py:4033-4045` the workflow has two explicit modes via the `aggregate=` kwarg ("overall" vs "event_study") that the caller picks; the HAD design (`continuous_at_zero` / `continuous_near_d_lower` / `mass_point`) is resolved separately inside `HeterogeneousAdoptionDiD.fit` by `_detect_design()` from the dose support (`had.py:1932-1987`). - `docs/api/had.rst` HAD Pretests intro: rephrased as a diagnostic battery; spells out the two `aggregate=` modes selected by the caller and notes the design path is auto-detected inside the estimator. - `docs/practitioner_decision_tree.rst` Universal Rollout snippet: comment no longer claims the workflow "adjudicates which design path"; clarifies that the estimator picks the design from the dose support. - `docs/r_comparison.rst` Heterogeneous Adoption section: dropped "adjudicates the design path" claim; describes the workflow as a diagnostic battery and points the design-path resolution at the estimator. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-26T13:23:55Z

/ai-review

github-actions · 2026-04-26T13:30:05Z

🔁 AI review rerun (requested by @igerber)

Head SHA: db00262f7189f941f346c77021b534bb5f0c0f47

Overall Assessment

✅ Looks good

Executive Summary

Re-review scope: the prior doc-contract issue is resolved. The updated prose now correctly separates pretest workflow dispatch from HAD design resolution in docs/api/had.rst:L130-L140, docs/practitioner_decision_tree.rst:L287-L310, and docs/r_comparison.rst:L233-L236, matching diff_diff/had_pretests.py:L4033-L4050 and diff_diff/had.py:L1932-L1987.
The new HAD pretest/API coverage matches the exported public surface in docs/api/had.rst:L127-L183 and diff_diff/__init__.py:L69-L81, diff_diff/__init__.py:L468-L488.
No unmitigated P0/P1 issues found. The refreshed survey/weighted, QUG-under-survey, mass-point-classical, and last-cohort event-study language aligns with docs/methodology/REGISTRY.md:L2352-L2382, diff_diff/had.py:L3399-L3447, diff_diff/had.py:L4147-L4181, diff_diff/had_pretests.py:L4081-L4160, and diff_diff/had.py:L1116-L1499.
One P3 informational item remains: the new SE summaries still read as if weighted mass-point CR1 is unconditional when cluster= is supplied, but design="mass_point" + weights= + aggregate="event_study" + cband=True is still a documented exception (docs/api/had.rst:L49-L55, docs/choosing_estimator.rst:L665-L667, docs/methodology/REGISTRY.md:L2380-L2382, diff_diff/had.py:L4065-L4181).
I could not rerun the docs build or runtime import smoke tests here because the environment is missing sphinx and numpy.

Methodology

Severity: P3 (informational; deviation already documented in REGISTRY.md). Impact: docs/api/had.rst:L49-L55 and docs/choosing_estimator.rst:L665-L667 summarize the deprecated weights= mass-point path as supporting CR1 whenever cluster= is supplied. In implementation, the default weighted event-study path still rejects design="mass_point" + weights= + cluster= + cband=True (diff_diff/had.py:L4065-L4181), and that exception is explicitly documented in docs/methodology/REGISTRY.md:L2380-L2382. Concrete fix: add a short qualifier to the CR1 wording, e.g. “except aggregate="event_study", cband=True on the mass-point path,” or link that sentence to the existing exception note.
No other findings.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No new untracked tech-debt findings. The remaining HAD limitations referenced by these docs are already tracked in TODO.md:L93-L102.

Security

No findings.

Documentation/Tests

No additional findings.
Validation note: static cross-check only. I could not rerun Sphinx or import-based validation in this environment because sphinx and numpy are unavailable.

…eption The R4 CR1 addition wrote "CR1 when `cluster=` is supplied" without noting the documented mass-point + `aggregate="event_study"` + `cband=True` carve-out: that sub-path rejects an effective `classical` vcov (per `had.py:4147-4181` and REGISTRY:2380-2382), so plain `cluster=` with the default `robust=False` hits the classical-default trap. Both `docs/api/had.rst` and `docs/choosing_estimator.rst` now spell out the carve-out and point to the existing classical-default deviation note for the working override (`vcov_type="hc1"` or `robust=True`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-26T13:39:21Z

/ai-review

github-actions · 2026-04-26T13:43:59Z

🔁 AI review rerun (requested by @igerber)

Head SHA: f43d737cb44dd090b2e6a1553b2cc26ba3ed4cdb

Overall Assessment

✅ Looks good

Executive Summary

Re-review scope: the prior workflow/design-resolution documentation issue looks resolved. The new prose now correctly separates did_had_pretest_workflow() dispatch from HAD design auto-detection in docs/api/had.rst:L134-L144, matching diff_diff/had_pretests.py:L4033-L4050 and diff_diff/had.py:L1932-L1987.
The new HAD API page covers the expected 12 public pretest symbols, and that list matches the top-level exports in diff_diff/__init__.py:L69-L81 and diff_diff/__init__.py:L474-L488.
One residual informational issue remains in the SE contract docs: docs/api/had.rst:L49-L58 and docs/choosing_estimator.rst:L665-L667 still imply weights= + cluster= + aggregate="event_study" + cband=True can be made to work via vcov_type="hc1" / robust=True, but the implementation rejects that clustered combination outright, and also rejects mass-point survey_design + cluster= on the weighted path diff_diff/had.py:L3399-L3447, diff_diff/had.py:L4059-L4095, docs/methodology/REGISTRY.md:L2380-L2382.
No unmitigated P0/P1 findings.
Validation here was static only; I could not rerun the Sphinx build or import smoke tests because sphinx and numpy are not installed in this environment.

Methodology

P3 Impact: docs/api/had.rst:L49-L58, docs/api/had.rst:L89-L100, and docs/choosing_estimator.rst:L665-L667 still overstate clustered weighted mass-point support. They frame the remaining blocker as the classical-default path, and suggest vcov_type="hc1" / robust=True as a workaround. In code, design="mass_point" + weights= + cluster= + aggregate="event_study" + cband=True is rejected regardless of vcov_type, and design="mass_point" + survey_design + cluster= is also rejected on the weighted path diff_diff/had.py:L3399-L3447, diff_diff/had.py:L4059-L4095. This deviation is already documented in docs/methodology/REGISTRY.md:L2380-L2382, so it is informational rather than blocking. Concrete fix: revise the prose to say those clustered combinations are unsupported outright, and remove the implication that hc1/robust=True enables the clustered cband=True path.
No other findings.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No new untracked tech-debt findings. The remaining HAD limitations referenced by these docs are already tracked in TODO.md:L93-L102.

Security

No findings.

Documentation/Tests

No additional findings.
Validation note: static cross-check only. I could not rerun the docs build or import-based validation because sphinx and numpy are unavailable in this environment.

…ected outright The R6 wording said `cluster=` + `aggregate="event_study"` + `cband=True` on mass-point could be made to work via `vcov_type="hc1"` / `robust=True`. Per `had.py:4059-4095` and `had.py:3399-3447` (also documented in REGISTRY:2380-2382) that path is rejected outright regardless of `vcov_type`, and `survey_design=` + `cluster=` on weighted mass-point is similarly rejected. The error is about variance-family mixing in the sup-t bootstrap / Binder-TSL composition, not about the classical-default trap. - `docs/api/had.rst` inline weights= shortcut summary: narrowed the CR1 qualifier - "rejected outright regardless of `vcov_type`" + cross-link to a new sibling deviation note. - `docs/api/had.rst` new "Mass-point cluster-combination deviation" note beside the existing classical-default note: enumerates the two rejected combinations (survey_design= + cluster= static and event-study; weights= + cluster= + cband=True event-study) with the implementation's own workaround language (cband=False / drop cluster= / cluster= alone / weights= + cluster=). - `docs/choosing_estimator.rst` SE Methods row: dropped the misleading "requires explicit hc1/robust=True" implication; says rejected outright for both the weighted shortcut + cluster= + event_study cband and the survey_design= + cluster= mass-point combination. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-26T13:49:36Z

/ai-review

github-actions · 2026-04-26T13:56:38Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 96eb711ca6e59de4a17f596da2ece239893d88b5

Overall Assessment

✅ Looks good

Executive Summary

Re-review scope: the prior weighted clustered-mass-point documentation gap is resolved. docs/api/had.rst:L49-L117 and docs/choosing_estimator.rst:L665-L667 now match the documented deviations in docs/methodology/REGISTRY.md:L2380-L2382 and the guards in diff_diff/had.py:L3399-L3447 and diff_diff/had.py:L4059-L4182.
The new HAD pretests API page covers the expected public surface and matches the top-level exports in diff_diff/__init__.py:L69-L81 and diff_diff/__init__.py:L474-L488.
The new narrative docs correctly separate pretest-workflow dispatch from estimator design auto-detection, consistent with docs/api/had.rst:L150-L160, docs/practitioner_decision_tree.rst:L287-L310, diff_diff/had_pretests.py:L4015-L4050, and diff_diff/had.py:L1932-L1987.
One residual informational issue remains in troubleshooting: the mass-point paragraph describes the path as a fallback that “just changes the SE regime,” but the registry and implementation treat mass_point as a distinct Wald-IV / 2SLS estimator path as well as a different SE family.
Validation here was static only; I could not rerun the Sphinx build or import smoke tests because sphinx and numpy are not installed in this environment.

Methodology

Affected methods reviewed: HeterogeneousAdoptionDiD and the HAD pretest surfaces did_had_pretest_workflow, qug_test, stute_test, yatchew_hr_test, stute_joint_pretest, joint_pretrends_test, and joint_homogeneity_test.

P3 Impact: docs/troubleshooting.rst:L524-L539 frames mass-point resolution as a fallback to a “2SLS sandwich” that “just changes the SE regime.” In the registry and implementation, mass_point is a distinct Design 1 estimator path: the point estimate itself switches to the Wald-IV / 2SLS formula with instrument 1{D_2 > d_lower}, and the structural-residual sandwich is the corresponding inference path. The same paragraph also over-attributes that SE path to a paper appendix rather than to the library’s documented 2SLS implementation around the paper’s Section 3.2.4 design. Concrete fix: rephrase this subsection to say HAD auto-resolves to the mass-point design when the modal fraction at d_lower exceeds 2%; that design uses the mass-point Wald-IV / 2SLS estimator plus structural-residual sandwich inference. References: docs/troubleshooting.rst:L524-L539, docs/methodology/REGISTRY.md:L2400-L2402, docs/methodology/REGISTRY.md:L2523-L2527, diff_diff/had.py:L33-L39, diff_diff/had.py:L2260-L2272.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No new untracked tech-debt findings. The remaining HAD follow-ups referenced by the new docs are already tracked in TODO.md:L93-L102.

Security

No findings.

Documentation/Tests

No blocking documentation/test issues in the diff.
Validation note: static cross-check only. I could not rerun the docs build or import-based smoke tests because sphinx and numpy are unavailable in this environment.

… fallback The "Mass-point fit fallback" troubleshooting subsection framed `mass_point` as a fallback that "just changes the SE regime." Per `had.py:33-39, 2260-2272` and REGISTRY:2400-2402, 2523-2527, mass-point is a distinct Design 1 estimator path from dCDH 2026 paper Section 3.2.4 (not the Appendix): both the point estimate (Wald-IV sample-average ratio with binary instrument `Z_g = 1{D_{g,2} > d_lower}`) AND the SE (structural-residual 2SLS sandwich) differ from the continuous local-linear / CCT-2014 paths. Renamed the subsection to "Mass-point design selected", rewrote the cause to spell out the 2%-modal-fraction trigger, the Wald-IV ratio formula, and the structural-residual sandwich; corrected the paper-section attribution (Section 3.2.4, not Appendix); refined the solution snippet to point at re-binning to drop modal fraction below 2% if the user prefers continuous_near_d_lower. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-26T14:01:30Z

/ai-review

github-actions · 2026-04-26T14:07:01Z

🔁 AI review rerun (requested by @igerber)

Head SHA: b0399ee390a45370a1b9a671bbff048758d32e19

Overall Assessment

✅ Looks good

Executive Summary

Re-review scope: the prior informational mass-point wording issue is resolved. docs/troubleshooting.rst:L518-L546 now describes mass_point as a distinct Design 1 Wald-IV / 2SLS path, matching docs/methodology/REGISTRY.md:L2400-L2402, docs/methodology/REGISTRY.md:L2523-L2528, and diff_diff/had.py:L33-L39, diff_diff/had.py:L2250-L2272.
The refreshed inference-contract block in docs/api/had.rst:L43-L117 matches the documented HAD survey/weights deviations, including the weighted mass-point vcov_type="classical" guard and the cluster-combination rejections in docs/methodology/REGISTRY.md:L2352-L2382, diff_diff/had.py:L3399-L3447, and diff_diff/had.py:L4059-L4182.
The new HAD pretests API section covers the exported public surface and correctly separates aggregate="overall" vs aggregate="event_study" workflow dispatch from estimator design auto-detection: docs/api/had.rst:L147-L205, diff_diff/__init__.py:L69-L81, diff_diff/__init__.py:L475-L488, diff_diff/had_pretests.py:L4033-L4050, diff_diff/had.py:L1932-L1987.
The narrative additions in docs/choosing_estimator.rst:L364-L405 and docs/practitioner_decision_tree.rst:L266-L311 are consistent with the registry on Design 1' vs Design 1 targets and the non-testability of Assumptions 5/6 beyond the shipped pretests.
Validation here was static only; I could not rerun Sphinx in this environment because python -m sphinx --version fails with No module named sphinx.

Methodology

No findings. The changed HAD docs align with the registry and implementation, and the only prior review issue was informational and is now resolved (docs/troubleshooting.rst:L518-L546, docs/methodology/REGISTRY.md:L2400-L2402, docs/methodology/REGISTRY.md:L2523-L2528, diff_diff/had.py:L2250-L2272).

Code Quality

No findings. The new API documentation covers the 12 exported HAD pretest symbols without inventing non-public surfaces (docs/api/had.rst:L147-L205, diff_diff/__init__.py:L69-L81, diff_diff/__init__.py:L475-L488).

Performance

No findings. This is a docs-only diff and it does not change runtime code paths.

Maintainability

No findings. The new doc-deps mappings improve drift coverage for HAD documentation rather than adding untracked coupling (docs/doc-deps.yaml:L386-L406).

Tech Debt

No findings. The limitations referenced by these docs remain tracked in TODO.md and/or documented as registry deviations, including weighted-event-study joint covariance and pending alias removal (TODO.md:L93-L104, docs/methodology/REGISTRY.md:L2380-L2382).

Security

No findings.

Documentation/Tests

No blocking documentation issues found in the diff. Static review only: Sphinx is not installed in this environment, so I could not rerun the docs build.

…Intensity Practitioner walkthrough for HeterogeneousAdoptionDiD on the no-untreated-controls case: every market got the campaign at varying intensity and there is no clean comparison group. Fills the structural gap T14 (ContinuousDiD) cannot address. Notebook scope (23 cells, 13 markdown / 10 code, mirrors T19's structure): - Sections 1-3: framing the no-untreated-controls measurement problem, setup imports, synthetic 60-DMA / 8-week panel with Uniform[$5K, $50K] regional add-on spend (every DMA participates, no DMA at $0). DGP is internally consistent: outcomes are generated from the dose values HAD then sees, no post-hoc relabeling. - Section 4: overall WAS_d_lower fit on a 2-period (pre/post mean) collapse - HAD's overall mode requires exactly 2 periods (had.py:952-959). Locked headline: per-$1K marginal effect of 100 weekly visits per DMA above the boundary spend (95% CI [98.6, 101.4]) with design auto-detection landing on `continuous_near_d_lower` (Design 1) and target `WAS_d_lower`. Surfaces the Assumption 5/6 advisory the library fires for Design 1 and explains why it holds in this DGP (linear by construction). - Section 5: multi-week event-study fit on the 8-week panel, per-week WAS_d_lower for e=0..3 (~100 each, CIs cover truth) and pre-launch placebos at e=-2..-4 sitting on zero. - Section 6: stakeholder communication template (T18/T19 markdown blockquote pattern), per-DMA dollar-lift interpretation `(actual_dose - d_lower) * WAS_d_lower`, Assumption 6 caveat. - Section 7: extensions (population-weighted/survey path, composite pretest workflow described accurately as QUG support-infimum test + linearity tests, mass-point design path), related-tutorials cross-links (T01, T02, T14, T17, T18, T19), summary checklist. Drift detection: companion tests/test_t20_had_brand_campaign_drift.py (13 tests, 0.06s, mirrors T19's test-file-only pattern - T19's notebook itself has zero in-notebook asserts). Pins panel composition including sample median, design auto-detection / target / d_lower, overall WAS_d_lower / SE / CI endpoints to one-decimal display, dose mean, n_units, full event-study horizon presence (e=-4..-2, 0..3), per-week post-launch coverage of TRUE_SLOPE=100 and zero coverage at every placebo horizon (|placebo_att| < 0.1). Tight `round(_, 1) == X.X` pins throughout - HAD's analytical SE path is bit-identical regardless of backend env (no Rust kernel involved). Locked DGP seed: MAIN_SEED=87. Documentation integration: - docs/tutorials/README.md: new T20 entry following T18/T19's 5-bullet pattern. - docs/doc-deps.yaml: T20 added to the existing diff_diff/had.py entry; cross-link to docs/practitioner_decision_tree.rst added. - docs/practitioner_decision_tree.rst: `.. tip::` block at the end of `section-no-untreated` (Universal Rollout - landed on main via PR #389) cross-links to T20 for the full walkthrough. - CHANGELOG.md: new ### Added bullet under [Unreleased]. Out of scope (queued in project_had_followups.md memory): - _handle_had in practitioner.py:_HANDLERS map. - HAD entries in llms-full.txt / choosing_estimator.rst. - Pretest workflow tutorial, weighted/survey HAD tutorial, mass-point design demo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The HeterogeneousAdoptionDiD example snippet in choosing_estimator.rst failed the Pure Python Fallback CI job (test_doc_snippets.py ::test_doc_snippet[choosing_estimator:block7]) due to three latent drift bugs from PR #389 (docs-rtd-audit): 1. Missing aggregate='event_study' on both did_had_pretest_workflow and HAD.fit calls — default aggregate='overall' requires exactly 2 periods, but the doc-snippet test framework's namespace `data` (built via generate_staggered_data) has 10 periods. 2. Used the namespace's generic `data` variable, which has nonzero dose in every period (rng.choice from {0.0, 0.5, 1.0, 2.0}). HAD requires D=0 for all units in at least one pre-period. 3. `print(f"Estimate: {results.att:.3f}")` formatted att as a scalar, but under aggregate='event_study' results.att is a numpy array. Fix: rewrite the snippet to construct its own HAD-shape panel inline (mirrors how block6 handles ContinuousDiD with its own data generator); thread aggregate='event_study' through both calls; iterate the per-horizon att array for output. Pre-existing on origin/main; surfaced on this PR's CI re-run after the rebase. Other failing snippets (troubleshooting:block18, :block20, r_comparison:block6, :block7) are also pre-existing on main but are out of scope for this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Fix latent doc-snippet bugs from PR #389 (HAD ecosystem)

…Intensity Practitioner walkthrough for HeterogeneousAdoptionDiD on the no-untreated-controls case: every market got the campaign at varying intensity and there is no clean comparison group. Fills the structural gap T14 (ContinuousDiD) cannot address. Notebook scope (23 cells, 13 markdown / 10 code, mirrors T19's structure): - Sections 1-3: framing the no-untreated-controls measurement problem, setup imports, synthetic 60-DMA / 8-week panel with Uniform[$5K, $50K] regional add-on spend (every DMA participates, no DMA at $0). DGP is internally consistent: outcomes are generated from the dose values HAD then sees, no post-hoc relabeling. - Section 4: overall WAS_d_lower fit on a 2-period (pre/post mean) collapse - HAD's overall mode requires exactly 2 periods (had.py:952-959). Locked headline: per-$1K marginal effect of 100 weekly visits per DMA above the boundary spend (95% CI [98.6, 101.4]) with design auto-detection landing on `continuous_near_d_lower` (Design 1) and target `WAS_d_lower`. Surfaces the Assumption 5/6 advisory the library fires for Design 1 and explains why it holds in this DGP (linear by construction). - Section 5: multi-week event-study fit on the 8-week panel, per-week WAS_d_lower for e=0..3 (~100 each, CIs cover truth) and pre-launch placebos at e=-2..-4 sitting on zero. - Section 6: stakeholder communication template (T18/T19 markdown blockquote pattern), per-DMA dollar-lift interpretation `(actual_dose - d_lower) * WAS_d_lower`, Assumption 6 caveat. - Section 7: extensions (population-weighted/survey path, composite pretest workflow described accurately as QUG support-infimum test + linearity tests, mass-point design path), related-tutorials cross-links (T01, T02, T14, T17, T18, T19), summary checklist. Drift detection: companion tests/test_t20_had_brand_campaign_drift.py (13 tests, 0.06s, mirrors T19's test-file-only pattern - T19's notebook itself has zero in-notebook asserts). Pins panel composition including sample median, design auto-detection / target / d_lower, overall WAS_d_lower / SE / CI endpoints to one-decimal display, dose mean, n_units, full event-study horizon presence (e=-4..-2, 0..3), per-week post-launch coverage of TRUE_SLOPE=100 and zero coverage at every placebo horizon (|placebo_att| < 0.1). Tight `round(_, 1) == X.X` pins throughout - HAD's analytical SE path is bit-identical regardless of backend env (no Rust kernel involved). Locked DGP seed: MAIN_SEED=87. Documentation integration: - docs/tutorials/README.md: new T20 entry following T18/T19's 5-bullet pattern. - docs/doc-deps.yaml: T20 added to the existing diff_diff/had.py entry; cross-link to docs/practitioner_decision_tree.rst added. - docs/practitioner_decision_tree.rst: `.. tip::` block at the end of `section-no-untreated` (Universal Rollout - landed on main via PR #389) cross-links to T20 for the full walkthrough. - CHANGELOG.md: new ### Added bullet under [Unreleased]. Out of scope (queued in project_had_followups.md memory): - _handle_had in practitioner.py:_HANDLERS map. - HAD entries in llms-full.txt / choosing_estimator.rst. - Pretest workflow tutorial, weighted/survey HAD tutorial, mass-point design demo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR igerber#389 added HAD code snippets to choosing_estimator.rst, troubleshooting.rst, and r_comparison.rst. None of those edits triggered rust-test.yml (which only runs on rust/, diff_diff/, tests/, pyproject.toml, and the workflow file), so tests/test_doc_snippets.py never executed and the snippets shipped with five latent bugs that now surface on every code PR via the Pure Python Fallback job. Bugs addressed: - r_comparison:block6 — bare HAD.fit(data, ...) with the generate_staggered_data fixture failed because the default aggregate='overall' requires exactly 2 periods and the namespace data has 10. Replaced with an inline HAD-shape panel construction (mirrors the upstream choosing_estimator:block7 fix in 55d7a27) plus aggregate='event_study'. - troubleshooting:block20 — the snippet demonstrates first_treat_col= auto-filtering on a staggered panel. The fixture's first_treat values disagree with the dose path (random per-row dose on never-treated units), tripping HAD's first_treat / dose-path consistency validator. Inlined a 120-unit / 10-period staggered HAD-shape panel (30 never + 30 cohort 5 + 60 cohort 8) so the validator passes and the boundary local-linear estimator has enough distinct dose values to fit. - troubleshooting:block17 / block18 / r_comparison:block7 — these are legitimately context-dependent snippets that reference est / results from prior text-flow context (inspection / output-format examples). Added them to _CONTEXT_DEPENDENT_SNIPPETS so the expected NameError is suppressed, matching the pattern already used for block8, the api_bacon blocks, and the existing r_comparison context-dependent set. choosing_estimator:block7 was the sixth failing snippet but was already fixed upstream in 55d7a27 with the inline-construction pattern; this branch rebases onto that. Verification: PYTHONPATH=. DIFF_DIFF_BACKEND=python pytest tests/test_doc_snippets.py reports 111 passed, 4 skipped, 0 failed on this branch (was 6 failed on origin/main before 55d7a27 and 5 failed after). Follow-up (separate PR queued): carve test_doc_snippets.py out into a dedicated docs-tests.yml workflow triggered on docs/** + diff_diff/** + the test file itself, and exclude it from rust-test.yml's pytest invocations so doc bugs are caught on doc PRs (not silently inherited by code PRs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…est.yml PR igerber#389 (Batch A) shipped six broken HAD doc snippets in 2026-04 that no CI run caught because rust-test.yml only triggers on rust/, diff_diff/, tests/, pyproject.toml, and the workflow file — none of which include docs/. PR igerber#396 patched the snippets but did not address the structural gap. This PR addresses it. Two changes: 1. New .github/workflows/docs-tests.yml — separate workflow that runs `pytest tests/test_doc_snippets.py -v` on a single ubuntu-latest / py3.14 / pure-Python runner. Triggers on docs/, diff_diff/, tests/test_doc_snippets.py, pyproject.toml, and the workflow file itself; same ready-for-ci label gate as rust-test.yml / notebooks.yml. Mirrors notebooks.yml's shape (the existing precedent for `pytest`-validated docs assets) so the two doc-validation workflows look like siblings. 2. .github/workflows/rust-test.yml: add --ignore=tests/test_doc_snippets.py to all three pytest invocations so doc snippets stop riding the code workflow. The Pure Python Fallback edit (line 193) is the only one that changes CI signal: that job runs from the repo root and was the ONLY place where test_doc_snippets.py actually executed. The two Rust-matrix edits (lines 158, 165) are defensive consistency: the matrix copies tests/ to /tmp/tests (rust-test.yml:138, 142) without docs/, so DOCS_DIR resolves to /tmp/docs/ which doesn't exist; the test collector silently skips every RST file via the guard at tests/test_doc_snippets.py:129. Adding --ignore there prevents the no-op from becoming a real run if anyone later adds `cp -r docs ...` to the copy steps. Each invocation now carries an in-YAML comment documenting which case it's the defensive vs behavior-changing edit. Verification: - python -c "import yaml; yaml.safe_load(open('.github/workflows/ docs-tests.yml')); yaml.safe_load(open('.github/workflows/ rust-test.yml'))" — both files well-formed. - pytest tests/ --ignore=tests/test_doc_snippets.py --ignore=tests/test_rust_backend.py --collect-only — 0 occurrences of test_doc_snippets in the collected set (was 115 cases collected when not ignored), confirming pytest accepts repeated --ignore flags as the existing line-193 pattern with --ignore=tests/ test_rust_backend.py already showed. After this PR opens, the workflow file itself triggers docs-tests.yml on its own change, providing the first end-to-end CI validation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…backlog Wave 1 of multi-wave tech-debt paydown. Docs-only — no estimator code touched. Changes: - Delete 3 SHIPPED rows marked `Done`: HAD trends_lin (PR igerber#389), HAD yatchew_hr_test(null="mean_independence") (post-PR igerber#392), HAD Phase 5 follow-up tutorial T22 (PR igerber#440). - Fix stray blank line inside Methodology/Correctness table (was rendering as two fragments). - Correct dCDH survey cell-period allocator row's PR-column "PR 2" → "igerber#408" (actual dcdh-by-path-survey-design merge). - Refresh Known Limitations line refs to current state of estimators.py (`:778-784` → `:1647`; `:567-588` → `:890-911`). - Regenerate Large Module Files table from `wc -l diff_diff/*.py` ≥1000 lines. Add 11 newly-eligible modules (chaisemartin_dhaultfoeuille.py 8636, had_pretests.py 4951, had.py 4593, etc.). Update header sentence to make the 3000/2000/1000 thresholds explicit instead of silently relaxing the documented "< 1000 lines" target. - Rewrite Standard Error Consistency section — `vcov_type` has subsumed the proposed `se_type` knob; point readers at the existing methodology row for the remaining 8-standalone-estimator threading work. - Modernize Test Coverage section to reference `pytest.importorskip` rather than a stale "21 visualization tests" count. - Add new `### Prioritized Tech-Debt Backlog` subsection: Tier A/B/C/D ordering by effort × risk, anchoring back to existing source-of-truth table rows so the tables remain canonical. Seven Tier A quick wins, twelve Tier B mid-size methodology items, nine Tier C derivation items, eight Tier D deferred/research items. The new tier structure exists so subsequent waves can be picked off without re-litigating priority each turn. Wave 2 candidate is the clubsandwich_cr2_golden.json regeneration from R (row 87 / Tier A). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…prose L194 checklist update (this PR) said the Eq. 18 detrending variant shipped in PR #389; the explanatory prose immediately below at L200 still said "Phase 4 extends it with the Eq (18) detrending" as if it were future work. Rewritten to past tense matching the L194 closure and the REGISTRY § "Note (Phase 4 — Eq 17 / Eq 18 linear-trend detrending shipped)" framing. Only the Pierce-Schott numerical replication remains waived (REGISTRY Deviations Note #3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- P3 (Methodology): the promoted HAD materials described the Eq. 17/18 `trends_lin=True` linear-trend-detrended variant as "deferred per Phase 4". This conflated TWO different things: (a) the FEATURE — which is shipped via the `trends_lin: bool = False` keyword-only kwarg on HAD.fit(), joint_pretrends_test, and joint_homogeneity_test (PR igerber#389; R-parity locked against DIDHAD::did_had(trends_lin=TRUE) v2.0.0 in test_did_had_parity.py); and (b) the PIERCE-SCHOTT NUMERICAL REPLICATION against the published p=0.51 anchor on the LBD-restricted panel, which IS waived per REGISTRY Deviations Note igerber#3. Updated 3 surfaces (paper-review L194, METHODOLOGY_REVIEW Eq. 18 Verified-Components row, test_methodology_had.py module docstring + TestHADJointStute class docstring) to distinguish "feature shipped + R-parity locked elsewhere" from "Pierce-Schott numerical replication waived". - P3 (Documentation/Tests): TestHADJointStute promotion narrative overstated H1 coverage as "H0 fail-to-reject and H1 reject on linear vs nonlinear DGPs" for both joint_pretrends_test and joint_homogeneity_test. Reality: H1 rejection is tested only on joint_homogeneity_test via a quadratic post- DGP; joint_pretrends_test gets H0-only coverage in this file (H1 would require a violating-pretrends fixture that re-verifies bootstrap calibration covered by test_had_pretests.py). Narrowed wording in METHODOLOGY_REVIEW Verified-Components row + TestHADJointStute class docstring; CHANGELOG entry unchanged (the H1 reject claim in CHANGELOG explicitly cites the homogeneity side via "H1 reject under nonlinear DGP", which is accurate). All 35 methodology tests pass; lint clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

igerber added the ready-for-ci Triggers CI test workflows label Apr 26, 2026

igerber merged commit a7b8454 into main Apr 26, 2026
4 of 5 checks passed

igerber deleted the docs-rtd-audit branch April 26, 2026 14:15

igerber mentioned this pull request Apr 26, 2026

Fix latent doc-snippet bugs from PR #389 (HAD ecosystem) #396

Merged

igerber added a commit that referenced this pull request Apr 26, 2026

Merge pull request #396 from igerber/fix/doc-snippet-had-bugs

8e1282b

Fix latent doc-snippet bugs from PR #389 (HAD ecosystem)

igerber mentioned this pull request Apr 26, 2026

Add docs-tests.yml; remove test_doc_snippets.py from rust-test.yml #399

Merged

igerber mentioned this pull request May 15, 2026

TODO.md cleanup: drop SHIPPED rows, refresh stale refs, add Tier A/B/C/D backlog #447

Merged

6 tasks

Sunbelt Computer Software

PL/B Language Development and Support

Conversation

igerber commented Apr 26, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

igerber commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

igerber commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

igerber commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Path to Approval

Uh oh!

igerber commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

igerber commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

igerber commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

igerber commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance