docs: HAD ecosystem completion (RTD audit Batch A) by igerber · Pull Request #389 · igerber/diff-diff · GitHub
Skip to content

docs: HAD ecosystem completion (RTD audit Batch A)#389

Merged
igerber merged 9 commits into
mainfrom
docs-rtd-audit
Apr 26, 2026
Merged

docs: HAD ecosystem completion (RTD audit Batch A)#389
igerber merged 9 commits into
mainfrom
docs-rtd-audit

Conversation

@igerber

@igerber igerber commented Apr 26, 2026

Copy link
Copy Markdown
Owner

Summary

Closes the gaps left after PR #372 added HeterogeneousAdoptionDiD to the canonical surfaces. The narrative pages did not yet mention HAD, and the 12-symbol HAD pretest suite shipped in had_pretests.py was absent from the API page. Also refreshes the inference-contract block to use the survey_design= canonical kwarg consolidated in PR #376.

  • docs/api/had.rst — new HAD Pretests section covering all 12 public symbols (4 single-period tests + 4 result classes + 3 joint tests + 1 joint result), split into aggregate="overall" and aggregate="event_study" subsections matching the workflow's dispatch. Refreshes the inference-contract block to reference survey_design=make_pweight_design(weights) (pweight shortcut) and survey_design=SurveyDesign(...) (full TSL); notes survey= / weights= are deprecated aliases.
  • docs/choosing_estimator.rst — HAD entries in all 3 tables (Quick Reference, Standard Error Methods, Survey Design Support) plus a new "Universal Rollout / No Untreated Control" subsection in Detailed Guidance. SE Methods row uses survey_design= canonical naming.
  • docs/r_comparison.rst — HAD row in Feature Comparison Table, new "No-Untreated Designs (no R parallel)" subsection, Migration Tips bullet.
  • docs/troubleshooting.rst — new HAD Issues section with 4 subsections (estimand resolution / mass-point fallback / classical SE under survey_design= / panel-only event-study).
  • docs/practitioner_decision_tree.rst — Start Here option 7, At a Glance row, new "Universal Rollout" section with _section-no-untreated anchor.
  • docs/references.rst — Binder citation references the canonical survey_design= paths.
  • docs/doc-deps.yaml — extends had_pretests.py entry with llms.txt user-guide dep; adds new top-level local_linear.py entry.

Methodology references (required if estimator / math changes)

  • N/A - no methodology changes (docs-only PR; no source files modified).

Validation

  • Tests added/updated: No test changes (docs-only).
  • Sphinx build: succeeds (python -m sphinx -b html -q docs docs/_build/html); 0 new warnings from this PR (71 pre-existing in bacon.py / staggered_bootstrap.py autosummary unaffected).
  • Public API regression: all 12 HAD pretest symbols importable (HADPretestReport, qug_test, stute_test, yatchew_hr_test, did_had_pretest_workflow, QUGTestResults, StuteTestResults, YatchewTestResults, stute_joint_pretest, joint_pretrends_test, joint_homogeneity_test, StuteJointResult); make_pweight_design + SurveyDesign importable.
  • Rendered HTML coverage: api/had.html 276 hits, narrative pages 4-8 hits each.
  • Cross-reference resolution: _section-no-untreated anchor (definition + cross-ref).
  • Style: 0 em dashes in diff.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Closes the gaps left after PR #372 added HeterogeneousAdoptionDiD to the
canonical surfaces. The narrative pages did not yet mention HAD, and the
12-symbol HAD pretest suite shipped in `had_pretests.py` was absent from
the API page. Also refreshes the inference-contract block to use the
`survey_design=` canonical kwarg consolidated in PR #376.

- `docs/api/had.rst`: new HAD Pretests section covering all 12 public
  symbols (4 single-period tests + 4 result classes + 3 joint tests + 1
  joint result), split into `aggregate="overall"` and
  `aggregate="event_study"` subsections matching the workflow's dispatch.
  Refreshes the existing inference-contract block to reference
  `survey_design=make_pweight_design(weights)` (pweight shortcut) and
  `survey_design=SurveyDesign(...)` (full TSL); notes `survey=` /
  `weights=` are deprecated aliases.
- `docs/choosing_estimator.rst`: HAD entries in all 3 tables (Quick
  Reference, Standard Error Methods, Survey Design Support) plus a new
  "Universal Rollout / No Untreated Control" subsection in Detailed
  Guidance. SE Methods row uses `survey_design=` canonical naming.
- `docs/r_comparison.rst`: HAD row in Feature Comparison Table, new
  "No-Untreated Designs (no R parallel)" subsection, Migration Tips
  bullet.
- `docs/troubleshooting.rst`: new HAD Issues section with 4 subsections
  (estimand resolution / mass-point fallback / classical SE under
  survey_design / panel-only event-study).
- `docs/practitioner_decision_tree.rst`: Start Here option 7, At a
  Glance row, new "Universal Rollout" section with `_section-no-untreated`
  anchor.
- `docs/doc-deps.yaml`: extend had_pretests.py entry with llms.txt
  user-guide dep; add new top-level local_linear.py entry.

Verification: all 12 HAD pretest symbols importable; `make_pweight_design`
+ `SurveyDesign` importable; sphinx build succeeds with 0 new warnings
(71 pre-existing unaffected); HTML render contains expected HAD content
(276 hits in had.html, 4-8 in narrative pages); 0 em dashes;
`_section-no-untreated` anchor resolves.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

P1 (methodology contract): the inference-contract block, choosing-estimator
SE Methods row, and troubleshooting "classical SE under survey" subsection
incorrectly described `survey_design=make_pweight_design(weights)` as the
estimator-side pweight shortcut. Per `survey.py:681-697` and
`had.py:2853-2891` that helper is reserved for ARRAY-IN HAD pretest helpers
(`stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `qug_test`); on
the data-in `HeterogeneousAdoptionDiD.fit` surface the deprecated
`weights=np.ndarray` shortcut is the actual pweight route, and it currently
yields a different SE family than `survey_design=SurveyDesign(...)`:
`weights=` -> `variance_formula="pweight"` / `"pweight_2sls"` (CCT-2014 /
2SLS pweight-sandwich); `survey_design=SurveyDesign(...)` ->
`"survey_binder_tsl"` / `"survey_binder_tsl_2sls"` (Binder-TSL). The
unification onto a single SE contract is queued for the next minor release.

- `docs/api/had.rst` inference-contract block: restore `weights=` shortcut
  (deprecated) and `survey_design=SurveyDesign(weights="col", ...)` as the
  two distinct weighted regimes; spell out the SE-family difference and
  the next-minor unification; add a separate paragraph that documents
  `make_pweight_design()` correctly as the pweight-only convenience for
  the array-in pretest helpers.
- `docs/api/had.rst` mass-point classical deviation: cband+event_study
  `NotImplementedError` fires on the deprecated `weights=` shortcut, not
  on `survey_design=make_pweight_design(...)`.
- `docs/choosing_estimator.rst` SE Methods row: same restoration; spells
  out the variance_formula values and notes the next-minor unification.
- `docs/troubleshooting.rst` "classical SE under survey": subsection
  `NotImplementedError` description corrected to `survey_design=SurveyDesign(...)`
  + deprecated `weights=` shortcut.

P1 (snippet correctness): the new HAD code snippets used `outcome=`,
`unit=`, `time=`, `dose=`, `results.coef`, `results.fit_path` - the actual
HAD signature uses `outcome_col`, `unit_col`, `time_col`, `dose_col`
(`had.py:2786-2804`, `had_pretests.py:3741+`), and the result objects
expose `att` and `design` (`had.py:316, 324, 669, 679`). Fixed across:

- `docs/practitioner_decision_tree.rst` Universal Rollout snippet
  (workflow + fit + result-attr).
- `docs/choosing_estimator.rst` Universal Rollout snippet (workflow + fit
  + result-attr).
- `docs/r_comparison.rst` Heterogeneous Adoption snippet (fit kwargs).
- `docs/troubleshooting.rst` HAD Issues snippets (fit kwargs +
  `results.fit_path` -> `results.design`).

P2 (R-comparison overstatement): the dedicated R package `DIDHAD`
(de Chaisemartin et al., August 2025) covers the QUG case (Design 1') of
the same arXiv paper, so claiming "no R parallel" was too strong. Reframed
to acknowledge `DIDHAD` and position diff-diff's broader surface (Design 1
no-QUG, multi-period event-study extension Appendix B.2, survey-design
integration via Binder TSL):

- `docs/r_comparison.rst`: section heading "No-Untreated Designs (no R
  parallel)" -> "Heterogeneous Adoption (HAD)"; body acknowledges `DIDHAD`
  and frames diff-diff's broader surface; Feature Comparison Table row
  label simplified; note about R equivalents updated; Migration Tips
  bullet narrowed and cross-reference target updated.

Sphinx build clean (0 warnings in edited files); also fixes a
title-underline-too-short warning on the "Resolved estimand..." subsection
that was below threshold by one character.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber

igerber commented Apr 26, 2026

Copy link
Copy Markdown
Owner Author

/ai-review

@github-actions

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 9a62e922816c659dc795b08517642b4d4b3a7426


Overall Assessment

⚠️ Needs changes

The prior P1s from the last review look resolved, but two new unmitigated P1 methodology/doc mismatches remain on the changed HAD public surface.

Executive Summary

  • The earlier blockers appear fixed: the new HAD examples now use the real *_col kwargs / result fields, and the estimator-side weighting docs now distinguish deprecated weights= from canonical survey_design=SurveyDesign(...).
  • P1 [Newly identified]: docs/api/had.rst now tells users that qug_test uses survey_design=make_pweight_design(weights), but qug_test still permanently rejects all survey/weighted inputs.
  • P1 [Newly identified]: the new troubleshooting text says “no exact dose == 0 implies Design 1 / WAS_d_lower”, but design="auto" also resolves to Design 1' when d.min() < 0.01 * median(|d|).
  • P2: the R-comparison refresh is directionally better, but one claim in docs/r_comparison.rst still overstates the remaining gap versus DIDHAD.
  • The newly documented HAD SE-family split / covariance limitations are already tracked in TODO.md, so those remain P3 informational only.

Methodology

  • Severity: P1 [Newly identified]. Impact: docs/api/had.rst:L67-L72 says the array-in HAD pretest helpers, including qug_test, use survey_design=make_pweight_design(weights). That contradicts the actual QUG contract: qug_test documents survey_design= / survey= / weights= as permanently unsupported and raises NotImplementedError, and the Methodology Registry explicitly says make_pweight_design(arr) is not a valid QUG migration target (diff_diff/had_pretests.py:L1236-L1255, diff_diff/had_pretests.py:L1361-L1387, docs/methodology/REGISTRY.md:L2352-L2353). Concrete fix: remove qug_test from that list and say make_pweight_design() applies only to stute_test, yatchew_hr_test, and stute_joint_pretest; for weighted data, direct QUG remains unsupported and did_had_pretest_workflow(..., survey_design=...) intentionally skips it.
  • Severity: P1 [Newly identified]. Impact: docs/troubleshooting.rst:L486-L490 says HAD routes to Design 1 whenever there are no exact dose = 0 rows. That is not the implemented default behavior: _detect_design() still resolves to continuous_at_zero / WAS when d.min() == 0 or when d.min() < 0.01 * median(|d|) (diff_diff/had.py:L1932-L1987, docs/methodology/REGISTRY.md:L2524-L2525). This misstates the public estimand-resolution contract and can make users misread results.target_parameter. Concrete fix: rewrite the troubleshooting cause to match the actual dispatch rule, including the small-positive-d_min Design 1' case.

Code Quality

No findings in the changed files.

Performance

No findings in the changed files.

Maintainability

No findings in the changed files.

Tech Debt

  • Severity: P3 informational. Impact: docs/api/had.rst:L61-L77 documents the weighted SE-family split, next-minor API consolidation, and missing joint cross-horizon analytical covariance, and those limitations are already tracked in TODO.md:L93-L102. Concrete fix: none required in this PR.

Security

No findings.

Documentation/Tests

  • Severity: P2. Impact: docs/r_comparison.rst:L227-L233, docs/r_comparison.rst:L420-L446 say diff-diff additionally covers the HAD multi-period event-study extension beyond DIDHAD. The current DIDHAD docs already expose dynamic effects, placebo, and event-study output in the QUG case, so the remaining gap is narrower than written. Concrete fix: narrow the comparison to the actual documented differences (e.g. Design 1 / no-QUG support and survey-design integration), or qualify the claim to the specific Appendix B.2 staggered/last-cohort surface if that is what you mean. (cran.r-universe.dev)

Not run here: local import/Sphinx smoke checks. This sandbox is missing numpy, and the workspace is read-only.

Path to Approval

  1. In docs/api/had.rst, remove qug_test from the make_pweight_design() guidance and explicitly state that weighted/survey-aware QUG is unsupported.
  2. In docs/troubleshooting.rst, replace the “no exact zero dose => Design 1” explanation with the implemented _detect_design() rule (d.min()==0 or d.min()<0.01*median(|d|) can still resolve to Design 1' / WAS).
  3. In docs/r_comparison.rst, narrow the DIDHAD comparison so it does not imply that R lacks HAD event-study support altogether.

…le, DIDHAD claim

P1 (qug_test in array-in pretest helper list): `docs/api/had.rst:67-72`
listed `qug_test` alongside `stute_test` / `yatchew_hr_test` /
`stute_joint_pretest` as accepting `survey_design=make_pweight_design(weights)`.
Per `had_pretests.py:1236-1255` and the methodology REGISTRY (Phase 4.5 C0
decision gate), `qug_test` permanently raises `NotImplementedError` on any
of `survey_design=` / `survey=` / `weights=` - there is no migration target
for survey-aware QUG, and `make_pweight_design()` is explicitly NOT a valid
QUG migration target. The composite workflow `did_had_pretest_workflow`
handles weighted dispatch by skipping QUG with a `UserWarning`. Removed
`qug_test` from the array-in helper list and added an explicit
permanent-rejection note pointing to the workflow's skip behavior.

P1 (estimand-resolution rule misstatement): `docs/troubleshooting.rst`
"Resolved estimand" subsection said "no exact `dose == 0` => Design 1".
Per `had.py:1932-1987` `_detect_design()` resolves to Design 1' when EITHER
`d.min() == 0` OR `d.min() < 0.01 * median(|d|)` (small-share-of-treated
escape clause). Rewrote the cause to spell out both sub-cases and clarify
that Design 1 only fires when `d.min()` is meaningfully positive relative
to the dose scale. Updated the inspection snippet to compute and print the
`0.01 * median(|d|)` threshold instead of just counting `dose == 0` rows.

P2 (DIDHAD event-study overstatement): `docs/r_comparison.rst` Heterogeneous
Adoption section, R-equivalents note, and Migration Tips bullet claimed
diff-diff additionally covers "the multi-period event-study extension
(paper Appendix B.2)" beyond `DIDHAD`. The `DIDHAD` package already
exposes dynamic effects / placebo / event-study output in the QUG case, so
this overstates the gap. Narrowed all three locations to the documented
differences: Design 1 (no QUG, `WAS_{d_lower}`) and survey-design
integration via Binder TSL.

Sphinx build clean (0 warnings in edited files; the unrelated
`tutorials/18_geo_experiments.ipynb:61` "File not found:
practitioner_decision_tree.html#few-test-markets" warning is pre-existing
on origin/main and not introduced here).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber

igerber commented Apr 26, 2026

Copy link
Copy Markdown
Owner Author

/ai-review

@github-actions

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: f63f40be6c80e458cc7398c3b83effb988ae7fd4


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review scope: the prior P1s appear resolved. docs/api/had.rst now correctly treats qug_test as the exception to survey-aware migration, and the estimand-resolution troubleshooting text now matches _detect_design().
  • The refreshed DIDHAD comparison is narrower than before and no longer makes the earlier overclaim about diff-diff being the only HAD event-study surface.
  • One non-blocking docs mismatch remains in the new HAD troubleshooting section: it overstates when staggered multi-cohort event-study inputs raise versus when the estimator auto-filters.
  • No code quality, performance, maintainability, or security findings in the changed files.

Methodology

  • Severity: P2. Impact: docs/troubleshooting.rst:L572-L597 says HeterogeneousAdoptionDiD.fit(..., aggregate="event_study") raises when staggered timing violates the last-cohort restriction and recommends manual subsetting. The actual contract is narrower: with first_treat_col provided, the estimator auto-filters to the last-treatment cohort plus never-treated and emits a warning; it raises only when the panel is staggered and first_treat_col is missing. That behavior is implemented in diff_diff/had.py:L1230-L1366 and diff_diff/had.py:L1470-L1499, and documented in docs/methodology/REGISTRY.md:L2408-L2408 and docs/methodology/REGISTRY.md:L2533-L2533. Concrete fix: rewrite this troubleshooting entry to split the two cases explicitly and make first_treat_col the primary remedy; keep manual subsetting as optional equivalent guidance.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • Severity: P3 informational. Impact: the weighted SE-family split and missing joint cross-horizon analytical covariance are now clearly documented in docs/api/had.rst:L61-L95 and already tracked in TODO.md:L93-L102, so they are non-blocking per the review policy. Concrete fix: none required in this PR.

Security

  • No findings.

Documentation/Tests

  • No additional findings beyond the methodology/doc-contract issue above.
  • Validation note: I could not run the Sphinx build here because sphinx is not installed in this environment.

The HAD "Panel-only event-study restriction" subsection in
`docs/troubleshooting.rst` overstated when staggered multi-cohort
event-study inputs raise. Per `had.py:1230-1366` and `had.py:1470-1499`
(also documented in `docs/methodology/REGISTRY.md:2408, 2533`):

- Common-adoption panel (single first-treat period): `first_treat_col`
  optional; the period is auto-inferred from the dose invariant.
- Staggered panel WITH `first_treat_col`: estimator auto-filters to the
  last-treatment cohort + never-treated and emits a UserWarning.
- Staggered panel WITHOUT `first_treat_col`: estimator raises (the only
  actual failure mode for this restriction).

Rewrote the cause to spell out the dispatch and made `first_treat_col`
the primary remedy; kept manual cohort subsetting as an equivalent that
skips the UserWarning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber

igerber commented Apr 26, 2026

Copy link
Copy Markdown
Owner Author

/ai-review

@github-actions

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 987bdcd42118bef761c202796cf0089671d706f3


Overall Assessment

⚠️ Needs changes

Executive Summary

  • Re-review scope: the prior HAD event-study troubleshooting mismatch is resolved. docs/troubleshooting.rst:L572-L602 now matches the last-cohort auto-filter / raise split in diff_diff/had.py:L1233-L1360 and diff_diff/had.py:L1470-L1499.
  • The new HAD API/pretest coverage is otherwise aligned with the exported public surface in diff_diff/__init__.py:L65-L81 and the registry notes in docs/methodology/REGISTRY.md:L2352-L2353 and docs/methodology/REGISTRY.md:L2488-L2505.
  • One P1 remains in the new troubleshooting guidance: it says HeterogeneousAdoptionDiD() defaults to robust=True, but the constructor default is robust=False, so the suggested workaround would reproduce the same mass-point + survey vcov_type='classical' failure.
  • One minor P3 docs mismatch remains: the weighted mass-point inference summaries omit supported CR1 clustered SEs.
  • No code-quality, performance, maintainability, or security issues in the changed files.
  • I could not rerun the Sphinx build here because python -m sphinx fails with No module named sphinx.

Methodology

  • Severity: P1. Impact: docs/troubleshooting.rst:L559-L563 recommends HeterogeneousAdoptionDiD() as the fix for the mass-point + survey + classical-VCOV NotImplementedError and says the default is robust=True. The implementation default is robust=False at diff_diff/had.py:L2643-L2649, and the weighted mass-point survey guard explicitly treats that default mapping as classical and raises at diff_diff/had.py:L3418-L3447. A user following the docs would hit the same error instead of the advertised workaround. Concrete fix: replace the snippet with a working override such as HeterogeneousAdoptionDiD(vcov_type='hc1') or HeterogeneousAdoptionDiD(robust=True), and remove the incorrect “robust=True by default” comment.
  • Severity: P3. Impact: docs/api/had.rst:L49-L54 and docs/choosing_estimator.rst:L665-L667 describe the deprecated weighted shortcut mass-point path as classical / hc1 only. The implementation and registry also support CR1 when cluster= is supplied on the weighted mass-point path, per diff_diff/had.py:L2276-L2284, diff_diff/had.py:L2433-L2448, and docs/methodology/REGISTRY.md:L2356-L2358 / docs/methodology/REGISTRY.md:L2527. This understates the supported inference surface. Concrete fix: clarify those summaries to say classical / hc1, plus CR1 when cluster= is supplied, or separate clustering from the vcov_type discussion.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • Severity: P3. Impact: the weighted-family split and missing joint cross-horizon analytical covariance called out in docs/api/had.rst:L61-L82 are already tracked in TODO.md:L93-L102, so they are non-blocking under the review policy. Concrete fix: none required in this PR.

Security

  • No findings.

Documentation/Tests

  • No additional findings beyond the methodology/doc-contract issues above.
  • Validation note: I could not rerun the docs build because sphinx is not installed in this environment.

Path to Approval

  1. Fix docs/troubleshooting.rst:L559-L563 so the recommended workaround actually selects HC1 on the mass-point survey path. The simplest correction is to show HeterogeneousAdoptionDiD(vcov_type='hc1') or HeterogeneousAdoptionDiD(robust=True), and state explicitly that plain HeterogeneousAdoptionDiD() keeps the default robust=False -> vcov_type='classical' mapping.

…lusion

P1 (default robust value): the troubleshooting "classical SE under survey"
solution and the api/had.rst inference-contract block both said the
default `robust=True` maps to `vcov_type='hc1'`. Per `had.py:2643-2649`
the constructor default is `robust=False`, which the survey + mass-point
guard at `had.py:3418-3447` treats as classical and raises - so the
suggested workaround `HeterogeneousAdoptionDiD()` reproduces the same
NotImplementedError. Replaced both with explicit working overrides
(`HeterogeneousAdoptionDiD(vcov_type='hc1')` or
`HeterogeneousAdoptionDiD(robust=True)`) and called out the wrong-default
trap explicitly.

P3 (CR1 omitted from weighted mass-point inference summary): both
`docs/api/had.rst:L49-L54` and `docs/choosing_estimator.rst:L665-L667`
described the deprecated `weights=` shortcut mass-point path as
"`classical` / `hc1` only". Per `had.py:2276-2284, 2433-2448` and
`docs/methodology/REGISTRY.md:2356-2358, 2527`, CR1 (Liang-Zeger) is also
supported when `cluster=` is supplied on the weighted mass-point path.
Added "CR1 when `cluster=` is supplied" to both summaries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber

igerber commented Apr 26, 2026

Copy link
Copy Markdown
Owner Author

/ai-review

@github-actions

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: ff6974104d2a25e43f0ca15445df440bca453d08


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review scope: the prior P1 on the HAD mass-point vcov_type="classical" workaround is resolved. docs/troubleshooting.rst:L541-L569 now matches the actual default mapping and guard in diff_diff/had.py:L2577-L2580 and diff_diff/had.py:L3415-L3447.
  • The prior P3 on weighted mass-point CR1 support is also resolved. docs/api/had.rst:L49-L55 and docs/choosing_estimator.rst:L665-L667 now match the weighted 2SLS contract documented in docs/methodology/REGISTRY.md:L2355-L2360.
  • The new HAD API/pretest coverage aligns with the exported public surface in diff_diff/__init__.py:L64-L82 and the registry notes for survey_design=, QUG-under-survey deferral, and variance-formula labels.
  • One P2 doc-contract issue remains: the new prose overstates did_had_pretest_workflow as adjudicating the HAD design / dispatching by panel structure, but the actual API requires explicit aggregate= selection and the estimator, not the workflow, resolves continuous_at_zero vs continuous_near_d_lower vs mass_point.
  • No P0/P1 methodology issues, no code-quality/performance/maintainability/security blockers in the changed files.
  • I could not rerun the Sphinx build here because python -m sphinx fails with ModuleNotFoundError: No module named 'sphinx'.

Methodology

  • Severity: P2. Impact: docs/api/had.rst:L130-L136, docs/practitioner_decision_tree.rst:L283-L301, and docs/r_comparison.rst:L227-L235 describe did_had_pretest_workflow as if it chooses the workflow shape from panel structure and/or adjudicates which HAD design path applies. In the implementation, the workflow has two explicit modes via aggregate= and requires callers to pass aggregate="event_study" on multi-period panels (diff_diff/had_pretests.py:L4033-L4045), while the HAD design itself is auto-resolved separately by _detect_design() from the dose support (diff_diff/had.py:L1932-L1987). A reader can reasonably expect automatic event-study dispatch or a pretest report that decides between continuous_at_zero, continuous_near_d_lower, and mass_point, but the API does neither. Concrete fix: rephrase the workflow as a diagnostic battery only; state explicitly that callers choose aggregate="overall" vs aggregate="event_study", and that HeterogeneousAdoptionDiD.fit() auto-detects the HAD design from the dose support.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No new untracked tech-debt findings. The remaining HAD limitations newly documented in docs/api/had.rst:L62-L83 are already tracked in TODO.md:L93-L102, so they are non-blocking under the review policy.

Security

  • No findings.

Documentation/Tests

  • No additional findings beyond the P2 documentation-contract issue above.
  • Validation note: I could not rerun the docs build in this environment because sphinx is not installed.

…y, not adjudicator)

The new HAD prose overstated `did_had_pretest_workflow` as adjudicating the
HAD design path or auto-dispatching by panel structure. Per
`had_pretests.py:4033-4045` the workflow has two explicit modes via the
`aggregate=` kwarg ("overall" vs "event_study") that the caller picks; the
HAD design (`continuous_at_zero` / `continuous_near_d_lower` / `mass_point`)
is resolved separately inside `HeterogeneousAdoptionDiD.fit` by
`_detect_design()` from the dose support (`had.py:1932-1987`).

- `docs/api/had.rst` HAD Pretests intro: rephrased as a diagnostic battery;
  spells out the two `aggregate=` modes selected by the caller and notes
  the design path is auto-detected inside the estimator.
- `docs/practitioner_decision_tree.rst` Universal Rollout snippet: comment
  no longer claims the workflow "adjudicates which design path"; clarifies
  that the estimator picks the design from the dose support.
- `docs/r_comparison.rst` Heterogeneous Adoption section: dropped
  "adjudicates the design path" claim; describes the workflow as a
  diagnostic battery and points the design-path resolution at the estimator.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber

igerber commented Apr 26, 2026

Copy link
Copy Markdown
Owner Author

/ai-review

@github-actions

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: db00262f7189f941f346c77021b534bb5f0c0f47


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review scope: the prior doc-contract issue is resolved. The updated prose now correctly separates pretest workflow dispatch from HAD design resolution in docs/api/had.rst:L130-L140, docs/practitioner_decision_tree.rst:L287-L310, and docs/r_comparison.rst:L233-L236, matching diff_diff/had_pretests.py:L4033-L4050 and diff_diff/had.py:L1932-L1987.
  • The new HAD pretest/API coverage matches the exported public surface in docs/api/had.rst:L127-L183 and diff_diff/__init__.py:L69-L81, diff_diff/__init__.py:L468-L488.
  • No unmitigated P0/P1 issues found. The refreshed survey/weighted, QUG-under-survey, mass-point-classical, and last-cohort event-study language aligns with docs/methodology/REGISTRY.md:L2352-L2382, diff_diff/had.py:L3399-L3447, diff_diff/had.py:L4147-L4181, diff_diff/had_pretests.py:L4081-L4160, and diff_diff/had.py:L1116-L1499.
  • One P3 informational item remains: the new SE summaries still read as if weighted mass-point CR1 is unconditional when cluster= is supplied, but design="mass_point" + weights= + aggregate="event_study" + cband=True is still a documented exception (docs/api/had.rst:L49-L55, docs/choosing_estimator.rst:L665-L667, docs/methodology/REGISTRY.md:L2380-L2382, diff_diff/had.py:L4065-L4181).
  • I could not rerun the docs build or runtime import smoke tests here because the environment is missing sphinx and numpy.

Methodology

  • Severity: P3 (informational; deviation already documented in REGISTRY.md). Impact: docs/api/had.rst:L49-L55 and docs/choosing_estimator.rst:L665-L667 summarize the deprecated weights= mass-point path as supporting CR1 whenever cluster= is supplied. In implementation, the default weighted event-study path still rejects design="mass_point" + weights= + cluster= + cband=True (diff_diff/had.py:L4065-L4181), and that exception is explicitly documented in docs/methodology/REGISTRY.md:L2380-L2382. Concrete fix: add a short qualifier to the CR1 wording, e.g. “except aggregate="event_study", cband=True on the mass-point path,” or link that sentence to the existing exception note.
  • No other findings.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No new untracked tech-debt findings. The remaining HAD limitations referenced by these docs are already tracked in TODO.md:L93-L102.

Security

  • No findings.

Documentation/Tests

  • No additional findings.
  • Validation note: static cross-check only. I could not rerun Sphinx or import-based validation in this environment because sphinx and numpy are unavailable.

…eption

The R4 CR1 addition wrote "CR1 when `cluster=` is supplied" without noting
the documented mass-point + `aggregate="event_study"` + `cband=True` carve-out:
that sub-path rejects an effective `classical` vcov (per `had.py:4147-4181`
and REGISTRY:2380-2382), so plain `cluster=` with the default `robust=False`
hits the classical-default trap. Both `docs/api/had.rst` and
`docs/choosing_estimator.rst` now spell out the carve-out and point to the
existing classical-default deviation note for the working override
(`vcov_type="hc1"` or `robust=True`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber

igerber commented Apr 26, 2026

Copy link
Copy Markdown
Owner Author

/ai-review

@github-actions

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: f43d737cb44dd090b2e6a1553b2cc26ba3ed4cdb


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review scope: the prior workflow/design-resolution documentation issue looks resolved. The new prose now correctly separates did_had_pretest_workflow() dispatch from HAD design auto-detection in docs/api/had.rst:L134-L144, matching diff_diff/had_pretests.py:L4033-L4050 and diff_diff/had.py:L1932-L1987.
  • The new HAD API page covers the expected 12 public pretest symbols, and that list matches the top-level exports in diff_diff/__init__.py:L69-L81 and diff_diff/__init__.py:L474-L488.
  • One residual informational issue remains in the SE contract docs: docs/api/had.rst:L49-L58 and docs/choosing_estimator.rst:L665-L667 still imply weights= + cluster= + aggregate="event_study" + cband=True can be made to work via vcov_type="hc1" / robust=True, but the implementation rejects that clustered combination outright, and also rejects mass-point survey_design + cluster= on the weighted path diff_diff/had.py:L3399-L3447, diff_diff/had.py:L4059-L4095, docs/methodology/REGISTRY.md:L2380-L2382.
  • No unmitigated P0/P1 findings.
  • Validation here was static only; I could not rerun the Sphinx build or import smoke tests because sphinx and numpy are not installed in this environment.

Methodology

  • P3 Impact: docs/api/had.rst:L49-L58, docs/api/had.rst:L89-L100, and docs/choosing_estimator.rst:L665-L667 still overstate clustered weighted mass-point support. They frame the remaining blocker as the classical-default path, and suggest vcov_type="hc1" / robust=True as a workaround. In code, design="mass_point" + weights= + cluster= + aggregate="event_study" + cband=True is rejected regardless of vcov_type, and design="mass_point" + survey_design + cluster= is also rejected on the weighted path diff_diff/had.py:L3399-L3447, diff_diff/had.py:L4059-L4095. This deviation is already documented in docs/methodology/REGISTRY.md:L2380-L2382, so it is informational rather than blocking. Concrete fix: revise the prose to say those clustered combinations are unsupported outright, and remove the implication that hc1/robust=True enables the clustered cband=True path.
  • No other findings.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No new untracked tech-debt findings. The remaining HAD limitations referenced by these docs are already tracked in TODO.md:L93-L102.

Security

  • No findings.

Documentation/Tests

  • No additional findings.
  • Validation note: static cross-check only. I could not rerun the docs build or import-based validation because sphinx and numpy are unavailable in this environment.

…ected outright

The R6 wording said `cluster=` + `aggregate="event_study"` + `cband=True`
on mass-point could be made to work via `vcov_type="hc1"` / `robust=True`.
Per `had.py:4059-4095` and `had.py:3399-3447` (also documented in
REGISTRY:2380-2382) that path is rejected outright regardless of `vcov_type`,
and `survey_design=` + `cluster=` on weighted mass-point is similarly
rejected. The error is about variance-family mixing in the sup-t
bootstrap / Binder-TSL composition, not about the classical-default trap.

- `docs/api/had.rst` inline weights= shortcut summary: narrowed the CR1
  qualifier - "rejected outright regardless of `vcov_type`" + cross-link
  to a new sibling deviation note.
- `docs/api/had.rst` new "Mass-point cluster-combination deviation" note
  beside the existing classical-default note: enumerates the two rejected
  combinations (survey_design= + cluster= static and event-study;
  weights= + cluster= + cband=True event-study) with the implementation's
  own workaround language (cband=False / drop cluster= / cluster= alone /
  weights= + cluster=).
- `docs/choosing_estimator.rst` SE Methods row: dropped the misleading
  "requires explicit hc1/robust=True" implication; says rejected outright
  for both the weighted shortcut + cluster= + event_study cband and the
  survey_design= + cluster= mass-point combination.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber

igerber commented Apr 26, 2026

Copy link
Copy Markdown
Owner Author

/ai-review

@github-actions

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 96eb711ca6e59de4a17f596da2ece239893d88b5


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review scope: the prior weighted clustered-mass-point documentation gap is resolved. docs/api/had.rst:L49-L117 and docs/choosing_estimator.rst:L665-L667 now match the documented deviations in docs/methodology/REGISTRY.md:L2380-L2382 and the guards in diff_diff/had.py:L3399-L3447 and diff_diff/had.py:L4059-L4182.
  • The new HAD pretests API page covers the expected public surface and matches the top-level exports in diff_diff/__init__.py:L69-L81 and diff_diff/__init__.py:L474-L488.
  • The new narrative docs correctly separate pretest-workflow dispatch from estimator design auto-detection, consistent with docs/api/had.rst:L150-L160, docs/practitioner_decision_tree.rst:L287-L310, diff_diff/had_pretests.py:L4015-L4050, and diff_diff/had.py:L1932-L1987.
  • One residual informational issue remains in troubleshooting: the mass-point paragraph describes the path as a fallback that “just changes the SE regime,” but the registry and implementation treat mass_point as a distinct Wald-IV / 2SLS estimator path as well as a different SE family.
  • Validation here was static only; I could not rerun the Sphinx build or import smoke tests because sphinx and numpy are not installed in this environment.

Methodology

Affected methods reviewed: HeterogeneousAdoptionDiD and the HAD pretest surfaces did_had_pretest_workflow, qug_test, stute_test, yatchew_hr_test, stute_joint_pretest, joint_pretrends_test, and joint_homogeneity_test.

  • P3 Impact: docs/troubleshooting.rst:L524-L539 frames mass-point resolution as a fallback to a “2SLS sandwich” that “just changes the SE regime.” In the registry and implementation, mass_point is a distinct Design 1 estimator path: the point estimate itself switches to the Wald-IV / 2SLS formula with instrument 1{D_2 > d_lower}, and the structural-residual sandwich is the corresponding inference path. The same paragraph also over-attributes that SE path to a paper appendix rather than to the library’s documented 2SLS implementation around the paper’s Section 3.2.4 design. Concrete fix: rephrase this subsection to say HAD auto-resolves to the mass-point design when the modal fraction at d_lower exceeds 2%; that design uses the mass-point Wald-IV / 2SLS estimator plus structural-residual sandwich inference. References: docs/troubleshooting.rst:L524-L539, docs/methodology/REGISTRY.md:L2400-L2402, docs/methodology/REGISTRY.md:L2523-L2527, diff_diff/had.py:L33-L39, diff_diff/had.py:L2260-L2272.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No new untracked tech-debt findings. The remaining HAD follow-ups referenced by the new docs are already tracked in TODO.md:L93-L102.

Security

  • No findings.

Documentation/Tests

  • No blocking documentation/test issues in the diff.
  • Validation note: static cross-check only. I could not rerun the docs build or import-based smoke tests because sphinx and numpy are unavailable in this environment.

… fallback

The "Mass-point fit fallback" troubleshooting subsection framed
`mass_point` as a fallback that "just changes the SE regime." Per
`had.py:33-39, 2260-2272` and REGISTRY:2400-2402, 2523-2527, mass-point is
a distinct Design 1 estimator path from dCDH 2026 paper Section 3.2.4
(not the Appendix): both the point estimate (Wald-IV sample-average ratio
with binary instrument `Z_g = 1{D_{g,2} > d_lower}`) AND the SE
(structural-residual 2SLS sandwich) differ from the continuous local-linear
/ CCT-2014 paths.

Renamed the subsection to "Mass-point design selected", rewrote the cause
to spell out the 2%-modal-fraction trigger, the Wald-IV ratio formula, and
the structural-residual sandwich; corrected the paper-section attribution
(Section 3.2.4, not Appendix); refined the solution snippet to point at
re-binning to drop modal fraction below 2% if the user prefers
continuous_near_d_lower.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber

igerber commented Apr 26, 2026

Copy link
Copy Markdown
Owner Author

/ai-review

@github-actions

Copy link
Copy Markdown

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 26, 2026
@igerber igerber merged commit a7b8454 into main Apr 26, 2026
4 of 5 checks passed
@igerber igerber deleted the docs-rtd-audit branch April 26, 2026 14:15
igerber added a commit that referenced this pull request Apr 26, 2026
…Intensity

Practitioner walkthrough for HeterogeneousAdoptionDiD on the
no-untreated-controls case: every market got the campaign at varying
intensity and there is no clean comparison group. Fills the structural
gap T14 (ContinuousDiD) cannot address.

Notebook scope (23 cells, 13 markdown / 10 code, mirrors T19's
structure):
- Sections 1-3: framing the no-untreated-controls measurement problem,
  setup imports, synthetic 60-DMA / 8-week panel with Uniform[$5K, $50K]
  regional add-on spend (every DMA participates, no DMA at $0). DGP
  is internally consistent: outcomes are generated from the dose
  values HAD then sees, no post-hoc relabeling.
- Section 4: overall WAS_d_lower fit on a 2-period (pre/post mean)
  collapse - HAD's overall mode requires exactly 2 periods
  (had.py:952-959). Locked headline: per-$1K marginal effect of 100
  weekly visits per DMA above the boundary spend (95% CI [98.6,
  101.4]) with design auto-detection landing on
  `continuous_near_d_lower` (Design 1) and target `WAS_d_lower`.
  Surfaces the Assumption 5/6 advisory the library fires for Design 1
  and explains why it holds in this DGP (linear by construction).
- Section 5: multi-week event-study fit on the 8-week panel,
  per-week WAS_d_lower for e=0..3 (~100 each, CIs cover truth) and
  pre-launch placebos at e=-2..-4 sitting on zero.
- Section 6: stakeholder communication template (T18/T19 markdown
  blockquote pattern), per-DMA dollar-lift interpretation
  `(actual_dose - d_lower) * WAS_d_lower`, Assumption 6 caveat.
- Section 7: extensions (population-weighted/survey path, composite
  pretest workflow described accurately as QUG support-infimum test
  + linearity tests, mass-point design path), related-tutorials
  cross-links (T01, T02, T14, T17, T18, T19), summary checklist.

Drift detection: companion tests/test_t20_had_brand_campaign_drift.py
(13 tests, 0.06s, mirrors T19's test-file-only pattern - T19's
notebook itself has zero in-notebook asserts). Pins panel composition
including sample median, design auto-detection / target / d_lower,
overall WAS_d_lower / SE / CI endpoints to one-decimal display, dose
mean, n_units, full event-study horizon presence (e=-4..-2, 0..3),
per-week post-launch coverage of TRUE_SLOPE=100 and zero coverage at
every placebo horizon (|placebo_att| < 0.1). Tight `round(_, 1) == X.X`
pins throughout - HAD's analytical SE path is bit-identical regardless
of backend env (no Rust kernel involved). Locked DGP seed: MAIN_SEED=87.

Documentation integration:
- docs/tutorials/README.md: new T20 entry following T18/T19's
  5-bullet pattern.
- docs/doc-deps.yaml: T20 added to the existing diff_diff/had.py
  entry; cross-link to docs/practitioner_decision_tree.rst added.
- docs/practitioner_decision_tree.rst: `.. tip::` block at the end
  of `section-no-untreated` (Universal Rollout - landed on main via
  PR #389) cross-links to T20 for the full walkthrough.
- CHANGELOG.md: new ### Added bullet under [Unreleased].

Out of scope (queued in project_had_followups.md memory):
- _handle_had in practitioner.py:_HANDLERS map.
- HAD entries in llms-full.txt / choosing_estimator.rst.
- Pretest workflow tutorial, weighted/survey HAD tutorial,
  mass-point design demo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 26, 2026
The HeterogeneousAdoptionDiD example snippet in choosing_estimator.rst
failed the Pure Python Fallback CI job (test_doc_snippets.py
::test_doc_snippet[choosing_estimator:block7]) due to three latent
drift bugs from PR #389 (docs-rtd-audit):

1. Missing aggregate='event_study' on both did_had_pretest_workflow
   and HAD.fit calls — default aggregate='overall' requires exactly 2
   periods, but the doc-snippet test framework's namespace `data`
   (built via generate_staggered_data) has 10 periods.

2. Used the namespace's generic `data` variable, which has nonzero
   dose in every period (rng.choice from {0.0, 0.5, 1.0, 2.0}). HAD
   requires D=0 for all units in at least one pre-period.

3. `print(f"Estimate: {results.att:.3f}")` formatted att as a scalar,
   but under aggregate='event_study' results.att is a numpy array.

Fix: rewrite the snippet to construct its own HAD-shape panel inline
(mirrors how block6 handles ContinuousDiD with its own data
generator); thread aggregate='event_study' through both calls;
iterate the per-horizon att array for output.

Pre-existing on origin/main; surfaced on this PR's CI re-run after
the rebase. Other failing snippets (troubleshooting:block18, :block20,
r_comparison:block6, :block7) are also pre-existing on main but are
out of scope for this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 26, 2026
Fix latent doc-snippet bugs from PR #389 (HAD ecosystem)
igerber added a commit that referenced this pull request Apr 26, 2026
…Intensity

Practitioner walkthrough for HeterogeneousAdoptionDiD on the
no-untreated-controls case: every market got the campaign at varying
intensity and there is no clean comparison group. Fills the structural
gap T14 (ContinuousDiD) cannot address.

Notebook scope (23 cells, 13 markdown / 10 code, mirrors T19's
structure):
- Sections 1-3: framing the no-untreated-controls measurement problem,
  setup imports, synthetic 60-DMA / 8-week panel with Uniform[$5K, $50K]
  regional add-on spend (every DMA participates, no DMA at $0). DGP
  is internally consistent: outcomes are generated from the dose
  values HAD then sees, no post-hoc relabeling.
- Section 4: overall WAS_d_lower fit on a 2-period (pre/post mean)
  collapse - HAD's overall mode requires exactly 2 periods
  (had.py:952-959). Locked headline: per-$1K marginal effect of 100
  weekly visits per DMA above the boundary spend (95% CI [98.6,
  101.4]) with design auto-detection landing on
  `continuous_near_d_lower` (Design 1) and target `WAS_d_lower`.
  Surfaces the Assumption 5/6 advisory the library fires for Design 1
  and explains why it holds in this DGP (linear by construction).
- Section 5: multi-week event-study fit on the 8-week panel,
  per-week WAS_d_lower for e=0..3 (~100 each, CIs cover truth) and
  pre-launch placebos at e=-2..-4 sitting on zero.
- Section 6: stakeholder communication template (T18/T19 markdown
  blockquote pattern), per-DMA dollar-lift interpretation
  `(actual_dose - d_lower) * WAS_d_lower`, Assumption 6 caveat.
- Section 7: extensions (population-weighted/survey path, composite
  pretest workflow described accurately as QUG support-infimum test
  + linearity tests, mass-point design path), related-tutorials
  cross-links (T01, T02, T14, T17, T18, T19), summary checklist.

Drift detection: companion tests/test_t20_had_brand_campaign_drift.py
(13 tests, 0.06s, mirrors T19's test-file-only pattern - T19's
notebook itself has zero in-notebook asserts). Pins panel composition
including sample median, design auto-detection / target / d_lower,
overall WAS_d_lower / SE / CI endpoints to one-decimal display, dose
mean, n_units, full event-study horizon presence (e=-4..-2, 0..3),
per-week post-launch coverage of TRUE_SLOPE=100 and zero coverage at
every placebo horizon (|placebo_att| < 0.1). Tight `round(_, 1) == X.X`
pins throughout - HAD's analytical SE path is bit-identical regardless
of backend env (no Rust kernel involved). Locked DGP seed: MAIN_SEED=87.

Documentation integration:
- docs/tutorials/README.md: new T20 entry following T18/T19's
  5-bullet pattern.
- docs/doc-deps.yaml: T20 added to the existing diff_diff/had.py
  entry; cross-link to docs/practitioner_decision_tree.rst added.
- docs/practitioner_decision_tree.rst: `.. tip::` block at the end
  of `section-no-untreated` (Universal Rollout - landed on main via
  PR #389) cross-links to T20 for the full walkthrough.
- CHANGELOG.md: new ### Added bullet under [Unreleased].

Out of scope (queued in project_had_followups.md memory):
- _handle_had in practitioner.py:_HANDLERS map.
- HAD entries in llms-full.txt / choosing_estimator.rst.
- Pretest workflow tutorial, weighted/survey HAD tutorial,
  mass-point design demo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HanomicsIMF pushed a commit to HanomicsIMF/diff-diff that referenced this pull request Apr 27, 2026
PR igerber#389 added HAD code snippets to choosing_estimator.rst,
troubleshooting.rst, and r_comparison.rst. None of those edits
triggered rust-test.yml (which only runs on rust/, diff_diff/, tests/,
pyproject.toml, and the workflow file), so tests/test_doc_snippets.py
never executed and the snippets shipped with five latent bugs that now
surface on every code PR via the Pure Python Fallback job.

Bugs addressed:

- r_comparison:block6 — bare HAD.fit(data, ...) with the
  generate_staggered_data fixture failed because the default
  aggregate='overall' requires exactly 2 periods and the namespace
  data has 10. Replaced with an inline HAD-shape panel construction
  (mirrors the upstream choosing_estimator:block7 fix in 55d7a27)
  plus aggregate='event_study'.

- troubleshooting:block20 — the snippet demonstrates
  first_treat_col= auto-filtering on a staggered panel. The fixture's
  first_treat values disagree with the dose path (random per-row dose
  on never-treated units), tripping HAD's first_treat / dose-path
  consistency validator. Inlined a 120-unit / 10-period staggered
  HAD-shape panel (30 never + 30 cohort 5 + 60 cohort 8) so the
  validator passes and the boundary local-linear estimator has
  enough distinct dose values to fit.

- troubleshooting:block17 / block18 / r_comparison:block7 — these are
  legitimately context-dependent snippets that reference est /
  results from prior text-flow context (inspection / output-format
  examples). Added them to _CONTEXT_DEPENDENT_SNIPPETS so the
  expected NameError is suppressed, matching the pattern already
  used for block8, the api_bacon blocks, and the existing
  r_comparison context-dependent set.

choosing_estimator:block7 was the sixth failing snippet but was
already fixed upstream in 55d7a27 with the inline-construction
pattern; this branch rebases onto that.

Verification: PYTHONPATH=. DIFF_DIFF_BACKEND=python pytest
tests/test_doc_snippets.py reports 111 passed, 4 skipped, 0 failed
on this branch (was 6 failed on origin/main before 55d7a27 and 5
failed after).

Follow-up (separate PR queued): carve test_doc_snippets.py out into a
dedicated docs-tests.yml workflow triggered on docs/** + diff_diff/**
+ the test file itself, and exclude it from rust-test.yml's pytest
invocations so doc bugs are caught on doc PRs (not silently inherited
by code PRs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HanomicsIMF pushed a commit to HanomicsIMF/diff-diff that referenced this pull request Apr 27, 2026
…est.yml

PR igerber#389 (Batch A) shipped six broken HAD doc snippets in 2026-04 that
no CI run caught because rust-test.yml only triggers on
rust/, diff_diff/, tests/, pyproject.toml, and the workflow file —
none of which include docs/. PR igerber#396 patched the snippets but did not
address the structural gap. This PR addresses it.

Two changes:

1. New .github/workflows/docs-tests.yml — separate workflow that
   runs `pytest tests/test_doc_snippets.py -v` on a single
   ubuntu-latest / py3.14 / pure-Python runner. Triggers on
   docs/, diff_diff/, tests/test_doc_snippets.py, pyproject.toml,
   and the workflow file itself; same ready-for-ci label gate as
   rust-test.yml / notebooks.yml. Mirrors notebooks.yml's shape
   (the existing precedent for `pytest`-validated docs assets) so
   the two doc-validation workflows look like siblings.

2. .github/workflows/rust-test.yml: add
   --ignore=tests/test_doc_snippets.py to all three pytest
   invocations so doc snippets stop riding the code workflow.

   The Pure Python Fallback edit (line 193) is the only one that
   changes CI signal: that job runs from the repo root and was the
   ONLY place where test_doc_snippets.py actually executed. The two
   Rust-matrix edits (lines 158, 165) are defensive consistency: the
   matrix copies tests/ to /tmp/tests (rust-test.yml:138, 142)
   without docs/, so DOCS_DIR resolves to /tmp/docs/ which doesn't
   exist; the test collector silently skips every RST file via the
   guard at tests/test_doc_snippets.py:129. Adding --ignore there
   prevents the no-op from becoming a real run if anyone later adds
   `cp -r docs ...` to the copy steps. Each invocation now carries
   an in-YAML comment documenting which case it's the defensive vs
   behavior-changing edit.

Verification:
- python -c "import yaml; yaml.safe_load(open('.github/workflows/
  docs-tests.yml')); yaml.safe_load(open('.github/workflows/
  rust-test.yml'))" — both files well-formed.
- pytest tests/ --ignore=tests/test_doc_snippets.py
  --ignore=tests/test_rust_backend.py --collect-only — 0 occurrences
  of test_doc_snippets in the collected set (was 115 cases collected
  when not ignored), confirming pytest accepts repeated --ignore
  flags as the existing line-193 pattern with --ignore=tests/
  test_rust_backend.py already showed.

After this PR opens, the workflow file itself triggers docs-tests.yml
on its own change, providing the first end-to-end CI validation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TDL77 pushed a commit to TDL77/diff-diff that referenced this pull request May 18, 2026
…backlog

Wave 1 of multi-wave tech-debt paydown. Docs-only — no estimator code touched.

Changes:
- Delete 3 SHIPPED rows marked `Done`: HAD trends_lin (PR igerber#389),
  HAD yatchew_hr_test(null="mean_independence") (post-PR igerber#392),
  HAD Phase 5 follow-up tutorial T22 (PR igerber#440).
- Fix stray blank line inside Methodology/Correctness table (was rendering
  as two fragments).
- Correct dCDH survey cell-period allocator row's PR-column "PR 2" → "igerber#408"
  (actual dcdh-by-path-survey-design merge).
- Refresh Known Limitations line refs to current state of estimators.py
  (`:778-784` → `:1647`; `:567-588` → `:890-911`).
- Regenerate Large Module Files table from `wc -l diff_diff/*.py` ≥1000 lines.
  Add 11 newly-eligible modules (chaisemartin_dhaultfoeuille.py 8636,
  had_pretests.py 4951, had.py 4593, etc.). Update header sentence to make
  the 3000/2000/1000 thresholds explicit instead of silently relaxing the
  documented "< 1000 lines" target.
- Rewrite Standard Error Consistency section — `vcov_type` has subsumed the
  proposed `se_type` knob; point readers at the existing methodology row
  for the remaining 8-standalone-estimator threading work.
- Modernize Test Coverage section to reference `pytest.importorskip` rather
  than a stale "21 visualization tests" count.
- Add new `### Prioritized Tech-Debt Backlog` subsection: Tier A/B/C/D
  ordering by effort × risk, anchoring back to existing source-of-truth
  table rows so the tables remain canonical. Seven Tier A quick wins,
  twelve Tier B mid-size methodology items, nine Tier C derivation items,
  eight Tier D deferred/research items.

The new tier structure exists so subsequent waves can be picked off
without re-litigating priority each turn. Wave 2 candidate is the
clubsandwich_cr2_golden.json regeneration from R (row 87 / Tier A).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request May 20, 2026
…prose

L194 checklist update (this PR) said the Eq. 18 detrending variant
shipped in PR #389; the explanatory prose immediately below at L200
still said "Phase 4 extends it with the Eq (18) detrending" as if it
were future work. Rewritten to past tense matching the L194 closure
and the REGISTRY § "Note (Phase 4 — Eq 17 / Eq 18 linear-trend
detrending shipped)" framing. Only the Pierce-Schott numerical
replication remains waived (REGISTRY Deviations Note #3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
HanomicsIMF pushed a commit to HanomicsIMF/diff-diff that referenced this pull request May 22, 2026
- P3 (Methodology): the promoted HAD materials described the Eq. 17/18
  `trends_lin=True` linear-trend-detrended variant as "deferred per Phase 4".
  This conflated TWO different things: (a) the FEATURE — which is shipped
  via the `trends_lin: bool = False` keyword-only kwarg on HAD.fit(),
  joint_pretrends_test, and joint_homogeneity_test (PR igerber#389; R-parity locked
  against DIDHAD::did_had(trends_lin=TRUE) v2.0.0 in test_did_had_parity.py);
  and (b) the PIERCE-SCHOTT NUMERICAL REPLICATION against the published
  p=0.51 anchor on the LBD-restricted panel, which IS waived per REGISTRY
  Deviations Note igerber#3. Updated 3 surfaces (paper-review L194, METHODOLOGY_REVIEW
  Eq. 18 Verified-Components row, test_methodology_had.py module docstring +
  TestHADJointStute class docstring) to distinguish "feature shipped + R-parity
  locked elsewhere" from "Pierce-Schott numerical replication waived".

- P3 (Documentation/Tests): TestHADJointStute promotion narrative overstated
  H1 coverage as "H0 fail-to-reject and H1 reject on linear vs nonlinear DGPs"
  for both joint_pretrends_test and joint_homogeneity_test. Reality: H1
  rejection is tested only on joint_homogeneity_test via a quadratic post-
  DGP; joint_pretrends_test gets H0-only coverage in this file (H1 would
  require a violating-pretrends fixture that re-verifies bootstrap calibration
  covered by test_had_pretests.py). Narrowed wording in METHODOLOGY_REVIEW
  Verified-Components row + TestHADJointStute class docstring; CHANGELOG entry
  unchanged (the H1 reject claim in CHANGELOG explicitly cites the homogeneity
  side via "H1 reject under nonlinear DGP", which is accurate).

All 35 methodology tests pass; lint clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant