docs(survey): waive zero-weight-PSU SE-invariance item; lock Lumley full-design convention by igerber · Pull Request #589 · igerber/diff-diff · GitHub
Skip to content

docs(survey): waive zero-weight-PSU SE-invariance item; lock Lumley full-design convention#589

Merged
igerber merged 3 commits into
mainfrom
docs/survey-se-zero-weight-waiver
Jun 30, 2026
Merged

docs(survey): waive zero-weight-PSU SE-invariance item; lock Lumley full-design convention#589
igerber merged 3 commits into
mainfrom
docs/survey-se-zero-weight-waiver

Conversation

@igerber

@igerber igerber commented Jun 30, 2026

Copy link
Copy Markdown
Owner

Summary

  • Re-examined the Actionable Backlog row proposing that the survey TSL finite-sample correction "count only positive-weight PSUs" so the SE is invariant to zero-weight (subpopulation / padded) rows. The premise conflicts with the library's documented, R-validated convention, so the item is waived with no estimator behavior change.
  • _compute_stratified_psu_meat's per-stratum correction (1 - f_h)*n_PSU_h/(n_PSU_h - 1) and PSU-mean centering intentionally keep genuine-subpopulation zero-weight PSUs — the full-design domain estimator of Lumley (2004 §3.4) / R survey::svyrecvar(subset()). The ATT is exactly invariant; the survey SE is deliberately not invariant to genuine-subpopulation zeroing (it should differ from a naive physical subset — the whole point of subpopulation()). R produces the matching SE (only df differs).
  • The only invariance-violating shape — appending synthetic new all-zero PSUs — arises in no estimator path (domain padding goes through prep.py's zero-padded full-design cell variance, which retains the real PSU layout; zero-weight rows reusing an existing PSU label are already bit-invariant). Forcing positive-weight-only counting would break the documented Lumley/R parity.
  • TODO.md: moved the row from Actionable Backlog → "Won't-fix / waived (decisions on the record)" with the Lumley/R justification.
  • docs/methodology/REGISTRY.md: added a Note in § "Subpopulation Analysis" making explicit that the TSL meat finite-sample correction counts zero-weight PSUs by design.
  • tests/test_survey.py: added TestZeroWeightPsuConventionWaiver to lock the decision (inert existing-PSU padding is bit-invariant; subpopulation zeroing keeps the full PSU structure so its SE differs from a naive subset). A future positive-weight-only change would collapse the two and trip the test.

Methodology references

  • Method name(s): Survey Taylor-linearization (TSL) variance; subpopulation / domain estimation.
  • Paper / source link(s): Lumley (2004) §3.4 "Analysis of complex survey samples"; R survey::svyrecvar(subset()). Existing documentation: REGISTRY § "Subpopulation Analysis".
  • Any intentional deviations from the source (and why): None new. This documents an existing intentional convention (full-design domain estimation, matching R) and declines a change that would have introduced a divergence. No estimator behavior changes.

Validation

  • Tests added/updated: tests/test_survey.py::TestZeroWeightPsuConventionWaiver (2 tests). Targeted survey zero-weight / subpopulation / invariance subset: 20 passed.
  • Backtest / simulation / notebook evidence (if applicable): N/A.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

…ull-design convention

Re-examined the TODO row proposing the survey TSL finite-sample correction
count only positive-weight PSUs so the SE is invariant to zero-weight
(subpopulation / padded) rows. Investigation shows the premise conflicts with
the library's documented, R-validated convention:

- `_compute_stratified_psu_meat`'s per-stratum correction
  (1 - f_h)*n_PSU_h/(n_PSU_h - 1) and PSU-mean centering intentionally keep
  genuine-subpopulation zero-weight PSUs. This is the full-design domain
  estimator of Lumley (2004 Section 3.4) / R survey::svyrecvar(subset()),
  already documented in REGISTRY section "Subpopulation Analysis".
- The ATT is exactly invariant; the survey SE is deliberately NOT invariant to
  genuine-subpopulation zeroing (it should differ from a naive physical subset
  -- that is the whole point of subpopulation()). R produces the matching SE
  (only df differs).
- Zero-weight rows that reuse an existing PSU label are already bit-invariant.
  The only invariance-violating shape -- appending synthetic new all-zero PSUs
  -- arises in no estimator path (domain padding goes through prep.py's
  zero-padded full-design cell variance, which retains the real PSU layout).

Forcing the meat to positive-weight-only counting would break the documented
Lumley/R parity, so the item is waived (no estimator behavior change):

- TODO.md: move the row from Actionable Backlog to "Won't-fix / waived
  (decisions on the record)" with the Lumley/R justification.
- REGISTRY.md: add a Note in section "Subpopulation Analysis" making explicit
  that the TSL meat finite-sample correction counts zero-weight PSUs by design.
- tests/test_survey.py: add TestZeroWeightPsuConventionWaiver regression-lock
  (inert existing-PSU padding is bit-invariant; subpopulation zeroing keeps the
  full PSU structure so its SE differs from a naive subset). A future
  positive-weight-only change would collapse the two and trip the test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

…t test

Addresses the CI review's actionable P3 (Documentation/Tests): the SE-level
test only asserts that subpopulation zeroing differs from a physical subset,
which catches a full positive-weight-only rewrite but could miss a partial
edit (e.g. changing only the finite-sample denominator while still centering
over the zero PSU).

Adds a direct unit test on _compute_stratified_psu_meat with crafted PSU
scores including one all-zero-score PSU (a fully zeroed subpopulation PSU),
asserting the exact full-design meat formula (n_PSU_h including the zero
PSU) and that it is NOT the positive-weight-only meat. Any change to the
centering OR the denominator now trips the lock.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 444c8333dd0073ceffe2e32652454f0c89c16299


Overall Assessment

✅ Looks good. No unmitigated P0/P1 findings.

Executive Summary

  • Affected method: Survey Taylor-linearization variance for subpopulation/domain estimation.
  • This PR changes docs/TODO/tests only; no estimator runtime behavior changes.
  • The new REGISTRY note matches the current full-design PSU meat implementation at diff_diff/survey.py:L1447-L1487.
  • Source check supports the convention: R subset.survey.design keeps original cluster/stratum information, and R svyCprod/onestage adds zero PSU rows for correct subpopulation variance. (rdrr.io)
  • Prior optional test coverage was improved with a direct _compute_stratified_psu_meat regression test.
  • I could not run tests here because pytest and numpy are not installed.

Methodology

Finding: P3 informational, no action required
Impact: The documented survey SE non-invariance to genuine subpopulation zeroing is consistent with the Methodology Registry note at docs/methodology/REGISTRY.md:L4539-L4557, the waived TODO entry at TODO.md:L147-L155, and the local implementation’s PSU-score grouping, stratum centering, and n_PSU_h/(n_PSU_h - 1) correction at diff_diff/survey.py:L1447-L1487. The R survey docs also support preserving original design information and zero-padding missing domain PSUs for subpopulation variance. (rdrr.io)
Concrete fix: None required.

Code Quality

No findings.

Performance

No findings. The PR does not change runtime code.

Maintainability

No findings. Moving the former backlog item into the waived-decision table makes the convention easier to preserve.

Tech Debt

No findings. The former action item is now explicitly tracked as waived in TODO.md:L147-L155.

Security

No findings. Changed files are docs/tests only; no secrets or unsafe execution paths observed.

Documentation/Tests

Finding: P3 informational
Impact: The new direct meat test at tests/test_survey.py:L2258-L2311 improves coverage, but its crafted “zeroed PSU” uses zero score rows while ResolvedSurveyDesign.weights remains all ones at tests/test_survey.py:L2280-L2288. That means it would not catch a future denominator-only edit that counts positive-weight PSUs using resolved.weights while still centering over full PSU scores. The end-to-end test still catches the main full positive-weight-only regression.
Concrete fix: Optional: set the two rows for PSU 2 to zero weights in the direct meat fixture, e.g. weights=np.array([1, 1, 0, 0, 1, 1], dtype=float), so the unit test models a true fully zero-weight subpopulation PSU.

Addresses the re-review's actionable P3: the direct meat test represented its
all-zero PSU via zero score rows only, with weights left all ones. A future
denominator-only edit that reads resolved.weights to drop positive-weight PSUs
would not have been caught. Set PSU 2's weights to 0 so the fixture models a
true fully zero-weight subpopulation PSU. The current meat ignores weights
(it operates on scores), so the expected value is unchanged; the change only
hardens the regression lock.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

@igerber igerber added the ready-for-ci Triggers CI test workflows label Jun 30, 2026
@igerber igerber merged commit 6b052e6 into main Jun 30, 2026
31 of 32 checks passed
@igerber igerber deleted the docs/survey-se-zero-weight-waiver branch June 30, 2026 16:11
@igerber igerber mentioned this pull request Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant