iframe-proxy

igerber · 2026-05-16T12:38:46Z

Summary

Wave 3 of multi-wave tech-debt paydown. Three Tier-A items bundled under a coherent "estimator observability" theme — each commit is independently rollback-able.

bc0bf39a HonestDiD test_m0_short_circuit: replace wall-clock elapsed < 0.5s proxy with mock.patch on scipy.optimize.linprog + assert_not_called(). CI-safe (no timing dependency), instantaneous, and a direct correctness signal — the M=0 fast path in _compute_worst_case_bias is supposed to skip the LP solver, so verifying the mock was never called is exactly right.
cf363048 WooldridgeDiD canonical-link warning: module-level _warn_if_canonical_link_violated(method, y, stacklevel) helper called from _fit_ols, _fit_logit, _fit_poisson at stacklevel=3 (matching the existing _warn_and_fill_nan_cohort convention). Four mismatches detected: ols+binary → recommend logit; ols+count → recommend poisson; logit+fractional → note QMLE consistent but not canonical; poisson+binary → recommend logit. Estimator remains consistent under QMLE; warning surfaces the canonical-link violation (W2023 Prop 3.1) that breaks numerical equivalence between QMLE-direct and OLS-by-imputation paths. REGISTRY.md gains a one-bullet note.
350ce841 HonestDiD ARP vertex-rejection diagnostic: _enumerate_vertices now tracks n_total / n_linalg_error / n_infeasible counters and emits a RuntimeWarning when (a) enumeration exhausts without feasible vertices, or (b) ≥ 50% of bases were rejected for LinAlgError. The previous silent-skip behavior is fully preserved (continue on LinAlgError, return List[np.ndarray]); the warning is purely additive observability. No caller modifications.
a6c562c4 TODO.md: remove 3 Tier-A bullets + 2 Methodology/Correctness rows (item 1: Tier-A only; items 5 + 6: both surfaces).

Net diff: 6 files, +284/-28.

Tests

HonestDiD: 30/30 pass in tests/test_methodology_honest_did.py, including new TestARPVertexEnumeration (3 cases: exhausted enumeration, heavy rejection, healthy enumeration). Old test_m0_short_circuit rewritten — drops the @skipif(CI=="true") decorator (CI-safe now).
WooldridgeDiD: 112/112 pass in tests/test_wooldridge.py, including new TestCanonicalLinkWarning (6 cases: 3 positive + 3 negative warning paths).
HonestDiD-touching: 138/138 across the whole repo.

Test plan

pytest tests/test_methodology_honest_did.py -v — 30 pass
pytest tests/test_wooldridge.py -v — 112 pass
pytest tests/test_methodology_honest_did.py::TestOptimalFLCI::test_m0_short_circuit -v — passes (no skip on CI)
pytest tests/test_wooldridge.py::TestCanonicalLinkWarning -v — 6 pass
pytest tests/test_methodology_honest_did.py::TestARPVertexEnumeration -v — 3 pass

🤖 Generated with Claude Code

github-actions · 2026-05-16T12:44:23Z

Overall Assessment

⚠️ Needs changes

Executive Summary

P1: the new Wooldridge canonical-link warning matrix is not source-faithful. Wooldridge (2023) states pooled QMLE and imputation coincide when the chosen LEF density uses its canonical link, lists linear-normal/OLS for “any response,” and lists logistic-Bernoulli for “binary or fractional response.” The PR therefore warns on source-supported uses such as fractional logit, and its OLS-on-binary/count warnings are not justified by the cited proposition. (academic.oup.com)
That incorrect interpretation is wired into all three fit paths in diff_diff/wooldridge.py and is repeated in docs/methodology/REGISTRY.md, so this is not mitigated by documentation; the documentation is part of the problem.
The new Wooldridge tests codify the same paper-inconsistent behavior, especially test_logit_on_fractional_warns, and they still do not directly cover the newly added poisson-on-binary branch. (academic.oup.com)
The HonestDiD changes look fine from a review standpoint: the M=0 test is materially better than the old timing proxy, and the ARP vertex diagnostics are additive only.
I could not execute the test suite in this environment because pytest and project Python deps such as numpy are unavailable.

Methodology

Severity P1 | Impact: diff_diff/wooldridge.py:L116-L180 introduces a “canonical-link violation” helper that warns on ols for binary/count outcomes and on logit for fractional outcomes, and it is invoked from all three fit paths at diff_diff/wooldridge.py:L717-L719, diff_diff/wooldridge.py:L944-L944, and diff_diff/wooldridge.py:L1181-L1181. This conflicts with the estimator’s own method contract in diff_diff/wooldridge.py:L340-L342 and with Wooldridge (2023): Proposition 3.1 is about using the canonical link for the chosen LEF density, Table 1 lists linear-normal/OLS for “any response,” and the paper explicitly treats logistic/Bernoulli as appropriate for binary or fractional responses. The REGISTRY note at docs/methodology/REGISTRY.md:L1389-L1389 does not mitigate this, because it repeats the same unsupported interpretation rather than documenting an intentional deviation. Concrete fix: remove this outcome-support warning matrix as a source-backed “canonical-link check,” or narrow it to an explicitly project-specific heuristic that is not attributed to Prop. 3.1; in either case, update the helper text, REGISTRY note, and tests to match the actual paper contract. (academic.oup.com)

Code Quality

No material findings beyond the methodology issue above.

Performance

No findings. The HonestDiD counter/warning additions are O(1) per enumerated basis and do not change the search complexity.

Maintainability

No additional findings. The extracted Wooldridge helper would be a maintainability improvement once its methodology contract is corrected.

Tech Debt

Severity P3 | Impact: the PR removes the Wooldridge canonical-link TODO from TODO.md, but the landed behavior is not yet source-faithful. Concrete fix: if the warning feature is reduced, reverted, or deferred, restore a TODO entry until the paper-faithful behavior is what ships.

Security

No findings.

Documentation/Tests

Severity P2 | Impact: tests/test_wooldridge.py:L618-L676 locks in the same incorrect source interpretation as production code, especially test_logit_on_fractional_warns, and does not directly exercise the added poisson-on-binary warning branch. That means future fixes toward source-faithful behavior will look like regressions, and one live branch remains uncovered. Concrete fix: rewrite the matrix from the paper/Table 1 contract; at minimum, remove or invert the fractional-logit warning expectation and make branch coverage match whatever warning behavior remains after the methodology fix. (academic.oup.com)

Path to Approval

In diff_diff/wooldridge.py:L116-L180, remove or rewrite _warn_if_canonical_link_violated() so it no longer cites Prop. 3.1/Table 1 for outcome-support rules the paper does not state.
Update docs/methodology/REGISTRY.md:L1389-L1389 and the helper/docstring text so they describe the paper-faithful rule: equivalence depends on the chosen LEF density using its canonical link, not on a unique method-by-outcome mapping.
Rewrite tests/test_wooldridge.py:L618-L676 to match the corrected contract; specifically, drop the fractional-logit warning expectation and ensure any remaining warning branches are covered exactly once.
If the warning feature is deferred or materially narrowed, restore the removed TODO.md tracking entry until the intended source-faithful behavior is actually merged.

…g mock The test was using `time.time() - t0 < 0.5s` as a proxy for "M=0 took the fast path." Wall-clock proxies on shared CI runners are flaky, so the test was `skipif(CI=="true")` and only ran locally. The fast path is `_compute_worst_case_bias(... M=0) -> 0.0` at honest_did.py:1650, which means `scipy.optimize.linprog` is never reached. Direct correctness signal: `mock.patch` on `diff_diff.honest_did.optimize.linprog` plus `assert_not_called()`. CI-safe, instantaneous, and verifies the actual short-circuit path rather than a timing proxy. Drops the CI skipif decorator and the unused `import os`. Tier A row 1 of post-Wave-2 backlog. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`_enumerate_vertices` was swallowing `np.linalg.LinAlgError` on every basis with `try / except / continue`, and `_compute_arp_test` returned False (conservative non-rejection) when the returned list was empty. Users had no way to tell whether the test "did not reject" because the data didn't support rejection or because the vertex search was numerically pathological (singular A_sys on every basis, degenerate moment-inequality system). Instrument `_enumerate_vertices` with three counters (`n_total`, `n_linalg_error`, `n_infeasible`) and emit a `RuntimeWarning` at function exit when: 1. `vertices == []` after `n_total > 0` bases tried — enumeration exhausted; caller will fall back to conservative non-rejection. 2. `vertices != []` but `n_linalg_error / n_total >= 0.5` — enumeration heavily constrained; recovered vertices may be numerically fragile. `RuntimeWarning` (not `UserWarning`) marks this as a numerical / algorithmic signal rather than a user-input issue. `stacklevel=3` so the warning surfaces at `_compute_arp_test`'s caller, matching the codebase convention for one-level-deep helper warnings. No changes to the return type, the caller (`_compute_arp_test`), or the algorithm semantics — the previous silent-skip behavior is fully preserved, the diagnostic is purely additive. Tier-A row in the post-Wave-2 backlog (TODO.md, item 6, PR #334 reference). Adds new `TestARPVertexEnumeration` class in test_methodology_honest_did.py with three cases: exhausted enumeration, heavy rejection, healthy enumeration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Items 1 and 6 from the post-Wave-2 backlog are now shipped: - HonestDiD test_m0_short_circuit (mock-based fix, commit bc0bf39) - HonestDiD ARP vertex-rejection diagnostic (commit 0bfaabba) Removals: - Tier-A bullets for items 1 and 6 - Methodology/Correctness row for HonestDiD ARP vertex-rejection (PR #334) Item 5 (WooldridgeDiD method/outcome pairing) reframed in place — PR #453 R1 review pointed out that the original "canonical link requirement (W2023 Prop 3.1) not enforced" framing misrepresented Wooldridge (2023): Table 1 lists Gaussian/OLS for "any response" and logistic-Bernoulli for "binary OR fractional", so neither OLS-on-binary nor logit-on-fractional is a Prop 3.1 violation. A useful hint still exists at the *efficiency* level (OLS on binary is consistent but logit is typically more efficient for inference), but should not be framed as a methodology violation. Both the Methodology/Correctness row and the Tier-A bullet for item 5 are rewritten to reflect this. Item 5 stays open with the corrected framing for a future wave to address. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-16T12:54:15Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 7c0bdb26e368bfa3d54b0ea04658d9bb4faaf6b0

Overall Assessment

✅ Looks good

Executive Summary

No unmitigated P0/P1 issues in the current diff.
HonestDiD’s code change is observability-only: _enumerate_vertices() still returns [] on vertex-search failure, and _compute_arp_test() still treats that as conservative non-rejection diff_diff/honest_did.py:L1935-L1957, diff_diff/honest_did.py:L2005-L2010.
The M=0 regression is materially better: it now directly asserts the LP solver is skipped instead of relying on wall-clock timing tests/test_methodology_honest_did.py:L249-L279, diff_diff/honest_did.py:L1651-L1667.
Re-review scope: the prior Wooldridge canonical-link methodology issue no longer appears in the current branch. The remaining TODO correctly reframes it as an optional efficiency hint, and the public method contract still allows logit for binary or fractional outcomes TODO.md:L99-L99, diff_diff/wooldridge.py:L273-L275.
P3: the new “heavy rejection” test does not actually isolate the heavily constrained branch; as written it can pass via the exhausted warning, so that branch is still effectively unpinned tests/test_methodology_honest_did.py:L519-L546.
I could not execute the tests locally because pytest is unavailable and the Python environment here is missing numpy.

Methodology

No findings. The HonestDiD change does not alter the estimator, identified set, or inference surface. Delta^RM still uses the documented conservative naive FLCI path because ARP remains disabled pending calibration diff_diff/honest_did.py:L2469-L2486, docs/methodology/REGISTRY.md:L2692-L2695.

Code Quality

No findings. Replacing the timing proxy with a direct mock-based assertion is a better correctness signal for the M=0 fast path tests/test_methodology_honest_did.py:L249-L279.

Performance

No findings. The added counters and warning checks are O(1) per enumerated basis and do not change the combinatorial search complexity diff_diff/honest_did.py:L1894-L1957.

Maintainability

No findings. The TODO cleanup matches the actual state of the branch, including the corrected Wooldridge framing TODO.md:L95-L100, TODO.md:L175-L178.

Tech Debt

No findings. Removing the resolved HonestDiD TODOs is appropriate, and the remaining HonestDiD ARP calibration debt stays tracked separately TODO.md:L95-L95, docs/methodology/REGISTRY.md:L2693-L2694.

Security

No findings.

Documentation/Tests

Severity P3 | Impact: tests/test_methodology_honest_did.py:L519-L546 is intended to cover the new heavily constrained warning in diff_diff/honest_did.py:L1947-L1957, but the fixture only has two unique nuisance-row types across 5 moments. That makes every 3-row basis singular, so the test can satisfy its assertion through the exhausted warning alone and will not catch regressions specific to the ≥50%-rejected-with-some-feasible-vertices branch. | Concrete fix: replace the fixture with one that yields both feasible and singular bases, then assert specifically on the heavily constrained message. Keep the pure exhaustion case separate as already covered at tests/test_methodology_honest_did.py:L505-L517.

P3 — `test_enumerate_vertices_warns_on_heavy_rejection` previously used a fixture (X_tilde with 2 unique row types, 5 moments → all 3-row bases singular) where the assertion could pass via the `exhausted` warning instead of the intended `heavily constrained` branch. The branch was effectively untested. Rewrite the fixture: 5 moments × 1 nuisance column, C(5,2)=10 bases. By design, 6 bases trip LinAlgError (pairs among the singular-X_tilde indices) and 4 bases produce feasible vertices (each pairs a positive X_tilde with the unique negative X_tilde at index 4). 60% rejection rate hits the heavily-constrained branch specifically, not exhaustion. Switched to `pytest.warns(RuntimeWarning, match="heavily constrained")` so the test now fails if the message changes or if the wrong branch fires. Added a `len(vertices) >= 1` assertion to guard against the fixture inadvertently producing the exhausted-branch outcome. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-16T13:00:24Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 54efaea6d0b2dc95b5bc2ff192fe4684df67fe5b

Overall Assessment

✅ Looks good

Executive Summary

No unmitigated P0/P1 findings in the current diff.
Re-review scope: the prior P3 on the ARP “heavy rejection” test is resolved. The revised fixture now creates both singular and feasible bases, so it specifically exercises the heavily constrained branch rather than the exhausted fallback (tests/test_methodology_honest_did.py:L519-L537, diff_diff/honest_did.py:L1947-L1957).
HonestDiD methodology is unchanged. _enumerate_vertices() now emits diagnostics, but _compute_arp_test() still maps empty vertex sets to conservative non-rejection, and public Δ^RM inference remains the documented naive FLCI rather than ARP (diff_diff/honest_did.py:L1935-L1959, diff_diff/honest_did.py:L2005-L2010, diff_diff/honest_did.py:L2463-L2493, docs/methodology/REGISTRY.md:L2692-L2695).
The M=0 regression is materially stronger: it directly verifies that optimize.linprog is never reached instead of using a wall-clock proxy (tests/test_methodology_honest_did.py:L249-L279, diff_diff/honest_did.py:L1651-L1674).
TODO cleanup is consistent with the landed work. The resolved HonestDiD items were removed, and the remaining Δ^RM/ARP limitation stays tracked (TODO.md:L95-L100, TODO.md:L173-L179, docs/methodology/REGISTRY.md:L2693-L2695).
I could not execute the HonestDiD tests locally because the provided Python environment is missing numpy.

Methodology

No findings. The HonestDiD code change is observability-only: it adds diagnostics around private ARP vertex enumeration without changing estimator formulas, weighting, SEs, or identification behavior (diff_diff/honest_did.py:L1935-L1959, docs/methodology/REGISTRY.md:L2692-L2695).

Code Quality

No findings. Replacing the timing-based M=0 proxy with a direct mock of optimize.linprog is a better correctness signal and removes CI-timing brittleness (tests/test_methodology_honest_did.py:L249-L279).

Performance

No findings. The added counters and warning checks are O(1) per enumerated basis and do not change the combinatorial search complexity of _enumerate_vertices() (diff_diff/honest_did.py:L1894-L1959).

Maintainability

No findings. The prior re-review P3 is addressed: the new heavy-rejection fixture now isolates the intended warning branch with a mixed singular/feasible construction and explicit combinatoric rationale (tests/test_methodology_honest_did.py:L519-L537).

Tech Debt

No findings. TODO cleanup matches the implemented HonestDiD work, and the substantive unresolved Δ^RM ARP limitation remains explicitly tracked rather than disappearing from project debt accounting (TODO.md:L95-L100, docs/methodology/REGISTRY.md:L2693-L2695).

Security

No findings.

Documentation/Tests

No findings in the diff. The new tests align with the changed behavior, and the previous branch-isolation gap is closed (tests/test_methodology_honest_did.py:L249-L279, tests/test_methodology_honest_did.py:L502-L560). Local execution was not possible here because numpy is unavailable in the provided environment.

igerber and others added 3 commits May 16, 2026 08:49

igerber force-pushed the wave-3-observability branch from a6c562c to 7c0bdb2 Compare May 16, 2026 12:50

igerber added the ready-for-ci Triggers CI test workflows label May 16, 2026

igerber merged commit 176174c into main May 16, 2026
33 of 34 checks passed

igerber deleted the wave-3-observability branch May 16, 2026 14:58

igerber mentioned this pull request May 31, 2026

WooldridgeDiD: outcome-fit hint for OLS on binary/count outcomes #513

Merged

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wave 3 estimator observability: HonestDiD M=0 test, Wooldridge canonical-link warning, ARP vertex diagnostic#453

Wave 3 estimator observability: HonestDiD M=0 test, Wooldridge canonical-link warning, ARP vertex diagnostic#453
igerber merged 4 commits into
mainfrom
wave-3-observability

igerber commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Sunbelt Computer Software

PL/B Language Development and Support

Conversation

igerber commented May 16, 2026

Summary

Tests

Test plan

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant