feat(estimators): iterative alternating-projection demeaning for N-way absorbed FE by igerber · Pull Request #586 · igerber/diff-diff · GitHub
Skip to content

feat(estimators): iterative alternating-projection demeaning for N-way absorbed FE#586

Merged
igerber merged 1 commit into
mainfrom
feature/multi-absorb-iterative-demean
Jun 29, 2026
Merged

feat(estimators): iterative alternating-projection demeaning for N-way absorbed FE#586
igerber merged 1 commit into
mainfrom
feature/multi-absorb-iterative-demean

Conversation

@igerber

@igerber igerber commented Jun 29, 2026

Copy link
Copy Markdown
Owner

Summary

  • Fixes an unbalanced-panel correctness bug: N>1 absorbed fixed effects (absorb=[a, b, ...] in DifferenceInDifferences / MultiPeriodDiD) and the shared unweighted two-way within_transform (used by TwoWayFixedEffects, SunAbraham, BaconDecomposition) used single-pass sequential demeaning, which is the exact (weighted) Frisch-Waugh-Lovell residualization only when the FE subspaces are orthogonal (balanced fully-crossed panels). On unbalanced panels it was a biased approximation (coefficients off by ~1e-2 in tested cases).
  • New N-way method-of-alternating-projections engine diff_diff.utils.demean_by_groups(): each variable is demeaned by each FE dimension in turn until convergence — the exact (weighted) FWL residual onto the combined column space of all absorbed dummies, matching R fixest / reghdfe / lfe. The two-way within_transform() now delegates to it.
  • The four DiD/MPD absorb= call sites (analytical + replicate-refit, ×2 estimators) route through the engine; the weighted-multi-absorb ValueError rejection is lifted (now supported via weighted MAP).
  • Byte-stability preserved: single-absorb delegates to the unchanged one-way demean_by_group; the weighted within_transform output is bit-identical (Wooldridge golden at atol=1e-14 stays green); balanced multi-way matches the prior closed-form demean to machine precision. The unweighted two-way path now also emits the non-convergence UserWarning (previously only the weighted path could).
  • DOF accounting (n_absorbed_effects = sum_d(nunique_d - 1)) is preserved unchanged; multi-way-FE DOF correctness is out of scope.

Methodology references (required if estimator / math changes)

  • Method name(s): Frisch-Waugh-Lovell within transformation for multi-way fixed effects via the method of alternating projections (von Neumann / Halperin; Guimarães-Portugal; Gaure lfe; Correia reghdfe).
  • Paper / source link(s): documented in docs/methodology/REGISTRY.md (TwoWayFixedEffects within-transform Note; "Absorbed Fixed Effects with Survey Weights" Note).
  • Any intentional deviations from the source (and why): None. This brings diff-diff into agreement with fixest/reghdfe/lfe for N>1 absorbed FE on unbalanced panels (previously it deviated via single-pass).

Validation

  • Tests added/updated: tests/test_utils.py (new TestDemeanByGroups: engine vs full-dummy OLS for unbalanced 2-way and 3-way, len==1 byte-identity, weighted byte-identity guard vs a frozen copy of the old loop, orthogonality, non-convergence warning); tests/test_within_transform.py (relaxed 5 balanced closed-form assertions to assert_allclose ~1e-12, added unbalanced-correctness + convergence-warning tests); tests/test_methodology_did.py (new TestMultiAbsorbIterativeDemean: unbalanced unweighted and non-uniform-weighted DiD/MPD absorb= vs fixed_effects= parity at atol≈1e-8, FE-collinear regressor reports NaN coef); tests/test_survey.py (flipped the two multi-absorb+survey rejection tests to now-supported + full-dummy parity).
  • R-parity goldens unchanged: test_methodology_twfe.py / test_methodology_sun_abraham.py / test_methodology_bacon.py / test_methodology_wooldridge.py all green (balanced panels: MAP == closed-form to ~1 ULP; weighted byte-identical).
  • Backtest / simulation / notebook evidence (if applicable): N/A.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

@github-actions

Copy link
Copy Markdown

@igerber igerber force-pushed the feature/multi-absorb-iterative-demean branch from 93f8586 to 7af6c6e Compare June 29, 2026 21:20
@github-actions

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 7af6c6eb4450ca2c37843632dbcf30e8eb9f60b3


PR Review: ✅ Looks good

Overall Assessment

Looks good — no unmitigated P0 or P1 findings. The previous P1 zero-total-weight absorbed-group issue and previous P2 within_transform() docstring issue are addressed.

Executive Summary

  • The affected methods are DifferenceInDifferences(absorb=...), MultiPeriodDiD(absorb=...), and shared two-way within_transform() consumers.
  • The PR now aligns the implementation with the Methodology Registry’s MAP demeaning notes for unbalanced multi-way absorbed FE.
  • The prior zero-weight survey/domain edge case is guarded with np.divide(..., where=w_sum > 0) in both one-way and N-way weighted demeaning.
  • Replicate absorb refits still drop zero-weight rows before re-demeaning, consistent with the registry.
  • Only one P3 informational cleanup remains: TODO.md still contains a stale cross-reference saying N>1 absorbed-FE + weights is gated.

Methodology

No methodology blockers found.

The new MAP path is documented in docs/methodology/REGISTRY.md:L4270-L4292 and implemented in the DiD/MPD absorb paths at diff_diff/estimators.py:L449-L468 and diff_diff/estimators.py:L1604-L1620. This matches the documented FWL residualization requirement for multiple absorbed dimensions.

The previous zero-weight-group concern is resolved by guarded weighted means in diff_diff/utils.py:L2621-L2632 and diff_diff/utils.py:L2736-L2745, with DiD/MPD zero-total-weight absorbed-category tests at tests/test_methodology_did.py:L1685-L1704 and tests/test_methodology_did.py:L1706-L1738.

Code Quality

No findings.

demean_by_groups() centralizes the MAP implementation and preserves one-way delegation to demean_by_group() at diff_diff/utils.py:L2709-L2719.

Performance

No findings.

The iterative groupby loop is more expensive than the old closed-form unweighted two-way transform, but that cost follows from switching to exact MAP on unbalanced panels.

Maintainability

No findings.

The within_transform() docstring now describes MAP and identifies the additive formula as the balanced-panel special case at diff_diff/utils.py:L2810-L2856.

Tech Debt

Finding 1 — P3: Stale TODO cross-reference still says weighted N>1 absorb is gated

Location: TODO.md:L220-L226

Impact:
TODO.md still says “N>1 absorbed-FE + weights composition remains gated by the open multi-absorb row,” but this PR removes that gate and the open row. This is documentation/tracker drift only; runtime behavior and methodology are unaffected.

Concrete fix:
Update or remove that parenthetical so TODO.md no longer points to a removed Actionable item.

Security

No findings.

Documentation/Tests

No blocking findings.

The registry and docstring updates cover the methodology change. Focused test execution could not be run in this environment because pytest is not installed (python -m pytest: No module named pytest).

…y absorbed FE

N>1 absorbed fixed effects used single-pass sequential demeaning, which is the
exact (weighted) Frisch-Waugh-Lovell residualization only on balanced
orthogonal-FE panels; on unbalanced panels it was a biased approximation
(coefficients off by ~1e-2 in tested cases).

Add an N-way method-of-alternating-projections engine demean_by_groups() in
utils.py; route the DiD/MultiPeriodDiD absorb= paths and the shared two-way
within_transform() through it, fixing TWFE / SunAbraham / BaconDecomposition on
unbalanced unweighted panels too. Lift the weighted-multi-absorb rejection
(now supported via weighted MAP).

Single-absorb and balanced-panel results are byte-stable; the weighted
within_transform output is bit-identical; R-parity goldens unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@igerber igerber force-pushed the feature/multi-absorb-iterative-demean branch from 7af6c6e to 91c6381 Compare June 29, 2026 21:35
@github-actions

Copy link
Copy Markdown

@igerber igerber added the ready-for-ci Triggers CI test workflows label Jun 29, 2026
@igerber igerber merged commit 6126f9b into main Jun 29, 2026
33 of 34 checks passed
@igerber igerber deleted the feature/multi-absorb-iterative-demean branch June 29, 2026 23:25
@igerber igerber mentioned this pull request Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant