{{ message }}
linalg: scale-invariant rank detection + solve; fix rank-0 IndexError (both backends)#500
Merged
Conversation
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
… (both backends) Repairs scale-sensitive rank handling in the shared OLS backend. A covariate on a large raw scale (~1e8) inflated the pivoted-QR rank threshold (anchored to the largest pivot diagonal) and false-dropped the intercept/treatment/interaction to NaN on a full-rank design; the scipy.lstsq(cond=1e-7) solve likewise truncated the small-scale direction, returning finite-but-wrong coefficients. _detect_rank_deficiency now runs a raw pivoted QR first (preserving the established drop-column selection for genuinely collinear designs) and only adopts the equilibrated rank when the raw drop was scale-induced. The solve equilibrates columns to unit 2-norm and unscales the coefficients. Mirrored in rust/src/linalg.rs (unscale before fitted/vcov). Rank-0 designs now return all-NaN cleanly (solve_ols) or raise a clear ValueError (solve_logit/poisson, routed through CS pscore_fallback) instead of a cryptic IndexError. No drop-order change for collinear designs; no-op for full-rank well-conditioned designs (R-parity goldens unaffected). New regression tests in tests/test_linalg.py::TestNumericalStability. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
diff_diff/linalg.py+rust/src/linalg.rs) invariant to per-column scaling. A covariate on a large raw scale (e.g. population, income in cents, market cap) previously inflated the pivoted-QR rank threshold — which is anchored to the largest pivot diagonal — and false-dropped the intercept/treatment/interaction columns to NaN on a full-rank model (aDifferenceInDifferencesfit returned the correct ATT at covariate ×1/×1e4 butATT=NaNat ×1e8); thegelsdsolve likewise truncated the small-scale direction, returning finite-but-wrong coefficients. Detection now runs a raw pivoted QR first and only re-checks on column-equilibrated (unit 2-norm) columns when the raw pass reports a deficiency; the solve equilibrates and unscales. Applied in both the Python and Rust backends.IndexError: arrays used as indices must be of integer typewhen a design collapses to rank 0 (e.g. a constant FE-collinear covariate inImputationDiD/TwoStageDiD):solve_olsnow returns all-NaN coefficients cleanly andsolve_poissonraises a clearValueError.Methodology references (required if estimator / math changes)
DifferenceInDifferences,TwoWayFixedEffects,MultiPeriodDiD).docs/methodology/REGISTRY.mdrank-deficiency notes (Rlm()drop convention, tolerance1e-7).CallawaySantAnna/TripleDifference/StaggeredTripleDifferenceperform covariate outcome-regression / doubly-robust nuisance solves locally (not viasolve_ols) that are not yet equilibrated — tracked inTODO.mdas a follow-up bundled with the DR/OR rank-guard.Validation
tests/test_linalg.py— scale-invariance of fitted values / t-stats; finite ATT through the public DiD estimator with a 1e8-scale covariate; rank-0 NaN handling (OLS) and clearValueError(solve_poisson, incl. positive-weight subset); mixed scale+collinearity retains an identified full-rank subset with valid kept-coefficient VCV; drop-selection preserved for genuine collinearity.tests/test_rust_backend.py); full default test suite green; R-parity suites (csdid, chaisemartin) unaffected.Security / privacy
🤖 Generated with Claude Code