{{ message }}
perf(efficient-did): cache polynomial sieve basis across DR nuisance fits#556
Merged
Conversation
…fits The EfficientDiD doubly-robust (covariate) path runs three sieve nuisance estimators (outcome regression, propensity ratio, inverse propensity) that each loop K=1..k_max and rebuild the full polynomial sieve basis at every degree. All three receive the same fit-level covariate_matrix, so for any degree reached by more than one helper the identical (n x n_basis) array was rebuilt from scratch each time, across every (g,t) cell. Add a per-fit memoization (_sieve_basis_cached, keyed (id(X), degree)) that the orchestrator threads into the three helpers so each distinct degree's basis is built once and shared. _polynomial_sieve_basis is a pure function of (X, degree) and the helpers only read basis_all (no in-place mutation), so this is bit-identical: verified by an exact (atol=0) match of overall ATT plus all 18 group_time effect/se on a fixed-seed covariate fit before vs after the change. basis_cache defaults to None (plain pass-through), so standalone callers are unchanged. Resolves the EfficientDiD sieve-basis Performance row in TODO.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
K=1..k_maxrebuilding_polynomial_sieve_basis(covariate_matrix, K)from scratch; since all three receive the same fit-levelcovariate_matrix, every shared degree's(n × n_basis)basis was recomputed once per helper, per(g,t)cell._sieve_basis_cached, keyed(id(X), degree)) that the orchestrator threads into the three helpers so each distinct degree's basis is built once and shared.basis_cachedefaults toNone(plain pass-through), so any standalone caller is unchanged.Methodology references (required if estimator / math changes)
docs/methodology/REGISTRY.md§EfficientDiD (polynomial sieve nuisances)_polynomial_sieve_basis(X, K)is a referentially-transparent function of(X, degree)and the nuisance helpers only read its output (no in-place mutation), so reusing one object is bit-identical to rebuilding it. Sieve degree selection, weighting, normal equations, Ω* construction, ATT, EIF, and SE logic are all unchanged.Validation
tests/test_efficient_did.py— newTestSieveBasisCache(cache-hit returns the same object and equals a fresh build bit-for-bit;cache=Nonepass-through; reads do not mutate the cached basis; end-to-end fit builds each distinct degree exactly once across the three helpers).group_timeeffect/se on a fixed-seed covariate DR fit before vs after the change — exact match (atol=0). Fulltests/test_efficient_did.py(176) andtests/test_methodology_efficient_did.py(27 + slow covariate) suites pass. No methodology/behavior change, so no REGISTRY/CHANGELOG edit.Security / privacy
🤖 Generated with Claude Code