Crate ordvec
Expand description
Training-free ordinal & sign quantization for vector retrieval.
ordvec is a training-free ordinal/sign retrieval substrate. It was
developed using the early
turbovec project context as a
rapid-development scaffold, with thanks to that lineage; ordvec’s
implementation history and active upstream live in this repository. It
carries no system dependencies — no BLAS, no ndarray, no faer — and
needs no training, rotation, or codebook. Norms are analytical.
Four substrate families, all data-oblivious:
Rankstores full-precision rank vectors (u16per coordinate,2 * dimbytes per document).RankQuantbuckets each rank into1 << bitsequal-width bins and packsbitsbits per coordinate (dim * bits / 8bytes per document).bits ∈ {1, 2, 4}are the stable retrieval widths;b = 8is a capability-gated evidence/refinement width — asymmetric scoring and code/projection generation at any dim, analytical-norm symmetric scoring (viaRankQuant::search) only whendim % 256 == 0(seeRankQuant::new_asymmetric). The standalonerankquant_eval_searchcomputes its norm empirically, so it scores anybits ∈ 1..=8at any dim (includingb = 8off the 256 grid) and carries no such restriction.Bitmapstores a top-bucket bitmap per document (one bit per coordinate) and scores viapopcount(Q AND D).SignBitmapstores a sign bitmap per document (one bit per coordinate, set when the coordinate is positive) for sign-cosine candidate generation.
For b=2 specifically, RankQuantFastscan is a specialized companion to
RankQuant — a block-32 FastScan kernel (nibble LUT; AVX-512 → scalar
dispatch) for absolute-minimum stage-1 scan latency, trading 2× the
b=2 storage and 8-bit LUT scoring noise. Reach for it only when scan latency
is the binding constraint.
These four families are the headline retrieval surface, with
RankQuantFastscan as the specialized b=2 latency companion above. The
experimental
MultiBucketBitmap indexed contingency / projection API is a niche
research/analysis substrate for the bilinear bucket-overlap decomposition —
it is not a default single-score retrieval path and was never
kernel-optimized for that role. For primary nearest-neighbour retrieval use
RankQuant, Bitmap, or the two-stage candidate-generation → rerank
flow instead.
The Bitmap candidate score is the implementation surface with the
strongest formal story: in the companion Lean formalization, literal
constant-weight bitmap overlap is the query-preserving quotient statistic,
an overlap threshold is Bayes-optimal under an explicit finite
monotone-overlap signal model, and the idealized uniform constant-weight null
calibrates that threshold by the hypergeometric upper tail. This is a finite
in-model theorem, not a claim that real encoders automatically satisfy the
quotient, symmetry, or null assumptions.
use ordvec::{Rank, RankQuant};
let mut idx = RankQuant::new(1024, 2);
let docs: Vec<f32> = vec![0.0; 1024 * 10_000];
idx.add(&docs);
let queries: Vec<f32> = vec![0.0; 1024 * 4];
let res = idx.search_asymmetric(&queries, 10);
assert_eq!(res.k, 10);Re-exports§
pub use rank::Rank;pub use rank_io::probe_index_metadata;pub use rank_io::IndexKind;pub use rank_io::IndexMetadata;pub use rank_io::IndexParams;pub use sign_bitmap::CandidateBatch;pub use sign_bitmap::SignBitmap;
Modules§
- rank
- Rank math primitives and the
Rankindex type. Rank-cosine math primitives and the full-precisionRankindex. - rank_io
- Read/write ordinal/sign index files.
- sign_
bitmap - Sign-cosine bitmap retrieval substrate.
Structs§
- Bitmap
- Top-bucket bitmap index for constant-composition coarse scoring.
- Rank
Quant B-bit RankQuant index.- Rank
Quant Fastscan - FastScan b=2 RankQuant index.
- Search
Results - Top-k search results, laid out as
nqcontiguous blocks ofk. - Subset
Scratch - Reusable scratch for the serial subset-rerank primitives. Grows to the maximum shape seen, then reuses capacity — so a caller’s bounded-pool worker runs allocation-free after warmup. Opaque: fields are an implementation detail.
- TwoStage
Candidate Policy
Enums§
- Ordvec
Error - Rank
Quant Capability - Which scoring modes a
RankQuantinstance supports.
Functions§
- rankquant_
eval_ search - Standalone symmetric RankQuant-style eval search for arbitrary bit widths.
- validate_
candidate_ ids - validate_
flat_ vectors_ len
Type Aliases§
- Bitmap
Index Deprecated - Rank
Index Deprecated - Rank
Quant Index Deprecated - Sign
Bitmap Index Deprecated
