ordvec - Rust
Skip to main content

Crate ordvec

Crate ordvec

Source
Expand description

Training-free ordinal & sign quantization for vector retrieval.

ordvec is a training-free ordinal/sign retrieval substrate. It was developed using the early turbovec project context as a rapid-development scaffold, with thanks to that lineage; ordvec’s implementation history and active upstream live in this repository. It carries no system dependencies — no BLAS, no ndarray, no faer — and needs no training, rotation, or codebook. Norms are analytical.

Four substrate families, all data-oblivious:

  • Rank stores full-precision rank vectors (u16 per coordinate, 2 * dim bytes per document).
  • RankQuant buckets each rank into 1 << bits equal-width bins and packs bits bits per coordinate (dim * bits / 8 bytes per document). bits ∈ {1, 2, 4} are the stable retrieval widths; b = 8 is a capability-gated evidence/refinement width — asymmetric scoring and code/projection generation at any dim, analytical-norm symmetric scoring (via RankQuant::search) only when dim % 256 == 0 (see RankQuant::new_asymmetric). The standalone rankquant_eval_search computes its norm empirically, so it scores any bits ∈ 1..=8 at any dim (including b = 8 off the 256 grid) and carries no such restriction.
  • Bitmap stores a top-bucket bitmap per document (one bit per coordinate) and scores via popcount(Q AND D).
  • SignBitmap stores a sign bitmap per document (one bit per coordinate, set when the coordinate is positive) for sign-cosine candidate generation.

For b=2 specifically, RankQuantFastscan is a specialized companion to RankQuant — a block-32 FastScan kernel (nibble LUT; AVX-512 → scalar dispatch) for absolute-minimum stage-1 scan latency, trading 2× the b=2 storage and 8-bit LUT scoring noise. Reach for it only when scan latency is the binding constraint.

These four families are the headline retrieval surface, with RankQuantFastscan as the specialized b=2 latency companion above. The experimental MultiBucketBitmap indexed contingency / projection API is a niche research/analysis substrate for the bilinear bucket-overlap decomposition — it is not a default single-score retrieval path and was never kernel-optimized for that role. For primary nearest-neighbour retrieval use RankQuant, Bitmap, or the two-stage candidate-generation → rerank flow instead.

The Bitmap candidate score is the implementation surface with the strongest formal story: in the companion Lean formalization, literal constant-weight bitmap overlap is the query-preserving quotient statistic, an overlap threshold is Bayes-optimal under an explicit finite monotone-overlap signal model, and the idealized uniform constant-weight null calibrates that threshold by the hypergeometric upper tail. This is a finite in-model theorem, not a claim that real encoders automatically satisfy the quotient, symmetry, or null assumptions.

use ordvec::{Rank, RankQuant};

let mut idx = RankQuant::new(1024, 2);
let docs: Vec<f32> = vec![0.0; 1024 * 10_000];
idx.add(&docs);

let queries: Vec<f32> = vec![0.0; 1024 * 4];
let res = idx.search_asymmetric(&queries, 10);
assert_eq!(res.k, 10);

Re-exports§

pub use rank::Rank;
pub use rank_io::probe_index_metadata;
pub use rank_io::IndexKind;
pub use rank_io::IndexMetadata;
pub use rank_io::IndexParams;
pub use sign_bitmap::CandidateBatch;
pub use sign_bitmap::SignBitmap;

Modules§

rank
Rank math primitives and the Rank index type. Rank-cosine math primitives and the full-precision Rank index.
rank_io
Read/write ordinal/sign index files.
sign_bitmap
Sign-cosine bitmap retrieval substrate.

Structs§

Bitmap
Top-bucket bitmap index for constant-composition coarse scoring.
RankQuant
B-bit RankQuant index.
RankQuantFastscan
FastScan b=2 RankQuant index.
SearchResults
Top-k search results, laid out as nq contiguous blocks of k.
SubsetScratch
Reusable scratch for the serial subset-rerank primitives. Grows to the maximum shape seen, then reuses capacity — so a caller’s bounded-pool worker runs allocation-free after warmup. Opaque: fields are an implementation detail.
TwoStageCandidatePolicy

Enums§

OrdvecError
RankQuantCapability
Which scoring modes a RankQuant instance supports.

Functions§

rankquant_eval_search
Standalone symmetric RankQuant-style eval search for arbitrary bit widths.
validate_candidate_ids
validate_flat_vectors_len

Type Aliases§

BitmapIndexDeprecated
RankIndexDeprecated
RankQuantIndexDeprecated
SignBitmapIndexDeprecated