Add experimental cuckoo_filter and binary_fuse_filter skip indexes#101796
Add experimental cuckoo_filter and binary_fuse_filter skip indexes#101796zex-hyd wants to merge 20 commits into
Conversation
|
Hi @alexey-milovidov @rschu1ze, This PR is related to my feature request #101795 (from Intern Tasks 2025/2026). Would appreciate feedback on the approach! Happy to adjust based on your suggestions. |
|
Workflow [PR], commit [05515d1] Summary: ❌
|
|
The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix. |
79e4994 to
a513b9c
Compare
@alexey-milovidov The stress test is failing again, I have verified that this PR is up to date with master! So it looks like a separate MSan issue (or flake) uncovered by stress, unrelated to the cuckoo_filter changes on our side! |
|
The MSan stress test failure (MemorySanitizer: use-of-uninitialized-value, STID 4179-5154 or 4148-3044) is a known pre-existing issue unrelated to this PR. Fix: #102158 |
@alexey-milovidov Thanks again for the clarafication! This PR is also ready for review! |
|
@alexey-milovidov @rschu1ze This PR has passed all the checks and is ready to be review! Please feel free to add comments on the PR! Thank you! |
|
@zex-hyd We must only introduce new index types to ClickHouse if they are strictly better than the current ones in terms of performance and storage space. Please evaluate the Cuckoo filter indexes vs. Bloom filter indexes (ideally as systematically as possible) along these dimensions, you can use functional tests ( |
|
Hi @alexey-milovidov @rschu1ze! Could you add the pr-performance label to this PR? Without the pr-performance label we don’t get the amd performance report / all-query-metrics.tsv. We need that to compare our new cuckoo_filter_.xml runs against the existing bloom_filter_.xml and to document the time + secondary index size follow-up to the earlier review! |
@alexey-milovidov @rschu1ze Could you add the pr-performance label to this PR? Without the pr-performance label we don’t get the amd performance report / all-query-metrics.tsv. |
b2a3847 to
01abec6
Compare
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered by tests: 1278/1854 (68.93%) | Lost baseline coverage: none · Uncovered code |
…y_fuse_filter Addresses the review request that the `JSONAllPaths` predicate contract be covered for the two new experimental skip-index types. The existing `04319_experimental_skip_index_json_isnotnull_nested` test only exercises the `isNotNull(json.path)` path-presence predicate; this adds the equality predicate (`json.path = value`) that the docs promise has the same coverage as `bloom_filter`. The test pins `index_granularity = 1` and `GRANULARITY 1` and, for each of `cuckoo_filter` and `binary_fuse_filter`, checks that a `force_data_skipping_indices` result equals the corresponding full scan for: - a present path with a non-default value (index is applicable), - an absent path (all granules can be skipped), - a present path via CAST to a non-nullable type, and - the default-value case `json.path::Int64 = 0`, where a missing path yields the type default and the index must NOT prune the granule. This is a false-negative-sensitive branch: a wrong decision would make `force_data_skipping_indices` drop granules where the path is present or absent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Merged current
The only remaining non-green check, The other AI-review item — refreshing the |
…ndex interface The recent merge of `master` into this branch introduced a build failure: `master` added a `StorageMetadataPtr metadata_snapshot` parameter as the first argument of both the `IMergeTreeIndex` constructor and the index factory `Creator`, but the new experimental `cuckoo_filter` and `binary_fuse_filter` index types still used the old signatures, so `MergeTreeIndexCuckooFilter.cpp`, `MergeTreeIndexBinaryFuseFilter.cpp` and `MergeTreeIndices.cpp` failed to compile (`no matching constructor for initialization of 'IMergeTreeIndex'` and `no viable conversion ... to 'Creator'`). Thread `metadata_snapshot` through the constructors and creators of both index types, mirroring the existing `MergeTreeIndexBloomFilter` / `bloomFilterIndexCreator`. This is a purely mechanical adaptation to the changed interface; no behavior change. Build failure: https://s3.amazonaws.com/clickhouse-test-reports/PRs/101796/1d4cea41ba2bfba9c879ac0cc56c74637741a01c/fast_test/build_clickhouse/build_clickhouse.log Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Changelog category (leave one):
Related Issue:
#101795 (Cuckoo Filter)
#104466 (Binary Fuse Filter)
Changelog entry:
Added experimental
cuckoo_filterandbinary_fuse_filterskip index types for MergeTree tables. Both are gated by experimental settings and documented as workload-dependent alternatives tobloom_filter, with stateless size-benchmark coverage; the binary-fuse CI benchmark also includes a tiny pruning sanity check.Documentation entry for user-facing changes
Summary
This PR implements and benchmarks two experimental probabilistic filter structures for MergeTree skip indexes:
cuckoo_filterandbinary_fuse_filter.Both structures are based on published research on compact approximate membership filters for large static sets. Benchmark results in the stateless tests show that they are not general replacements for
bloom_filterin ClickHouse's current skip-index model. In particular, at skip-indexGRANULARITY = 1and a common target FPR such as0.025, both filters are often larger thanbloom_filterbecause of fixed structural overhead on small per-index-granule key sets.The feature remains experimental and gated.
bloom_filterremains the recommended default skip index unless workload-specific benchmarks show an advantage for one of these filters.Implemented Index Types
1.
cuckoo_filterBased on Fan et al., "Cuckoo Filter: Practically Better Than Bloom" (ACM CoNEXT 2014), with secondary-index motivation from Kipf et al., "Cuckoo Index" (PVLDB 2020).
Parameters: Single argument — target false positive rate, for example
0.01.Experimental setting: Requires
allow_experimental_cuckoo_filter_index = 1.2.
binary_fuse_filterBased on Graf & Lemire, "Binary Fuse Filters: Fast and Smaller Than Xor Filters" (arXiv:2201.01174, 2022).
Parameters: Single argument — target false positive rate, for example
0.025.Experimental setting: Requires
allow_experimental_binary_fuse_filter_index = 1.Benchmark Results
Both filters were evaluated against
bloom_filterin04104_cuckoo_filter_benchmark.sqland04204_binary_fuse_filter_benchmark.sql.Methodology: 10,000 rows per table, table
index_granularity = 8192, skip-indexGRANULARITY = 1,OPTIMIZE FINAL, thensum(secondary_indices_uncompressed_bytes)fromsystem.parts. Each file covers 18 core scenarios: 6 cardinality regimes × 3 FPR values.04204adds representative CI-friendly extensions (lower FPR snapshot, skip-indexGRANULARITYsensitivity, distribution-sensitive cases, and a tiny pruning sanity check).Cuckoo Filter vs Bloom
At skip-index
GRANULARITY = 1andfpr = 0.025,cuckoo_filterwas larger thanbloom_filterin all six tested cardinality regimes (04104reference).Observed range at
fpr = 0.025: 1.4x–7.5x larger than Bloom.At the lower
fpr = 0.001(where the requested FPR is rounded up to the nearest supported fingerprint width),cuckoo_filterremains larger thanbloom_filterin every tested cardinality regime as well (04104reference):Binary Fuse Filter vs Bloom
At skip-index
GRANULARITY = 1andfpr = 0.025,binary_fuse_filterwas also larger thanbloom_filterin all six core cardinality regimes (04204reference).Observed range at
fpr = 0.025: 1.2x–6.0x larger than Bloom.However,
binary_fuse_filtercan be smaller in specific FPR regimes where its fingerprint width is favorable. For example, atfpr = 0.001:Additional representative CI-friendly coverage in
04204_binary_fuse_filter_benchmark.sql: lower-FPR snapshot (fpr = 0.005), skip-indexGRANULARITY = 1 / 8 / 64, distribution-sensitive scenarios, and a tiny hit/miss pruning sanity check.04104covers the full 18-scenario size matrix for cuckoo but does not include those extensions.Root Cause Analysis
The common issue is structural overhead in small-set regimes:
cuckoo_filter: per-index-granule bucket count rounds up to a power of two; fixed overhead from slot structure, load-factor requirements, and fingerprint-width choices cannot amortize well for small distinct-key counts.binary_fuse_filter: whole-byte fingerprints and power-of-two segment alignment introduce step changes in size. This can be favorable in some FPR regimes, but unfavorable at common settings such asfpr = 0.025.Both structures are designed for larger static sets where structural overhead is easier to amortize. ClickHouse skip indexes often operate on much smaller per-index-granule key sets, so the benefit is workload- and parameter-dependent.
Why keep this experimental implementation?
Although the MergeTree skip-index results are workload-dependent, these implementations provide reusable probabilistic filter structures that may be useful beyond the current skip-index model, for example for runtime filters or future index layouts with larger key sets.
The user-facing skip-index integrations remain experimental and gated, and the documentation describes the measured limitations instead of presenting these filters as general
bloom_filterreplacements.Test Files
04077_cuckoo_filter_skip_index.sql— focused cuckoo predicate coverage vsbloom_filter: scalar equality/!=/NOT IN, scalarIN(including a forced miss set),hasAny/hasAll, tuple expression indexes, and map key paths; EXPLAIN sanity for scalar equality04299_binary_fuse_filter_skip_index.sql— same focused predicate coverage forbinary_fuse_filter04319_experimental_skip_index_json_isnotnull_nested.sql— JSON path-presence correctness: bareisNotNull(json.path)uses the index; nested forms such asisNotNull(json.path) = 0do not04078_cuckoo_filter_skip_index_bad_args.sql— cuckoo bad-argument validation, experimental-setting gates, and unchanged-index ALTER behavior04205_binary_fuse_filter_bad_args.sql— same forbinary_fuse_filter04104_cuckoo_filter_benchmark.sql— full 18-scenario Cuckoo vs Bloom size benchmark (04104reference)04204_binary_fuse_filter_benchmark.sql— 18 core scenarios plus representative CI-friendly extensions for Binary Fuse vs Bloom04103_cuckoo_filter_vs_bloom_filter_equivalence.sql— PREWHERE scalar-equality count equivalence vsbloom_filter04203_binary_fuse_filter_vs_bloom_filter_equivalence.sql— same forbinary_fuse_filterBoth benchmark files use the same core methodology: same schema/data pattern,
OPTIMIZE FINAL, then readingsecondary_indices_uncompressed_bytesfromsystem.parts.Predicates Covered
The implemented index-condition paths follow existing
bloom_filterskip-index semantics where applicable, while unsafe or unsupported forms fall back conservatively to avoid false negatives.Covered paths exercised by the stateless tests include:
=)IN(including a forced miss set in04077/04299)hasAnyandhasAllx = … AND y = …on a tuple index)m['key'] = …with amapKeys(m)index)isNotNull(json.path)(04319)Not claimed as index-backed in tests:
isNotNull(json.path) = 0!=,NOT IN— compared againstbloom_filterfor result parity, but not forced through the skip indexMore complex boolean forms should fall back to normal scanning rather than being forced through the skip index.
Example Usage