Implemented SIEVE eviction in cache framework#62756
Conversation
|
This is an automatic comment. The PR descriptions does not match the template. Please, edit it accordingly. The error is: More than one changelog category specified: 'Improvement', 'Performance Improvement' |
1 similar comment
|
This is an automatic comment. The PR descriptions does not match the template. Please, edit it accordingly. The error is: More than one changelog category specified: 'Improvement', 'Performance Improvement' |
|
This is an automated comment for commit 21fce56 with description of existing statuses. It's updated for the latest CI running ✅ Click here to open a full report in a separate page Successful checks
|
|
Let's add the cache policy to randomization in our functional tests. |
kssenii
left a comment
There was a problem hiding this comment.
You can add randomization of cache policies in CI similar to
ClickHouse/docker/test/stress/run.sh
Lines 73 to 88 in 38a0edd
cache_policy setting in storage_conf.xml you should change the mark_cache_policy in another server config. We do not yet have it explicitly enabled in CI, so you'll also need to add a file with this setting to ClickHouse/tests/configs/config.d/. Also do not forget to add an according file installation into ClickHouse/tests/config/install.sh
|
reopen |
|
@Vectorshine, what happened? |
|
@Vectorshine Thanks for checking. There is no need to close and re-open this PR over and over 😄 I just started CI to see how the tests respond. |
|
Dear @antonio2368, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself. |
|
Dear @antonio2368, you haven't been active on this PR for 30 days. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself. |
# Conflicts: # src/Common/CacheBase.h # tests/config/install.sh # tests/docker_scripts/stateless_runner.sh # tests/docker_scripts/stress_runner.sh
Mark the `assertQueue` test helper as `static` so it does not trigger `-Werror,-Wmissing-prototypes`, and take `expected_keys` by const reference. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Workflow [PR], commit [5f8cf6a] Summary: ✅
AI ReviewSummaryThis PR adds Missing context / blind spots
Final VerdictStatus: ✅ Approve |
List the supported cache eviction policies (`LRU`, `SLRU`, `SIEVE`) in the descriptions of the `uncompressed_cache_policy`, `mark_cache_policy`, `index_uncompressed_cache_policy` and `index_mark_cache_policy` server settings, and in the documentation of the filesystem cache `cache_policy` setting. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The cache-policy randomization joined two `s|...|...|` substitutions into a single `sed` expression with `;`, but dropped the closing `|` of the first substitution, producing `sed: -e expression ClickHouse#1, char 94: unknown option to 's'` whenever a non-default policy (`LRU` or `SIEVE`) was selected for the mark and uncompressed caches. Split them into two `-e` expressions in both `ci/jobs/functional_tests.py` and `tests/docker_scripts/stress_runner.sh`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The `CacheBase` constructor ran `LOG_TRACE(getLogger("CacheBase"), ...)`.
Some caches are function-local `static` objects, so the constructor runs
during their lazy initialization; logging from there can re-enter the same
initialization, and with `send_logs_level=trace` `clickhouse-local` aborted
with `libc++abi: __cxa_guard_acquire detected recursive initialization`,
hanging the process. This caused `04240_80087_clickhouse_local_logs_on_exception`
to time out in the Fast test.
Drop the log line (and the now-unused `logger_useful.h` include); `master`
never logged the cache policy here.
Reproduced and verified locally: before, `clickhouse local --send_logs_level=trace
-q "SELECT * FROM nonexistent_table"` hung; after, it prints the buffered logs
and the exception and exits.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The filesystem cache (`storage_conf*.xml` `cache_policy`) is backed by `FileCachePolicy`, which only supports `LRU` and `SLRU` (and their overcommit variants) — not `SIEVE`. Setting it to `SIEVE` made the server fail to start with `Code: 36 ... Unexpected value of FileCachePolicy: 'SIEVE' (BAD_ARGUMENTS)`, which surfaced as a `Start ClickHouse Server` failure in Bugfix validation. `SIEVE` is only implemented for the `CacheBase`-backed caches (mark and uncompressed caches in `cache_policy.xml`). Randomize the filesystem cache policy only across `LRU`/`SLRU`, using a separate variable from the mark and uncompressed cache policy (which keeps `LRU`/`SLRU`/`SIEVE`), in both `ci/jobs/functional_tests.py` and `tests/docker_scripts/stress_runner.sh`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ix validation The install-time cache-policy randomization rewrote every `<cache_policy>LRU</cache_policy>` in `storage_conf*.xml` to the randomly chosen filesystem cache policy. This broke filesystem-cache tests that pin a specific policy: `02944_dynamically_change_filesystem_cache_size` (which has a dedicated `s3_cache_02944_lru` disk and tests both `LRU` and `SLRU` behaviour) and `03032_dynamically_resize_filesystem_cache_2` (which assumes `LRU` eviction sizes). Both fail with "result differs with reference" only on this PR and never on master. `SIEVE` is only implemented for the `CacheBase`-backed caches (mark, uncompressed and the index/page/query-condition caches in `cache_policy.xml`), not for the filesystem cache (`storage_conf*.xml`, backed by `FileCachePolicy`). Randomizing the filesystem cache policy therefore exercises nothing related to this feature while breaking the tests above. Stop randomizing the filesystem cache policy entirely and randomize only the mark and uncompressed cache policy across `LRU`/`SLRU`/`SIEVE`. In addition, during bugfix validation the install configs are used with downloaded master-side binaries that do not contain the `SIEVE` branch in `CacheBase`, so writing `<mark_cache_policy>SIEVE</mark_cache_policy>` would make the server fail to start before the validation test runs. Exclude `SIEVE` from the randomization when running bugfix validation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adding `SIEVE` to the shared `CacheBase` constructor makes it accepted by every `CacheBase`-backed server setting, but only `mark_cache_policy`, `uncompressed_cache_policy`, `index_mark_cache_policy` and `index_uncompressed_cache_policy` listed the new value. The remaining `CacheBase`-backed policy settings (`unique_key_index_cache_policy`, `primary_index_cache_policy`, `iceberg_metadata_files_cache_policy`, `parquet_metadata_cache_policy`, `vector_similarity_index_cache_policy`, `text_index_tokens_cache_policy`, `text_index_header_cache_policy`, `text_index_postings_cache_policy`, `page_cache_policy` and `query_condition_cache_policy`) now also accept `SIEVE` but did not document it; `unique_key_index_cache_policy` even stated "(SLRU or LRU)". Update their descriptions to list the actual accepted values `LRU`, `SLRU`, `SIEVE`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`system.filesystem_cache_settings.cache_policy` describes the filesystem cache, which is backed by `FileCachePolicy`/`FileCache` and accepts only `LRU`/`SLRU` (and their overcommit variants), not `SIEVE`. Listing `SIEVE` here documented a value the server rejects, so revert the row to its previous wording. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Pushed fixes for the CI failures and the AI-review "Request changes" items (commits CI failures ( Bugfix validation — Documentation surface — Earlier review threads — the algorithm correctness and the requested step-by-step eviction test are addressed by the reworked |
# Conflicts: # src/Core/ServerSettings.cpp
In `SIEVECachePolicy::set`, `removeOverflow` was always called with
`ignore_last_element = true`, but `set` reaches this call for updates as
well as insertions. The `ignore_last_element` flag protects the
just-inserted tail entry from immediate eviction by rewinding the hand to
`queue.begin`. On an update no new tail is appended, so passing `true`
makes the policy skip whatever entry the hand currently points at (even an
unrelated tail) and grant it the new-entry exemption, deviating from the
canonical `SIEVE` algorithm.
For example, with queue `{1, 3, 5, 7, 11}` and the hand at the tail `11`,
an update of key `1` that overflows the cache evicted key `3`, while the
canonical `SIEVE` hand inspects and evicts key `11` first.
Pass `inserted` instead, so the tail is protected only when a new entry was
actually appended. Add a unit test reproducing the scenario.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Merged Addressed the open review finding on The only red on the previous run was |
# Conflicts: # src/Core/ServerSettings.cpp
LLVM Coverage Report |

Implements the
SIEVEcache eviction policy in the cache framework (CacheBase), available alongside the existingLRUandSLRUpolicies.SIEVEis a simple, scan-resistant eviction algorithm: a single "hand" walks a FIFO queue of entries and evicts the first entry whosevisitedflag is clear, clearing the flag as it passes set entries. See the paper for details: https://junchengyang.com/publication/nsdi24-SIEVE.pdfThis continues the original work by @Vectorshine, brought up to date with
master:SIEVECachePolicyto match the currentICachePolicyinterface (CurrentMetricstracking,maxCount,contains, per-entryOnRemoveEntryFunction), mirroringLRUCachePolicy.removewhere the stored queue iterator was advanced in place and the wrong node was erased.gtest_sieve_cache.cpp, including a step-by-step test of the eviction logic.LRU,SLRUandSIEVEin stateless functional tests (ci/jobs/functional_tests.py) and the stress test (tests/docker_scripts/stress_runner.sh).SIEVEis only implemented for theCacheBase-backed caches, so the filesystem cache (backed byFileCachePolicy) is not randomized to it, and it is excluded during bugfix validation, which runs downloaded master-side binaries.Closes: #58910
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Added a new cache eviction policy
SIEVE. It can be selected for the in-memory caches backed byCacheBase, such as the mark cache and the uncompressed cache, via the corresponding*_cache_policyserver settings (for example,mark_cache_policyanduncompressed_cache_policy). It is not available for the filesystem cache.Documentation entry for user-facing changes