Finer-grained subquery cache propagation control by vvo · Pull Request #99804 · ClickHouse/ClickHouse · GitHub
Skip to content

Finer-grained subquery cache propagation control#99804

Merged
alexey-milovidov merged 1 commit into
ClickHouse:masterfrom
vvo:subquery-cache-propagation
May 19, 2026
Merged

Finer-grained subquery cache propagation control#99804
alexey-milovidov merged 1 commit into
ClickHouse:masterfrom
vvo:subquery-cache-propagation

Conversation

@vvo

@vvo vvo commented Mar 17, 2026

Copy link
Copy Markdown
Contributor

Implements finer-grained subquery cache propagation control, as requested in the review of #76252 by @alexey-milovidov.

use_query_cache on the outer query no longer auto-propagates to subqueries. Three rules implemented in shouldUseQueryCacheForSubquery:

  1. No auto-propagation by defaultuse_query_cache on the outer query does NOT propagate to subqueries
  2. Explicit per-subquery opt-inSELECT * FROM (SELECT ... SETTINGS use_query_cache = true) enables caching for that specific subquery
  3. Bulk propagationquery_cache_for_subqueries = true enables propagation into all subqueries

Also fixes two issues from #76252:

  • Config key mismatch: the rename from QueryCacheQueryResultCache changed keys in Context.cpp but not in config files, breaking SYSTEM RELOAD CONFIG for cache settings
  • All 03381_* tests now explicitly SET enable_analyzer = 1 since Planner-level subquery caching is new-analyzer only

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Individual subquery results can now be cached independently using SETTINGS use_query_cache = true on specific subqueries, without caching the entire outer query. A new setting query_cache_for_subqueries = true enables bulk propagation of use_query_cache into all subqueries. Note: use_query_cache on the outer query no longer auto-propagates to subqueries.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

A new "Subquery Caching" section has been added to the query cache documentation explaining the three propagation modes with examples.

Version info

  • Merged into: 26.5.1.844

@vvo vvo marked this pull request as ready for review March 17, 2026 20:26
@vvo vvo mentioned this pull request Mar 17, 2026
1 task
@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Mar 18, 2026
@clickhouse-gh

clickhouse-gh Bot commented Mar 18, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-improvement Pull request with some product improvements label Mar 18, 2026
@alexey-milovidov

Copy link
Copy Markdown
Member

Thanks! This is exactly what we need!
Please check test failures and update accordingly.

@vvo vvo force-pushed the subquery-cache-propagation branch from d6f0b0a to 107a40c Compare March 18, 2026 10:49
@vvo

vvo commented Mar 18, 2026

Copy link
Copy Markdown
Contributor Author

@alexey-milovidov Here's the status on implementing your feedback for finer-grained subquery
cache propagation.

The three rules from your feedback are implemented in
shouldUseQueryCacheForSubquery:

  1. No auto-propagation by default
  2. Explicit SETTINGS use_query_cache = true on a subquery enables caching for that subquery
  3. query_cache_for_subqueries = true enables bulk propagation

Along the way we also fixed two issues from the base PR (#76252):

  • The rename from QueryCacheQueryResultCache changed config keys in
    Context.cpp from query_cache.* to query_result_cache.*, but config files still use query_cache.*
    — so SYSTEM RELOAD CONFIG was silently ignored. This broke test_query_cache_size_is_runtime_configurable.
  • All 03381_* tests now explicitly SET enable_analyzer = 1 since the Planner-level subquery caching is new-analyzer only.

Does the overall approach look right to you?

Comment thread src/Interpreters/PreparedSets.cpp Outdated
Comment thread src/Core/SettingsChangesHistory.cpp
Comment thread src/Interpreters/ExecuteScalarSubqueriesVisitor.cpp Outdated
Comment thread src/Planner/Planner.cpp Outdated
Comment thread src/Core/Settings.cpp
Comment thread src/Interpreters/Cache/QueryResultCache.h
@clickhouse-gh clickhouse-gh Bot added the manual approve Manual approve required to run CI label Mar 19, 2026
@rschu1ze rschu1ze self-assigned this Mar 19, 2026
@vvo

vvo commented Mar 20, 2026

Copy link
Copy Markdown
Contributor Author

@rschu1ze @alexey-milovidov I think we're done here, let me know!

Comment thread src/Interpreters/Cache/QueryResultCache.cpp
@vvo

vvo commented Apr 1, 2026

Copy link
Copy Markdown
Contributor Author

@rschu1ze @alexey-milovidov do we still want this?

@rschu1ze

rschu1ze commented Apr 1, 2026

Copy link
Copy Markdown
Member

Yes. Haven't had time to check yet, sorry.

Comment thread src/Interpreters/Cache/QueryResultCache.cpp Outdated
@alexey-milovidov

Copy link
Copy Markdown
Member

The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix.

@alexey-milovidov

Copy link
Copy Markdown
Member

The failures of "Flaky check" in "functions_bad_arguments" will be fixed by #101994.

Comment thread src/Planner/Planner.cpp Outdated
Comment thread src/Interpreters/Cache/QueryResultCache.cpp Outdated
alexey-milovidov added a commit to vvo/ClickHouse that referenced this pull request May 11, 2026
Address review feedback on PR ClickHouse#99804. The error messages thrown by
`checkCanWriteQueryResultCache` for queries containing non-deterministic
functions or system tables had the awkward "regardless or to omit caching"
phrasing. Reword to "regardless, or omit caching" for clarity.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
alexey-milovidov added a commit to vvo/ClickHouse that referenced this pull request May 11, 2026
Address review feedback on PR ClickHouse#99804. `removeTableAliases` used
`starts_with("__table")` to strip planner-generated table aliases from
the AST before computing the subquery cache key. Any user-visible
identifier starting with the same prefix (e.g. `__table_prod`) was
incorrectly normalized away, so a query like
`SELECT x FROM __table_prod` could collide with `SELECT x FROM prod` in
the subquery cache and return wrong results.

The analyzer's `createUniqueAliasesIfNecessary` generates aliases of the
exact form `__table<N>` where `N` is a non-empty sequence of digits
(`__table1`, `__table2`, etc.). Restrict the alias-stripping check to
that exact pattern via a new `isPlannerGeneratedTableAlias` helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
alexey-milovidov added a commit to vvo/ClickHouse that referenced this pull request May 17, 2026
…vior

After commit abf0a27 gated `query_cache_for_subqueries` Planner-level
caching on `is_subquery`, the top-level query is no longer recorded under the
`is_subquery = 1` namespace — only actual subqueries are. The 03381_* test
references and SQL comments were still asserting the old "outer + inner" count
and so the Fast test was failing on all 10 affected tests.

- Update test references to match the new actual outputs.
- Update misleading SQL comments that referred to the outer node as a subquery.
- Move `query_cache_for_subqueries` from the `26.4` to the `26.5` section of
  `SettingsChangesHistory.cpp` so the upgrade-from-26.4 check in
  `02995_new_settings_history` recognises it.

CI report: ClickHouse#99804

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@alexey-milovidov alexey-milovidov removed the manual approve Manual approve required to run CI label May 18, 2026
Comment thread src/Core/SettingsChangesHistory.cpp Outdated
Comment thread src/Interpreters/PreparedSets.cpp
Comment thread src/Core/SettingsChangesHistory.cpp Outdated
}

IASTHash calculateASTHash(ASTPtr ast, const String & current_database, const Settings & settings)
IASTHash calculateAstHash(ASTPtr ast, const String & current_database, const Settings & settings, const bool is_subquery, const bool already_cloned = false)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our code style demands capital letters for every abbreviated word. AST, not Ast.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed in 47aecf5calculateAstHashcalculateASTHash and calculateAstHashAndQueryStringcalculateASTHashAndQueryString.

@alexey-milovidov alexey-milovidov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few rough edges, but mostly good.

@alexey-milovidov alexey-milovidov self-assigned this May 18, 2026
Comment thread src/Planner/Planner.cpp
Comment thread src/Interpreters/Cache/QueryResultCache.cpp
@CLAassistant

CLAassistant commented May 19, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

alexey-milovidov added a commit to vvo/ClickHouse that referenced this pull request May 19, 2026
Address review feedback on PR ClickHouse#99804. The error messages thrown by
`checkCanWriteQueryResultCache` for queries containing non-deterministic
functions or system tables had the awkward "regardless or to omit caching"
phrasing. Reword to "regardless, or omit caching" for clarity.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
alexey-milovidov added a commit to vvo/ClickHouse that referenced this pull request May 19, 2026
Address review feedback on PR ClickHouse#99804. `removeTableAliases` used
`starts_with("__table")` to strip planner-generated table aliases from
the AST before computing the subquery cache key. Any user-visible
identifier starting with the same prefix (e.g. `__table_prod`) was
incorrectly normalized away, so a query like
`SELECT x FROM __table_prod` could collide with `SELECT x FROM prod` in
the subquery cache and return wrong results.

The analyzer's `createUniqueAliasesIfNecessary` generates aliases of the
exact form `__table<N>` where `N` is a non-empty sequence of digits
(`__table1`, `__table2`, etc.). Restrict the alias-stripping check to
that exact pattern via a new `isPlannerGeneratedTableAlias` helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
alexey-milovidov added a commit to vvo/ClickHouse that referenced this pull request May 19, 2026
Address review feedback on PR ClickHouse#99804. `shouldUseQueryCacheForSubquery`
was returning `true` for the top-level query when both
`use_query_cache = 1` and `query_cache_for_subqueries = 1` were set.
That caused the top-level query to be cached twice: once by
`executeQuery` under the `is_subquery = false` key, and once again by
the Planner under the `is_subquery = true` key.

Gate the `query_cache_for_subqueries` propagation branch on
`is_subquery` so only real subqueries enter the Planner-level cache
namespace, matching the setting's contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
alexey-milovidov added a commit to vvo/ClickHouse that referenced this pull request May 19, 2026
…vior

After commit abf0a27 gated `query_cache_for_subqueries` Planner-level
caching on `is_subquery`, the top-level query is no longer recorded under the
`is_subquery = 1` namespace — only actual subqueries are. The 03381_* test
references and SQL comments were still asserting the old "outer + inner" count
and so the Fast test was failing on all 10 affected tests.

- Update test references to match the new actual outputs.
- Update misleading SQL comments that referred to the outer node as a subquery.
- Move `query_cache_for_subqueries` from the `26.4` to the `26.5` section of
  `SettingsChangesHistory.cpp` so the upgrade-from-26.4 check in
  `02995_new_settings_history` recognises it.

CI report: ClickHouse#99804

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@alexey-milovidov alexey-milovidov force-pushed the subquery-cache-propagation branch from 4de2595 to ca58e56 Compare May 19, 2026 04:28
Implements finer-grained subquery cache propagation control, as requested
in the review of ClickHouse#76252.

`use_query_cache` on the outer query no longer auto-propagates to subqueries.
Three rules implemented in `shouldUseQueryCacheForSubquery`:

1. No auto-propagation by default: `use_query_cache` on the outer query does
   not propagate to subqueries.
2. Explicit per-subquery opt-in:
   `SELECT * FROM (SELECT ... SETTINGS use_query_cache = true)` enables caching
   for that specific subquery.
3. Bulk propagation: `query_cache_for_subqueries = true` enables propagation
   into all subqueries.

Also fixes two issues from ClickHouse#76252:
- Config key mismatch: the rename `QueryCache` -> `QueryResultCache` changed
  keys in `Context.cpp` but not in config files, breaking
  `SYSTEM RELOAD CONFIG` for cache settings.
- All `03381_*` tests now explicitly `SET enable_analyzer = 1` since
  Planner-level subquery caching is new-analyzer only.

Co-authored-by: Баранник Никита Павлович <npbarannik@edu.hse.ru>
Co-authored-by: Robert Schulze <robert@clickhouse.com>
Co-authored-by: Alexey Milovidov <milovidov@clickhouse.com>
@alexey-milovidov alexey-milovidov force-pushed the subquery-cache-propagation branch from ca58e56 to c494ad1 Compare May 19, 2026 04:39
Comment thread src/Planner/Planner.cpp
/// When should_cache is true but the outer query didn't set use_query_cache (explicit subquery opt-in),
/// skip the context flag check in checkCanWriteQueryResultCache while still respecting safety checks.
bool skip_context_check = should_cache && !can_use_query_result_cache;
if (should_cache && checkCanWriteQueryResultCache(ast, query_context, skip_context_check))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This write path can cache subquery results even when the outer query has use_query_cache = 0 (explicit per-subquery opt-in uses skip_context_check = true). In that case we bypass the non-throw overflow-mode guard that only exists in executeQuery.

That reintroduces the truncated-result risk from bug 67476 for subqueries: a subquery can run with *_overflow_mode != 'throw', produce partial output, and still be written into the cache.

Could we apply the same overflow-mode precondition here (or inside checkCanWriteQueryResultCache) before adding StreamInQueryResultCacheStep?

@clickhouse-gh

clickhouse-gh Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.20% +0.10%
Functions 91.40% 91.40% +0.00%
Branches 76.60% 76.60% +0.00%

Changed lines: 94.38% (302/320) | lost baseline coverage: 9 line(s) · Uncovered code

Full report · Diff report

@alexey-milovidov alexey-milovidov disabled auto-merge May 19, 2026 15:59
@alexey-milovidov alexey-milovidov merged commit 439aa63 into ClickHouse:master May 19, 2026
164 of 165 checks passed
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label May 19, 2026
@clickgapai

Copy link
Copy Markdown
Contributor

Hi @vvo @alexey-milovidov — while reviewing this PR I found the following:

Happy to discuss — close anything that's wrong or already addressed.

@clickgapai

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants