Ensure query_log entry is written even if pipeline finalization fails by alexey-milovidov · Pull Request #102165 · ClickHouse/ClickHouse · GitHub
Skip to content

Ensure query_log entry is written even if pipeline finalization fails#102165

Merged
alexey-milovidov merged 17 commits into
masterfrom
fix-02494-query-cache-http-introspection
May 1, 2026
Merged

Ensure query_log entry is written even if pipeline finalization fails#102165
alexey-milovidov merged 17 commits into
masterfrom
fix-02494-query-cache-http-introspection

Conversation

@alexey-milovidov

@alexey-milovidov alexey-milovidov commented Apr 9, 2026

Copy link
Copy Markdown
Member

In BlockIO::onFinish, finalize_query_pipeline is called before finish_callbacks (which write the QueryFinish entry to query_log). If finalize_query_pipeline throws (e.g. during query result cache write finalization in finalizeWriteInQueryResultCache), the finish_callbacks were never called, silently losing the query_log entry.

This explains the most common failure pattern for flaky test 02494_query_cache_http_introspection: the cache-miss query (_q1) writes to the query result cache during finalization, and if that throws, its QueryFinish entry is lost. The cache-hit query (_q2) has no cache write during finalization and succeeds.

Analysis of the CI database (180 days, 40+ failures) confirmed that in 10+ out of 22 real failures, only _q1 was missing while _q2 was present — matching this theory exactly.

The fix catches exceptions from finalize_query_pipeline, ensures finish_callbacks are always called (with a default-constructed QueryPipelineFinalizedInfo if the original threw), and then re-throws the exception.

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=102022&sha=e0736f2b5221c7470a2381247890943ae675122e
PR: #102022

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Version info

  • Merged into: 26.5.1.217

alexey-milovidov and others added 3 commits April 9, 2026 06:33
The test failed in CI (Stateless tests, amd_asan_ubsan, distributed plan)
because the wait loop that polls `system.query_log` filtered by
`Settings['query_cache_tag']`, and this filter intermittently returned
no rows despite the entries existing. The exact cause is unclear but
may be related to how the large number of randomized settings interacts
with the `Settings` Map column access under ASan.

The `query_cache_tag` filter is redundant for isolation — the `query_id`
values already include the unique per-run database name. The tag is still
set in the HTTP URLs so that cache entries are properly isolated between
test runs; it is only removed from the `query_log` WHERE clauses.

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=102022&sha=e0736f2b5221c7470a2381247890943ae675122e
PR: #102022

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In `BlockIO::onFinish`, `finalize_query_pipeline` is called before
`finish_callbacks` (which write the `QueryFinish` entry to `query_log`).
If `finalize_query_pipeline` throws (e.g. during query result cache
write finalization in `finalizeWriteInQueryResultCache`), the
`finish_callbacks` were never called, silently losing the `query_log`
entry.

This explains the most common failure pattern for test
`02494_query_cache_http_introspection`: the cache-miss query (_q1)
writes to the query result cache during finalization, and if that
throws, its `QueryFinish` entry is lost. The cache-hit query (_q2)
has no cache write during finalization and succeeds. Analysis of the
CI database confirmed that in 10+ out of 22 real failures, only _q1
was missing while _q2 was present — matching this theory exactly.

The fix catches exceptions from `finalize_query_pipeline`, ensures
`finish_callbacks` are always called (with a default-constructed
`QueryPipelineFinalizedInfo` if the original threw), and then
re-throws the exception.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clickhouse-gh

clickhouse-gh Bot commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-ci label Apr 9, 2026
Comment thread src/QueryPipeline/BlockIO.cpp
@alexey-milovidov

Copy link
Copy Markdown
Member Author

The Fast test (arm_darwin) failure is unrelated to this PR — the fix is in #102184.

@Algunenano Algunenano self-assigned this Apr 9, 2026

@Algunenano Algunenano left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=102022&sha=e0736f2b5221c7470a2381247890943ae675122e&name_0=PR and I don't see an exception being logged during query result cache write finalization, maybe I've missed it but I don't see it under test_g66i2hca_q1 or test_g66i2hca_q2.

But even if it was the case where the result cache write failed, this seems like the wrong solution to the problem. If finalize_query_pipeline then query_pipeline_finalized_info won't be populated and we'll log incorrect (empty) profile data.

Wouldn't it make more sense to capture narrower exceptions in finalizeWriteInQueryResultCache and similar places instead? That way the query_pipeline will still be reset and we'll get the true profile counters to log

@alexey-milovidov

Copy link
Copy Markdown
Member Author

The flaky check failure is fixed in #102148, let's update the branch.

Per review feedback, handle exceptions from `finalizeWriteInQueryResultCache`
inside `finalizeQueryPipelineBeforeLogging` itself so that profile counters
are still collected and the pipeline is still reset (yielding a proper
`QueryFinish` entry) even if the cache write finalization fails.

Also harden `BlockIO::onFinish` so that an exception thrown by a
`finish_callbacks` callback does not shadow the original exception from
`finalize_query_pipeline`.

#102165
Comment thread src/QueryPipeline/BlockIO.cpp
Per review feedback, revert the exception handling added around
`finalize_query_pipeline` in `BlockIO::onFinish`. The narrow fix in
`finalizeQueryPipelineBeforeLogging` (commit 21ba598) already ensures
that a failure of `finalizeWriteInQueryResultCache` does not lose the
`QueryFinish` entry in `query_log`, so the broader change in
`BlockIO::onFinish` is no longer needed.

The broader change also created a failure-atomicity concern: running
all `finish_callbacks` after a finalization exception would run
success-only side effects such as `implicit_tcl_executor->commit` on
a query that has failed.

#102165
Comment thread src/Interpreters/executeQuery.cpp
… fails

When `finalizeWriteInQueryResultCache` throws, the exception is caught
and logged so that the `QueryFinish` entry is still written to
`query_log`. However, `query_result_cache_usage` was still set to
`Write` in `query_log`, falsely reporting that the failed write
succeeded.

Propagate a `query_result_cache_write_failed` flag through
`QueryPipelineFinalizedInfo` so `logQueryFinishImpl` can downgrade the
logged usage from `Write` to `None` when the cache write actually
failed, keeping `query_log` accurate for cache observability.

#102165
@clickhouse-gh

clickhouse-gh Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.00% 84.00% +0.00%
Functions 91.10% 91.10% +0.00%
Branches 76.60% 76.60% +0.00%

Changed lines: 83.33% (15/18) | lost baseline coverage: 1 line(s) · Uncovered code

Full report · Diff report

@alexey-milovidov

Copy link
Copy Markdown
Member Author

@alexey-milovidov alexey-milovidov dismissed Algunenano’s stale review April 29, 2026 16:53

Addressed the review.

@alexey-milovidov alexey-milovidov added this pull request to the merge queue May 1, 2026
Merged via the queue into master with commit dbb85b7 May 1, 2026
325 of 326 checks passed
@alexey-milovidov alexey-milovidov deleted the fix-02494-query-cache-http-introspection branch May 1, 2026 05:22
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-ci pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants