Ensure query_log entry is written even if pipeline finalization fails#102165
Conversation
The test failed in CI (Stateless tests, amd_asan_ubsan, distributed plan) because the wait loop that polls `system.query_log` filtered by `Settings['query_cache_tag']`, and this filter intermittently returned no rows despite the entries existing. The exact cause is unclear but may be related to how the large number of randomized settings interacts with the `Settings` Map column access under ASan. The `query_cache_tag` filter is redundant for isolation — the `query_id` values already include the unique per-run database name. The tag is still set in the HTTP URLs so that cache entries are properly isolated between test runs; it is only removed from the `query_log` WHERE clauses. CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=102022&sha=e0736f2b5221c7470a2381247890943ae675122e PR: #102022 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In `BlockIO::onFinish`, `finalize_query_pipeline` is called before `finish_callbacks` (which write the `QueryFinish` entry to `query_log`). If `finalize_query_pipeline` throws (e.g. during query result cache write finalization in `finalizeWriteInQueryResultCache`), the `finish_callbacks` were never called, silently losing the `query_log` entry. This explains the most common failure pattern for test `02494_query_cache_http_introspection`: the cache-miss query (_q1) writes to the query result cache during finalization, and if that throws, its `QueryFinish` entry is lost. The cache-hit query (_q2) has no cache write during finalization and succeeds. Analysis of the CI database confirmed that in 10+ out of 22 real failures, only _q1 was missing while _q2 was present — matching this theory exactly. The fix catches exceptions from `finalize_query_pipeline`, ensures `finish_callbacks` are always called (with a default-constructed `QueryPipelineFinalizedInfo` if the original threw), and then re-throws the exception. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit 9ce0251.
|
The |
Algunenano
left a comment
There was a problem hiding this comment.
I've checked https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=102022&sha=e0736f2b5221c7470a2381247890943ae675122e&name_0=PR and I don't see an exception being logged during query result cache write finalization, maybe I've missed it but I don't see it under test_g66i2hca_q1 or test_g66i2hca_q2.
But even if it was the case where the result cache write failed, this seems like the wrong solution to the problem. If finalize_query_pipeline then query_pipeline_finalized_info won't be populated and we'll log incorrect (empty) profile data.
Wouldn't it make more sense to capture narrower exceptions in finalizeWriteInQueryResultCache and similar places instead? That way the query_pipeline will still be reset and we'll get the true profile counters to log
|
The flaky check failure is fixed in #102148, let's update the branch. |
…he-http-introspection
Per review feedback, handle exceptions from `finalizeWriteInQueryResultCache` inside `finalizeQueryPipelineBeforeLogging` itself so that profile counters are still collected and the pipeline is still reset (yielding a proper `QueryFinish` entry) even if the cache write finalization fails. Also harden `BlockIO::onFinish` so that an exception thrown by a `finish_callbacks` callback does not shadow the original exception from `finalize_query_pipeline`. #102165
…he-http-introspection
…he-http-introspection
…he-http-introspection
Per review feedback, revert the exception handling added around `finalize_query_pipeline` in `BlockIO::onFinish`. The narrow fix in `finalizeQueryPipelineBeforeLogging` (commit 21ba598) already ensures that a failure of `finalizeWriteInQueryResultCache` does not lose the `QueryFinish` entry in `query_log`, so the broader change in `BlockIO::onFinish` is no longer needed. The broader change also created a failure-atomicity concern: running all `finish_callbacks` after a finalization exception would run success-only side effects such as `implicit_tcl_executor->commit` on a query that has failed. #102165
…he-http-introspection
… fails When `finalizeWriteInQueryResultCache` throws, the exception is caught and logged so that the `QueryFinish` entry is still written to `query_log`. However, `query_result_cache_usage` was still set to `Write` in `query_log`, falsely reporting that the failed write succeeded. Propagate a `query_result_cache_write_failed` flag through `QueryPipelineFinalizedInfo` so `logQueryFinishImpl` can downgrade the logged usage from `Write` to `None` when the cache write actually failed, keeping `query_log` accurate for cache observability. #102165
…he-http-introspection
…he-http-introspection
…he-http-introspection
…he-http-introspection
LLVM Coverage Report
Changed lines: 83.33% (15/18) | lost baseline coverage: 1 line(s) · Uncovered code |

In
BlockIO::onFinish,finalize_query_pipelineis called beforefinish_callbacks(which write theQueryFinishentry toquery_log). Iffinalize_query_pipelinethrows (e.g. during query result cache write finalization infinalizeWriteInQueryResultCache), thefinish_callbackswere never called, silently losing thequery_logentry.This explains the most common failure pattern for flaky test
02494_query_cache_http_introspection: the cache-miss query (_q1) writes to the query result cache during finalization, and if that throws, itsQueryFinishentry is lost. The cache-hit query (_q2) has no cache write during finalization and succeeds.Analysis of the CI database (180 days, 40+ failures) confirmed that in 10+ out of 22 real failures, only
_q1was missing while_q2was present — matching this theory exactly.The fix catches exceptions from
finalize_query_pipeline, ensuresfinish_callbacksare always called (with a default-constructedQueryPipelineFinalizedInfoif the original threw), and then re-throws the exception.CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=102022&sha=e0736f2b5221c7470a2381247890943ae675122e
PR: #102022
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
...
Documentation entry for user-facing changes
Version info
26.5.1.217