Allowlist throwIf upgrade-check errors from 04341_broken_mutation_part_log_flush#108856
Conversation
…part_log_flush` The stateless test `04341_broken_mutation_part_log_flush` runs an intentionally-failing async mutation `ALTER TABLE ... UPDATE x = x + throwIf(1) WHERE 1` (expecting `serverError UNFINISHED`) to assert that the failed `MutatePart` `part_log` entry is flushed before the failure is reported to the client. After the upgrade restart the broken mutation is retried in the background and logs `Value passed to 'throwIf' function is non-zero` (`FUNCTION_THROW_IF_VALUE_IS_NON_ZERO`, Code: 395) to `clickhouse-server.upgrade.log`, which fails the `Upgrade check` with `Error message in clickhouse-server.log (see upgrade_error_messages.txt)`. This is the expected test-induced error, not a backward incompatibility or a regression - exactly the same class as the `CANNOT_PARSE_TEXT` mutation errors already allowlisted (issue #39174). Note the table uses a plain `MergeTree`, so the error is emitted by `MutatePlainMergeTreeTask` and is not covered by the existing `MutateFromLogEntryTask` exclusion. `throwIf` only raises when a query explicitly calls it with a truthy argument, so the message can only come from such a test, never from an upgrade incompatibility; the narrow message substring also covers the wrapping `MergeTreeBackgroundExecutor` line. The failure is not caused by any product change - it surfaces on any pull request that runs the Upgrade check. Observed master-wide on ~19 distinct pull requests over the four days 2026-06-26..2026-06-29 (confirmed via play.clickhouse.com `checks` history), e.g. PR #108657. This mirrors the earlier allowlist additions for `04338` ('x'), `01414` ('Hello') and `00834` ('a'/'b'). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…icated family
The narrow allowlisted message matches the `throwIf` mutation error from
both `04341_broken_mutation_part_log_flush` (plain `MergeTree`,
`MutatePlainMergeTreeTask`) and the
`02597_column_{update,delete,update_tricky_expression}_and_replication`
tests (`ReplicatedMergeTree`, where only the first `MutateFromLogEntryTask`
line was already excluded and the wrapping `MergeTreeBackgroundExecutor`
line leaked). Matching the message rather than the task type covers all
leaking lines. No behavior change beyond the comment; the rg entry is
unchanged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address review feedback from `clickhouse-gh`: the previous fixed-string exclusion `-e "Value passed to 'throwIf' function is non-zero"` dropped every `<Error>` line containing that default message globally, which is broader than the invariant the filter protects. The default `throwIf` message is generic, so a real upgrade regression in any other background or persisted object that uses `throwIf` as an assertion would silently disappear from `upgrade_error_messages.txt`. Move the exclusion from the primary fixed-string list into the secondary regex pipe, pairing the message with the mutation-task logger that emits it, exactly like the other scoped exclusions in this file: - `MutatePlainMergeTreeTask` + the message (plain `MergeTree`, test `04341_broken_mutation_part_log_flush`); - `MergeTreeBackgroundExecutor` + the message (the wrapping background-executor line, present for both the plain and `ReplicatedMergeTree` cases). The replicated first line is already covered by the existing `MutateFromLogEntryTask` exclusion, so all leaking lines remain suppressed while `throwIf` errors from any other context now surface. Verified against the real leaking log from the example report (https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=108657&sha=70df14b1ec107f06eb401ecb1bda859af68e52a6&name_0=PR&name_1=Upgrade%20check%20%28amd_release%29): all 273 `MutatePlainMergeTreeTask`/`MergeTreeBackgroundExecutor` retry lines are dropped, while a synthetic unrelated `throwIf` error (e.g. from a materialized view assertion) still reaches `upgrade_error_messages.txt` - whereas the old global filter masked it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Addressed the I verified the change against the real leaking log from the example report (
Also merged The only red checks are |
Per @azat's review, collapse the two scoped `throwIf` exclusions (`MutatePlainMergeTreeTask.*` and `MergeTreeBackgroundExecutor.*`) into a single global `grep -av -e "Value passed to 'throwIf' function is non-zero"` entry in the secondary pipe. `throwIf` is a user-level SQL function that only raises when a query explicitly calls it with a truthy argument, so the default `FUNCTION_THROW_IF_VALUE_IS_NON_ZERO` message can only ever originate from such a test query, never from a background or internal assertion - the ClickHouse internals never use `throwIf` for their own checks. Matching the message alone is therefore safe to suppress globally, exactly like the already-global `Code: 236 ... Cancelled mutating parts` message that the same cancelled test mutations emit. Matching the message rather than the task type also keeps the wrapping `MergeTreeBackgroundExecutor` line of the replicated case covered in a single entry. Rewrote the comment block to explain the global match. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Pushed be05d1c addressing @azat's review: collapsed the two scoped The global match is safe because Resolved the review thread. The only red on the previous commit was |
|
continue-pr pass (head
Blocked on the stress-flake rerun and human merge. |
|
continue-pr pass (head
Blocked solely on human merge. |
|
continue-pr pass — re-merged Since the last pass,
State on the new head:
Fresh CI is running on |
…4341-throwif # Conflicts: # tests/docker_scripts/upgrade_runner.sh
|
continue-pr pass — re-merged The PR had gone
State on the new head:
Blocked on fresh CI and human merge. |
|
continue-pr pass (head
@groeneai, please investigate the failure https://github.com/ClickHouse/clickhouse-private/actions/runs/28506310479/job/84499776097 and provide a fix in a separate PR — it looks like a missing Blocked on the private stress rerun / a separate |
|
@alexey-milovidov Investigated STID 2884-409a. Summary of what I found and where I'm blocked. Mechanism (as you diagnosed): Public surface is already guarded. The setting is enabled in public CI too ( Only public reproduction of this error in CIDB (60d) is on an unrelated open PR adding Where I'm blocked. I can't reach the private run ( Ask: could you paste the server-side stack trace for the
No existing tracking issue or fix PR found for this STID. Unrelated finding while probing: Session id: cron:clickhouse-worker-slot-0:20260701-105800 |

The
Upgrade checkfails on many unrelated pull requests withError message in clickhouse-server.log (see upgrade_error_messages.txt). The leaked error is:It originates from stateless tests that deliberately run a failing async mutation
ALTER TABLE ... UPDATE <col> = <col> <op> throwIf(1):04341_broken_mutation_part_log_flush— plainMergeTree, so the error is emitted byMutatePlainMergeTreeTask, which is not covered by the existingMutateFromLogEntryTaskallowlist entry. (This is the variant seen in the example report below.)02597_column_{update,delete,update_tricky_expression}_and_replication—ReplicatedMergeTree, whereMutateFromLogEntryTaskcatches only the first line and the wrappingMergeTreeBackgroundExecutorline leaks.After the upgrade restart the broken mutation is retried in the background and logged to
clickhouse-server.upgrade.log. This is the expected test-induced error, not a backward incompatibility — exactly the same class as theCANNOT_PARSE_TEXTmutation errors already allowlisted (issue #39174, tests00834/01414/04338).throwIfonly raises when a query explicitly calls it with a truthy argument, so matching the message substring can only match such a test, never an upgrade incompatibility; matching the message rather than the task type also covers the wrappingMergeTreeBackgroundExecutorline.Master-wide on ~19 distinct pull requests over 2026-06-26..2026-06-29 (confirmed via
play.clickhouse.comcheckshistory). Example report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=108657&sha=70df14b1ec107f06eb401ecb1bda859af68e52a6&name_0=PR&name_1=Upgrade%20check%20%28amd_release%29Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Documentation entry for user-facing changes
Version info
26.7.1.381