Mark caught LOGICAL_ERROR exception as logged in BackupDataFileNameGenerator gtest by groeneai · Pull Request #106668 · ClickHouse/ClickHouse · GitHub
Skip to content

Mark caught LOGICAL_ERROR exception as logged in BackupDataFileNameGenerator gtest#106668

Merged
jkartseva merged 1 commit into
ClickHouse:masterfrom
groeneai:groeneai/fix-mark-as-logged-backup-data-file-name-gtest
Jul 2, 2026
Merged

Mark caught LOGICAL_ERROR exception as logged in BackupDataFileNameGenerator gtest#106668
jkartseva merged 1 commit into
ClickHouse:masterfrom
groeneai:groeneai/fix-mark-as-logged-backup-data-file-name-gtest

Conversation

@groeneai

@groeneai groeneai commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Related: #100177

BackupDataFileNameGeneratorTest.ThrowsOnZeroChecksum deliberately throws DB::Exception(LOGICAL_ERROR, ...) from getBackupDataFileName to validate the zero-checksum guard. On non-debug, non-sanitizer builds (e.g. Unit tests (amd_llvm_coverage)) the test passes, but Exception::~Exception then logs the exception with a stack trace via ForcedCriticalErrorsLogger because LOGICAL_ERROR is "important" unless markAsLogged was called. The trace ends up on the gtest binary's stderr, after the test's [ OK ] line.

When an unrelated test in the same unit_tests_dbms binary later hangs and the gdb -batch ... -ex run runner times out and kills the binary mid-suite, gtest.json is never written. Praktika's from_gtest parser falls back to log-scraping, which then surfaces the stray Backup checksum should not be zero (test.bin). (LOGICAL_ERROR) trace as a fake -FunctionsStress failure in CIDB. 16 such hits across Unit tests (amd_llvm_coverage) on PRs #100177, #105784, #96100 in the last 60 days — all attributable to the underlying SchedulerWorkloadResourceManager / FunctionsStress chronic hang family, not the backup test.

Replace EXPECT_THROW(..., Exception) with an equivalent try block that calls markAsLogged on the caught exception, mirroring the pattern from commit 64a4caa39242 ("Mark fuzzer exceptions as logged to prevent ForcedCriticalErrorsLogger") in src/Interpreters/executeQuery.cpp. Verified locally with a release-mode unit_tests_dbms (-O3 -DNDEBUG, ENABLE_TESTS=1, no sanitizer): the pre-fix binary prints a 17-frame stack trace to stderr on this test, while the post-fix binary runs the test silently with [ OK ] and no trace.

CI report containing the trace: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100177&sha=6e1a7cdc1170bb75bbcc76d5c46ba23c6cc38855&name_0=PR&name_1=Unit%20tests%20%28amd_llvm_coverage%29

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

Version info

  • Merged into: 26.7.1.406

…rowsOnZeroChecksum

The test deliberately exercises the zero-checksum guard in
`getBackupDataFileName`, which throws `DB::Exception(LOGICAL_ERROR, ...)`.
On non-debug, non-sanitizer builds (e.g. `Unit tests (amd_llvm_coverage)`)
the test passes, but `Exception::~Exception()` then logs the exception
through `ForcedCriticalErrorsLogger` because `LOGICAL_ERROR` is treated
as an "important" error code unless `markAsLogged` has been called. The
resulting `Code: 49. DB::Exception: Backup checksum should not be zero
(test.bin). (LOGICAL_ERROR)` line plus its full stack trace ends up on
the gtest binary's stderr after the test's `[ OK ]` line.

When an unrelated test in the same `unit_tests_dbms` binary later hangs
(e.g. `SchedulerWorkloadResourceManager.DropNotEmptyQueueLong` per the
report linked below), the wrapping `gdb -batch ... -ex run` runner times
out, kills the binary mid-suite, and `gtest.json` is never written. The
Praktika `from_gtest` parser then has nothing to parse and falls back to
log-scraping, which surfaces the leaked `Backup checksum should not be
zero` stack trace as a fake `-FunctionsStress` failure in CIDB. Sixteen
hits across `Unit tests (amd_llvm_coverage)` on PRs ClickHouse#100177, ClickHouse#105784,
ClickHouse#96100 over the past 60 days follow this pattern; the underlying
contributor is the chronic `SchedulerWorkloadResourceManager` /
`FunctionsStress` family hang, not the backup test.

Replace `EXPECT_THROW(..., Exception)` with an equivalent `try` block
that calls `markAsLogged` on the caught exception, mirroring the pattern
already used in `executeQuery.cpp` (commit 64a4caa "Mark fuzzer
exceptions as logged to prevent ForcedCriticalErrorsLogger") to suppress
the stray stack trace. Verified locally: built `unit_tests_dbms` in
release mode (`-O3 -DNDEBUG`, `ENABLE_TESTS=1`, no sanitizer) and
confirmed the pre-fix binary prints a 17-frame stack trace to stderr on
this test, while the post-fix binary runs the test silently with
`[ OK ]` and no trace.

Related: ClickHouse#100177

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

groeneai commented Jun 7, 2026

Copy link
Copy Markdown
Contributor Author

@groeneai

groeneai commented Jun 7, 2026

Copy link
Copy Markdown
Contributor Author

cc @jkartseva — could you review this? Test-only change in src/Backups/tests/gtest_backup_data_file_name_generator.cpp that calls markAsLogged on the caught LOGICAL_ERROR exception in BackupDataFileNameGeneratorTest.ThrowsOnZeroChecksum so its destructor no longer prints a stray stack trace via ForcedCriticalErrorsLogger. Spun out from PR #100177 (@ alexey-milovidov directive).

@clickhouse-gh

clickhouse-gh Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [f428aac]

Summary:

job_name test_name status info comment
Stateless tests (amd_debug, parallel) FAIL
Server died FAIL cidb, issue ISSUE EXISTS
Logical error: Virtual row boundary violated in MergingSortedAlgorithm for input A on sort column 'B' C: the virtual row announced D but the source then produced E, which would mis-order the merge (STID: 2651-3359) FAIL cidb IGNORED

AI Review

Summary

This PR replaces EXPECT_THROW in BackupDataFileNameGeneratorTest.ThrowsOnZeroChecksum with an equivalent manual try/catch so the caught DB::Exception can be marked via markAsLogged. That preserves the old type-level assertion while preventing ForcedCriticalErrorsLogger from emitting the known misleading LOGICAL_ERROR stack trace. I did not find any current-code findings.

Final Verdict

Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-ci label Jun 7, 2026
@clickhouse-gh

clickhouse-gh Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.60% 84.50% -0.10%
Functions 92.30% 92.30% +0.00%
Branches 77.20% 77.20% +0.00%

Changed lines: Changed C/C++ lines covered by tests: 7/7 (100.00%) | Lost baseline coverage: none · Uncovered code

Newly covered by added/modified tests: 324 line(s), 20 function(s) across 105 file(s) · Details

Top files
  • src/IO/S3/copyS3File.cpp: 18 line(s)
  • src/Disks/IO/CachedOnDiskWriteBufferFromFile.cpp: 15 line(s)
  • src/Storages/StorageReplicatedMergeTree.cpp: 14 line(s)
  • src/Coordination/KeeperServer.cpp: 12 line(s), 1 function(s)
  • src/Storages/Statistics/ConditionSelectivityEstimator.cpp: 11 line(s), 2 function(s)

Full report · Diff report

@groeneai

groeneai commented Jun 7, 2026

Copy link
Copy Markdown
Contributor Author

CI ledger for HEAD f428aacf2933 (BackupDataFileNameGenerator gtest markAsLogged fix):

check status classification disposition
All builds, FT (except 1), IT, Unit, Performance, ClickBench, Install, Compatibility, Docker, AST/BuzzHouse fuzzer pass clean ---
Stateless tests (amd_debug, parallel) "Logical error: Virtual row boundary violated in MergingSortedAlgorithm ... STID 2651-3359" + Server died fail chronic / not PR-caused Tracked in our umbrella for the MergingSortedAlgorithm virtual-row debug assertion (vdimir territory; assertion added in #106565). 7-day cross-PR: 34 distinct PRs / 36 hits. PR diff: 1 file (src/Common/tests/gtest_BackupDataFileNameGenerator.cpp), entirely unrelated to MergingSortedAlgorithm. Our fix PR for STID 2651-3359 is #106664.

CI fully finished at 2026-06-07 12:13:43Z. Mergeable Check + PR aggregator failures are downstream of the chronic STID 2651-3359 stateless test crash.

@jkartseva jkartseva self-assigned this Jun 8, 2026
@vdimir

vdimir commented Jun 16, 2026

Copy link
Copy Markdown
Member

Stateless tests (amd_debug, parallel) FAIL
Server died FAIL cidb
Logical error: Virtual row boundary violated in MergingSortedAlgorithm for input A on sort column 'B' C: the virtual row announced D but the source then produced E, which would mis-order the merge (STID: 2651-3359) FAIL cidb

Fix #107053 (need review)

@groeneai

Copy link
Copy Markdown
Contributor Author

@jkartseva jkartseva enabled auto-merge June 16, 2026 20:03
@jkartseva jkartseva added this pull request to the merge queue Jul 2, 2026
Merged via the queue into ClickHouse:master with commit 7be48ac Jul 2, 2026
164 of 166 checks passed
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-ci pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants