iframe-proxy

alexey-milovidov · 2026-06-13T23:52:55Z

When a column is added to or dropped from a replicated PostgreSQL table while the change is not seen through the replication stream — e.g. the column changes and then the ClickHouse (or PostgreSQL) server restarts — MaterializedPostgreSQL stopped replicating all tables in the database, with:

LOGICAL_ERROR	Columns number mismatch. Attributes: 17, buffer: 20

Root cause. On startup the consumer builds a StorageData for every table; its constructor compares the freshly fetched PostgreSQL column count with the existing nested ClickHouse table and throws Columns number mismatch on a mismatch. That throw happened in the loop that constructs the consumer for the whole database, so one out-of-sync table aborted startup of the entire database, and it could not be recovered without dropping and recreating it.

Fix. Catch the error per table while constructing the consumer. The affected table is skipped (with a clear log message asking the user to DETACH/ATTACH it to reload its structure), while the other tables keep replicating. This mirrors how structure changes that arrive through the replication stream are already handled (the table is marked as skipped, not fatal).

The added integration test changes the structure of one table in PostgreSQL, restarts ClickHouse, and verifies that (1) the other tables keep replicating and (2) the affected table is recoverable with DETACH/ATTACH.

Closes: #66273
Related: #49045

Changelog category (leave one):

Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix MaterializedPostgreSQL stopping replication of an entire database (with LOGICAL_ERROR: Columns number mismatch) when a single replicated table's structure changed while the server was down. Now only the affected table is skipped and can be recovered with DETACH/ATTACH.

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

🤖 Generated with Claude Code

Version info

Merged into: 26.7.1.382

…e's structure mismatch When the structure of a replicated PostgreSQL table changes while it is not observed through the replication stream (for example, a column is added or dropped in PostgreSQL and then the server restarts), the consumer's `StorageData` constructor throws `LOGICAL_ERROR: Columns number mismatch`. This exception was thrown while building the consumer for the whole database, so a single out-of-sync table stopped replication of *all* tables, and the database could not recover without being dropped and recreated. Catch the error per table when constructing the consumer: the affected table is skipped (with a clear message asking to DETACH/ATTACH it) while the rest of the tables keep replicating. This mirrors how structure changes detected through the replication stream are already handled. Closes: #66273 Related: #49045 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

clickhouse-gh · 2026-06-13T23:53:28Z

… the catch The "Columns number mismatch" error is an expected condition (the PostgreSQL table structure no longer matches the nested ClickHouse table), not an internal logic error. Throwing it as `LOGICAL_ERROR` aborted the server in debug/sanitizer builds at exception-construction time, before the catch in the consumer constructor could skip the table — which is why the integration test crashed the server in CI. Throw `POSTGRESQL_REPLICATION_INTERNAL_ERROR` instead, and narrow the catch in the consumer constructor to that specific error so unrelated initialization failures still propagate (as suggested in review). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

alexey-milovidov

Ok.

alexey-milovidov

Nice.

Resolve conflict in tests/integration/test_postgresql_replica_database_engine/test_3.py: both branches appended new test functions at the same location. Keep all three — this PR's test_table_schema_changed_while_server_down regression test plus master's test_numeric_to_int256 and test_numeric_int256_validation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

alexey-milovidov · 2026-06-19T17:51:47Z

Re-merged the latest master into the branch (31efaf43ffb..4673641859c) to refresh CI. The merge was clean with no conflicts, and the net diff is unchanged — only src/Storages/PostgreSQL/MaterializedPostgreSQLConsumer.cpp and tests/integration/test_postgresql_replica_database_engine/test_3.py (byte-faithful to the prior tip).

The only red check was Stress test (arm_debug) failing with Logical error: 'Bad cast from type DB::ColumnString to DB::ColumnLowCardinality' (STID: 4525-5f05). This is a pre-existing, unrelated stress/fuzzer flake, not caused by this PR:

It fires on master directly (pull_request_number = 0) and across many unrelated PRs over the last 90 days (2026-03-29/30/31, 2026-05-12, 2026-06-13, 2026-06-16), not only here. Since STID is a stack-trace fingerprint, those older occurrences are the same code location, so it long predates this change.
The exception is a ColumnString → ColumnLowCardinality bad cast with no relation to MaterializedPostgreSQL; this PR touches only the PostgreSQL replication consumer.
It belongs to the auto-filed "Bad cast from type A to B" fuzzer family (testing, fuzz), cf. Logical error: Bad cast from type A to B (STID: 4054-600e) #107598 and Logical error: Bad cast from type A to B (STID: 2241-3a03) #103814.

AI Review verdict is ✅ Approve, @kssenii approved, and there are no unresolved review threads.

…smatch-restart # Conflicts: # tests/integration/test_postgresql_replica_database_engine/test_3.py

…test_3.py The 06-24 merge of `master` into this branch rewrote the import block of `tests/integration/test_postgresql_replica_database_engine/test_3.py` (master had refactored those imports) and dropped the `assert_number_of_columns` import, while the test body added by this PR still calls it. `ruff` failed with: test_3.py:1072:5: F821 Undefined name `assert_number_of_columns` Re-add the import. `assert_number_of_columns` is defined in `tests/integration/helpers/postgres_utility.py`. Style check report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=107427&sha=de16eaaf5e360e25dffab6b674e458bee1c8c7b7&name_0=PR&name_1=Style%20check Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…recovery When a table is skipped from replication (for example, skipped at startup because its structure no longer matched the nested ClickHouse table, as added in this PR, or skipped after a structure change observed in the replication stream), and then receives WAL before the user repairs it, the consumer processes the table's `Relation` message, finds no storage for it, and calls `markTableAsSkipped`, which records an empty `skip_list` entry keyed by the PostgreSQL relation id. The promised `DETACH`/`ATTACH` recovery went through `removeNested`/`addNested`, neither of which cleared that `skip_list` entry. After the `waiting_list` entry set by `addNested` was consumed by the first post-attach message, `isSyncAllowed` saw the stale empty `skip_list` entry and returned `false`, so all further changes to the reattached table were skipped indefinitely and replication never resumed. Clear `skip_list` entries whose relation id maps to the table being brought back in `addNested`, so the recovery actually works. The regression test now writes to the affected table before `DETACH`/`ATTACH` (waiting until it is really marked as skipped in the replication stream) and again after `ATTACH`, proving that replication resumes instead of only that the snapshot reload matches once.

…smatch-restart

alexey-milovidov · 2026-06-29T17:25:26Z

Addressed the AI Review "Request changes" finding (stale skip_list entry on DETACH/ATTACH recovery).

Fix. When a startup-skipped table later receives WAL, the consumer's Relation handler finds no storage for it and markTableAsSkipped records an empty skip_list entry keyed by the PostgreSQL relation id. The DETACH/ATTACH recovery went through removeNested/addNested, neither of which cleared that entry, so once the waiting_list entry set by addNested was consumed, isSyncAllowed saw the stale empty skip_list entry and skipped all further changes to the reattached table indefinitely. addNested now drops any skip_list entry whose relation id maps (via relation_id_to_name) to the table being brought back, so the recovery actually resumes replication. This also fixes the same latent issue for tables skipped after a structure change observed in the replication stream.

Test. The regression test now writes to the affected table before DETACH/ATTACH (waiting on the is skipped from replication stream log line so the stale-skip_list path is really exercised, not raced) and again after ATTACH, asserting the table re-synchronizes — which fails without the fix and passes with it.

The modified MaterializedPostgreSQLConsumer.cpp translation unit compiles cleanly. Latest master was merged into the branch (clean, no conflicts).

…smatch-restart

alexey-milovidov · 2026-06-30T00:37:16Z

Merged the latest master into the branch to refresh CI (809e8fb9ae1..449c5fcbe36). The merge was clean with no conflicts, and the net diff is unchanged — only src/Storages/PostgreSQL/MaterializedPostgreSQLConsumer.cpp and tests/integration/test_postgresql_replica_database_engine/test_3.py (+111/-3). The changed translation unit compiles cleanly (-fsyntax-only). AI Review is ✅ Approve, @kssenii approved, and there are no unresolved review threads.

The three previously red checks are all unrelated to this change:

Stress test (amd_tsan) and Stress test (arm_tsan) — both Hung check failed, possible deadlock found (STID for the generic TCPHandler/executeQuery path), tracked as the chronic master-wide flake Hung check failed, possible deadlock found #107941. The stack trace has no relation to MaterializedPostgreSQL.
CH Inc sync (tests failed, clickhouse-private#60882) — the only failing private job was Integration tests (amd_asan_ubsan, db disk, old analyzer, 5/7), and its single failing test was test_refreshable_mv_no_multi_read/test.py::test_refreshable_mv_attach_without_multi_read (700 passed, 1 failed). That test is unrelated to this PR (refreshable MV / Keeper MULTI_READ), and its flakiness fix cdabef4635c ("Wait for the scheduled pass before asserting view state in the no-multi-read test", Stop refreshable MV gracefully on Keeper without MULTI_READ on ATTACH/restore #108441) was not in the synced commit 809e8fb9 but is included by this master merge. The push re-syncs the private PR to a commit that contains the fix and re-runs the sync from scratch. This PR's own test (test_postgresql_replica_database_engine/test_3.py) passed.

…smatch-restart

alexey-milovidov · 2026-06-30T06:47:02Z

Merged the latest master into the branch to refresh CI (449c5fcbe36..2325d95ef20). The merge was clean with no conflicts, and the net diff is unchanged — only src/Storages/PostgreSQL/MaterializedPostgreSQLConsumer.cpp and tests/integration/test_postgresql_replica_database_engine/test_3.py (+111/-3, byte-identical to the prior tip 449c5fcbe36).

The two previously red checks are both the same unrelated, chronic master-wide flake:

Stress test (amd_tsan) and Stress test (arm_tsan) — both Hung check failed, possible deadlock found, tracked as Hung check failed, possible deadlock found #107941 (open). The stack trace is the generic AST-fuzzer path (executeASTFuzzerQueries → TCPHandler::runImpl → executeQuery) with no relation to MaterializedPostgreSQL. CIDB confirms this fingerprint fires on master directly (pull_request_number = 0) every single day and across 150+ unrelated PRs per day, so it long predates and is independent of this change.

This PR's own integration test (test_postgresql_replica_database_engine/test_3.py::test_table_schema_changed_while_server_down) passed. AI Review is ✅ Approve, @kssenii approved, and there are no unresolved review threads.

alexey-milovidov · 2026-06-30T08:16:12Z

The latest CI cycle (commit 2325d95e) is still mostly in progress (stateless, integration, stress, AST fuzzer and BuzzHouse jobs are queued). The only completed red is Unit tests (tsan, function_prop_fuzzer) (FunctionsStress.stress / AllTests), and it is unrelated to this change.

The failure is the function property fuzzer flagging reinterpret of a Decimal/DateTime64 value to the same physical type with a different scale — e.g. SELECT reinterpret(CAST('1970-01-01 00:59:59.87239' AS DateTime64(5)), CAST('DateTime64(1)' AS String)) → function returned column of unexpected type or scale: Const(DateTime64) instead of DateTime64(1). In the FromType == ToType branch of reinterpretAs.cpp the source column is returned verbatim, so its physical scale diverges from the declared result type.
It started firing reliably only after the new scale-aware fuzzer gate d3202d63ef8 ("Fix function property stress test scale-blind validity gate and changeDate result scale", writeSlice LOGICAL_ERROR: arrayPushFront/arrayPushBack over Array(DateTime64/Time64) reaches GatherUtils generic path with mismatched column types #108517), which now rejects scale-divergent columns at the producing function's gate (the same hardening that earlier caught and fixed changeYear/changeMonth/…).
CIDB confirms this is master-wide and not specific to this PR: today (2026-06-30) FunctionsStress.stress failed 23 times across 20 PRs plus master directly (pull_request_number = 0), all on the same reinterpret-scale family of queries. This PR touches only src/Storages/PostgreSQL/MaterializedPostgreSQLConsumer.cpp and tests/integration/test_postgresql_replica_database_engine/test_3.py, with no relation to reinterpret.
A fix is already open: Fix reinterpret to Decimal/DateTime64 with a different scale #108878 ("Fix reinterpret to Decimal/DateTime64 with a different scale"), which rebuilds the result column with the destination scale (IsDataTypeDecimal<ToType> covers Decimal, DateTime64 and Time64). Tracked under the auto-filed fuzzer issue FunctionsStress #107242.

Not re-merging master (the branch is only ~6 commits / a few hours behind) — that would just cancel the in-progress CI and re-roll the same master-wide flake. AI Review is ✅ Approve, @kssenii approved, and there are no unresolved review threads.

alexey-milovidov · 2026-06-30T09:59:47Z

The latest completed red is now Upgrade check (amd_release) (Error message in clickhouse-server.log), and it is unrelated to this change — it is a known master-wide flake with a fix already in flight.

The five flagged lines all come from a single stateless test (04210_show_remote_databases_in_system_tables) that intentionally creates two remote database engines pointing at the unreachable documentation address 192.0.2.1:

<Error> mysqlxx::Pool: Failed to connect to MySQL (fake_db@192.0.2.1:3306 as user user): mysqlxx::ConnectionFailed
<Error> Application: Connection to mysql failed 1 times
<Error> DatabaseMySQL: ... Connections to mysql failed: fake_db@192.0.2.1:3306 ... (ALL_CONNECTION_TRIES_FAILED)
<Error> PostgreSQLConnectionPool: Connection error: connection to server at "192.0.2.1", port 5432 failed: timeout expired
<Error> void DB::DatabasePostgreSQL::removeOutdatedTables(): ... Connection to `192.0.2.1:5432` failed ...

When the server is restarted during the upgrade test, those still-attached DatabaseMySQL/DatabasePostgreSQL engines re-probe the unreachable host and log <Error> connection failures, which the upgrade-check allowlist does not yet cover. There is no real MySQL/PostgreSQL server in the upgrade environment, so these connection failures are expected and benign; none of them touch MaterializedPostgreSQL, and this PR changes only src/Storages/PostgreSQL/MaterializedPostgreSQLConsumer.cpp.

CIDB confirms it is a broad master-wide flake, not specific to this PR: Error message in clickhouse-server.log failed with the 192.0.2.1/fake_db messages across 8 PRs on 2026-06-30, 12 on 06-29, 6 on 06-28, 5 on 06-27, 9 on 06-26, starting 2026-06-25 — independently of this change. The fix is already open: #108560 (allowlist the remote-database-engine connection errors in upgrade_runner.sh).

The other completed reds are the same unrelated flakes noted above:

Unit tests (tsan, function_prop_fuzzer) — the reinterpret-to-DateTime64/Decimal same-physical-type-different-scale fuzzer family; fix in flight Fix reinterpret to Decimal/DateTime64 with a different scale #108878.
Stress test (amd_tsan) / Stress test (arm_tsan) — Hung check failed, possible deadlock found, the generic AST-fuzzer executeASTFuzzerQueries → TCPHandler path, tracked as Hung check failed, possible deadlock found #107941.

@kssenii approved, AI Review is ✅ Approve, and there are no unresolved review threads.

…smatch-restart

alexey-milovidov · 2026-06-30T17:02:05Z

continue-pr status — re-merged current master.

The branch had drifted 157 commits behind master (merge-base from 07:03Z this morning), so I merged origin/master into it (2325d95ef20..4aa7c094b04). The merge is clean — master did not touch either PR file, and the net diff vs master is still exactly the two PR files (MaterializedPostgreSQLConsumer.cpp + test_3.py, +111/-3, byte-identical to before). The changed TU compiles clean (-fsyntax-only) against the newer headers. CI has been re-fired.

Review state unchanged: @kssenii APPROVED, AI Review ✅ Approve, 0 unresolved review threads.

All CI reds remain the same tracked, master-wide flakes — none touches the PostgreSQL replication path, and re-merging cannot clear them until their fixers land:

Unit tests (tsan, function_prop_fuzzer) (FunctionsStress.stress/AllTests) — reinterpret to DateTime64/Decimal with a different scale; fixed by open Fix reinterpret to Decimal/DateTime64 with a different scale #108878. Tracked by FunctionsStress #107242.
Upgrade check (amd_release) — benign <Error> connection logs from 04210_show_remote_databases_in_system_tables (DatabaseMySQL/DatabasePostgreSQL at 192.0.2.1); allowlist fix is open Allowlist remote database engine connection errors in upgrade check #108560.
Stress test (amd_tsan) + Stress test (arm_tsan) — Hung check (AST-fuzzer executeASTFuzzerQueries→TCPHandler, 0 PG frames); chronic master-wide flake Hung check failed, possible deadlock found #107941.

Blocked solely on those fixers landing + a maintainer merge.

alexey-milovidov · 2026-06-30T19:52:35Z

continue-pr status — no code change, blocked on CI completion + merge.

Re-checked the post-merge head 4aa7c094b04 (today's master, merge-base 75154b382fe from 15:52Z, ~4h old). No re-merge: the branch already has today's master, the CI cycle for this head is still actively running (48 jobs in progress, including the Integration tests shards that run this PR's own test_postgresql_replica_database_engine/test_3.py), and re-merging would only cancel it and re-roll the same flake.

Review state: AI Review ✅ Approve (re-run on this head at 19:39Z), @kssenii APPROVED, 0 unresolved review threads (2/2 resolved). POSTGRESQL_REPLICATION_INTERNAL_ERROR (619) is registered in ErrorCodes.cpp and used correctly.

Independent correctness check of the trickiest path (stale skip_list recovery): a table skipped at startup that later receives WAL gets an empty-LSN skip_list entry via markTableAsSkipped. Without clearing it, once the waiting_list entry set by addNested is consumed (one true from isSyncAllowed), the next message falls through to the stale skip_list entry and isSyncAllowed returns false indefinitely. The fix in addNested drops the entry keyed (via relation_id_to_name) by the reattached table, making DETACH/ATTACH recovery behave identically to a fresh addNested (waiting_list-only). The test waits on the is skipped from replication stream log line before DETACH/ATTACH, so it deterministically exercises that path rather than racing it. Confirmed sound.

Only completed reds are the two Stress test (amd_tsan) / Stress test (arm_tsan) Hung check failed, possible deadlock found — the chronic master-wide flake #107941 (OPEN). Their stack traces are the generic query-execution path (PipelineExecutor -> ExecutionThreadContext -> Poco::Thread) with no MaterializedPostgreSQL/PostgreSQL frames, so they are unrelated to this change. They will need a rerun once the in-progress cycle finishes (a rerun now is rejected while the run is queued/in-progress).

alexey-milovidov · 2026-06-30T23:31:08Z

Re-ran the only two red jobs on 4aa7c094 — Stress test (amd_tsan) and Stress test (arm_tsan) (workflow run 28461893177, --failed); both are queued again.

Both failures were Hung check failed, possible deadlock found with generic Poco::ThreadImpl::runnableEntry / __tsan_thread_start_func stacks (no MaterializedPostgreSQL / PostgreSQL frames) — the chronic master-wide Hung-check flake #107941, not caused by this change.

Status otherwise: net diff still exactly the 2 PR files (MaterializedPostgreSQLConsumer.cpp +50, test_3.py +64; +111/-3); AI Review ✅ Approve, approved by @kssenii, 0 unresolved threads; MERGEABLE. CH Inc sync is still in progress (private #60882). Ready to merge once the Stress rerun is green.

alexey-milovidov · 2026-07-01T02:48:40Z

continue-pr status — no code change, re-triggered the flaky Stress reruns.

The previous rerun (workflow run 28461893177) completed and both jobs failed again with the same Hung check failed, possible deadlock found — the chronic master-wide flake #107941 (which the CI bot itself links). The stacks are the generic AST-fuzzer path (executeASTFuzzerQueries → TCPHandler::runImpl → executeQuery) with no MaterializedPostgreSQL/PostgreSQL frames, so they are unrelated to this change.

This is TSan-specific: on the same head 4aa7c094 all 9 other Stress variants are green — amd_debug, amd_asan_ubsan, amd_msan, arm_debug, arm_asan_ubsan, arm_asan_ubsan (s3), arm_msan, arm_ubsan, arm_release — only amd_tsan + arm_tsan hit the Hung check.

Re-ran only those two jobs again (gh run rerun 28461893177 --failed, now queued). No re-merge: the merge-base (75154b382fe, 06-30 15:52Z) is ~11h old and the branch is MERGEABLE; re-merging would only cancel the in-progress CH Inc sync and re-roll the same flake.

Net diff unchanged — exactly the two PR files (MaterializedPostgreSQLConsumer.cpp +50, test_3.py +64; +111/-3). AI Review ✅ Approve, @kssenii APPROVED, 0 unresolved review threads. Blocked solely on a green Stress rerun (or a merge-past the tracked flake) + maintainer merge.

alexey-milovidov · 2026-07-01T05:53:53Z

continue-pr status — no code change; reruns resolved, CH Inc sync now green, ready to merge past the tracked flake.

The two Stress reruns queued in the previous pass (workflow run 28461893177, --failed) have completed (~03:29–03:30Z) and failed again with the same Hung check failed, possible deadlock found — the chronic master-wide flake #107941 (still OPEN, CI bot itself links it). The stacks are the generic Poco::ThreadImpl::runnableEntry / AST-fuzzer analyzer path (QueryAnalyzer::resolveFunction → resolveExpressionNode → IExecutableFunction::execute) with no MaterializedPostgreSQL/PostgreSQL frames, so unrelated to this change. On the same head all other Stress variants are green — only amd_tsan + arm_tsan hit the Hung check.

What changed since the last update:

CH Inc sync is now ✅ SUCCESS (was in progress on private Logical error: 'No responses set for MultiRead'. #60882).
Mergeable Check is ✅ SUCCESS — every required check is green; the PR is MERGEABLE/UNSTABLE, i.e. the only red (Stress test (amd_tsan)/(arm_tsan)) is non-required.

No re-merge: the merge-base (75154b382fe, 06-30 15:52Z) is <1 day old, the branch is MERGEABLE, and master has touched neither PR file since — re-merging would only cancel the freshly-green CH Inc sync and re-roll the same flake. Head unchanged 4aa7c094; net diff still exactly the two PR files (MaterializedPostgreSQLConsumer.cpp +47/-3, test_3.py +64; +111/-3).

AI Review ✅ Approve, @kssenii APPROVED, 0 unresolved review threads. Blocked solely on a maintainer merge — the remaining red is the non-required tracked flake, so this can be merged past it.

alexey-milovidov · 2026-07-01T06:48:57Z

@groeneai, fix failures of the Hung check.

groeneai · 2026-07-01T07:30:42Z

@alexey-milovidov Confirmed, and the fix is already merged: it is your own #108892.

Both Stress test (amd_tsan) and (arm_tsan) on 4aa7c094 have exactly one hung query each, identical: SELECT formatQuery($$SELECT [[[[ ... ~13k-deep nested array ... ]]]]$$) (40 KB literal), elapsed 1524s, is_cancelled 1, read_rows 0, no MaterializedPostgreSQL/PostgreSQL frames. Not caused by this PR (agreed), but it is the deep-nested-literal parse hang specifically, not a generic AST-fuzzer stack.

That is exactly what #108892 ("Parse nested literal collections in linear time") fixes: ParserCollectionOfLiterals::parseImpl was quadratic in nesting depth, so a ~13k-deep [[[...]]] took ~25 min to parse and the parser never checks cancellation mid-loop (hence is_cancelled 1 while still running).

The catch: this head predates the fix. #108892 merged 06-30 19:20Z; 4aa7c094 was cut 06-30 17:00Z (merge-base 75154b382fe, 15:52Z), so #108892 is not an ancestor of it (git compare = diverged). Every rerun --failed rebuilds the same pre-#108892 head, which is why it keeps hitting the hang.

Recommendation: merge past the non-required Hung check. #108892 is already in master, so post-merge master will not regress on this signature, and this avoids re-rolling the now-green CH Inc sync (your earlier concern). The master Stress-tsan Hung-check rate already collapsed from 48 on 06-30 to 1 on 07-01, right at the #108892 merge.

Alternative, if you want a green Stress run on record: re-merge master. Unlike the previous reruns, that now picks up #108892 and will clear it. Defense-in-depth for the same class (cap the recursion-controlling parser settings + honor cancellation) is in flight in #108863, tracked under #107941.

Reports:

…smatch-restart

alexey-milovidov · 2026-07-01T09:14:29Z

continue-pr — re-merged master to clear the tracked Stress Hung-check flake.

Following @groeneai's analysis: the only two red checks (Stress test (amd_tsan) and (arm_tsan), both Hung check failed, possible deadlock found) were the deep-nested-literal parse hang, not caused by this PR, and already fixed on master by #108892 ("Parse nested literal collections in linear time", merged 06-30 19:20Z). The previous head 4aa7c094 (cut 17:00Z, merge-base 15:52Z) predated that fix, so every rerun --failed kept rebuilding the pre-fix binary and re-hitting the hang.

Took the recommended "green run on record" path: merged origin/master into the branch (4aa7c094b04..d33693394c1). The merge is clean — master did not touch either PR file, and the net diff vs master is unchanged (only src/Storages/PostgreSQL/MaterializedPostgreSQLConsumer.cpp +50 and tests/integration/test_postgresql_replica_database_engine/test_3.py +64). The new head now contains #108892 (c09dfca7068), so the fresh Stress-tsan cycle should clear this signature.

No code change. CH Inc sync, Mergeable Check, and Style check were already green; reviewDecision is still empty (awaiting a human approval). Blocked on fresh CI + human approve + merge.

alexey-milovidov · 2026-07-01T11:46:05Z

continue-pr — the re-merge worked: the Stress Hung check flake is cleared, all Stress variants are green.

Confirming the outcome of the previous re-merge (d3369339, which pulled in #108892). On this head all 12 Stress variants have now completed ✅ SUCCESS — including Stress test (amd_tsan) and Stress test (arm_tsan), which were the only red across the last ~10 CI cycles (Hung check failed, possible deadlock found, the master-wide flake #107941).

This confirms @groeneai's diagnosis: the Hung check was the quadratic nested-literal parse, fixed on master by #108892 (c09dfca7068). The prior head 4aa7c094 predated that fix, so every gh run rerun --failed rebuilt the pre-fix binary and re-hung — which is why the earlier reruns kept failing on the same two TSan jobs.

Current CI on d3369339: 0 failures. Style check ✅ SUCCESS; CH Inc sync in progress (private). The remaining 32 jobs (17 Stateless, 8 Integration, 6 Performance Comparison, 1 SQLLogic) are still routinely in progress. @kssenii APPROVED, AI Review ✅ Approve, 0 unresolved review threads; the net diff is still exactly the two PR files (MaterializedPostgreSQLConsumer.cpp +47/-3, test_3.py +64; +111/-3).

Ready to merge once the remaining routine shards finish green.

clickhouse-gh · 2026-07-01T12:53:50Z

LLVM Coverage Report

Metric	Baseline	Current	Δ
Lines	85.40%	85.40%	+0.00%
Functions	92.60%	92.60%	+0.00%
Branches	77.60%	77.60%	+0.00%

Changed lines: Changed C/C++ lines covered: 16/29 (55.17%) · Uncovered code

Full report · Diff report

alexey-milovidov · 2026-07-01T16:42:04Z

continue-pr status — no code change; re-ran the two unrelated private-sync jobs.

Head d33693394c1 (today's master merge, merge-base 09f0358a934 from 07:59Z, ~8h old — no re-merge). The net diff vs master is still exactly the two PR files (MaterializedPostgreSQLConsumer.cpp + test_3.py, +111/-3).

Review state unchanged: @kssenii APPROVED, AI Review ✅ Approve (re-run on this head at 12:55Z), 0 unresolved review threads (2/2 resolved).

The only red is CH Inc sync (private clickhouse-private#60882, synced to this exact head d33693394c1). Its two failing jobs are both unrelated to the PostgreSQL replication path this PR touches:

Integration tests (amd_asan_ubsan, db disk, old analyzer, 1/7) — the single failing test is test_refreshable_mv_no_multi_read/test.py::test_refreshable_mv_attach_without_multi_read (refreshable MV / Keeper MULTI_READ). The report's own CIDB query confirms this fingerprint fires on master directly (pull_request_number = 0), i.e. it is a master-wide flake, not specific to this PR. Its earlier flakiness fix cdabef4635c (Stop refreshable MV gracefully on Keeper without MULTI_READ on ATTACH/restore #108441) is already in this head.
Stateless tests (amd_asan_ubsan, distributed cache, meta in keeper, s3 storage, parallel, 1/2) — the server failed to start in the Cloud-only distributed-cache / meta-in-keeper config (Code: 210. DB::NetException: Connection refused (localhost:9000), Server replica 0 not ready). This is a transient startup/infra failure, not a test assertion, and has no relation to MaterializedPostgreSQL.

This PR changes only src/Storages/PostgreSQL/MaterializedPostgreSQLConsumer.cpp and tests/integration/test_postgresql_replica_database_engine/test_3.py; its own integration test (test_postgresql_replica_database_engine/test_3.py) passes — the public Integration tests shards and Bugfix validation (integration tests) are all green.

I re-ran the two failed private-sync jobs (fresh containers) to clear the transient infra failure and give the flake a clean pass. Blocked solely on the private sync re-run + a maintainer merge.

Resolve conflict in test_postgresql_replica_database_engine/test_3.py: both this PR (backup tests: test_backup_database, test_backup_table_engine, test_backup_table_partitions) and master's PR #107427 (test_table_schema_changed_while_server_down) added new test functions at the same location. Keep both sets. The three C++ files auto-merged cleanly; their changes occupy disjoint regions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Jun 13, 2026

clickhouse-gh Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread src/Storages/PostgreSQL/MaterializedPostgreSQLConsumer.cpp Outdated

alexey-milovidov commented Jun 14, 2026

View reviewed changes

alexey-milovidov self-assigned this Jun 14, 2026

alexey-milovidov added this pull request to the merge queue Jun 14, 2026

github-merge-queue Bot removed this pull request from the merge queue due to a conflict with the base branch Jun 14, 2026

Merge branch 'master' into fix-matpg-column-mismatch-restart

9186379

alexey-milovidov commented Jun 14, 2026

View reviewed changes

alexey-milovidov added this pull request to the merge queue Jun 14, 2026

github-merge-queue Bot removed this pull request from the merge queue due to a conflict with the base branch Jun 14, 2026

kssenii approved these changes Jun 15, 2026

View reviewed changes

kssenii self-assigned this Jun 15, 2026

alexey-milovidov and others added 3 commits June 15, 2026 15:20

Merge branch 'master' into fix-matpg-column-mismatch-restart

31efaf4

Merge remote-tracking branch 'origin/master' into HEAD

4673641

alexey-milovidov and others added 3 commits June 24, 2026 16:25

Merge remote-tracking branch 'origin/master' into fix-matpg-column-mi…

de16eaa

…smatch-restart # Conflicts: # tests/integration/test_postgresql_replica_database_engine/test_3.py

Merge remote-tracking branch 'origin/master' into wt-107427-merge

66623a6

clickhouse-gh Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread src/Storages/PostgreSQL/MaterializedPostgreSQLConsumer.cpp

alexey-milovidov added 2 commits June 29, 2026 17:17

Merge remote-tracking branch 'origin/master' into fix-matpg-column-mi…

809e8fb

…smatch-restart

Merge remote-tracking branch 'origin/master' into fix-matpg-column-mi…

449c5fc

…smatch-restart

Merge remote-tracking branch 'origin/master' into fix-matpg-column-mi…

2325d95

…smatch-restart

Merge remote-tracking branch 'origin/master' into fix-matpg-column-mi…

4aa7c09

…smatch-restart

alexey-milovidov mentioned this pull request Jun 30, 2026

Add UTM and MGRS coordinate functions #108939

Open

1 task

Merge remote-tracking branch 'origin/master' into fix-matpg-column-mi…

d336933

…smatch-restart

alexey-milovidov merged commit b00c7d7 into master Jul 1, 2026
173 of 174 checks passed

alexey-milovidov deleted the fix-matpg-column-mismatch-restart branch July 1, 2026 21:05

robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label Jul 1, 2026

robot-ch-test-poll3 mentioned this pull request Jul 1, 2026

MaterializedPostgreSQL connection breaks on column changes in postgres after restart #66273

Closed

alexey-milovidov mentioned this pull request Jul 2, 2026

Compress hash join right-side blocks in memory before OOM / spilling #107667

Open

1 task

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

alexey-milovidov commented Jun 13, 2026 • edited by robot-ch-test-poll3 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Version info

Uh oh!

clickhouse-gh Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Missing context / blind spots

Final Verdict

Uh oh!

Uh oh!

alexey-milovidov left a comment

Choose a reason for hiding this comment

Uh oh!

alexey-milovidov left a comment

Choose a reason for hiding this comment

Uh oh!

alexey-milovidov commented Jun 19, 2026

Uh oh!

Uh oh!

alexey-milovidov commented Jun 29, 2026

Uh oh!

alexey-milovidov commented Jun 30, 2026

Uh oh!

alexey-milovidov commented Jun 30, 2026

Uh oh!

alexey-milovidov commented Jun 30, 2026

Uh oh!

alexey-milovidov commented Jun 30, 2026

Uh oh!

alexey-milovidov commented Jun 30, 2026

Uh oh!

alexey-milovidov commented Jun 30, 2026

Uh oh!

alexey-milovidov commented Jun 30, 2026

Uh oh!

alexey-milovidov commented Jul 1, 2026

Uh oh!

alexey-milovidov commented Jul 1, 2026

Uh oh!

alexey-milovidov commented Jul 1, 2026

Uh oh!

groeneai commented Jul 1, 2026

Uh oh!

alexey-milovidov commented Jul 1, 2026

Uh oh!

alexey-milovidov commented Jul 1, 2026

Uh oh!

clickhouse-gh Bot commented Jul 1, 2026

LLVM Coverage Report

Uh oh!

alexey-milovidov commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alexey-milovidov commented Jun 13, 2026 •

edited by robot-ch-test-poll3

Loading

clickhouse-gh Bot commented Jun 13, 2026 •

edited

Loading