Honor materialized_postgresql_schema in MaterializedPostgreSQL table engine#107425
Honor materialized_postgresql_schema in MaterializedPostgreSQL table engine#107425alexey-milovidov wants to merge 13 commits into
Conversation
…engine The standalone `MaterializedPostgreSQL` table engine ignored the `materialized_postgresql_schema` setting. Unlike the database engine, which schema-qualifies and quotes the tables list in `fetchRequiredTables`, the table engine puts the bare remote table name into `tables_list` (in the storage constructor) and never processes it. As a result `createPublicationIfNeeded` issued `CREATE PUBLICATION ... FOR TABLE ONLY <table>` without the schema, failing with `relation "<table>" does not exist` whenever the table lived in a non-default schema. Build the publication's tables list from the storages (which applies the schema via `doubleQuoteWithSchema`) for the table engine, mirroring the database engine behaviour. The rest of the flow (structure fetch, snapshot COPY, consumer mapping) already used the schema correctly. Closes: #59950 Related: #49045 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The default replication slot name is `<postgres_database>_<table>_ch_replication_slot`, which is checked against PostgreSQL's 63-character identifier limit in `checkReplicationSlot` at handler construction. With the previous names the slot name `postgres_database_test_table_engine_schema_table_ch_replication_slot` was 68 characters, so the test aborted with `Too big replication slot size` (a `LOGICAL_ERROR`) before the `materialized_postgresql_schema` fix was ever exercised. Under sanitizer builds this aborted the server, cascading into `Connection refused` failures across the other integration test shards. Shorten the table name so the slot name stays within the limit and the test actually validates the schema handling end-to-end. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CI triageIntegration tests (6 shards: The two remaining reds are unrelated to this change (a one-line PostgreSQL publication change):
@groeneai, could you take a look at the |
|
@alexey-milovidov The Root cause is reentrancy of the sanitizer death callback. At process exit the static The fix marks |
…ine-schema # Conflicts: # tests/integration/test_postgresql_replica_database_engine/test_3.py
…aware After merging master, the original publication tables-list change from this PR became redundant: master's #107423 ("Fix MaterializedPostgreSQL replication for upper-case database/table names", commit `190a65c1556`) added an `else if (!is_materialized_postgresql_database)` branch in `createPublicationIfNeeded` that quotes the single-table `tables_list` via `doubleQuoteWithSchema`, which already applies the `materialized_postgresql_schema` setting. So the `CREATE PUBLICATION ... FOR TABLE ONLY` query is schema-qualified on master already. Revert that part here so master's branch is no longer dead code. What was still missing — and what this commit fixes — is the replication *identity*. `getPublicationName` and the default `getReplicationSlotName` derived the name from `<postgres_database>_<table>` only, ignoring the schema. Two standalone `MaterializedPostgreSQL` tables replicating a table with the same name from two different PostgreSQL schemas of the same database therefore shared one publication and one default replication slot. The second `CREATE` dropped and recreated the shared publication, and because the publication carries only the bare relation name (`schema_as_a_part_of_table_name` stays false in this mode) the consumers cross-talked: one replica could stop receiving its schema's changes or ingest the other schema's rows. Include the schema in the publication name and the default replication slot name for the single-table engine when `materialized_postgresql_schema` is set, so the two configurations get distinct PostgreSQL objects and stay isolated. The database engine path (empty remote table name) and the default-schema table-engine path are unchanged, so existing deployments keep their current publication and slot names. Add an integration test `test_two_schemas_same_table_name_single_storage` that replicates the same table name from two schemas and asserts both the initial snapshot and ongoing replication stay isolated. Shorten the schema and table names in `test_single_table_engine_with_non_default_schema` so the now schema-aware default slot name stays within PostgreSQL's identifier length limit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
/continue-pr session summaryMerged Key finding from the merge. Fix for the AI Block (the genuinely-remaining issue). Tests.
The unresolved AI thread on Verification. CI triage. The only red on the previous head was For the author: the basic |
…chema Making the standalone `MaterializedPostgreSQL` table-engine replication identity schema-aware (in the previous commit) changed the generated default publication and replication-slot names for *every* non-empty `materialized_postgresql_schema`, including the common explicit-default case `materialized_postgresql_schema = 'public'`. That breaks `ATTACH` of an existing standalone `MaterializedPostgreSQL` table created before the identity became schema-aware: the default slot was `postgres_database_<table>_ch_replication_slot`, but the new code looks for `postgres_database_<schema>_<table>_ch_replication_slot`. When that slot is absent, `startSynchronization` runs the initial sync, creates a new slot, and `loadFromSnapshot` inserts a fresh snapshot into the already-existing nested table (`createNestedIfNeeded` is a no-op when the nested table exists), duplicating data and leaving the old slot and publication behind. Treat PostgreSQL's default schema (`public`, or an empty setting) the same as no schema in `getPublicationName` and `getReplicationSlotName`: only a genuinely non-default schema is included in the generated names. This preserves the legacy identity for the default schema (so `ATTACH` keeps finding the existing objects) while still keeping the collision fix intact, because `public.t` and `schema1.t` still get distinct names (`postgres_database_t` vs `postgres_database_schema1_t`). Adds `test_default_schema_preserves_legacy_identity`, which creates the single-table engine with `materialized_postgresql_schema = 'public'` and asserts that the publication and replication slot use the legacy schema-unaware names, not the schema-aware ones. Addresses the AI review on #107425 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ine-schema Master's "Enable more Ruff checks (F401/F403/F405/F541/F841) and clean up tests" (`d6c076decf7`) rewrote the `from helpers.postgres_utility import (...)` block of `test_3.py`, dropping every name that master's own copy of the file did not use. The auto-merge took master's reduced import block, so the three helpers this PR's new tests call as bare functions — `create_postgres_schema`, `create_postgres_table`, and `create_postgres_table_with_schema` — were no longer imported, and `ruff` flagged them as `F821` undefined-name on the CI merge ref (the Style check failure on `f353f743778`). Re-add the three imports (they still exist in `postgres_utility.py`); `ruff check` now passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ine-schema # Conflicts: # tests/integration/test_postgresql_replica_database_engine/test_3.py
…tant The single-table MaterializedPostgreSQL engine derived its publication and default replication-slot names from a plain `postgres_database_postgres_schema_postgres_table` concatenation when `materialized_postgresql_schema` was set to a non-default schema. That is not injective: `schema = a_b`, `table = c` and `schema = a`, `table = b_c` both yield `postgres_database_a_b_c_*`, and the replication slot name is additionally folded by `normalizeReplicationSlot` (lower-cased, `-` mapped to `_`), so PostgreSQL-distinct schemas such as `Foo`/`foo` or `a-b`/`a_b` map to one slot. Two standalone tables built from such identities would then share a publication or a replication slot and their consumers would cross-talk — the very failure the schema-aware identity is meant to remove. Derive the schema-aware single-table identity from a bounded, collision-resistant hash of a length-prefixed (hence unambiguous) serialization of the full (database, schema, table) triple. The generated name is now injective in practice, fixed-length (so it stays within PostgreSQL's identifier limit regardless of the schema/table length) and within the replication-slot character set. The database-engine path and the default-schema (empty or `public`) table-engine path keep their legacy names unchanged. Adds focused integration coverage for both collision classes: `a_b.c` versus `a.b_c` (publication/slot separator collision) and `a-b.t` versus `a_b.t` (slot hyphen-folding collision), asserting that the two engines own distinct PostgreSQL objects and stay isolated. Addresses the AI Review "Request changes" findings on #107425 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
/continue-pr session summaryNo code changes were needed — the PR is approved and has no unresolved review threads.
|
…ase names The schema-aware single-table publication and default replication-slot identities folded the schema and table into a fixed-length hash but kept the PostgreSQL database name as a full prefix, so the generated names were not actually bounded. With a moderately long database name (>= 28 characters) the default slot `<database>_<16 hex hash>_ch_replication_slot` exceeded PostgreSQL's 63-character identifier limit, and `checkReplicationSlot` threw `Too big replication slot size` in the constructor before the table could be created — a regression for non-default-schema tables that worked with a short, schema-blind slot name. The same unbounded prefix was used by `getPublicationName`. Cap the cosmetic, human-readable database prefix at 16 bytes in the new `getSchemaAwareIdentityName` helper, which both `getPublicationName` and the default `getReplicationSlotName` now use. The full `(database, schema, table)` identity is still carried by the hash, so capping the prefix cannot reintroduce a collision, and the generated names (slot, its `_tmp` variant, and the publication) stay within the limit regardless of the database name length. Add `test_schema_aware_identity_long_database_name`, which replicates a non-default-schema table under a 39-character PostgreSQL database name and asserts the initial snapshot and ongoing replication succeed and that the generated publication/slot names stay within the 63-character limit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
/continue-pr session summaryAddressed the AI Review Fix. The schema-aware single-table identity folded the schema and table into a fixed-length hash but kept the full PostgreSQL database name as a prefix, so the generated names were not actually bounded. With a database name of ≥ 28 characters the default slot A new Test. Added Verification. CI triage (previous head, all reds unrelated to a PostgreSQL identity-string change):
The new commit re-triggers full CI. |
/continue-pr session summaryNo code changes were needed — the PR is approved with no unresolved review threads.
|
/continue-pr session summaryNo code changes — the PR is correct, AI Review verdict is No re-merge. The branch merged CI triage (this PR touches only
The failed jobs cannot be re-run while the "PR" workflow run is still in progress; they will be re-run once it finishes if still needed. Otherwise the PR is ready to merge. |
LLVM Coverage ReportChanged lines: Changed C/C++ lines covered: 23/38 (60.53%) · Uncovered code |

The standalone
MaterializedPostgreSQLtable engine did not work for a relation that lives in a non-default PostgreSQL schema (selected via thematerialized_postgresql_schemasetting).Update after merging
master. The original failure of this PR —CREATE PUBLICATION ... FOR TABLE ONLY <table>issued without the schema, sopqxx::undefined_table: ERROR: relation "<table>" does not exist— is now fixed onmasterby #107423 (commit190a65c1556). That change quotes the single-table publication's tables list viadoubleQuoteWithSchema, which also schema-qualifies it. The redundant publication tables-list change originally made here has therefore been reverted, leavingcreatePublicationIfNeededidentical tomaster.What this PR fixes now: the replication identity.
getPublicationNameand the defaultgetReplicationSlotNamederived their names from<postgres_database>_<table>only, ignoring the schema. Two standaloneMaterializedPostgreSQLtables that replicate a table with the same name from two different schemas of the same PostgreSQL database therefore shared one publication and one default replication slot. The secondCREATEdropped and recreated the shared publication, and because the publication carried only the bare relation name (schema_as_a_part_of_table_namestaysfalsein this mode), the two consumers cross-talked — one replica could stop receiving its schema's changes or ingest the other schema's rows.Fix. For the single-table engine, derive both the publication name and the default replication slot name from a bounded, collision-resistant hash of the full
(database, schema, table)identity whenmaterialized_postgresql_schemais set to a non-default schema, so the two configurations get distinct PostgreSQL objects and stay isolated. A plaindatabase_schema_tableconcatenation is not injective (schema = a_b,table = candschema = a,table = b_cboth fold todatabase_a_b_c), and the replication-slot name is additionally lower-cased with-mapped to_, so even PostgreSQL-distinct schemas such asFoo/fooora-b/a_bwould otherwise collide. Hashing a length-prefixed (hence unambiguous) serialization of the identity keeps the generated name injective in practice and fixed-length, and the human-readable database prefix kept in front of the hash (purely for recognizability inpg_replication_slots/pg_publication) is itself capped, so the slot stays within PostgreSQL's identifier limit regardless of the database, schema and table length. The default schema is left out of the identity: an emptymaterialized_postgresql_schema, or the explicit defaultmaterialized_postgresql_schema = 'public', keeps the legacy schema-unaware names, so existing single-table deployments (and the database-engine path, which has an empty remote table name) keep their current publication and slot names andATTACHstill finds them.Tests. Adds
test_two_schemas_same_table_name_single_storage, which replicates the same table name from two schemas into two standalone engines and asserts that both the initial snapshot and ongoing replication stay isolated. Addstest_schema_aware_identity_publication_separator_collision(a_b.cversusa.b_c) andtest_schema_aware_identity_slot_hyphen_distinct(a-b.tversusa_b.t), which assert that the formerly-colliding identities now own distinct publications and replication slots and replicate in isolation. Keepstest_single_table_engine_with_non_default_schema(single non-default schema). Addstest_default_schema_preserves_legacy_identity, which creates the single-table engine with the explicit defaultmaterialized_postgresql_schema = 'public'and asserts (viapg_replication_slotsandpg_publication) that the legacy schema-unaware publication and slot names are used, soATTACHof tables created before the identity became schema-aware keeps working. Addstest_schema_aware_identity_long_database_name, which replicates a non-default-schema table under a long PostgreSQL database name (whose unbounded slot name would have exceeded the identifier limit) and asserts that the initial snapshot and ongoing replication succeed and that the generated publication and slot names stay within PostgreSQL's 63-character limit.Closes: #59950
Related: #49045
Related: #107423
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix the
MaterializedPostgreSQLtable engine so that a single table can be replicated from a non-default PostgreSQL schema (materialized_postgresql_schema), including the case where tables with the same name exist in several schemas of the same database (previously they would share a publication and replication slot and cross-talk).Documentation entry for user-facing changes
🤖 Generated with Claude Code