Support for ALTER TABLE ADD ENUM VALUES#93830
Conversation
There was a problem hiding this comment.
I think about syntax. We never had function in expression of MODIFY COLUMN
https://clickhouse.com/docs/sql-reference/statements/alter/column#modify-column.
Have you seen any examples of similar logic in other DBMS?
There was a problem hiding this comment.
Have you seen any examples of similar logic in other DBMS?
From functional standpoint, not really, PostgreSQL has "ALTER TYPE enum_type_name ADD VALUE 'new_value'" which is probably the best match.
From syntax standpoint, I thought about
ALTER TABLE foo MODIFY COLUMN enumX
ADD ENUM VALUES ('VAL_101' = 101, 'VAL_102')
but probably addToEnum is slightly more natural.
From implementation standpoint, addToEnum is a type, and it can be used in too many contexts, which is a drawback I am currently trying to address.
I appreciate any suggestion or thoughts.
There was a problem hiding this comment.
I actually think that this example is invalid. It is either pairs or just values. So this will not work
ADD ENUM VALUES ('VAL_101' = 101, 'VAL_102')
This should be fine
ADD ENUM VALUES ('VAL_101', 'VAL_102')
ADD ENUM VALUES ('VAL_101' = 101, 'VAL_102' = 102)
IMHO it is closer to what Postgres and derivatives have. And makes more sense than addToEnum function.
There was a problem hiding this comment.
Hello @qoega, thanks for the response.
The example is not really a mistake, currently ClickHouse supports Enum('VAL_101' = 101, 'VAL_102').
That's why it is supported by addToEnum as well.
:) CREATE TABLE default.enum(`x` Enum16('Zero' = 0, 'One' = 1, 'Two' = 2, 'Three' = 3, 'Four' = 4, 'Hundred' = 100, 'Thousand' = 1000, 'ThousandOne' = 1001)) ENGINE = Memory;
:) alter table enum modify column x addToEnum('ThousandTen' = 1010, 'ThousandEleven');
:) alter table enum modify column x addToEnum('NextValue');
:) alter table enum modify column x addToEnum('SpecificValue'=42);
:) show create table enum format Raw;
SHOW CREATE TABLE enum
FORMAT Raw
Query id: 15b475a4-159e-4067-b7ba-c959c90819fc
CREATE TABLE default.enum
(
`x` Enum16('Zero' = 0, 'One' = 1, 'Two' = 2, 'Three' = 3, 'Four' = 4, 'SpecificValue' = 42, 'Hundred' = 100, 'Thousand' = 1000, 'ThousandOne' = 1001, 'ThousandTen' = 1010, 'ThousandEleven' = 1011, 'NextValue' = 1012)
)
ENGINE = Memory
Of course, addToEnum ('VAL_101', 'VAL_102') and ('VAL_101' = 101, 'VAL_102' = 102) are supported as well.
Anyway, if you believe that
ALTER TABLE foo MODIFY COLUMN enumX ADD ENUM VALUES ('VAL_101' = 101, 'VAL_102')
is better than
ALTER TABLE foo MODIFY COLUMN enumX AddToEnum('VAL_101' = 101, 'VAL_102')
I'll willingly switch to this new syntax, thanks.
There was a problem hiding this comment.
I thought that we do not allow mixed ones, but checked that it works, ok.
https://fiddle.clickhouse.com/e9dd95bb-c305-4a5e-9992-b7ee50ad11e1
|
Hello @qoega , I am not 100% sure about particular syntax, though eventually I get used to it, probably it is not too bad. What do you think? |
mkmkme
left a comment
There was a problem hiding this comment.
Overall looks good. I've added a couple of comments in places that caught my eye.
|
Hello @qoega , could you please enable tests and review ? |
|
Workflow [PR], commit [07520f5] Summary: ✅
AI ReviewSummaryThis PR adds Final VerdictStatus: ✅ Approve |
|
cpp stylecheck believes that is invalid. Related code is I hesitated to improve it, we need an extra layer here to exclude quoted string from check. |
|
Fast test failure is definitely not related. next_refresh_time still belong to 2026 (current time). |
# Conflicts: # src/Parsers/ASTAlterQuery.cpp
The branch carried stale auto-generated `.github/workflows` YAML referencing the old `style-checker-aarch64` runner labels; reset them to master's current version so the PR diff stays scoped to the `ADD ENUM VALUES` feature. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The test was numbered `03774` which now collides with several tests added on master (e.g. `03774_join_pushdown_sharding_bug`, `03774_parquet_empty_tuple`). Renumber to `04307`, past the current highest test number on master. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Updated the branch (maintainer push,
I rebuilt from the merged tree and verified The CI failures on the previous run are unrelated to this change:
The open |
|
📊 Cloud Performance Report 🟢 AI verdict: This PR adds the ALTER TABLE ... MODIFY COLUMN ... ADD ENUM VALUES syntactic sugar, touching only the SQL parser, Enum type creation/merging, and ALTER command handling. None of that runs in the ClickBench or TPC-H SELECT execution path, and the flagged queries move in both directions, which points to run-to-run variance rather than a real effect, so those flags were downgraded. Three large improvements remain flagged purely by their magnitude (clickbench Q23 -31.8%, tpch Q7 -34.5%, tpch Q20 -47.2%); given the diff these most likely reflect environment differences, and Q20 in particular sits on very noisy measurements. No actionable regression is attributable to this change. clickbench🟢 1 improved · Flagged queries (11 of 43)
q-value = BH-FDR adjusted p; smaller is stronger evidence. MIRAI flags a query when q < fdr_q (default 0.10) — the value the verdict is based on. tpch_adapted_1_official🟢 2 improved · Flagged queries (3 of 22)
q-value = BH-FDR adjusted p; smaller is stronger evidence. MIRAI flags a query when q < fdr_q (default 0.10) — the value the verdict is based on. Debug info
|
# Conflicts: # src/Storages/AlterCommands.cpp
`AlterCommands::prepare` derived the result of `ADD ENUM VALUES` by merging
against the `columns` snapshot, but never advanced that snapshot between
commands. As a result `MODIFY COLUMN x ADD ENUM VALUES('a'), MODIFY COLUMN x
ADD ENUM VALUES('b')` produced `base + b` and silently dropped `a`, because
both commands merged against the original type and were then applied
sequentially.
Write the merged type back into the working snapshot so subsequent commands
in the same statement see the already-extended type.
Additionally, `ADD ENUM VALUES` computes its type only when the column is
present in the snapshot. When the column is added by a preceding `ADD COLUMN`
in the same statement (which does not advance the snapshot), the modification
was silently dropped. Fail explicitly with `NO_SUCH_COLUMN_IN_TABLE` instead.
Addresses review feedback on `src/Storages/AlterCommands.cpp`.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Merged Addressed review feedbackFixed the unresolved bot thread on
CI analysis (all three failures unrelated to this PR)
@groeneai, the two flaky tests |
|
@alexey-milovidov, status on the two flaky tests:
The |
The test sorts a hand-built `BFloat16` array that contains five values which all round to the same `BFloat16` representation (`+0.0`, `+0.0`, `0.0`, `-0.0`, `0.0`). Equal sort keys have an implementation-defined relative order; under enough query-plan randomization the relative order of `-0` against the surrounding `0`s changes, and the result no longer matches the byte-exact `.reference` file. Pinning `max_threads = 1` makes the order produced by `MergeSortingTransform` deterministic and matches the committed reference. The test still exercises sorting of `BFloat16` values across the full input. Diagnosed by @alexey-milovidov on ClickHouse#93830 (comment) CI history: 4 hits across 4 unrelated PRs in 90 days (ClickHouse#93830, ClickHouse#100983, ClickHouse#100649, ClickHouse#96130), 0 master failures, 78957 OK runs / 1 FAIL in 30 days.
# Conflicts: # src/DataTypes/EnumValues.cpp # src/DataTypes/EnumValues.h
|
Merged latest `master` to resolve conflicts. Since the last merge, `master` replaced the hash-map-based `EnumValues` with the new compact storage ( Re-threaded `ValidationMode` through the new `buildLookupStructures`:
Built locally and `04307_add_to_enum` passes against a fresh server. |
|
I prohibited a combination of MODIFY ... MODIFY ADD ENUM VALUES without column restrictions. Advancing snapshot after any MODIFY COLUMN seems too risky. Update: now it is possible to have MODIFY and MODIFY ADD ENUM VALUES for different columns. |
|
Hello @alexey-milovidov , I [obviously] cannot sync with private repo. |
# Conflicts: # src/DataTypes/DataTypeEnum.cpp
LLVM Coverage ReportChanged lines: Changed C/C++ lines covered by tests: 291/303 (96.04%) | Lost baseline coverage: none · Uncovered code |

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Implement
ADD ENUM VALUESinALTER TABLEqueries to simplify appending new values to an existing Enum type without the need to specify all current Enum values again.Documentation entry for user-facing changes
Note
Medium Risk
Adds a new ALTER TABLE syntax and plumbing to merge enum definitions, touching SQL parsing and schema-alter preparation logic; mistakes could break ALTER parsing or allow invalid enum remaps.
Overview
Adds
ALTER TABLE ... MODIFY/ALTER COLUMN ... ADD ENUM VALUES (...)to append new members to existingEnum/Enum8/Enum16columns without restating the full definition, includingNullable(Enum*).Implements parsing/formatting support (
ADD_ENUM_VALUESkeyword,ASTAlterCommand::add_enum_values), adds enum-type merge/validation logic (mergeEnumTypes) with conflict and range checks (including optional relative numbering), and wires it intoAlterCommands::prepareto produce the updated column type for the mutation.Updates user docs for
EnumandALTER COLUMN, and adds stateless tests covering success cases and expected errors (non-enum columns, overflow, conflicting name/value, and syntax misuse).Written by Cursor Bugbot for commit 9bf39da. This will update automatically on new commits. Configure here.
Version info
26.6.1.769