Support for ALTER TABLE ADD ENUM VALUES by ilejn · Pull Request #93830 · ClickHouse/ClickHouse · GitHub
Skip to content

Support for ALTER TABLE ADD ENUM VALUES#93830

Merged
alexey-milovidov merged 60 commits into
ClickHouse:masterfrom
ilejn:add_enum
Jun 13, 2026
Merged

Support for ALTER TABLE ADD ENUM VALUES#93830
alexey-milovidov merged 60 commits into
ClickHouse:masterfrom
ilejn:add_enum

Conversation

@ilejn

@ilejn ilejn commented Jan 9, 2026

Copy link
Copy Markdown
Contributor

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Implement ADD ENUM VALUES in ALTER TABLE queries to simplify appending new values to an existing Enum type without the need to specify all current Enum values again.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Note

Medium Risk
Adds a new ALTER TABLE syntax and plumbing to merge enum definitions, touching SQL parsing and schema-alter preparation logic; mistakes could break ALTER parsing or allow invalid enum remaps.

Overview
Adds ALTER TABLE ... MODIFY/ALTER COLUMN ... ADD ENUM VALUES (...) to append new members to existing Enum/Enum8/Enum16 columns without restating the full definition, including Nullable(Enum*).

Implements parsing/formatting support (ADD_ENUM_VALUES keyword, ASTAlterCommand::add_enum_values), adds enum-type merge/validation logic (mergeEnumTypes) with conflict and range checks (including optional relative numbering), and wires it into AlterCommands::prepare to produce the updated column type for the mutation.

Updates user docs for Enum and ALTER COLUMN, and adds stateless tests covering success cases and expected errors (non-enum columns, overflow, conflicting name/value, and syntax misuse).

Written by Cursor Bugbot for commit 9bf39da. This will update automatically on new commits. Configure here.

Version info

  • Merged into: 26.6.1.769

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think about syntax. We never had function in expression of MODIFY COLUMN
https://clickhouse.com/docs/sql-reference/statements/alter/column#modify-column.

Have you seen any examples of similar logic in other DBMS?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you seen any examples of similar logic in other DBMS?

From functional standpoint, not really, PostgreSQL has "ALTER TYPE enum_type_name ADD VALUE 'new_value'" which is probably the best match.

From syntax standpoint, I thought about

ALTER TABLE foo MODIFY COLUMN enumX
    ADD ENUM VALUES ('VAL_101' = 101, 'VAL_102')

but probably addToEnum is slightly more natural.

From implementation standpoint, addToEnum is a type, and it can be used in too many contexts, which is a drawback I am currently trying to address.

I appreciate any suggestion or thoughts.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think that this example is invalid. It is either pairs or just values. So this will not work

    ADD ENUM VALUES ('VAL_101' = 101, 'VAL_102')

This should be fine

    ADD ENUM VALUES ('VAL_101', 'VAL_102')
    ADD ENUM VALUES ('VAL_101' = 101, 'VAL_102' = 102) 

IMHO it is closer to what Postgres and derivatives have. And makes more sense than addToEnum function.

@ilejn ilejn Jan 25, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @qoega, thanks for the response.

The example is not really a mistake, currently ClickHouse supports Enum('VAL_101' = 101, 'VAL_102').
That's why it is supported by addToEnum as well.

:) CREATE TABLE default.enum(`x` Enum16('Zero' = 0, 'One' = 1, 'Two' = 2, 'Three' = 3, 'Four' = 4, 'Hundred' = 100, 'Thousand' = 1000, 'ThousandOne' = 1001)) ENGINE = Memory;
:) alter table enum modify column x addToEnum('ThousandTen' = 1010, 'ThousandEleven');
:) alter table enum modify column x addToEnum('NextValue');
:) alter table enum modify column x addToEnum('SpecificValue'=42);
:) show create table enum format Raw;

SHOW CREATE TABLE enum
FORMAT Raw

Query id: 15b475a4-159e-4067-b7ba-c959c90819fc

CREATE TABLE default.enum
(
    `x` Enum16('Zero' = 0, 'One' = 1, 'Two' = 2, 'Three' = 3, 'Four' = 4, 'SpecificValue' = 42, 'Hundred' = 100, 'Thousand' = 1000, 'ThousandOne' = 1001, 'ThousandTen' = 1010, 'ThousandEleven' = 1011, 'NextValue' = 1012)
)
ENGINE = Memory

Of course, addToEnum ('VAL_101', 'VAL_102') and ('VAL_101' = 101, 'VAL_102' = 102) are supported as well.

Anyway, if you believe that
ALTER TABLE foo MODIFY COLUMN enumX ADD ENUM VALUES ('VAL_101' = 101, 'VAL_102')
is better than
ALTER TABLE foo MODIFY COLUMN enumX AddToEnum('VAL_101' = 101, 'VAL_102')

I'll willingly switch to this new syntax, thanks.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that we do not allow mixed ones, but checked that it works, ok.
https://fiddle.clickhouse.com/e9dd95bb-c305-4a5e-9992-b7ee50ad11e1

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this feature is introduced by #36352
But there is some room for improvement #93691

@ilejn ilejn marked this pull request as ready for review January 11, 2026 23:24
@ilejn

ilejn commented Jan 21, 2026

Copy link
Copy Markdown
Contributor Author

Hello @qoega ,
the reason for this development is the fact some ClickHouse users prefer Enums over LowCardinality (less controllable) or Dictionary (extra fuss, I think). And it is common to add one or two values to a long list.
So the feature seems to be useful and it is requested.

I am not 100% sure about particular syntax, though eventually I get used to it, probably it is not too bad.

What do you think?

@mkmkme mkmkme left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. I've added a couple of comments in places that caught my eye.

Comment thread docs/en/sql-reference/data-types/enum.md Outdated
Comment thread src/Interpreters/InterpreterAlterQuery.cpp Outdated
Comment thread src/Storages/AlterCommands.cpp Outdated
Comment thread tests/queries/0_stateless/04307_add_to_enum.sql.j2
@ilejn

ilejn commented Feb 3, 2026

Copy link
Copy Markdown
Contributor Author

Hello @qoega , could you please enable tests and review ?

@clickhouse-gh

clickhouse-gh Bot commented Feb 3, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [07520f5]

Summary:


AI Review

Summary

This PR adds ALTER TABLE ... MODIFY COLUMN ... ADD ENUM VALUES for Enum, Enum8, Enum16, and nullable enum columns, with parser/AST support, enum merge validation, metadata-only alter handling, documentation, and focused stateless coverage. I did not find any remaining blocker or major issue in the current diff.

Final Verdict

Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-feature Pull request with new product feature label Feb 3, 2026
@qoega qoega added the can be tested Allows running workflows for external contributors label Feb 6, 2026
@ilejn

ilejn commented Feb 9, 2026

Copy link
Copy Markdown
Contributor Author

cpp stylecheck believes that

ostr << " ADD ENUM VALUES ( ";

is invalid.

Related code is

    xargs grep $@ -n -P '((class|struct|namespace|enum|if|for|while|else|throw|switch).*|\)(\s*const)?(\s*override)?\s*)\{$|\s$|^ {1,3}[^\* ]\S|\t|^\s*(if|else if|if constexpr|else if constexpr|for|while|catch|switch)\(|\( [^\s\\]|\S \)' |
# a curly brace not in a new line, but not for the case of C++11 init or agg. initialization | trailing whitespace | number of ws not a multiple of 4, but not in the case of comment continuation | missing whitespace after for/if/while... before opening brace | whitespaces inside braces
    grep -v -P '//|\s+\*|\$\(\(| \)"' && echo "^ style error on this line"

I hesitated to improve it, we need an extra layer here to exclude quoted string from check.

@ilejn

ilejn commented May 20, 2026

Copy link
Copy Markdown
Contributor Author

Fast test failure is definitely not related.
After

$CLICKHOUSE_CLIENT -q "
    system test view a set fake time '2050-01-01 00:00:01';
    system wait view a;
    system refresh view a;
    system wait view a;
    select '<4.1: fake clock>', status, last_success_time, next_refresh_time, progress, read_rows, total_rows, written_rows, retry from refreshes;

next_refresh_time still belong to 2026 (current time).

Reason: result differs with reference: 
--- /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/tests/queries/0_stateless/02932_refreshable_materialized_views_1.reference    2026-05-20 13:42:05.398439153 +0200
+++ /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/tests/queries/0_stateless/02932_refreshable_materialized_views_1.213364.stdout    2026-05-20 13:53:19.095476701 +0200
@@ -1,15 +1,15 @@
 <1: created view>        a
 CREATE MATERIALIZED VIEW default.a\nREFRESH EVERY 2 SECOND\n(\n    `x` UInt64\n)\nENGINE = Memory\nDEFINER = default SQL SECURITY DEFINER\nAS SELECT number AS x\nFROM numbers(2)\nUNION ALL\nSELECT rand64() AS x
 <2: refreshed>    3    1    1
 <3: time difference at least>    1000
-<4.1: fake clock>    Scheduled    2050-01-01 00:00:01    2050-01-01 00:00:02    1    3    3    3    0 
+<4.1: fake clock>    Scheduled    2050-01-01 00:00:01    2026-05-20 11:53:16    1    3    3    3    0 

Test report https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=93830&sha=8fb42adc14b05bf0a419db37b04a17b6c15a2cf3&name_0=PR&name_1=Fast%20test

@clickhouse-gh clickhouse-gh Bot added the manual approve Manual approve required to run CI label May 29, 2026
alexey-milovidov and others added 3 commits June 3, 2026 05:46
The branch carried stale auto-generated `.github/workflows` YAML
referencing the old `style-checker-aarch64` runner labels; reset them
to master's current version so the PR diff stays scoped to the
`ADD ENUM VALUES` feature.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The test was numbered `03774` which now collides with several tests
added on master (e.g. `03774_join_pushdown_sharding_bug`,
`03774_parquet_empty_tuple`). Renumber to `04307`, past the current
highest test number on master.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member

Updated the branch (maintainer push, maintainerCanModify):

  • Merged master (branch was 941 commits behind; CI was red only on unrelated flakes — see below). Zero conflicts in the feature files (DataTypeEnum, EnumValues, ASTAlterQuery, ParserAlterQuery, AlterCommands).
  • Reset the auto-generated .github/workflows/*.yml to master's version. The branch carried stale generated YAML still referencing the old style-checker-aarch64 / pr-style-checker-aarch64 runner labels, which is unrelated to this feature and could misroute CI.
  • Renumbered the test 03774_add_to_enum04307_add_to_enum. The 03774 prefix now collides with several tests added on master (e.g. 03774_join_pushdown_sharding_bug, 03774_parquet_empty_tuple); 04307 is past the current highest number.

I rebuilt from the merged tree and verified 04307_add_to_enum locally: all serverError/clientError directives match and the SELECT output is byte-identical to the committed reference (relative numbering, Nullable(Enum*), overflow/conflict errors, SHOW CREATE, syntax-error cases all pass).

The CI failures on the previous run are unrelated to this change:

  • test_mysql_protocol/test.py::test_mysql_dotnet_client.NET MySQL client SSL handshake flake (Received unknown response X for SSLRequest), independent of ALTER ... ADD ENUM VALUES.
  • check_report_messages.pyplay.clickhouse.com CIDB connection refused (infra).

The open forEachPointerToChild thread is addressed for this PR (add_enum_values is wired in); the remaining refresh / MODIFY REFRESH observation is a pre-existing issue and out of scope here, as noted.

Comment thread src/Storages/AlterCommands.cpp
@clickhouse-gh

clickhouse-gh Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

📊 Cloud Performance Report

🟢 AI verdict: improvement3 query(s) improved out of 38 analysed

This PR adds the ALTER TABLE ... MODIFY COLUMN ... ADD ENUM VALUES syntactic sugar, touching only the SQL parser, Enum type creation/merging, and ALTER command handling. None of that runs in the ClickBench or TPC-H SELECT execution path, and the flagged queries move in both directions, which points to run-to-run variance rather than a real effect, so those flags were downgraded. Three large improvements remain flagged purely by their magnitude (clickbench Q23 -31.8%, tpch Q7 -34.5%, tpch Q20 -47.2%); given the diff these most likely reflect environment differences, and Q20 in particular sits on very noisy measurements. No actionable regression is attributable to this change.

clickbench

🟢 1 improved · ⚠️ 10 inconclusive

Flagged queries (11 of 43)
Query Verdict Baseline med (ms) PR med (ms) Change q-value Hint
🟢 23 improvement 297 202 -31.8% <0.0001 Large, consistent -31.8% held by magnitude. PR only changes Enum DDL, so attribution is unclear.
⚠️ 4 not_sure 264 227 -14.0% <0.0001 PR adds Enum ADD-VALUES DDL (parser/DataTypeEnum/AlterCommands); SELECT path untouched. -14% is variance.
⚠️ 15 not_sure 246 213 -13.4% <0.0001 Enum DDL-only change cannot move this SELECT; -13% is variance despite trending history.
⚠️ 16 not_sure 796 887 +11.4% <0.0001 Off-path: PR touches only Enum ALTER/type creation. +11% does not match the diff.
⚠️ 17 not_sure 564 621 +10.0% <0.0001 Enum DDL-only PR; SELECT execution unchanged. +10% regression is run-to-run variance.
⚠️ 18 not_sure 1403 1548 +10.3% <0.0001 Off-path: nothing in the Enum DDL diff runs in this query. +10% is variance.
⚠️ 22 not_sure 334 251 -24.9% <0.0001 -24.85% is below the hard-trust threshold; base measurements very noisy and PR is Enum DDL only.
⚠️ 31 not_sure 310 342 +10.3% <0.0001 Off-path: PR touches only Enum ALTER/type creation. +10% is run-to-run variance.
⚠️ 32 not_sure 1367 1505 +10.1% 0.0002 Already not_sure; the two tests disagree. PR is Enum DDL only, off the query path.
⚠️ 33 not_sure 1488 1593 +7.1% 0.0003 Off-path Enum DDL change; +7% borderline regression is variance, not a PR effect.
⚠️ 34 not_sure 1486 1595 +7.3% <0.0001 Off-path Enum DDL change; +7% borderline regression is variance, not a PR effect.

q-value = BH-FDR adjusted p; smaller is stronger evidence. MIRAI flags a query when q < fdr_q (default 0.10) — the value the verdict is based on.

tpch_adapted_1_official

🟢 2 improved · ⚠️ 1 inconclusive

Flagged queries (3 of 22)
Query Verdict Baseline med (ms) PR med (ms) Change q-value Hint
🟢 7 improvement 113 74 -34.5% <0.0001 Large, consistent -34.5% held by magnitude. PR only changes Enum DDL, so attribution is unclear.
🟢 20 improvement 248 131 -47.2% <0.0001 -47% held only by magnitude; underlying runs very noisy. PR is Enum DDL, off the query path.
⚠️ 8 not_sure 156 182 +17.4% <0.0001 Enum DDL-only PR; TPC-H has no Enum columns. +17% regression is run-to-run variance.

q-value = BH-FDR adjusted p; smaller is stronger evidence. MIRAI flags a query when q < fdr_q (default 0.10) — the value the verdict is based on.

Debug info
  • StressHouse run: 4ac35320-3ee4-475d-8d62-5a7a2e22ecbc
  • MIRAI run: 387ea28a-3dd0-4c03-a748-94408c483401
  • PR check IDs:
    • clickbench_48872_1781381803
    • clickbench_48878_1781381803
    • clickbench_48884_1781381803
    • tpch_adapted_1_official_48887_1781381803
    • tpch_adapted_1_official_48894_1781381803
    • tpch_adapted_1_official_48905_1781381803

alexey-milovidov and others added 2 commits June 7, 2026 15:21
# Conflicts:
#	src/Storages/AlterCommands.cpp
`AlterCommands::prepare` derived the result of `ADD ENUM VALUES` by merging
against the `columns` snapshot, but never advanced that snapshot between
commands. As a result `MODIFY COLUMN x ADD ENUM VALUES('a'), MODIFY COLUMN x
ADD ENUM VALUES('b')` produced `base + b` and silently dropped `a`, because
both commands merged against the original type and were then applied
sequentially.

Write the merged type back into the working snapshot so subsequent commands
in the same statement see the already-extended type.

Additionally, `ADD ENUM VALUES` computes its type only when the column is
present in the snapshot. When the column is added by a preceding `ADD COLUMN`
in the same statement (which does not advance the snapshot), the modification
was silently dropped. Fail explicitly with `NO_SUCH_COLUMN_IN_TABLE` instead.

Addresses review feedback on `src/Storages/AlterCommands.cpp`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member

Merged master (the branch was ~1002 commits behind and red), resolving a conflict in src/Storages/AlterCommands.cpp (both sides added a helper to the same anonymous namespace — kept both).

Addressed review feedback

Fixed the unresolved bot thread on AlterCommands.cpp about chained ADD ENUM VALUES in a single statement (6b03661059a):

  • prepare now advances its working columns snapshot after each ADD ENUM VALUES, so MODIFY COLUMN x ADD ENUM VALUES('a'), MODIFY COLUMN x ADD ENUM VALUES('b') keeps both a and b (previously dropped a).
  • When the target column is absent from the snapshot (e.g. added by a preceding ADD COLUMN in the same statement), it now fails with NO_SUCH_COLUMN_IN_TABLE instead of silently dropping the modification (fail-close).
  • Added regression tests for both cases in 04307_add_to_enum; verified the full test passes against a local build.

CI analysis (all three failures unrelated to this PR)

@groeneai, the two flaky tests 00175_obfuscator_schema_inference and 03366_bfloat16_sorting are unrelated to this PR; please investigate and provide fixes in separate PRs if they are not already tracked. The UntrackedMemoryRegistry UBSan failure is already resolved by the revert 3930f8abb8a on master.

@groeneai

groeneai commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

@alexey-milovidov, status on the two flaky tests:

The UntrackedMemoryRegistry UBSan failure is already resolved by 3930f8abb8a as you noted.

pull Bot pushed a commit to AnotherGenZ/ClickHouse that referenced this pull request Jun 7, 2026
The test sorts a hand-built `BFloat16` array that contains five values
which all round to the same `BFloat16` representation (`+0.0`, `+0.0`,
`0.0`, `-0.0`, `0.0`). Equal sort keys have an implementation-defined
relative order; under enough query-plan randomization the relative
order of `-0` against the surrounding `0`s changes, and the result no
longer matches the byte-exact `.reference` file. Pinning
`max_threads = 1` makes the order produced by `MergeSortingTransform`
deterministic and matches the committed reference. The test still
exercises sorting of `BFloat16` values across the full input.

Diagnosed by @alexey-milovidov on
ClickHouse#93830 (comment)

CI history: 4 hits across 4 unrelated PRs in 90 days (ClickHouse#93830, ClickHouse#100983,
ClickHouse#100649, ClickHouse#96130), 0 master failures, 78957 OK runs / 1 FAIL in 30 days.
# Conflicts:
#	src/DataTypes/EnumValues.cpp
#	src/DataTypes/EnumValues.h
@alexey-milovidov

Copy link
Copy Markdown
Member

Merged latest `master` to resolve conflicts. Since the last merge, `master` replaced the hash-map-based `EnumValues` with the new compact storage (Replace EnumValues hash maps with compact storage + follow-ups), which conflicted with this PR's `ValidationMode` plumbing through the old `fillMaps`.

Re-threaded `ValidationMode` through the new `buildLookupStructures`:

  • `ValidationMode::Normal` is unchanged (sort by value, full dup-name + dup-value validation, build value-to-name lookup).
  • `ValidationMode::TemporaryAdd` keeps parser order (per-element relative flags are positional) and skips dup-value validation, as before. Additionally it now skips building the value-to-name lookup structures: a `TemporaryAdd` instance is unsorted and may hold duplicate placeholder values, so the compact direct-lookup build would compute a negative range and overflow `value_to_index.resize`. This is safe because a `TemporaryAdd` instance is only ever read via `getValues()` / `generateName` / `getRelativeFlagsSize` (in `mergeEnumTypes` and the ctor) and is never queried by value. The dup-name check is preserved.

Built locally and `04307_add_to_enum` passes against a fresh server.

Comment thread src/Storages/AlterCommands.cpp
@ilejn

ilejn commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

I prohibited a combination of MODIFY ... MODIFY ADD ENUM VALUES without column restrictions.

Advancing snapshot after any MODIFY COLUMN seems too risky.
I tried to handle snapshot before ADD ENUM VALUES (not after), but the change looks too invasive.

Update: now it is possible to have MODIFY and MODIFY ADD ENUM VALUES for different columns.

Comment thread src/Storages/AlterCommands.cpp Outdated
@ilejn

ilejn commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

Hello @alexey-milovidov , I [obviously] cannot sync with private repo.

# Conflicts:
#	src/DataTypes/DataTypeEnum.cpp
@clickhouse-gh

clickhouse-gh Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.70% 84.70% +0.00%
Functions 92.30% 92.30% +0.00%
Branches 77.30% 77.30% +0.00%

Changed lines: Changed C/C++ lines covered by tests: 291/303 (96.04%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

@alexey-milovidov alexey-milovidov merged commit 1a5e24c into ClickHouse:master Jun 13, 2026
165 of 166 checks passed
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors manual approve Manual approve required to run CI pr-feature Pull request with new product feature pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants