`ILIKE` multi column selection by niyue · Pull Request #104569 · ClickHouse/ClickHouse · GitHub
Skip to content

ILIKE multi column selection#104569

Merged
alexey-milovidov merged 13 commits into
ClickHouse:masterfrom
niyue:feat/ilke-columns
Jun 9, 2026
Merged

ILIKE multi column selection#104569
alexey-milovidov merged 13 commits into
ClickHouse:masterfrom
niyue:feat/ilke-columns

Conversation

@niyue

@niyue niyue commented May 11, 2026

Copy link
Copy Markdown
Contributor

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

  • Add support for selecting columns by name pattern with * LIKE '<pattern>' and * ILIKE '<pattern>', including qualified forms such as table.* LIKE '<pattern>' and table.* ILIKE '<pattern>'. LIKE matches column names case-sensitively, ILIKE case-insensitively.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

This PR also adds old-analyzer (enable_analyzer = 0) support for qualified COLUMNS regexp matchers, fixing a pre-existing inconsistency where a.COLUMNS('...') worked with the new analyzer but failed with the old analyzer through the multiple-join rewrite path.

Closes: #53163

Version info

  • Merged into: 26.6.1.527

niyue added 2 commits May 11, 2026 17:02
Add `* ILIKE` examples to the `Dynamic column selection` section and clarify that the pattern uses `LIKE` semantics rather than `re2` regular expression semantics.
@niyue niyue changed the title Feat/ilke columns ILIKE multi column selection May 11, 2026
@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label May 11, 2026
@clickhouse-gh

clickhouse-gh Bot commented May 11, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-feature Pull request with new product feature label May 11, 2026
niyue added 2 commits May 12, 2026 08:53
Expand `table.COLUMNS` regexp matchers in the old analyzer paths so `table.* ILIKE` works when `enable_analyzer` is disabled.

This fixes the reproducible failure in `04218_asterisk_ilike` for `SELECT t.* ILIKE %id ...`.
Add a regression query with a qualified `* ILIKE` matcher over multiple joins and `enable_analyzer = 0` so the old analyzer join-to-subquery rewrite path is exercised.
@niyue

niyue commented May 12, 2026

Copy link
Copy Markdown
Contributor Author

Note: this PR also fixes a related old-analyzer inconsistency discovered while implementing * ILIKE.

a.* ILIKE '<pattern>' is represented internally as a qualified COLUMNS regexp matcher, similar to a.COLUMNS(...). The new analyzer already supported that form, but the old analyzer did not fully expand qualified COLUMNS matchers, in particular through the multiple-join rewrite path.

To make a.* ILIKE ... work with enable_analyzer = 0, this PR also adds old-analyzer support for qualified COLUMNS regexp matchers. This fixes the new syntax and also resolves the pre-existing inconsistency for a.COLUMNS(...).

For example, before this PR, a query like this worked with the new analyzer but failed with enable_analyzer = 0:

SELECT a.COLUMNS('(?i)id$')
FROM (SELECT number AS UserID FROM numbers(1)) AS a
LEFT JOIN (SELECT number AS id FROM numbers(1)) AS b ON b.id = a.UserID
LEFT JOIN (SELECT number AS id FROM numbers(1)) AS c ON c.id = a.UserID
SETTINGS enable_analyzer = 0;

@niyue

niyue commented May 12, 2026

Copy link
Copy Markdown
Contributor Author

Hi @alexey-milovidov, this PR is ready for review, could you please help? Thanks.

@alexey-milovidov

Copy link
Copy Markdown
Member

This looks strange to me. First I thought it will apply the function to every column. For example:

SELECT count() FROM table WHERE * LIKE '%abc%'

But it does a different thing. How would we resolve this ambiguity later?
Is the Snowflake behavior widespread or just a random quirk? We shouldn't embrace random quirks.

@alexey-milovidov alexey-milovidov added the st-discussion When the implementation aspects are not clear or when the PR is on hold due to questions. label May 30, 2026
@alexey-milovidov alexey-milovidov self-assigned this May 30, 2026
@niyue

niyue commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

Is the Snowflake behavior widespread or just a random quirk?

I checked support in other SQL dialects:

System Support Notes
Snowflake Exact syntax Supports SELECT * ILIKE '<pattern>' and qualified alias.* ILIKE '<pattern>'. Docs
DuckDB Broader star filtering support Supports filtering * by column-name patterns with LIKE, GLOB, and SIMILAR TO; it also has regex-based COLUMNS(...). Docs
BigQuery No exact support Supports SELECT * EXCEPT (...) and SELECT * REPLACE (...), but not pattern-based star filtering. Docs
Spark SQL No exact support The official star (*) clause supports [ name . ] * [ EXCEPT (...) ], but not * ILIKE. Spark can select columns by regex via quoted identifiers (using backticks) when spark.sql.parser.quotedRegexColumnNames = true. Docs
Amazon Redshift No exact support Supports SELECT * EXCLUDE ..., but not pattern-based star filtering. Docs

I also briefly checked Pinot/Druid/DataFusion/Trino and didn't find such capability.

So this is not Snowflake-specific (DuckDB has it as well) but it is not widespread either.

@alexey-milovidov please advise if we should proceed, and feel free to reject this PR if we consider this syntax too niche or not worth extending the parser for.

@alexey-milovidov

Copy link
Copy Markdown
Member

I'm ok with that, but let's also add LIKE support.

alexey-milovidov and others added 3 commits June 3, 2026 16:01
Per review feedback, support `* LIKE '<pattern>'` (and qualified
`table.* LIKE '<pattern>'`) in addition to the case-insensitive
`* ILIKE '<pattern>'`. The shared parser helper now accepts both
`LIKE` and `ILIKE` keywords; `LIKE` matches column names
case-sensitively while `ILIKE` remains case-insensitive.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member

Added case-sensitive LIKE support alongside ILIKE, as requested.

  • * LIKE '<pattern>' and qualified table.* LIKE '<pattern>' now work, matching column names case-sensitively (ILIKE remains case-insensitive). The shared parser helper accepts both keywords; LIKE builds a plain likePatternToRegexp matcher while ILIKE prepends (?i).
  • Extended the test (now 04312_asterisk_like_ilike, renumbered to avoid a collision after merging master) with case-sensitive LIKE cases that demonstrate the difference from ILIKE (e.g. * LIKE 'foo%' matches foo but not FooBar), plus a qualified old-analyzer case and a no-match EMPTY_LIST_OF_COLUMNS_QUERIED case.
  • Updated the documentation section to cover both LIKE and ILIKE.
  • Merged master (the branch was ~3 weeks behind).

Verified locally on a freshly built binary: the test matches its reference, both serverError cases throw EMPTY_LIST_OF_COLUMNS_QUERIED, and a regression sweep (plain *, scalar/column LIKE, transformers, multiplication) is unaffected.

The only prior CI failure was CH Inc sync, which should be re-evaluated on this new commit now that master is merged.

@alexey-milovidov

Copy link
Copy Markdown
Member

@groeneai, investigate the failure: Upgrade check (amd_release) — "Error message in clickhouse-server.log" and provide a fix in a separate PR if needed. If the fix is already in progress, link it here.

The flagged error messages are all StorageKafka2 (... 03923_kafka2_keeper_offsets_...): Exception during get topic partitions from Kafka: Local: Broker transport failure — background polling errors emitted while the test Kafka broker is being torn down. They are unrelated to this PR, which only changes the SQL parser and old-analyzer expansion for * LIKE / * ILIKE column selection (no Kafka code touched).

I have merged the latest master into the branch and re-pushed, so CI will re-run on the fresh commit.

Comment thread src/Interpreters/JoinToSubqueryTransformVisitor.cpp
@groeneai

groeneai commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

@alexey-milovidov, the fix is already merged: #106516 (merged 2026-06-06, master commit cfe19934d1).

The current 4e3bf67c head of this PR already includes that master merge, so the upgrade-check pipeline will re-run with the fix in place.

What the fix does: adds a regex allowlist line to tests/docker_scripts/upgrade_runner.sh that excludes the specific StorageKafka2 ... Exception during get topic partitions from Kafka: Local: Broker transport failure log emitted by KafkaConsumer2's background poll loop while the Redpanda broker is being torn down during the upgrade restart. The filter requires both the StorageKafka2 engine context AND the Broker transport failure symptom together, so real Kafka2 regressions (auth, timeouts, other broker errors) still surface.

Recurrence on this PR's fresh build with the fix in place would be a new flake, not the same one. CIDB shows this signature affected 53 unrelated PRs in 30d before the fix landed.

session: cron:clickhouse-ci-task-worker:20260607-171600

niyue added 3 commits June 8, 2026 13:38
Expand nested qualified `COLUMNS` regexp matchers before old analyzer join-to-subquery rewrite collects identifiers, and add coverage for nested `a.* ILIKE` with both analyzers.

Follow-up for ClickHouse#104569 (comment)
Add regression coverage for nested `a.* ILIKE` used as an argument to binary `plus`, including the multi-column expansion error path.
Comment thread src/Interpreters/JoinToSubqueryTransformVisitor.cpp Outdated
@clickhouse-gh

clickhouse-gh Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.50% 84.60% +0.10%
Functions 92.30% 92.30% +0.00%
Branches 77.20% 77.30% +0.10%

Changed lines: Changed C/C++ lines covered by tests: 156/175 (89.14%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 3 line(s) · Uncovered code

Full report · Diff report

@alexey-milovidov alexey-milovidov added this pull request to the merge queue Jun 9, 2026
Merged via the queue into ClickHouse:master with commit 8ad6f74 Jun 9, 2026
166 checks passed
@robot-clickhouse robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-feature Pull request with new product feature pr-synced-to-cloud The PR is synced to the cloud repo st-discussion When the implementation aspects are not clear or when the PR is on hold due to questions.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

COLUMNS('feature') alias * ILIKE 'feature%' (compatibility with SnowFlake)

4 participants