Recognize date_part as syntax sugar for EXTRACT by alexey-milovidov · Pull Request #105127 · ClickHouse/ClickHouse · GitHub
Skip to content

Recognize date_part as syntax sugar for EXTRACT#105127

Merged
alexey-milovidov merged 9 commits into
masterfrom
date-part-extract
May 25, 2026
Merged

Recognize date_part as syntax sugar for EXTRACT#105127
alexey-milovidov merged 9 commits into
masterfrom
date-part-extract

Conversation

@alexey-milovidov

@alexey-milovidov alexey-milovidov commented May 16, 2026

Copy link
Copy Markdown
Member

PostgreSQL's date_part('unit', expr) is documented as syntactic sugar for EXTRACT(unit FROM expr). Rather than introduce a new runtime function, this PR recognises date_part at parse time so it produces the same AST as EXTRACT and goes through the same per-unit dispatch (toYear, toMonth, toUnixTimestamp for epoch, toDayOfWeek for dow/isodow, etc.).

Supported units:

  • Standard interval kinds: year, quarter, month, week, day, hour, minute, second, millisecond, microsecond, nanosecond.
  • PostgreSQL-specific extra units: epoch, dow, doy, isodow, isoyear, century, decade, millennium.

The unit string is matched case-insensitively. datepart (no underscore) is accepted as well.

To avoid duplicating the per-unit dispatch logic that EXTRACT already uses, the ExtractUnit enum and the AST-building switch are extracted from ExtractLayer into file-local helpers shared with the new DatePartLayer.

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Added date_part('unit', expr) as syntactic sugar for EXTRACT(unit FROM expr). Standard interval kinds and the PostgreSQL extras (epoch, dow, doy, isodow, isoyear, century, decade, millennium) are all supported.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Note

Medium Risk
Changes SQL parsing for date_part/datepart and refactors EXTRACT AST construction, which could affect query parsing or error behavior for these function calls.

Overview
Adds PostgreSQL-compatible date_part('unit', expr) / datepart('unit', expr) parsing as syntactic sugar for EXTRACT(unit FROM expr), requiring the unit to be a constant string (case-insensitive) and supporting the same interval aliases plus extra units like epoch, dow, and isoyear.

Refactors EXTRACT parsing by moving the per-unit AST construction into shared helpers, and adds a stateless query test verifying date_part results, alias handling, and parser rejection of unknown units.

Reviewed by Cursor Bugbot for commit dd438e4. Bugbot is set up for automated code reviews on this repo. Configure here.

Version info

  • Merged into: 26.6.1.137

PostgreSQL's `date_part('unit', expr)` is documented as a syntactic
sugar form of `EXTRACT(unit FROM expr)`. Rather than introduce a new
runtime function, recognise `date_part` at parse time so it produces the
same AST as `EXTRACT` and goes through the same per-unit dispatch
(`toYear`, `toMonth`, `toUnixTimestamp` for `epoch`, `toDayOfWeek` for
`dow`/`isodow`, etc.).

Standard interval units (year/month/week/day/hour/minute/second/etc.)
and the PostgreSQL-specific extra units (epoch, dow, doy, isodow,
isoyear, century, decade, millennium) are all supported. The unit
string is matched case-insensitively. `datepart` (no underscore) is
accepted too.

To avoid duplicating the per-unit dispatch logic that `EXTRACT` already
uses, the `ExtractUnit` enum and the AST-building switch are extracted
from `ExtractLayer` into file-local helpers shared with the new
`DatePartLayer`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@clickhouse-gh

clickhouse-gh Bot commented May 16, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-improvement Pull request with some product improvements label May 16, 2026
Comment thread src/Parsers/ExpressionListParsers.cpp Outdated
alexey-milovidov added a commit that referenced this pull request May 16, 2026
Recent compatibility PRs added case-insensitive aliases and parser sugar
that make several of the SQLStorm rewrites unnecessary:

  - `STDDEV`            -> `stddevPop`            (#105120)
  - `array_to_string`   -> `arrayStringConcat`    (#105121)
  - `REGEXP_SUBSTR`     -> `regexpExtract`        (#105122)
  - `CARDINALITY`       -> `length`               (#105123)
  - `unnest()` function -> `arrayJoin()`          (#105124)
  - `STRING_AGG`        -> `groupConcat`          (#105125)
  - `date_part(unit,e)` -> `EXTRACT(unit FROM e)` (#105127)
  - `expr OP ANY/ALL(array_literal)`              (#105129)

`ARRAY_AGG`, `TRANSLATE`, and `EXTRACT(EPOCH|DOW|... FROM ...)` were
already supported by ClickHouse before these PRs.

Removed the corresponding rewrite calls and helper functions
(`rewrite_string_agg`, `rewrite_array_agg`, `rewrite_date_part`,
`rewrite_stddev`, `rewrite_extract_epoch`, the EXTRACT(DOW) inline
rewrite, `rewrite_any_comparison`, and the trailing
`unnest(...) -> arrayJoin(...)` substitution). Also dropped the
unreferenced no-op helpers (`rewrite_extract_unit`, `rewrite_fetch_offset`,
`rewrite_interval`, `rewrite_cast_timestamp`, `rewrite_current_timestamp`,
`rewrite_bool_literals`, `rewrite_ilike`, `rewrite_no_supertype`).

The PostgreSQL `LATERAL` / `CROSS JOIN UNNEST(...)` table-source forms,
`arrayJoin(...)` in JOIN position, PG-specific casts, `AT TIME ZONE`,
`STRING_AGGDistinct` (a mangled-name artifact), and the still-unsupported
function rewrites (`string_to_array`, `regexp_split_to_array`, `RANDOM`,
`TO_TIMESTAMP`, `ARRAY_LENGTH`, `SPLIT_PART`, `age`) are still rewritten.

Net change: -329 lines from rewrite_queries.py and -75 lines from the
tests (the `TestRewriteAnyComparison` class is removed since the rewrite
it covered no longer exists).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address review feedback: `date_part('unit', expr)` is documented as syntactic
sugar for `EXTRACT(unit FROM expr)`, but the unit string was only matched
against the canonical singular names accepted by `IntervalKind::tryParseString`
(`year`, `month`, etc.). `EXTRACT` itself, parsed via `parseIntervalKind`,
accepts plurals (`years`), short forms (`yy`, `yyyy`, `mm`, `hh`, `ms`, `ns`,
`s`, `n`, …) and the `SQL_TSI_*` family. This made equivalent forms diverge —
`EXTRACT(YEARS FROM ts)` worked but `date_part('years', ts)` did not.

Mirror the same alias set for `date_part` via a string-based helper that
extends `IntervalKind::tryParseString` with the additional names recognised
by `parseIntervalKind`, kept in sync via a comment.

Extend `04241_date_part_as_extract` with coverage for the alias forms and an
unknown-unit `SYNTAX_ERROR` check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

This was fixed by #105146. Let's update the branch.

alexey-milovidov and others added 7 commits May 17, 2026 20:51
The unknown-unit case raises `SYNTAX_ERROR` from the parser (`DatePartLayer`
returns `false` on an unrecognised unit, which causes the function-call
parser to fail). The Fast test runs queries through `clickhouse-local`,
where parsing happens client-side, so this should be tagged as
`clientError SYNTAX_ERROR` rather than `serverError SYNTAX_ERROR`.

The previous tag caused `ClientBase` to rethrow the parse failure (the
`hint.hasServerErrors() && !hint.hasClientErrors()` branch in
`ClientBase.cpp`), making the test fail in CI.

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=105127&sha=a3e678712ce9e4d8857e0012d8df5ba8a08b7dfa&name_0=PR&name_1=Fast%20test

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The local gh token lacks 'workflow' scope, so this file (added in
master after this PR was last refreshed) is dropped from the PR
branch to allow the master-merge to be pushed. It will return
automatically when the PR is merged into master.
The prior commit deleted/modified workflow files in this branch as a
workaround for GitHub's workflow-scope check. Restoring them to match
`master` so the PR diff no longer contains unrelated changes.
@clickhouse-gh

clickhouse-gh Bot commented May 23, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.10% +0.00%
Functions 91.40% 91.40% +0.00%
Branches 76.50% 76.50% +0.00%

Changed lines: 92.98% (106/114) | lost baseline coverage: 3 line(s) · Uncovered code

Full report · Diff report

@alexey-milovidov alexey-milovidov self-assigned this May 25, 2026

@alexey-milovidov alexey-milovidov left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

@alexey-milovidov alexey-milovidov added this pull request to the merge queue May 25, 2026
Merged via the queue into master with commit 00e8df7 May 25, 2026
328 of 329 checks passed
@alexey-milovidov alexey-milovidov deleted the date-part-extract branch May 25, 2026 19:16
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-synced-to-cloud The PR is synced to the cloud repo label May 25, 2026
DavidHe-2008 pushed a commit to DavidHe-2008/ClickHouse that referenced this pull request Jun 1, 2026
Recognize date_part as syntax sugar for EXTRACT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants