Preserve parameter types in timeseries aggregate functions, add type annotations by vitlibar · Pull Request #104812 · ClickHouse/ClickHouse · GitHub
Skip to content

Preserve parameter types in timeseries aggregate functions, add type annotations#104812

Merged
nikitamikhaylov merged 5 commits into
masterfrom
revert-104672-revert-103428-resubmit-preserve-parameter-types-in-timeseries-agg-functions
May 22, 2026
Merged

Preserve parameter types in timeseries aggregate functions, add type annotations#104812
nikitamikhaylov merged 5 commits into
masterfrom
revert-104672-revert-103428-resubmit-preserve-parameter-types-in-timeseries-agg-functions

Conversation

@vitlibar

@vitlibar vitlibar commented May 13, 2026

Copy link
Copy Markdown
Member

Reverts #104672, with extra improvements.
This PR is an improved version of #103428

This PR adds:
1. Preserving parameter types in timeSeries* aggregate functions

We need to restore the initial invariant: For any aggregation function IAggregateFunction::getParameters() should return exactly the parameters used to create the aggregation function. Aggregation functions are not allowed to change or normalize these parameters.

Otherwise it would become a different DataTypeAggregateFunction (see DataTypeAggregateFunction::strictEquals),
and could fail to reattach a table because decodeDataType() would reconstruct a type different from the one
recorded in the column metadata; or fail on parallel replicas under serialize_query_plan=1 because the replica's
decodeDataType() would reconstruct a type different from the one the coordinator sent in the plan. And we actually saw a bunch of such issues already (#101407, #102345, #103079)

It's difficult to fix all these issues and it's better to just restore the invariant. By the way, for non-timeseries aggregate functions parameter types are always preserved.

2. Parameters are now printed with types in timeSeries* aggregate functions.

Another important change is that for the timeSeries* aggregate functions, the AggregateFunction(...) type name now prints parameters in a type-annotated form ('<value>'::TypeName) instead of a bare literal that could be ambiguous. For example this is how a data type is printed now

AggregateFunction(timeSeriesInstantRateToGrid('1734004810'::Decimal64(3), '1734004860'::Decimal64(3), 10, 60), Array(DateTime64(3, 'UTC')), Array(Float64))

and earlier it was printed this way:

AggregateFunction(timeSeriesInstantRateToGrid('1734004810', '1734004860', 10, 60), Array(DateTime64(3, 'UTC')), Array(Float64))

Bare literals were ambiguous because '1734004810' could mean a String or a Decimal with a different scale. Form '<value>'::TypeName allows to parse exactly the same parameters back.

Closes #104459

Version info

  • Merged into: 26.6.1.98

@clickhouse-gh

clickhouse-gh Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label May 13, 2026
/// and could fail to reattach a table because decodeDataType() would reconstruct a type different from the one
/// recorded in the column metadata; or fail on parallel replicas under serialize_query_plan=1 because the replica's
/// decodeDataType() would reconstruct a type different from the one the coordinator sent in the plan.)
/// NOTE: Check for function->getParameters().empty() is here because some aggregation functions (kolmogorovSmirnovTest, mannWhitneyUTest) drop their parameters completely.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new invariant check is intentionally bypassed for kolmogorovSmirnovTest/mannWhitneyUTest, but function->getParameters().empty() makes that bypass global.

If any other aggregate function accidentally starts dropping parameters entirely, this assertion will stay silent and we will miss exactly the regression this guard is meant to catch.

Can we narrow the exception to an explicit allowlist (for example by function name) instead of accepting every empty-parameter result?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still unresolved in the current code.

AggregateFunctionFactory::getImpl still accepts function->getParameters().empty() as a global bypass. That means any future parameterized aggregate that accidentally drops parameters will silently skip the invariant check.

Could we narrow this to a temporary explicit exception list (or remove the bypass once the known offenders are fixed) so new regressions are not masked?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still looks unresolved in the current code.

AggregateFunctionFactory::getImpl still allows a global bypass via function->getParameters().empty():

chassert(function && (function->getParameters().empty() || function->getParameters() == parameters), ...)

That makes the new invariant check silent for any parameterized aggregate that accidentally starts dropping parameters, not just the known temporary exceptions.

Can we narrow this to an explicit temporary exception list (or remove it once the remaining offenders are fixed) so the assertion keeps catching new regressions?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still looks unresolved on the current head.

In src/AggregateFunctions/AggregateFunctionFactory.cpp, the invariant check still accepts function->getParameters().empty():

chassert(function && (function->getParameters().empty() || function->getParameters() == parameters), ...)

Because that empty-parameters branch is global, any new parameterized aggregate that accidentally drops parameters will bypass the check and the regression will stay undetected.

Could we narrow this to explicit temporary exceptions only (or remove it once the remaining offenders are fixed)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #106245

Comment thread tests/queries/0_stateless/03254_timeseries_instant_value_aggregate_functions.sql Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR re-applies an earlier change (#103428 / #101083) that restores the invariant that IAggregateFunction::getParameters() must return exactly the parameters used to construct the function. Previously, the timeseries *ToGrid aggregate functions normalized their first four (and the predict-offset) parameters to Decimal64, which could cause DataTypeAggregateFunction mismatches between the column metadata and the reconstructed type — breaking table reattach and serialize_query_plan=1 parallel replicas. The PR threads the original parameters Array through to the base aggregate constructor and adds chasserts in AggregateFunctionFactory::getImpl to enforce the invariant for both direct functions and combinator-wrapped functions.

Changes:

  • Pass the original parameters Array through the timeseries aggregate factory and base/derived constructors instead of building a normalized Decimal64 array in the base.
  • Add chasserts in AggregateFunctionFactory::getImpl enforcing function->getParameters() == parameters, with an explicit hardcoded exception for kolmogorovSmirnovTest/mannWhitneyUTest (which still discard params; TODO comments added at their definitions).
  • Add a regression test (04109_…parameter_type_mismatch) showing that mismatched parameter spellings produce CANNOT_CONVERT_TYPE (matching non-timeseries behaviour), and stabilize two existing timeseries tests by disabling randomized parallel replicas.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/AggregateFunctions/AggregateFunctionFactory.cpp Adds invariant chasserts after creator/combinator dispatch; removes redundant AggregateFunctionNothing special case.
src/AggregateFunctions/IAggregateFunction.h Comment on getParameters() clarifying the new invariant.
src/AggregateFunctions/AggregateFunctionMannWhitney.cpp TODO noting parameters should be passed through (currently dropped).
src/AggregateFunctions/AggregateFunctionKolmogorovSmirnovTest.cpp Same TODO for KS test.
src/AggregateFunctions/TimeSeries/AggregateFunctionTimeseriesBase.h Base constructor now takes parameters_ and forwards it instead of building a normalized array.
src/AggregateFunctions/TimeSeries/AggregateFunctionTimeseriesLinearRegression.h Predict-variant constructor takes/forwards parameters_; clarifying comment.
src/AggregateFunctions/TimeSeries/AggregateFunctionTimeseriesHelpers.cpp Factory passes the original parameters to all four constructor call sites.
tests/queries/0_stateless/04109_timeseries_aggregate_function_parameter_type_mismatch.{sql,reference} New regression test for the restored invariant (CANNOT_CONVERT_TYPE on mismatched spelling).
tests/queries/0_stateless/03254_timeseries_instant_value_aggregate_functions.{sql,reference} Disable randomized PR; reference now reflects un-normalized parameter spellings (e.g. 1734004810.1234, integer step/window).
tests/queries/0_stateless/03254_timeseries_to_grid_aggregate_function_sparse.sql Disable randomized parallel replicas to keep type comparisons stable.

Comment thread src/AggregateFunctions/AggregateFunctionFactory.cpp Outdated
Comment thread src/AggregateFunctions/TimeSeries/AggregateFunctionTimeseriesLinearRegression.h Outdated
Comment thread src/AggregateFunctions/AggregateFunctionFactory.cpp Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated no new comments.

@vitlibar vitlibar force-pushed the revert-104672-revert-103428-resubmit-preserve-parameter-types-in-timeseries-agg-functions branch from 8bc3475 to f1f9bc6 Compare May 15, 2026 22:56
@vitlibar vitlibar requested a review from Copilot May 15, 2026 22:57

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated no new comments.

Comment thread src/AggregateFunctions/AggregateFunctionFactory.cpp Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated no new comments.

@vitlibar vitlibar force-pushed the revert-104672-revert-103428-resubmit-preserve-parameter-types-in-timeseries-agg-functions branch from 9ff74b5 to d6bf250 Compare May 16, 2026 16:35
@vitlibar vitlibar requested a review from Copilot May 16, 2026 16:35

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.

@alexey-milovidov

Copy link
Copy Markdown
Member

This was fixed by #105146. Let's update the branch.

@vitlibar vitlibar force-pushed the revert-104672-revert-103428-resubmit-preserve-parameter-types-in-timeseries-agg-functions branch from 6ad6eea to 44466ed Compare May 17, 2026 21:38
@PedroTadim

Copy link
Copy Markdown
Member

Hi! Please mention it closes #104459

@vitlibar vitlibar force-pushed the revert-104672-revert-103428-resubmit-preserve-parameter-types-in-timeseries-agg-functions branch from b963749 to 6e4db42 Compare May 18, 2026 19:04
@vitlibar vitlibar changed the title Resubmit #2: Preserve parameter types in timeseries aggregate functions Preserve parameter types in timeseries aggregate functions, add type annotations May 18, 2026
@vitlibar vitlibar requested a review from Copilot May 18, 2026 20:10

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.

Comment thread src/Common/FieldVisitorToCastedLiteral.cpp
Comment thread src/AggregateFunctions/AggregateFunctionFactory.cpp
Comment thread src/Parsers/parseFieldFromCastedLiteral.cpp
Comment thread src/DataTypes/DataTypeAggregateFunction.cpp
Comment thread src/Common/FieldVisitorToCastedLiteral.cpp
Comment thread src/Common/FieldVisitorToCastedLiteral.cpp
Comment thread src/Common/FieldVisitorToCastedLiteral.cpp
@vitlibar vitlibar marked this pull request as ready for review May 18, 2026 21:45
@vitlibar vitlibar added the 🍃 green ci 🌿 Fixing flaky tests in CI label May 18, 2026
@clickhouse-gh

clickhouse-gh Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.10% +0.00%
Functions 91.40% 91.40% +0.00%
Branches 76.60% 76.60% +0.00%

Changed lines: 52.50% (105/200) | lost baseline coverage: 9 line(s) · Uncovered code

Full report · Diff report

@vitlibar

Copy link
Copy Markdown
Member Author

@davenger Can you please take a look?

@vitlibar vitlibar requested a review from davenger May 20, 2026 12:55

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this break compatibility with the tables created by older version that have AggregateFunctionState columns?

@vitlibar vitlibar May 20, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old types are still completely parseable. There can be a problem if a new version writes a type like AggregateFunction(timeSeriesInstantRateToGrid('1734004810'::Decimal64(3), '1734004860'::Decimal64(3), 10, 60), Array(DateTime64(3, 'UTC')), Array(Float64)), and then an older version will try to read it (because older versions don't know how to parse parameters like '1734004810'::Decimal64(3). However timeSeries* aggregate functions are experimental, so I suppose that's OK

@nikitamikhaylov nikitamikhaylov self-assigned this May 22, 2026
@nikitamikhaylov nikitamikhaylov added this pull request to the merge queue May 22, 2026
Merged via the queue into master with commit 4a0750d May 22, 2026
170 checks passed
@nikitamikhaylov nikitamikhaylov deleted the revert-104672-revert-103428-resubmit-preserve-parameter-types-in-timeseries-agg-functions branch May 22, 2026 16:10
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label May 22, 2026
DavidHe-2008 pushed a commit to DavidHe-2008/ClickHouse that referenced this pull request Jun 1, 2026
…vert-103428-resubmit-preserve-parameter-types-in-timeseries-agg-functions

Preserve parameter types in timeseries aggregate functions, add type annotations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🍃 green ci 🌿 Fixing flaky tests in CI pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

7 participants