Preserve parameter types in timeseries aggregate functions, add type annotations#104812
Conversation
…egate functions""
| /// and could fail to reattach a table because decodeDataType() would reconstruct a type different from the one | ||
| /// recorded in the column metadata; or fail on parallel replicas under serialize_query_plan=1 because the replica's | ||
| /// decodeDataType() would reconstruct a type different from the one the coordinator sent in the plan.) | ||
| /// NOTE: Check for function->getParameters().empty() is here because some aggregation functions (kolmogorovSmirnovTest, mannWhitneyUTest) drop their parameters completely. |
There was a problem hiding this comment.
The new invariant check is intentionally bypassed for kolmogorovSmirnovTest/mannWhitneyUTest, but function->getParameters().empty() makes that bypass global.
If any other aggregate function accidentally starts dropping parameters entirely, this assertion will stay silent and we will miss exactly the regression this guard is meant to catch.
Can we narrow the exception to an explicit allowlist (for example by function name) instead of accepting every empty-parameter result?
There was a problem hiding this comment.
This is still unresolved in the current code.
AggregateFunctionFactory::getImpl still accepts function->getParameters().empty() as a global bypass. That means any future parameterized aggregate that accidentally drops parameters will silently skip the invariant check.
Could we narrow this to a temporary explicit exception list (or remove the bypass once the known offenders are fixed) so new regressions are not masked?
There was a problem hiding this comment.
This still looks unresolved in the current code.
AggregateFunctionFactory::getImpl still allows a global bypass via function->getParameters().empty():
chassert(function && (function->getParameters().empty() || function->getParameters() == parameters), ...)
That makes the new invariant check silent for any parameterized aggregate that accidentally starts dropping parameters, not just the known temporary exceptions.
Can we narrow this to an explicit temporary exception list (or remove it once the remaining offenders are fixed) so the assertion keeps catching new regressions?
There was a problem hiding this comment.
This still looks unresolved on the current head.
In src/AggregateFunctions/AggregateFunctionFactory.cpp, the invariant check still accepts function->getParameters().empty():
chassert(function && (function->getParameters().empty() || function->getParameters() == parameters), ...)
Because that empty-parameters branch is global, any new parameterized aggregate that accidentally drops parameters will bypass the check and the regression will stay undetected.
Could we narrow this to explicit temporary exceptions only (or remove it once the remaining offenders are fixed)?
There was a problem hiding this comment.
Pull request overview
This PR re-applies an earlier change (#103428 / #101083) that restores the invariant that IAggregateFunction::getParameters() must return exactly the parameters used to construct the function. Previously, the timeseries *ToGrid aggregate functions normalized their first four (and the predict-offset) parameters to Decimal64, which could cause DataTypeAggregateFunction mismatches between the column metadata and the reconstructed type — breaking table reattach and serialize_query_plan=1 parallel replicas. The PR threads the original parameters Array through to the base aggregate constructor and adds chasserts in AggregateFunctionFactory::getImpl to enforce the invariant for both direct functions and combinator-wrapped functions.
Changes:
- Pass the original
parametersArraythrough the timeseries aggregate factory and base/derived constructors instead of building a normalizedDecimal64array in the base. - Add chasserts in
AggregateFunctionFactory::getImplenforcingfunction->getParameters() == parameters, with an explicit hardcoded exception forkolmogorovSmirnovTest/mannWhitneyUTest(which still discard params; TODO comments added at their definitions). - Add a regression test (
04109_…parameter_type_mismatch) showing that mismatched parameter spellings produceCANNOT_CONVERT_TYPE(matching non-timeseries behaviour), and stabilize two existing timeseries tests by disabling randomized parallel replicas.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/AggregateFunctions/AggregateFunctionFactory.cpp | Adds invariant chasserts after creator/combinator dispatch; removes redundant AggregateFunctionNothing special case. |
| src/AggregateFunctions/IAggregateFunction.h | Comment on getParameters() clarifying the new invariant. |
| src/AggregateFunctions/AggregateFunctionMannWhitney.cpp | TODO noting parameters should be passed through (currently dropped). |
| src/AggregateFunctions/AggregateFunctionKolmogorovSmirnovTest.cpp | Same TODO for KS test. |
| src/AggregateFunctions/TimeSeries/AggregateFunctionTimeseriesBase.h | Base constructor now takes parameters_ and forwards it instead of building a normalized array. |
| src/AggregateFunctions/TimeSeries/AggregateFunctionTimeseriesLinearRegression.h | Predict-variant constructor takes/forwards parameters_; clarifying comment. |
| src/AggregateFunctions/TimeSeries/AggregateFunctionTimeseriesHelpers.cpp | Factory passes the original parameters to all four constructor call sites. |
| tests/queries/0_stateless/04109_timeseries_aggregate_function_parameter_type_mismatch.{sql,reference} | New regression test for the restored invariant (CANNOT_CONVERT_TYPE on mismatched spelling). |
| tests/queries/0_stateless/03254_timeseries_instant_value_aggregate_functions.{sql,reference} | Disable randomized PR; reference now reflects un-normalized parameter spellings (e.g. 1734004810.1234, integer step/window). |
| tests/queries/0_stateless/03254_timeseries_to_grid_aggregate_function_sparse.sql | Disable randomized parallel replicas to keep type comparisons stable. |
8bc3475 to
f1f9bc6
Compare
9ff74b5 to
d6bf250
Compare
|
This was fixed by #105146. Let's update the branch. |
6ad6eea to
44466ed
Compare
|
Hi! Please mention it closes #104459 |
1967d6c to
b963749
Compare
b963749 to
6e4db42
Compare
LLVM Coverage Report
Changed lines: 52.50% (105/200) | lost baseline coverage: 9 line(s) · Uncovered code |
|
@davenger Can you please take a look? |
There was a problem hiding this comment.
Won't this break compatibility with the tables created by older version that have AggregateFunctionState columns?
There was a problem hiding this comment.
Old types are still completely parseable. There can be a problem if a new version writes a type like AggregateFunction(timeSeriesInstantRateToGrid('1734004810'::Decimal64(3), '1734004860'::Decimal64(3), 10, 60), Array(DateTime64(3, 'UTC')), Array(Float64)), and then an older version will try to read it (because older versions don't know how to parse parameters like '1734004810'::Decimal64(3). However timeSeries* aggregate functions are experimental, so I suppose that's OK
…vert-103428-resubmit-preserve-parameter-types-in-timeseries-agg-functions Preserve parameter types in timeseries aggregate functions, add type annotations

Reverts #104672, with extra improvements.
This PR is an improved version of #103428
This PR adds:
1. Preserving parameter types in
timeSeries*aggregate functionsWe need to restore the initial invariant: For any aggregation function
IAggregateFunction::getParameters()should return exactly the parameters used to create the aggregation function. Aggregation functions are not allowed to change or normalize these parameters.Otherwise it would become a different
DataTypeAggregateFunction(seeDataTypeAggregateFunction::strictEquals),and could fail to reattach a table because
decodeDataType()would reconstruct a type different from the onerecorded in the column metadata; or fail on parallel replicas under serialize_query_plan=1 because the replica's
decodeDataType()would reconstruct a type different from the one the coordinator sent in the plan. And we actually saw a bunch of such issues already (#101407, #102345, #103079)It's difficult to fix all these issues and it's better to just restore the invariant. By the way, for non-timeseries aggregate functions parameter types are always preserved.
2. Parameters are now printed with types in
timeSeries*aggregate functions.Another important change is that for the
timeSeries*aggregate functions, the AggregateFunction(...) type name now prints parameters in a type-annotated form ('<value>'::TypeName) instead of a bare literal that could be ambiguous. For example this is how a data type is printed nowand earlier it was printed this way:
Bare literals were ambiguous because '1734004810' could mean a String or a Decimal with a different scale. Form
'<value>'::TypeNameallows to parse exactly the same parameters back.Closes #104459
Version info
26.6.1.98