[WIP] perf(ts-engine): Add forwards-compatible metric_locality_id. by JTCunning · Pull Request #102719 · ClickHouse/ClickHouse · GitHub
Skip to content

[WIP] perf(ts-engine): Add forwards-compatible metric_locality_id.#102719

Draft
JTCunning wants to merge 15 commits into
ClickHouse:masterfrom
JTCunning:timeseries-metric-locality-id
Draft

[WIP] perf(ts-engine): Add forwards-compatible metric_locality_id.#102719
JTCunning wants to merge 15 commits into
ClickHouse:masterfrom
JTCunning:timeseries-metric-locality-id

Conversation

@JTCunning

@JTCunning JTCunning commented Apr 14, 2026

Copy link
Copy Markdown
Contributor

Adding to CI to see how this does. I believe I covered every case but I need more.

Closes #100906.

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Add metric_locality_id to the TimeSeries table engine to increase metric density in range scans.

@CLAassistant

CLAassistant commented Apr 14, 2026

Copy link
Copy Markdown

@JTCunning JTCunning added the experimental feature Bug in the feature that should not be used in production label Apr 14, 2026
@clickhouse-gh

clickhouse-gh Bot commented Apr 14, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [102e465]

Summary:

job_name test_name status info comment
Style check failure
cpp failure cidb
Build (arm_tidy) failure
Build ClickHouse failure cidb
Fast test (arm_darwin) dropped
Build (amd_debug) dropped
Build (amd_asan_ubsan) dropped
Build (amd_tsan) dropped
Build (amd_msan) dropped
Build (amd_binary) dropped
Build (arm_debug) dropped
Build (arm_asan_ubsan) dropped

AI Review

Summary

This PR introduces metric_locality_id support for TimeSeries DATA layout, wires locality-aware selector filtering, and hardens startup/runtime handling of timeSeriesMetricLocalityId SQL UDF conflicts. I did not find new high-confidence correctness, safety, concurrency, or performance blockers at the current PR head beyond already-discussed inline bot items.

Missing context
  • ⚠️ I did not have full Praktika test logs for the latest commit; only high-level check status was available (Style check is currently failing in reported checks).
ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility ⚠️ Legacy/external DATA fallback paths are implemented, but dedicated integration coverage for selector behavior on legacy DATA schemas remains limited.
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
No large/binary files
Tests
  • ⚠️ Add a dedicated integration test fixture for TimeSeries with external/legacy DATA table without metric_locality_id, then validate timeSeriesSelector behavior for both a literal __name__ matcher and a non-literal matcher (=~) to lock in the synthetic/zero-fill fallback semantics.
Final Verdict
  • Status: ⚠️ Request changes
  • Minimum required actions:
    • Add focused integration coverage for the legacy/external DATA path in timeSeriesSelector (no physical metric_locality_id), including both literal and non-literal metric-name selector cases.

@JTCunning JTCunning force-pushed the timeseries-metric-locality-id branch 2 times, most recently from b6f42e9 to d3c5262 Compare April 14, 2026 22:53
- Add UInt32 metric_locality_id on inner DATA MergeTree before id; ORDER BY
  (metric_locality_id, id, timestamp). Shared locality helper and
  timeSeriesMetricLocalityId via built-in SQL UDF (registered after persisted
  UDFs in Server and LocalServer); remove native function from stress list.
- Prometheus remote write fills locality when the physical DATA table has the
  column; omit it when an external DATA table has only id/timestamp/value.
- StorageTimeSeriesSelector: tuple IN vs id IN + synthetic locality from
  literal __name__ when the data target has no metric_locality_id column;
  optional locality validation for external DATA targets.
- Integration: test_timeseries_metric_locality, prometheus external tables,
  remote_write_label_demographics (PRW v1 label ordering).

Made-with: Cursor
@JTCunning JTCunning force-pushed the timeseries-metric-locality-id branch from d3c5262 to 0734baa Compare April 14, 2026 22:59
@JTCunning JTCunning added the comp-promql PromQL / time-series subsystem: TimeSeries storage engine, PromQL parser, PromQL-to-SQL converter... label Apr 14, 2026
@JTCunning JTCunning changed the title [WIP] feat(time-series): Add forwards-compatible metric_locality_id. [WIP] feat(ts-engine): Add forwards-compatible metric_locality_id. Apr 15, 2026
@JTCunning JTCunning changed the title [WIP] feat(ts-engine): Add forwards-compatible metric_locality_id. [WIP] perf(ts-engine): Add forwards-compatible metric_locality_id. Apr 15, 2026
@clickhouse-gh clickhouse-gh Bot added the pr-performance Pull request with some performance improvements label Apr 15, 2026
Comment thread src/Interpreters/registerBuiltinSQLUserDefinedFunctions.cpp Outdated
Comment thread src/Storages/StorageTimeSeriesSelector.cpp Outdated
Comment thread tests/integration/test_timeseries_metric_locality/test.py Outdated
…t DATA locality

- Register timeSeriesMetricLocalityId as a built-in function (same semantics as
  sipHash64→UInt32) so CREATE FUNCTION cannot override it (addresses review).
- Remove SQL UDF registration stub from registerBuiltinSQLUserDefinedFunctions.
- Add allow_time_series_selector_regex_without_metric_locality_id (default true):
  when DATA has no metric_locality_id, allow regex __name__ matchers and emit
  zero metric_locality_id; set false to require literal __name__ (addresses review).
- Extend gtest_functions_stress with the new function name.

Made-with: Cursor
Remove allow_time_series_selector_regex_without_metric_locality_id. When the DATA target has no metric_locality_id column, always allow non-literal __name__ matchers (id IN from tags; zero metric_locality_id when no literal).

Made-with: Cursor
Remove native REGISTER_FUNCTION implementation; register via
registerBuiltinSQLUserDefinedFunctions after persisted UDFs load.

Made-with: Cursor
If the function already exists (e.g. from disk), require normalized function_core to match x -> toUInt32(sipHash64(x)); otherwise throw BAD_ARGUMENTS.

Made-with: Cursor
Comment thread src/TableFunctions/TableFunctionTimeSeriesSelector.cpp Outdated
…est doc

- Declare metric_locality_id using config.metric_locality_id_data_type (matches readImpl casts).
- Reword integration test module doc for legacy DATA without physical locality column.

Made-with: Cursor
Comment thread src/Storages/TimeSeries/PrometheusRemoteWriteProtocol.cpp Outdated
… write

validateColumnForMetricLocalityId allows Nullable(UInt32); createColumn returns ColumnNullable. Store IColumn* and insert via ColumnUInt32 or ColumnNullable instead of typeid_cast to ColumnUInt32 only.

Made-with: Cursor
Startup registerBuiltinSQLUserDefinedFunctions only registers timeSeriesMetricLocalityId if the name is free (no validation of an existing UDF).

Add ensureTimeSeriesMetricLocalityIdUserDefinedFunction for full validate-or-register; call it from StorageTimeSeriesSelector::getConfiguration so conflicts fail when TimeSeries is actually used, not at boot.

Made-with: Cursor
Comment thread src/Storages/StorageTimeSeriesSelector.cpp Outdated
- Remove duplicate Parsers/ASTLiteral.h include (readability-duplicate-include / arm_tidy).
- Replace EXPLAIN-based PK assertion with system.tables.sorting_key: planner may use timestamp prefix for range scans while ORDER BY remains (metric_locality_id, id, timestamp).

Made-with: Cursor
Align with blocker review: enforce exact canonical normalized function_core AST for an existing definition or fail (BAD_ARGUMENTS).

registerBuiltinSQLUserDefinedFunctions now uses the same register-or-validate path as ensureTimeSeriesMetricLocalityIdUserDefinedFunction (kept for runtime CREATE OR REPLACE coverage).

Other listed blockers were already addressed: legacy selector id-IN fallback, Nullable locality in remote write, table-function metric_locality_id type, duplicate include.

Made-with: Cursor
@clickhouse-gh

clickhouse-gh Bot commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.00% 84.00% +0.00%
Functions 90.90% 90.90% +0.00%
Branches 76.50% 76.50% +0.00%

Changed lines: 87.45% (223/255) | lost baseline coverage: 3 line(s) · Uncovered code

Full report · Diff report

…unction

- Clarify integration tests need a branch-built binary (metric_locality_id on DATA).
- Replace (metric_locality_id, id) IN subquery: analyzer cannot resolve metric_locality_id inside tuple over timeSeriesData(); use id IN (...) AND d.metric_locality_id = sipHash literal for single-metric equivalence.
- Assert stored locality vs sipHash via join without requiring SQL UDF registration in the server.

Made-with: Cursor
@@ -2896,6 +2897,7 @@ try
database_catalog.assertDatabaseExists(default_database);
/// Load user-defined SQL functions.
global_context->getUserDefinedSQLObjectsStorage().loadObjects();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unconditional startup call can make upgrades fail for users who already have a persisted SQL UDF named timeSeriesMetricLocalityId with different semantics, even if they never use the TimeSeries engine.

That is a backward-incompatible validation on existing metadata at boot time. To keep rollout safe, please either gate this check behind a compatibility setting or limit strict validation to TimeSeries usage paths (where ensureTimeSeriesMetricLocalityIdUserDefinedFunction is already called).

PR feedback (AI review): unconditional startup validation of timeSeriesMetricLocalityId can break upgrades when a persisted UDF of the same name exists for non-TimeSeries use.

- registerBuiltinSQLUserDefinedFunctions: only register canonical UDF if absent (same as before strict startup check).
- ensureTimeSeriesMetricLocalityIdUserDefinedFunction: keep full validate-or-register for TimeSeries paths (timeSeriesSelector).

Integration test_timeseries_metric_locality requires a branch-built clickhouse binary (metric_locality_id on DATA).

Made-with: Cursor

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

registerTimeSeriesMetricLocalityIdIfAbsent uses replace_if_exists = true after checking only SQL UDF storage via has(...). If a user already has a WASM function named timeSeriesMetricLocalityId, startup will silently drop it in UserDefinedSQLFunctionFactory::registerFunction (dropIfExists) and replace it with this SQL UDF.

That is a backward-incompatible startup side effect unrelated to TimeSeries usage and can destroy existing user-defined function behavior. Please avoid replacement here (e.g. use replace_if_exists = false and skip registration when any function kind with this name exists), and keep strict validation only on TimeSeries execution paths.

…t startup

Startup registration now skips if a WebAssembly or executable UDF already owns
the name, uses replace_if_exists=false, and TimeSeries paths throw BAD_ARGUMENTS
when the SQL UDF is missing but a non-SQL UDF blocks registration.

Adds an integration test that reloads the server with a same-name WASM function
and asserts it is not replaced by the builtin SQL UDF (WasmEdge engine in config
to match typical builds).

Made-with: Cursor
@vitlibar vitlibar self-assigned this May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp-promql PromQL / time-series subsystem: TimeSeries storage engine, PromQL parser, PromQL-to-SQL converter... experimental feature Bug in the feature that should not be used in production pr-performance Pull request with some performance improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(ts-engine): establish structured series IDs for increased density in primary key

3 participants