iframe-proxy

BadLiveware · 2026-05-09T18:43:22Z

Adds native lowering for the PromQL count_values aggregation operator over ClickHouse TimeSeries data.

The lowering formats sample values as Prometheus label strings, adds the generated value label into the output grouping, and packs sparse per-step counts back into the vector-grid shape used by the PromQL endpoint.

Split out from the integration draft #104271; shared base #104484.

Changelog category (leave one):

Experimental Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Added native PromQL-to-SQL lowering for count_values over ClickHouse TimeSeries data.

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

Add the common PromQL compliance reporting and developer notes used by the follow-up TimeSeries PromQL lowering slices. Keep the parser compliance corpus expectation explicit for selectors Prometheus rejects.

Add focused Prometheus protocol tests for user-facing native lowering errors: unsupported-but-parseable functions and invalid date-function arity. These are semantic guardrails for clear errors, not broad coverage padding.

Implement native SQL lowering for PromQL count_values(label, vector) using the existing TimeSeries group/tag representation. The lowering formats sample values as Prometheus label strings, overwrites the destination label before aggregation, handles by/without grouping, and packs per-step counts back onto the vector grid. Also adds focused differential coverage for value formatting, destination-label overwrite, grouping variants, sparse subquery results, and empty inputs, plus compliance cases for count_values variants. Validation: git diff --check; Python py_compile for updated PromQL tests; Dockerized Build (amd_debug) passed; focused test_aggregation_operators passed; test_promql_compliance passed.

Avoid fork-local wording in the PromQL review base. These are ClickHouse PromQL regression cases and evidence requirements, not fork-specific artifacts.

Remove the standalone native histogram discovery Markdown note from the PromQL split branches.

Remove downstream promshim-specific evidence and non-goal language from the native PromQL lowering README so the note stays focused on ClickHouse review guidance.

Rewrite the native PromQL lowering notes around external SQL-transpiler measurements. Document measured and mixed SQL patterns with target PromQL shapes, pseudocode, and the ClickHouse-side validation expected for in-tree changes.

Replace external corpus row names with concrete PromQL expressions and query-range context in the native PromQL lowering notes.

clickhouse-gh · 2026-05-10T11:01:36Z

Match Prometheus parser behavior by requiring vector selectors to contain at least one matcher that cannot match an empty label value. This prevents broad selectors like `{job=~".*"}` or `{__name__=~".*"}` from silently selecting every metric when the user likely intended an explicit non-empty constraint. Users who intentionally want all real metrics can still use `{__name__=~".+"}`, and empty-matching label matchers remain valid when combined with that explicit metric-name matcher.

Deduplicate expanded compliance cases by query text before checking the `should_fail` expectation. A query listed once as expected success and once as expected failure is now reported as conflicting metadata instead of slipping through as two separate cases.

…plit/promql-count-values

Mirror the upstream PromQL compliance corpus for label transformation invalid label-name cases. These cases should remain should-fail entries so ClickHouse acceptance of invalid destination labels stays visible as a rejection mismatch rather than being counted as expected coverage.

`count_values` creates an output label from its string argument. Keep that argument constrained to PromQL label-name syntax instead of accepting every non-empty UTF-8 string, so invalid names such as `~invalid` are rejected before rendering a label key that selectors cannot address.

…on-base

Do not stop selector validation after finding the first matcher that cannot match an empty label value. Later regex matchers still need syntax validation so equivalent selector reorderings cannot change parse validity. Add parser regressions for invalid regex matchers after an implicit metric-name matcher and after explicit non-empty matchers.

…on-base

…plit/promql-count-values

Add parser and HTTP API coverage for Prometheus selector validation rules: negated non-empty matchers, invalid regex validation after non-empty matchers, and duplicate metric-name matchers. Rejecting duplicate outside/in-brace metric names keeps ClickHouse parsing aligned with Prometheus expression parsing.

Use Prometheus UTF-8 label-name validation for the generated value label, so current Prometheus-valid names such as ~value are accepted while empty labels still fail. Add focused and compliance coverage for generated-label insertion before grouping, by/without behavior, label overwrite, FormatFloat edge cases including -0, empty inputs, and scalar-grid range evaluation. Validation: - ninja -C build_debug clickhouse - test_evaluation.py::test_aggregation_operators - test_compliance.py::test_promql_compliance

Ignore string literals and selector matchers when routing compliance failures by query shape. This keeps report categories from treating label values or exponent signs as binary operators, and recognizes bare '<' and '>' comparisons.

Use the existing eps parameter when checking live Prometheus HTTP results against expected JSON, and compare ClickHouse HTTP output against the live Prometheus result. This keeps numeric drift bounded to eps while preserving exact shape, label, and timestamp checks.

`clickhouse_fuzzer` runs the full `clickhouse-local` binary (see `programs/local/fuzzers/clickhouse_fuzzer.cpp`) which spins up a complete clickhouse-server-grade runtime: JeMalloc arenas, OpenSSL initializer, static initializers, and full query execution. Baseline RSS is close to the libFuzzer default `rss_limit_mb=2048`, so fuzzed inputs that trigger even moderate query-time allocations push the process past the limit. CIDB shows 7 unrelated PRs in 7 days all hitting OOM at 2049-2056 Mb — just barely over the 2048 Mb default. Failure mode is uniform: libFuzzer out-of-memory (used: 2049-2056Mb; limit: 2048Mb) oom-<hash> Bumping to 4096 Mb follows existing precedent for fuzzers with similar workloads: - `execute_query_fuzzer.options`: rss_limit_mb = 4096 (also runs query execution; same author bumped this in commit bafb7d7) - `data_type_deserialization_fuzzer.options`: rss_limit_mb = 4096 - `create_parser_fuzzer.options`: rss_limit_mb = 6144 The limit still bounds the target (it does not disable the check via `rss_limit_mb=0`), so genuine memory regressions above 4 GB will still be caught. CIDB evidence (last 30 days, all 10 hits — 0 on master because libFuzzer does not run on master regularly): PR ClickHouse#99740 (4 hits, Apr 19-21) PR ClickHouse#104231, ClickHouse#104849, ClickHouse#104956, ClickHouse#96844, ClickHouse#104510, ClickHouse#104492 (1 hit each, May 13-15)

…on-base

Matcher validation intentionally turns malformed label regexes into PromQL parse errors. Construct RE2 with log_errors disabled so rejected selectors do not also emit internal RE2 error log lines on every parse attempt.

Prometheus excludes __name__ from without-grouping keys even when count_values writes its generated sample-value label to __name__. Cover that behavior explicitly so the ClickHouse SQL lowering keeps collapsing buckets by the remaining labels instead of preserving the generated metric name as a grouping key.

…-base

…values

Keep the PromQL parser, modifier, and selector fixes from the local matrix pass, but move the high-value regression coverage into the existing protocol and compliance suites instead of adding the standalone matrix harness to the PR branch.\n\nThis keeps fixed-time range evaluation covered through the existing evaluation tests and records the parser, matcher, timestamp-modifier, and sparse range cases in the shared compliance corpus.

Build (arm_tidy) flagged the vector-grid finalization predicate because one call cloned an AST pointer while another argument moved the same pointer. C++ argument evaluation order can make that a use-after-move.\n\nCreate separate AST nodes for the null and stale-marker checks so the generated predicate stays the same without relying on argument evaluation order.

Parse `@ start()` and `@ end()` as timestamp modifier variants instead of rewriting the raw query text before parsing. This keeps label values and regular expressions untouched while resolving symbolic bounds in the query range planner. Propagate symbolic `@` modifiers through the PromQL tree and SQL lowering so instant selectors and range-vector functions bind to the query start or end consistently with Prometheus.

The selector-matcher validation branch should not also introduce symbolic `@ start()` and `@ end()` modifier parsing. Remove that parser/evaluation feature from the base branch so the behavior can be reviewed independently. Move the symbolic timestamp-modifier compliance cases out with the feature. Keep the offset-only case here because it exercises existing modifier behavior.

Symbolic `@ start()` and `@ end()` modifiers need parser-level handling so selectors, string literals, and regular expressions are not rewritten as raw text. Remove the temporary text-substitution path from the base branch; the symbolic modifier implementation belongs with the dedicated parser change.

…-base

clickhouse-gh · 2026-05-30T09:23:41Z

+            timeSeriesScalarToAST(0, context.scalar_data_type))));
+        builder.select_list.back()->setAlias(value_label_column);
+
+        builder.where = makeASTFunction(


count_values needs to drop Prometheus stale markers before formatting sample values. Instant selectors are lowered with stale-marker filtering disabled, so a series whose latest sample is the stale marker reaches this isNotNull check as a non-null NaN; timeSeriesPrometheusFormatFloat then emits NaN and this path returns a {value="NaN"} bucket. Prometheus treats that instant vector as empty, and finalizeSQL.cpp already has the stale-marker predicate for plain instant-vector output, so this diverges for stale series. Please add the same reinterpretAsUInt64(...) != 0x7ff0000000000002 predicate before Step 3 and cover count_values("value", stale_series) with a stale-marker sample.

clickhouse-gh · 2026-05-30T13:01:27Z

LLVM Coverage Report

Changed lines: Changed C/C++ lines covered by tests: 413/457 (90.37%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 2 line(s) · Uncovered code

Full report · Diff report

BadLiveware added 13 commits May 9, 2026 13:52

test: add PromQL lowering review base

68c3317

Add the common PromQL compliance reporting and developer notes used by the follow-up TimeSeries PromQL lowering slices. Keep the parser compliance corpus expectation explicit for selectors Prometheus rejects.

test: cover native PromQL error paths

527db13

Add focused Prometheus protocol tests for user-facing native lowering errors: unsupported-but-parseable functions and invalid date-function arity. These are semantic guardrails for clear errors, not broad coverage padding.

docs: clarify PromQL regression corpus wording

15ee944

Avoid fork-local wording in the PromQL review base. These are ClickHouse PromQL regression cases and evidence requirements, not fork-specific artifacts.

docs: clarify PromQL regression corpus wording

f3d9fb5

Avoid fork-local wording in the PromQL review base. These are ClickHouse PromQL regression cases and evidence requirements, not fork-specific artifacts.

docs: remove native histogram discovery note

adc7887

Remove the standalone native histogram discovery Markdown note from the PromQL split branches.

docs: remove native histogram discovery note

24e7ebc

Remove the standalone native histogram discovery Markdown note from the PromQL split branches.

docs: trim PromQL lowering notes

e38f7eb

Remove downstream promshim-specific evidence and non-goal language from the native PromQL lowering README so the note stays focused on ClickHouse review guidance.

docs: trim PromQL lowering notes

c84dddf

Remove downstream promshim-specific evidence and non-goal language from the native PromQL lowering README so the note stays focused on ClickHouse review guidance.

docs: record PromQL lowering optimization evidence

fb83187

Rewrite the native PromQL lowering notes around external SQL-transpiler measurements. Document measured and mixed SQL patterns with target PromQL shapes, pseudocode, and the ClickHouse-side validation expected for in-tree changes.

docs: record PromQL lowering optimization evidence

8050260

Rewrite the native PromQL lowering notes around external SQL-transpiler measurements. Document measured and mixed SQL patterns with target PromQL shapes, pseudocode, and the ClickHouse-side validation expected for in-tree changes.

docs: use PromQL examples in optimization notes

593f4de

Replace external corpus row names with concrete PromQL expressions and query-range context in the native PromQL lowering notes.

docs: use PromQL examples in optimization notes

06bf50f

Replace external corpus row names with concrete PromQL expressions and query-range context in the native PromQL lowering notes.

alexey-milovidov added the can be tested Allows running workflows for external contributors label May 10, 2026

clickhouse-gh Bot added the pr-experimental Experimental Feature label May 10, 2026

clickhouse-gh Bot reviewed May 10, 2026

View reviewed changes

Comment thread src/Storages/TimeSeries/PrometheusQueryToSQL/applyAggregationOperatorCountValues.cpp Outdated

BadLiveware added 6 commits May 10, 2026 17:30

Merge remote-tracking branch 'origin/split/promql-common-base' into s…

b869f8b

…plit/promql-count-values

Merge branch 'split/promql-common-base' into split/promql-count-values

79a296c

clickhouse-gh Bot reviewed May 10, 2026

View reviewed changes

Comment thread src/Storages/TimeSeries/PrometheusQueryToSQL/applyAggregationOperatorCountValues.cpp

BadLiveware added 4 commits May 12, 2026 06:22

Merge remote-tracking branch 'upstream/master' into split/promql-comm…

f3a8f97

…on-base

Merge remote-tracking branch 'upstream/master' into split/promql-comm…

04fc742

…on-base

Merge branch 'split/promql-common-base' into split/promql-count-values

47005d7

vitlibar self-assigned this May 13, 2026

BadLiveware mentioned this pull request May 13, 2026

Draft: improve TimeSeries PromQL-to-SQL coverage #104271

Closed

1 task

BadLiveware added 2 commits May 13, 2026 23:37

Merge remote-tracking branch 'upstream/master' into split/promql-comm…

5bb0ee0

…on-base

Merge remote-tracking branch 'origin/split/promql-common-base' into s…

f6c6e6f

…plit/promql-count-values

clickhouse-gh Bot reviewed May 13, 2026

View reviewed changes

Comment thread src/Storages/TimeSeries/PrometheusQueryToSQL/applyAggregationOperatorCountValues.cpp Outdated

BadLiveware added 4 commits May 14, 2026 04:39

fix: classify PromQL binary operators by tokens

bd8eab8

Ignore string literals and selector matchers when routing compliance failures by query shape. This keeps report categories from treating label values or exponent signs as binary operators, and recognizes bare '<' and '>' comparisons.

groeneai mentioned this pull request May 16, 2026

CI: bump rss_limit_mb to 4096 for clickhouse_fuzzer to stop chronic OOMs at 2048 Mb default #105134

Merged

1 task

BadLiveware added 2 commits May 16, 2026 23:20

Merge remote-tracking branch 'upstream/master' into split/promql-comm…

f28c23e

…on-base

Merge branch 'split/promql-common-base' into split/promql-count-values

123db63

clickhouse-gh Bot reviewed May 16, 2026

View reviewed changes

Comment thread src/Storages/TimeSeries/PrometheusQueryToSQL/applyAggregationOperatorCountValues.cpp

BadLiveware added 3 commits May 17, 2026 09:15

fix: suppress RE2 logs for PromQL matcher checks

6cd88eb

Matcher validation intentionally turns malformed label regexes into PromQL parse errors. Construct RE2 with log_errors disabled so rejected selectors do not also emit internal RE2 error log lines on every parse attempt.

Merge branch 'split/promql-common-base' into split/promql-count-values

2f175e1

alexey-milovidov mentioned this pull request May 17, 2026

Stop the bleeding in function_prop_fuzzer #105146

Merged

1 task

alexey-milovidov and others added 11 commits May 17, 2026 21:05

Merge branch 'master' into split/promql-common-base

3cb90e8

Merge remote-tracking branch 'origin/master' into split/promql-common…

6bd66f9

…-base

Merge remote-tracking branch 'origin/master' into split/promql-count-…

b9ace76

…values

Merge remote-tracking branch 'origin/master' into split/promql-common…

0e181cf

…-base

Merge remote-tracking branch 'origin/master' into split/promql-common…

4c95154

…-base

Merge branch 'split/promql-common-base' into split/promql-count-values

5f27769

clickhouse-gh Bot reviewed May 30, 2026

View reviewed changes

alexey-milovidov mentioned this pull request Jun 30, 2026

Avoid copying data for the NONE compression codec when writing #108096

Open

1 task

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support PromQL count_values#104492

Support PromQL count_values#104492
BadLiveware wants to merge 45 commits into
ClickHouse:masterfrom
BadLiveware:split/promql-count-values

BadLiveware commented May 9, 2026 •

edited

Loading

Uh oh!

clickhouse-gh Bot commented May 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

clickhouse-gh Bot May 30, 2026

Uh oh!

clickhouse-gh Bot commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

BadLiveware commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

clickhouse-gh Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Findings

Tests

Final Verdict

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

clickhouse-gh Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh Bot commented May 30, 2026

LLVM Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BadLiveware commented May 9, 2026 •

edited

Loading

clickhouse-gh Bot commented May 10, 2026 •

edited

Loading