Add /processors-profile web UI for per-query pipeline heatmap by nikitamikhaylov · Pull Request #104614 · ClickHouse/ClickHouse · GitHub
Skip to content

Add /processors-profile web UI for per-query pipeline heatmap#104614

Merged
nikitamikhaylov merged 10 commits into
masterfrom
processors-profile-web-ui
May 19, 2026
Merged

Add /processors-profile web UI for per-query pipeline heatmap#104614
nikitamikhaylov merged 10 commits into
masterfrom
processors-profile-web-ui

Conversation

@nikitamikhaylov

@nikitamikhaylov nikitamikhaylov commented May 11, 2026

Copy link
Copy Markdown
Member

Inspired by: https://github.com/nickitat/fake_explain_analyze

image

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Added a new built-in web UI at /processors-profile that visualizes the pipeline of any past SELECT query as a heatmap, sourced from system.query_log and system.processors_profile_log. Each processor is colored by its elapsed_us and shows per-processor stats (rows, bytes, wait times) on hover.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Version info

  • Merged into: 26.5.1.818

Serves a built-in HTML page at /processors_profile (registered next to
/jemalloc, same auth model: user/password URL params, inherits CORS from
existing WebUIRequestHandler).

The page lets the user pick a recent SELECT from system.query_log via an
editable WHERE clause, then renders the query's pipeline as a Graphviz DOT
SVG (via @viz-js/viz from CDN) with per-processor heatmap colors derived
from sum(elapsed_us) in system.processors_profile_log. Hovering a node
shows a tooltip with name, step, elapsed/wait times and input/output rows
and bytes joined by processor_uniq_id.

Inspired by:
  https://raw.githubusercontent.com/nickitat/fake_explain_analyze/refs/heads/master/explain.py

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@clickhouse-gh

clickhouse-gh Bot commented May 11, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-feature Pull request with new product feature label May 11, 2026
nikitamikhaylov and others added 2 commits May 11, 2026 19:12
So the page sees every replica's local log table and any rotated
query_log_N / processors_profile_log_N tables on Cloud / sharded setups.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A `Cluster: on/off` button in the header flips the page between
`system.*_log` (single-node default) and
`clusterAllReplicas('default', merge('system', '^*_log'))` for both
`query_log` and `processors_profile_log`. The preference is persisted in
`localStorage` and applies to the dropdown list, the query-text lookup,
and the profile-data join.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread programs/server/processors_profile.html Outdated
Comment thread programs/server/processors_profile.html Outdated
The previous `'${queryId.replace(/'/g, "\\'")}'` interpolation only
escaped single quotes — a backslash in the input would slip through and
end the SQL string literal early, allowing injection. Both `query_id`
lookups (fetchQueryText and fetchProfileData) now bind the value as a
parameterized `{qid:String}`, so the server treats it as a literal of
the declared type and any quote/backslash/etc payload stays inert.

`runQuery` gained a `params` object argument that's appended to the
request URL as `param_<name>=…`, mirroring ClickHouse's standard query
parameter mechanism.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread programs/server/processors_profile.html Outdated

@alexey-milovidov alexey-milovidov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to processors-profile as these paths are more convenient for URLs.

@alexey-milovidov alexey-milovidov self-assigned this May 11, 2026
@alexey-milovidov alexey-milovidov changed the title Add /processors_profile web UI for per-query pipeline heatmap Add /processors-profile web UI for per-query pipeline heatmap May 12, 2026
alexey-milovidov and others added 3 commits May 17, 2026 12:39
Per review feedback: hyphenated paths are more URL-friendly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per review feedback: the page must work in air-gapped or restricted
environments and should not depend on a third-party CDN at runtime.

The `@viz-js/viz@3.7.0` standalone bundle is now embedded into the
binary via `#embed` and served from `/js/viz-standalone.js`, alongside
the other built-in JS assets handled by `JavaScriptWebUIRequestHandler`.

`processors_profile.html` now loads the local copy.
};
constexpr unsigned char resource_viz_standalone_js[] =
{
#embed "../../programs/server/js/viz-standalone.js"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Embedding programs/server/js/viz-standalone.js pulls a ~1.38 MiB prebuilt dependency bundle (including Graphviz object code) into the repository.

That permanently inflates git history and violates the no-large/fat-binary-in-git rule. Please avoid committing this generated bundle and instead generate/package it from source during build/release (or fetch it in packaging/CI) so the artifact is not stored in-tree.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tolerable. Please proceed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue still holds in the current head.

programs/server/js/viz-standalone.js is a ~1.38 MiB prebuilt/minified bundle that includes third-party object code, so we are still checking a fat generated artifact into git. That conflicts with the repo rule against large/binary blobs in-tree, and it now also causes CI failure (Style check -> codespell on this file).

Please keep this asset out of the repository (generate/fetch it in build/release packaging, or vendor the source and build it as part of tooling) so we avoid permanent history bloat and style-check noise.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still present in the current head: programs/server/js/viz-standalone.js is a ~1.38 MiB prebuilt third-party bundle (including Graphviz object code) checked directly into the repository and embedded into the server binary. That remains a fat generated dependency artifact in git, with permanent history/binary-size cost.

Please switch to a source-based or build-time fetched artifact flow instead of committing the bundled output file.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-checked current head and this is still unresolved.

programs/server/js/viz-standalone.js remains a ~1.38 MiB prebuilt third-party bundle (including embedded Graphviz object code), checked directly into git and then embedded into the server binary via #embed.

The previous reply does not change the underlying invariant issue: this permanently increases repository history with a generated fat artifact instead of source-managed assets/build-time generation.

@alexey-milovidov

Copy link
Copy Markdown
Member

This was fixed by #105146. Let's update the branch.

}
picker.innerHTML = rows.map(r => {
const label = `${r.event_time} · ${r.user} · ${formatDuration(r.query_duration_ms)} · ${r.query_id} · ${r.query}`;
return `<option value="${escapeHtml(r.query_id)}">${escapeHtml(label)}</option>`;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query_id alone is not a stable identifier for a single query execution. It can be reused (for example via explicit query_id settings), and in cluster mode the same query_id can appear on multiple hosts/executions.

Because the picker stores only query_id in <option value=...>, subsequent calls (fetchQueryText / fetchProfileData) can pick up rows from a different execution and merge unrelated processor stats. That breaks the page contract of showing the selected query's pipeline.

Please carry a per-execution key from the picker (for example hostname + query_id + event_time_microseconds, or another unique tuple) and filter both follow-up queries by that key (or a precise time window) instead of query_id only.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not important, please disregard the comment above.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ambiguity is still present in the current code.

The picker stores only query_id (<option value=...>), then both follow-up queries use only WHERE query_id = {qid:String} (fetchQueryText and fetchProfileData). If query_id is reused (manual SET query_id, retries, or cross-host duplicates in cluster mode), the selected row can resolve to another execution and blend unrelated processor stats.

Please carry an execution key from the picker (for example query_id + event_time_microseconds + hostname) and filter both follow-up reads by that key instead of query_id alone.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still live in the current head. The picker stores only query_id as the option value, and both follow-up queries still filter only WHERE query_id = {qid:String}. That conflates distinct executions when query_id is reused (explicit SET query_id, retries, or collisions across hosts in cluster mode), and duplicate options with the same value are not independently selectable.

Please key selection and profile lookup by a tuple that uniquely identifies one log row (for example event_time_microseconds + host + query_id, or a stable row id), and use that same tuple in both query-text and profile-data fetches.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-checked current head and this concern is still live.

The picker still stores only query_id as <option value=...>, and both follow-up reads still filter only WHERE query_id = {qid:String} in fetchQueryText and fetchProfileData.

The prior "not important" reply does not address the correctness issue: when query_id is reused (explicit query_id settings, retries, or multi-host contexts), the selected row in the UI is not a stable key for one execution, so the rendered text/heatmap can describe a different execution than the one the user picked.

@alexey-milovidov

Copy link
Copy Markdown
Member

Also, please replace the usage of Dagre in the Web UI (play.html) with Viz as well.

The `EXPLAIN GRAPH` renderer in the Play web UI now uses the same
vendored `viz-standalone.js` bundle as `/processors_profile`, instead of
loading `dagre`, `graphlib-dot`, `dagre-d3`, and `d3` from external
CDNs. This removes the last runtime CDN dependency for diagram
rendering, so the page works in air-gapped or restricted environments.

Also skip `viz-standalone.js` in the codespell typo check — the
minified bundle contains the literal token `ofO` which codespell
flags as a typo of `of`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread programs/server/processors_profile.html
Overlapping `profileQuery` calls (auto-select on `loadQueries`, picker
change) could complete out of order, letting a slower earlier call
overwrite the graph with stale data for a no-longer-selected query.

Each call now captures a monotonically increasing generation token on
entry and bails after every `await` if a newer call has been started.
`renderDot` accepts the same predicate so the actual SVG mutation is
also skipped once the call is stale.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
factory.addPathToHints("/jemalloc");
factory.addHandler(jemalloc_handler);

auto processors_profile_handler = std::make_shared<HandlingRuleHTTPHandlerFactory<ProcessorsProfileWebUIRequestHandler>>(server);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/processors-profile is currently added only in addCommonDefaultHandlersFactory, but createHandlersFactoryFromConfig has no matching handler.type branch (unlike play, dashboard, binary, merges, and jemalloc).

That means any deployment that sets http_handlers cannot configure this endpoint at all, so the feature disappears outside default handlers. Please add a dedicated config handler type (for example processors_profile) in the config-driven switch and wire it to ProcessorsProfileWebUIRequestHandler.

@clickhouse-gh

clickhouse-gh Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.20% 84.10% -0.10%
Functions 91.40% 91.40% +0.00%
Branches 76.60% 76.60% +0.00%

Changed lines: 80.00% (16/20) · Uncovered code

Full report · Diff report

@nikitamikhaylov nikitamikhaylov added this pull request to the merge queue May 19, 2026
Merged via the queue into master with commit d3861de May 19, 2026
323 of 326 checks passed
@nikitamikhaylov nikitamikhaylov deleted the processors-profile-web-ui branch May 19, 2026 11:45
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants