fix(docs): correct h3Distance example output value by avoronov-explyt · Pull Request #106175 · ClickHouse/ClickHouse · GitHub
Skip to content

fix(docs): correct h3Distance example output value#106175

Merged
alexey-milovidov merged 7 commits into
ClickHouse:masterfrom
avoronov-explyt:fix-h3distance-output
Jun 13, 2026
Merged

fix(docs): correct h3Distance example output value#106175
alexey-milovidov merged 7 commits into
ClickHouse:masterfrom
avoronov-explyt:fix-h3distance-output

Conversation

@avoronov-explyt

@avoronov-explyt avoronov-explyt commented May 31, 2026

Copy link
Copy Markdown
Contributor

Several embedded function-documentation examples (shown in SHOW CREATE FUNCTION / docs and used by the docs generation) had incorrect outputs. This corrects them to match the actual server behavior:

  • h3Distance: the example distance between cells 590080540275638271 and 590103561300344831 is 6, not 7.
  • multiSearchFirstPositionUTF8: the leftmost matching offset is 4 (needle 'авст'), not 3.
  • multiSearchFirstPositionCaseInsensitiveUTF8: fixed the result header case to match the query and corrected the value to 1 (needle 'Здра' matches at position 1), it previously showed 3.
  • readWkt: the MultiLineString example used readWKTLineString instead of readWKTMultiLineString.

All values were verified against the server.

Changelog category (leave one):

  • Documentation (changelog entry is not required)

Version info

  • Merged into: 26.6.1.752

@CLAassistant

CLAassistant commented May 31, 2026

Copy link
Copy Markdown

@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label May 31, 2026
@alexey-milovidov alexey-milovidov self-assigned this May 31, 2026
@clickhouse-gh

clickhouse-gh Bot commented May 31, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [8d617d7]

Summary:


AI Review

Summary

This PR corrects embedded FunctionDocumentation examples for h3Distance, multiSearchFirstPositionUTF8, multiSearchFirstPositionCaseInsensitiveUTF8, and readWKTMultiLineString. I reviewed the current diff and prior review discussion against the checked-out source; the earlier Pretty block width issue is fixed, and I found no remaining high-signal correctness or documentation-contract issues.

Final Verdict

Status: ✅ Approve

alexey-milovidov and others added 2 commits June 7, 2026 14:13
…e output value

The example for `multiSearchFirstPositionCaseInsensitiveUTF8` showed `3`, but
the leftmost matching offset is `1`: the needle `'Здра'` matches the haystack
`'Здравствуй, мир'` at position 1. Verified against the server. The previous
commit fixed only the header case (`мир` -> `МИР`) but left the wrong value.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member

Picked this up, merged master (was ~1688 behind, clean), and verified all four example outputs against a local server:

  • h3Distance(590080540275638271, 590103561300344831)6
  • multiSearchFirstPositionUTF8('Здравствуй, мир', ['мир', 'вст', 'авст'])4
  • readWKTMultiLineString('MULTILINESTRING ((1 1, 2 2, 3 3, 1 1))')MultiLineString
  • multiSearchFirstPositionCaseInsensitiveUTF8('Здравствуй, мир', ['МИР', 'вст', 'Здра'])1, not 3

The last one: the previous commit fixed only the result header case (мирМИР) but kept the value 3, which is also wrong — the needle 'Здра' matches at position 1, so I corrected the example value to 1 (commit a13d3ea1ca0).

Also filled in the PR body's changelog category (Documentation), which was the only CI failure (Change category is missing or invalid).

@clickhouse-gh clickhouse-gh Bot added the pr-documentation Documentation PRs for the specific code PR label Jun 7, 2026
Comment thread src/Functions/readWkt.cpp
alexey-milovidov and others added 2 commits June 9, 2026 03:09
…k width

Changing the example header from `readWKTLineString` to `readWKTMultiLineString`
widened the top border, but the value row and bottom border still used the old,
narrower width, leaving the Pretty block inconsistent. Pad the value row and
extend the bottom border so all three lines match the new header width.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member

@groeneai, the only CI failure is Tests/04277_parallel_replicas_join_dp_reorder with MEMORY_LIMIT_EXCEEDED (Code 241, current RSS: 53.70 GiB, maximum: 53.69 GiB) on the arm_asan_ubsan, azure, parallel shard. This is a docs-only PR (it only edits string literals inside embedded function-documentation blocks), so it cannot affect parallel-replicas JOIN memory usage. CIDB shows this test flaking sporadically on unrelated PRs over the last two weeks (#106734, #101398, #105889). Please investigate and provide a fix in a separate PR if one is not already in progress.

Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=106175&sha=a13d3ea1ca0a364a5f27aa68875f066007db2c63&name_0=PR&name_1=Stateless%20tests%20%28arm_asan_ubsan%2C%20azure%2C%20parallel%29

@groeneai

groeneai commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

@alexey-milovidov investigated. Short version: 04277_parallel_replicas_join_dp_reorder is one of many incidental victims of host-level RSS pressure on the arm_asan_ubsan, azure, parallel shard, not a test-specific issue.

What the failure looks like on this PR

Code: 241. (total) memory limit exceeded:
  would use 4.13 GiB (attempt to allocate chunk of 1.00 MiB),
  current RSS: 53.70 GiB, maximum: 53.69 GiB: While executing Remote.
(query 11, line 55: SELECT count() FROM pr_dp_a JOIN pr_dp_b ... JOIN pr_dp_c ...)

3/3 reruns passed. The test data itself is trivial — three MergeTree tables of 100, 1000, and 100000 rows, single UInt32 column. It cannot legitimately use 53 GiB; it is the query that loses the OvercommitTracker arbitration when the runner's cumulative RSS already sits at the cap.

Shard-wide picture (CIDB, last 14d on Stateless tests (arm_asan_ubsan, azure, parallel), current RSS: 53.69 GiB pattern)

  • 50 unique tests hit, 60 total hits, 40 unique PRs
  • Steady cadence: 1–16 hits/day, no master gap
  • Top victims: 03742_nested_loop_join_long (5), 01603_read_with_backoff_bug (4), 03748_join_on_complex_condition_result_memory (3), 02455_one_row_from_csv_memory_usage (2), 04033_tpc_ds_q04/q11/q61 (2/1/1), 01661_extract_all_groups_throw_fast (2), and a long tail of single-hit incidental victims
  • 04277_parallel_replicas_join_dp_reorder itself: 1 hit in 14d on this shard, all 3 reruns passed (this PR's run)

CIDB on 04277_parallel_replicas_join_dp_reorder (30d, all shards)

So the only RSS-OOM occurrence for 04277 is this single 0/3-rerun hit on this shard, and it is not even close to the worst victim there.

Precedent

The closest sibling case I have open is PR #106797 (filed yesterday under your earlier directive on PR #10454504033_tpc_ds_q61): same shard, same OvercommitTracker arbitration, same current RSS: 53.7 GiB / 53.69 GiB shape. There I disabled the test mirroring the existing pattern for 04033_tpc_ds_q35. But q61 had multiple hits including blocking the parent PR's CI — 04277 does not (1 hit / 14d, 3/3 reruns clean).

Why I'm NOT speculatively pushing a 04277 disable / data shrink

Per past directives I'd be making a defensive change for a failure I can't reproduce locally and that has hit this test exactly once in 14 days. The test's data is already small; there is nothing to shrink. Disabling it just shifts the next victim of the same shard pressure to whichever query happens to lose the next OvercommitTracker arbitration. That doesn't address the systemic ~5 unique-test-hits/day issue.

What I'd like direction on

Two reasonable paths, both viable:

  1. Mirror PR Disable 04033_tpc_ds_q61 (MEMORY_LIMIT_EXCEEDED on arm_asan_ubsan azure parallel) #106797 for 04277. Symmetric with 04033_tpc_ds_q61/q35. I can file the same one-line SKIP disable + comment if you want a quick stop-the-bleed for this specific test.
  2. Treat as the chronic shard-wide host-RSS issue. Investigate why the runner's sanitizer + Azure + parallel triple is sitting at the 53.7 GiB cap (50 unique tests / 40 PRs over 14d argue for an environment fix or runner memory bump rather than per-test disables).

Please tell me which you'd prefer. Default would be (1) since you asked for a separate PR and that mirrors the precedent. CIDB report URL on this exact failure: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=106175&sha=a13d3ea1ca0a364a5f27aa68875f066007db2c63&name_0=PR&name_1=Stateless%20tests%20%28arm_asan_ubsan%2C%20azure%2C%20parallel%29

Session: cron:clickhouse-ci-task-worker:20260609-040300

@clickhouse-gh

clickhouse-gh Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.60% 84.60% +0.00%
Functions 92.30% 92.30% +0.00%
Branches 77.30% 77.20% -0.10%

Changed lines: Changed C/C++ lines covered by tests: 27/27 (100.00%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

@alexey-milovidov

Copy link
Copy Markdown
Member

@groeneai, investigate the failure: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=106175&sha=8d617d769b286ed421c0f5cb29b803bbf8c0ef94&name_0=PR&name_1=Integration%20tests%20%28amd_msan%2C%204%2F6%29 and provide a fix in a separate PR. If the fix is already in progress, link it here.

The failure is test_profile_max_sessions_for_user/test.py::test_profile_max_sessions_for_user_postgres — a pytest timeout (>900 s) in threaded_run_test([postgres_session] * 3) on the amd_msan shard. This is unrelated to this PR: the PR only edits string literals inside embedded FunctionDocumentation blocks, and per CIDB this test has timed out on many unrelated PRs and on master over the last week (e.g. #106538, #95659, #106137, #101789, #106020, #102308, #106177, #96130), so it looks like a recurring flaky timeout of the PostgreSQL session test under MSan. There is no open issue for it yet. I have rerun the failed job in the meantime.

@groeneai

Copy link
Copy Markdown
Contributor

@alexey-milovidov Thanks for the rerun. Confirmed this is unrelated to the docs PR.

CIDB over 30 days: the timeout is specific to the postgres variant and to MSan. 14 timeouts on test_profile_max_sessions_for_user_postgres across 13 distinct PRs plus master (11 of 14 on amd_msan), versus 1 each for the tcp/http/mysql/grpc variants. Recurring flaky timeout, not a regression. No open issue or fix PR exists yet.

Preliminary root cause (confirming with a local repro before I open the fix):

threaded_run_test starts 3 postgres sessions (max_sessions_for_user=2), waits for the overflown session count log, then issues a single KILL QUERY WHERE user='test_user' SYNC and thread.join() with no timeout. That log fires at session acquisition (Session::trackSession), not at query execution, so it only proves the excess session was rejected. The two accepted sessions' SELECT ..., COUNT(*) FROM system.numbers (infinite) queries may not have reached the process list when the one-shot KILL runs. If a query starts after the KILL, nothing kills it and it runs forever. The postgres worker is psycopg2 with no client-side query timeout, so the thread blocks in fetchall() until the 900s pytest limit. The native-protocol variants have a 600s client timeout (helpers/client.py) that masks the same race, which is why postgres dominates. MSan widens the connect-to-execute window and raises the hit rate.

Fix direction (separate PR, per your request): make the cleanup robust against a late-starting query by killing in a loop until no test_user queries remain in system.processes (instead of a single one-shot KILL), bounded well under the pytest limit. The overflow assertion is unchanged. I will link the PR here.

@groeneai

Copy link
Copy Markdown
Contributor

@alexey-milovidov alexey-milovidov added this pull request to the merge queue Jun 13, 2026
Merged via the queue into ClickHouse:master with commit 8029bad Jun 13, 2026
327 of 328 checks passed
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-documentation Documentation PRs for the specific code PR pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants