Fix missing scalar issue when evaluating subqueries inside table functions by amosbird · Pull Request #56057 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix missing scalar issue when evaluating subqueries inside table functions#56057

Merged
robot-clickhouse merged 4 commits into
ClickHouse:masterfrom
amosbird:fix-56031
Oct 30, 2023
Merged

Fix missing scalar issue when evaluating subqueries inside table functions#56057
robot-clickhouse merged 4 commits into
ClickHouse:masterfrom
amosbird:fix-56031

Conversation

@amosbird

Copy link
Copy Markdown
Collaborator

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix missing scalar issue when evaluating subqueries inside table functions. This fixes #56031.

@Algunenano Algunenano self-assigned this Oct 26, 2023
@robot-ch-test-poll robot-ch-test-poll added the pr-bugfix Pull request with bugfix, not backported by default label Oct 26, 2023
@robot-ch-test-poll

robot-ch-test-poll commented Oct 26, 2023

Copy link
Copy Markdown
Contributor

@Algunenano

Copy link
Copy Markdown
Member

Failures:

  • Bugfix validate check — Changed tests don't reproduce the bug. It complains because the previous release passes the test, as you are only checking the ErrorCode and that doesn't change. You can either tweak a bit the test or change the PR to an improvement.
  • Stateless tests (release, wide parts enabled) -> 02263_lazy_mark_load -> Test 02263_lazy_mark_load is flaky #55973
  • Upgrade check (msan) -> A LOGICAL_ERROR due to using distributed tables and allow_experimental_partial_result together (23.9.2.56). The feature had to be removed because of these kind of problems in master.
  • Upgrade check (tsan) -> Cannot start the new CH server. The problem is that the previous one never finished shutting down and we didn't kill it.

@Algunenano Algunenano left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind adapting the test so that it doesn't pass with the previous release or changing the PR to an improvement , please?

@amosbird

Copy link
Copy Markdown
Collaborator Author

Let's see if this PR can also fix the error code.

Comment thread src/Interpreters/Context.cpp Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly this is not ok: SELECT __getScalar('')

@robot-clickhouse robot-clickhouse merged commit cb0cf67 into ClickHouse:master Oct 30, 2023
pull Bot pushed a commit to AKJUS/ClickHouse that referenced this pull request May 18, 2026
The test asserted a deterministic row-by-row output from:

  SELECT val, avg(toUInt32(val))
  FROM t_group_by_lowcardinality
  GROUP BY val
  LIMIT 10
  SETTINGS max_threads = 1, max_rows_to_group_by = 100, group_by_overflow_mode = 'any';

With `group_by_overflow_mode = 'any'`, only the first `max_rows_to_group_by` distinct
keys make it into the hash table — and which keys those are depends on the order in
which rows arrive at the aggregator. `LIMIT 10` without `ORDER BY` then selects ten of
those keys in hash-iteration order. The `max_threads = 1` clause does not help under
`ParallelReplicas` (where the table is sharded across replicas and partial aggregations
are merged) or under S3 storage (where reads can be split into independent ranges), so
the resulting row set varies between runs.

Originally tracked in ClickHouse#36069 (2022). The `-- Tags: no-random-settings` workaround masked
the issue for random-settings randomization but the underlying non-determinism returned
once `ParallelReplicas` was enabled by default in some CI configurations. Concretely the
test now fails ~10–20 times per day on master under
`Stateless tests (amd_llvm_coverage, ParallelReplicas, s3 storage, parallel)` —
@alexey-milovidov flagged this on ClickHouse#102039.

The test is a crash-regression test for
"Avoid crash in case of GROUP BY LowCardinality(Nullable(String)) column and
group_by_overflow_mode='any'" (PR ClickHouse#29637 / ClickHouse#56057). The query is preserved verbatim,
including all the problematic settings; only the assertion is changed to
`SELECT count() FROM (...) `, which is always 10. If the underlying crash ever returns,
the test will detect it; the specific row values were never deterministic to begin with.

Verified locally that the old reference fails under `max_threads = 4` (extra null bucket
appears, last row drops — exact diff matches the CI report). The new assertion returns
`{"count()":10}` for all parallelism levels.

Closes ClickHouse#36069
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Scalar doesn't exist for format function

4 participants