Implements SQL truth-value predicates `IS TRUE`, `IS FALSE`, `IS UNKNOWN` and their `IS NOT` variants. Closes ClickHouse/ClickHouse#99597. by ibrahimkarimeddin · Pull Request #99997 · ClickHouse/ClickHouse · GitHub
Skip to content

Implements SQL truth-value predicates IS TRUE, IS FALSE, IS UNKNOWN and their IS NOT variants. Closes ClickHouse/ClickHouse#99597.#99997

Merged
alexey-milovidov merged 10 commits into
ClickHouse:masterfrom
ibrahimkarimeddin:fix-99597-is-true-false-unknown
May 19, 2026
Merged

Implements SQL truth-value predicates IS TRUE, IS FALSE, IS UNKNOWN and their IS NOT variants. Closes ClickHouse/ClickHouse#99597.#99997
alexey-milovidov merged 10 commits into
ClickHouse:masterfrom
ibrahimkarimeddin:fix-99597-is-true-false-unknown

Conversation

@ibrahimkarimeddin

@ibrahimkarimeddin ibrahimkarimeddin commented Mar 19, 2026

Copy link
Copy Markdown
Contributor

Fixes #99597

Changelog category (leave one):

  • New Feature

Changelog entry:

Added support for SQL truth-value predicates IS TRUE, IS FALSE, IS UNKNOWN, IS NOT TRUE, IS NOT FALSE, and IS NOT UNKNOWN for nullable boolean expressions.

Documentation entry for user-facing changes

Motivation: Allows unambiguous testing of nullable boolean columns for all three SQL truth values (true, false, null/unknown). Unlike = true, the IS TRUE predicate returns false (not NULL) when the value is NULL.
Example use:

-- Scalar
SELECT 1 WHERE NULL IS UNKNOWN;  -- returns 1
SELECT 1 WHERE NULL IS TRUE;     -- returns nothing
-- On a nullable column
SELECT
    x IS TRUE,
    x IS FALSE,
    x IS UNKNOWN
FROM my_table;

<!-- ch-version-info:start -->
### Version info
- Merged into: `26.5.1.864`
<!-- ch-version-info:end -->

@CLAassistant

CLAassistant commented Mar 19, 2026

Copy link
Copy Markdown

@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Mar 19, 2026
@clickhouse-gh

clickhouse-gh Bot commented Mar 19, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [8adb439]

Summary:

job_name test_name status info comment
Stateless tests (amd_llvm_coverage, ParallelReplicas, s3 storage, parallel) FAIL
01710_projection_additional_filters FAIL cidb

AI Review

Summary

This PR adds parser support for SQL truth-value predicates (IS TRUE, IS FALSE, IS UNKNOWN, and IS NOT variants), rewrites them to existing comparison/null-check functions, documents the new operators, and adds a focused stateless test. I reviewed the current diff and prior discussion threads against the latest code, and I do not see unresolved contract violations in correctness, safety, compatibility, or rollout behavior.

Final Verdict

Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-feature Pull request with new product feature label Mar 19, 2026
Comment thread src/Parsers/ExpressionListParsers.cpp Outdated
Comment thread src/Parsers/ExpressionListParsers.cpp

@alexey-milovidov alexey-milovidov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good change, almost ready!

@alexey-milovidov alexey-milovidov self-assigned this Mar 19, 2026
Comment thread src/Parsers/ExpressionListParsers.cpp
@ibrahimkarimeddin

Copy link
Copy Markdown
Contributor Author

@alexey-milovidov I checked the failing tests, and they do not seem to be caused by the new changes in this PR. Please let me know whether I should investigate them or treat them as unrelated CI failures.

@alexey-milovidov

Copy link
Copy Markdown
Member

Comment thread src/Parsers/ExpressionListParsers.cpp
@ibrahimkarimeddin

Copy link
Copy Markdown
Contributor Author

@alexey-milovidov just a kind reminder on this PR. If any further updates are needed from my side, I’m ready to make them.

@alexey-milovidov alexey-milovidov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Approved.

@ibrahimkarimeddin

Copy link
Copy Markdown
Contributor Author

@alexey-milovidov sorry for the ping — quick question as this is my first time. Do I need to do anything after approval for the PR to be merged, or does it just take some time?

@alexey-milovidov

Copy link
Copy Markdown
Member

We will take it from here.

@alexey-milovidov

Copy link
Copy Markdown
Member

Fix for the integration test: #101897

@alexey-milovidov

Copy link
Copy Markdown
Member

Now we have to fix the hung queries in the Stress test...

@alexey-milovidov

Copy link
Copy Markdown
Member

The Hung check failure is fixed by #102008 and #102010, let's update the branch

@alexey-milovidov

Copy link
Copy Markdown
Member

I fixed the hung check, will update the branch.

@ibrahimkarimeddin

Copy link
Copy Markdown
Contributor Author

Hi @alexey-milovidov , gentle follow-up on this PR. Thanks again for the review and help here. Please let me know if anything else is needed from my side.

@alexey-milovidov

Copy link
Copy Markdown
Member

I will take it from here. The CI is not green, we have to fix the CI first.

@alexey-milovidov

alexey-milovidov commented May 16, 2026

Copy link
Copy Markdown
Member

Merged latest master into the branch and pushed (8adb439e4f5d). CI should re-run.

CI failures analysis (from the previous run)

The two failing checks are unrelated to this PR (which only touches the SQL parser):

  1. Stateless tests (amd_binary, flaky check) — timeout in 03207_json_read_subcolumns_2_memory. This is a JSON subcolumns test, no relation to truth-value predicates.
  2. Stress test (arm_msan)MemorySanitizer: use-of-uninitialized-value inside DB::MergeTreeReadersChain::executeActionsBeforePrewhere, with the uninitialized memory created by an allocation of patch_max_version in src/Storages/MergeTree/MergeTreeReadersChain.cpp:210. Entirely in MergeTree read code.

@groeneai, investigate both unrelated failures:

@groeneai

Copy link
Copy Markdown
Contributor

@alexey-milovidov — investigated both failures. Both are pre-existing trunk issues, unrelated to this PR's parser-only diff.

1. Stress test (arm_msan) — MemorySanitizer: use-of-uninitialized-value (STID: 1478-2063)

This is the same chronic MSan trunk bug tracked by #102241 (which was closed as completed on 2026-04-10, but the bug is still active — see below).

The stack shows the uninitialized value is reported through TextLog::appendToBlockOwnAsyncSplitChannel::runTextLog, and MSan attributes the allocation to patch_max_version in MergeTreeReadersChain::executeActionsBeforePrewhere at src/Storages/MergeTree/MergeTreeReadersChain.cpp:210. The actual variable is the std::optional<UInt64> returned by getMaxPatchVersionForStep (which just forwards prewhere_info->mutation_version). The poison is then propagated through patch-application machinery and eventually surfaces in a Poco::Message constructor when text logs are flushed asynchronously.

Cross-PR distribution (last 30 days, stress-test rows whose test_context_raw mentions patch_max_version or MergeTreeReadersChain): 24+ distinct PRs affected, all sanitizer/architecture variants:

pull_request_number  check_name                              cnt  last_day
100185               Stress test (amd_asan_ubsan)             1   2026-05-05
100185               Stress test (amd_msan)                   1   2026-05-05
100185               Stress test (arm_msan)                   1   2026-05-14
100185               Stress test (arm_tsan)                   1   2026-05-05
100185               Stress test (amd_debug)                  1   2026-05-14
104494               Stress test (arm_asan_ubsan, s3)         1   2026-05-15
104903               Stress test (amd_debug)                  1   2026-05-14
104120               Stress test (amd_msan)                   1   2026-05-06
... and ~16 more PRs across amd/arm × debug/asan/tsan/msan/ubsan

Master hits (pull_request_number = 0) in the last 60 days:

day          failures  checks
2026-05-14   1         Stress test (experimental, serverfuzz, azure, amd_tsan)
2026-05-11   1         Stress test (experimental, serverfuzz, arm_tsan)
2026-05-07   1         Stress test (experimental, serverfuzz, amd_tsan)
2026-04-25   1         Stress test (amd_debug)
2026-04-21   1         Stress test (experimental, serverfuzz, amd_debug)
2026-04-10   4         Stress test (arm_msan)

8 master hits in 60 days, latest 2026-05-14. #102241 should likely be reopened — the bug is still recurring on master (including the serverfuzz nightlies) more than a month after the issue was closed.

PR #102305 (Unpoison fiber stack on AArch64) and PR #102290 (Rollback jemalloc upgrade) were both opened as candidate fixes but closed without merging, so the root cause remains unaddressed.

2. Stateless tests (amd_binary, flaky check) — 03207_json_read_subcolumns_2_memory timeout

Only relevant to the flaky-check stage (which re-runs new/changed tests 50× under heavy randomization within a 600s budget). The test itself is intrinsically heavy: 800K rows of JSON data with max_dynamic_paths=2 and dozens of SELECT json.… paths-with-types queries.

Cross-PR distribution (last 60 days):

pull_request_number  cnt  first_day    last_day     checks
99513                5    2026-03-25   2026-03-26   amd_llvm_coverage variants
102269               2    2026-04-10   2026-04-13   amd_binary, flaky check
99997                2    2026-04-10   2026-04-10   amd_binary, flaky check
101645               2    2026-04-10   2026-04-10   amd_binary, flaky check
101918               1    2026-04-10   2026-04-10   amd_binary, flaky check
100535               1    2026-04-10   2026-04-10   amd_binary, flaky check
96130                1    2026-03-21   2026-03-21   arm_binary, parallel

Master hits in 60 days: 0.

5 of the 6 amd_binary, flaky check hits cluster on a single day (2026-04-10) across completely unrelated PRs (#102269, #99997, #101645, #101918, #100535) — same day the run we're looking at was performed. That looks like a CI infrastructure or shared-resource event on that runner, not a property of this PR.

Note: I previously merged #101765 for the sibling test 03469_json_read_subcolumns_combined_2_compact_merge_tree, which had a similar randomized-settings-induced timeout. The same approach (capping index_granularity / index_granularity_bytes and pinning max_threads) likely applies to 03207_json_read_subcolumns_2_memory if it surfaces again. I'll open a follow-up for that.

Conclusion

Both failures are pre-existing trunk issues that pre-date this PR. The new CI run on 8adb439e4f5d should be fine for the flaky-check timeout (the cluster of 2026-04-10 hits looks transient) and may also miss the MSan report (it's stochastic at ~one hit per few days per build variant), but the MSan signature itself is master-bug territory and not caused by the parser changes here. I'll file a follow-up to revisit #102241 / extend the timeout fix to 03207_json_read_subcolumns_2_memory.

@groeneai

Copy link
Copy Markdown
Contributor

Correction on the MSan analysis above — I was over-inclusive with my pattern matching.

The test_context_raw LIKE '%patch_max_version%' OR LIKE '%MergeTreeReadersChain%' query I ran above also matches stress runs whose attached server log contains other unrelated failures (Logical errors, Hung check, etc.) — the MergeTreeReadersChain::executeActionsBeforePrewhere frame appears in many normal query stack traces, so the broader query is noisy.

Filtering to the actual STID test_name:

SELECT toStartOfDay(check_start_time) AS day, count() AS failures, any(check_name) AS check
FROM default.checks
WHERE test_name = 'MemorySanitizer: use-of-uninitialized-value (STID: 1478-2063)'
  AND test_status IN ('FAIL', 'ERROR')
  AND pull_request_number = 0
  AND check_start_time > now() - INTERVAL 60 DAY
GROUP BY day ORDER BY day DESC
day          failures  check
2026-04-10   3         Stress test (arm_msan)
2026-04-09   12        Stress test (arm_msan)
2026-04-08   1         Stress test (arm_msan)
2026-04-03   1         Stress test (arm_msan)

Zero master hits since 2026-04-10, so STID 1478-2063 itself does appear to be resolved (likely consistent with #102241 being closed on that same day).

The CI failure linked from this PR is on SHA 1b7c5887c2f7… — checking the failure log timestamps, that run is from 2026-04-10 08:21, i.e. before the fix that closed #102241 landed. So this is a stale CI run, not a fresh MSan recurrence. The new CI on 8adb439e4f5d should not reproduce it.

Same story for the 03207_json_read_subcolumns_2_memory flaky-check timeout — also from the 2026-04-10 run cluster, also stale.

TL;DR: Both failures are stale CI from 2026-04-10. The new run on 8adb439e4f5d should be green; no further investigation needed on my side. Apologies for the noise on #102241 — that one looks legitimately fixed.

@clickhouse-gh

clickhouse-gh Bot commented May 16, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.10% +0.00%
Functions 90.80% 90.80% +0.00%
Branches 76.60% 76.60% +0.00%

Changed lines: 94.34% (50/53) | lost baseline coverage: 1 line(s) · Uncovered code

Full report · Diff report

@alexey-milovidov alexey-milovidov merged commit 97cbcf5 into ClickHouse:master May 19, 2026
163 of 167 checks passed
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label May 19, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@groeneai if there is a set index on a merge tree boolean column and IS TRUE or IS UNKNOWN predicate (if the column is nullable) is used, is the index used? Add a test case, please.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #105865 with a regression test. Short answer: yes, the set skip index is already used in all six forms.

The new predicates lower to functions that the set index recognises in MergeTreeIndexConditionSet::atomFromDAG and then evaluates per-granule in mayBeTrueOnGranule:

  • b IS TRUE -> isNotDistinctFrom(b, true)
  • b IS FALSE -> isNotDistinctFrom(b, false)
  • b IS UNKNOWN -> isNull(b)
  • b IS NOT TRUE -> isDistinctFrom(b, true)
  • b IS NOT FALSE -> isDistinctFrom(b, false)
  • b IS NOT UNKNOWN -> isNotNull(b)

With three granules holding only true, only false, and only NULL respectively, the set index drops the two unrelated granules for each positive form and the matching granule for each IS NOT form. The new test asserts that via ProfileEvents['SelectedMarks'].

One small follow-up not addressed by this PR: EXPLAIN indexes = 1 prints Condition: true for the isNotDistinctFrom / isDistinctFrom forms because KeyCondition only special-cases isNull / isNotNull (for IS UNKNOWN / IS NOT UNKNOWN it prints (b isNull) / (b isNotNull)). The pruning still happens via the actions DAG path, but teaching KeyCondition about the new operators would make the EXPLAIN output more informative. Happy to send a separate PR for that if you'd like.

groeneai added a commit to groeneai/ClickHouse that referenced this pull request Jun 1, 2026
Sibling of `04290_set_index_is_true_false_unknown` covering the leading
primary-key path on a `Nullable(Bool)` column. `KeyCondition` recognises
the `isNull` / `isNotNull` lowered forms and drops one granule for
`IS UNKNOWN` and `IS NOT UNKNOWN`. The other four predicates lower to
`isNotDistinctFrom` / `isDistinctFrom`, which are not recognised by
`KeyCondition`, so the test pins this current behaviour.

Follow-up to ClickHouse#99997. Requested in ClickHouse#105865.
pull Bot pushed a commit to AKJUS/ClickHouse that referenced this pull request Jun 2, 2026
…NKNOWN

Verifies that the set skip index prunes granules for the new truth-value
predicates from ClickHouse#99997 on a `Nullable(Bool)` column. The new test populates
three granules of eight rows each (only true, only false, only NULL) and
checks via `ProfileEvents['SelectedMarks']` that the index drops the two
unrelated granules for each positive form and the matching granule for
each `IS NOT` form.

Follow-up to ClickHouse#99997.
alexey-milovidov added a commit that referenced this pull request Jun 29, 2026
The `swap-clickhouse-jdbc.py` module docstring claimed the NoREC oracle's
`IS TRUE` postfix is rewritten to `= TRUE`, but the implementation (and the
inline comment, and the PR description) correctly rewrite it to `!= 0`. The
`= TRUE` variant was tried and rejected; the docstring was left behind.

Also clarify the rationale in both the docstring and the Dockerfile comment.
`IS TRUE`/`IS FALSE` are now implemented in ClickHouse's parser (since
#99997), but `IS TRUE` parses as
`<expr> <=> true`, i.e. strict equality with `1`, which does not match
`WHERE <expr>` truthiness (any non-zero numeric). Verified locally that
`5 IS TRUE` = 0 while `WHERE 5` matches and `5 != 0` = 1, so the `!= 0`
rewrite remains correct and necessary - using native `IS TRUE` would silently
produce false-positive oracle mismatches.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-feature Pull request with new product feature pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support IS TRUE, IS FALSE, IS UNKNOWN boolean predicates

6 participants