Revert "Revert "Merge pull request #93542 from scanhex12/multistage_prewhere"" by scanhex12 · Pull Request #95565 · ClickHouse/ClickHouse · GitHub
Skip to content

Revert "Revert "Merge pull request #93542 from scanhex12/multistage_prewhere""#95565

Merged
scanhex12 merged 22 commits into
masterfrom
revert-95496-revert-multistage-prewhere-parquet
Feb 4, 2026
Merged

Revert "Revert "Merge pull request #93542 from scanhex12/multistage_prewhere""#95565
scanhex12 merged 22 commits into
masterfrom
revert-95496-revert-multistage-prewhere-parquet

Conversation

@scanhex12

@scanhex12 scanhex12 commented Jan 29, 2026

Copy link
Copy Markdown
Member

Reverts #95496

Version info

  • Merged into: 26.2.1.214

@clickhouse-gh

clickhouse-gh Bot commented Jan 29, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Jan 29, 2026
@alexey-milovidov

Copy link
Copy Markdown
Member

@scanhex12, attach ClickBench results here before merging.

@alexey-milovidov alexey-milovidov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@alexey-milovidov

Copy link
Copy Markdown
Member

@scanhex12, create isolated tests that reproduce the bug deterministically.

@scanhex12 scanhex12 mentioned this pull request Jan 30, 2026
1 task
@alexey-milovidov alexey-milovidov self-assigned this Jan 30, 2026

@alexey-milovidov alexey-milovidov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, those are seriously good speed-ups!

else
{
size_t remaining = rows_total - row_subidx;
const UInt8 * const first_one = static_cast<const UInt8 *>(memchr(filter_base + row_subidx, 1, remaining));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insert into function file('t.parquet') select if(number%10,2,0) as x from numbers(100) settings engine_file_truncate_on_insert=1;
select count() from file('t.parquet') prewhere x;

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like it works:

:) insert into function file('t.parquet') select if(number%10,2,0) as x from numbers(100) settings engine_file_truncate_on_insert=1;
select count() from file('t.parquet') prewhere x;

INSERT INTO FUNCTION file('t.parquet')
SETTINGS engine_file_truncate_on_insert = 1
SELECT if(number % 10, 2, 0) AS x
FROM numbers(100)
SETTINGS engine_file_truncate_on_insert = 1

Query id: 8692db55-1924-460c-9015-b25fdca9180e

Ok.

0 rows in set. Elapsed: 0.005 sec. 


SELECT count()
FROM file('t.parquet')
PREWHERE x

Query id: f8bb1455-43fc-4f42-9174-9418a6278c94

   ┌─count()─┐
1. │      90 │
   └─────────┘

1 row in set. Elapsed: 0.004 sec. 

Comment thread src/Processors/Formats/Impl/Parquet/Reader.cpp Outdated
Comment thread src/Processors/Formats/Impl/Parquet/Reader.cpp Outdated
Comment thread src/Processors/Formats/Impl/Parquet/Reader.cpp Outdated
@scanhex12 scanhex12 force-pushed the revert-95496-revert-multistage-prewhere-parquet branch from a0a8eae to a68e002 Compare February 3, 2026 00:03
@scanhex12

Copy link
Copy Markdown
Member Author

Sorry, there were segfaults in the previous results. New benchmarks:

[0.108, 0.032, 0.033],
[0.311, 0.077, 0.084],
[0.395, 0.114, 0.113],
[1.005, 0.123, 0.126],
[1.106, 0.352, 0.345],
[1.592, 0.416, 0.421],
[0.161, 0.081, 0.077],
[0.145, 0.069, 0.069],
[1.429, 0.451, 0.469],
[2.230, 0.491, 0.498],
[1.245, 0.305, 0.281],
[1.332, 0.316, 0.318],
[1.641, 0.469, 0.465],
[3.515, 0.671, 0.674],
[1.746, 0.525, 0.522],
[1.090, 0.410, 0.418],
[3.790, 1.192, 1.217],
[3.356, 0.850, 0.837],
[6.873, 2.241, 2.235],
[0.716, 0.108, 0.114],
[13.859, 0.903, 0.920],
[16.193, 1.206, 1.202],
[31.063, 2.181, 2.221],
[77.281, 4.850, 4.944],
[4.056, 0.422, 0.417],
[1.506, 0.237, 0.246],
[4.007, 0.420, 0.419],
[14.100, 1.171, 1.205],
[12.082, 4.312, 4.295],
[0.350, 0.110, 0.098],
[3.800, 0.570, 0.577],
[8.773, 0.776, 0.791],
[9.951, 4.836, 5.090],
[14.385, 1.888, 1.897],
[14.242, 1.885, 1.873],
[0.583, 0.342, 0.353],
[0.464, 0.131, 0.145],
[0.354, 0.107, 0.105],
[0.425, 0.099, 0.100],
[0.470, 0.157, 0.159],
[0.406, 0.075, 0.070],
[0.296, 0.056, 0.075],
[0.295, 0.066, 0.063],

Master branch:

[0.122, 0.038, 0.038],
[0.209, 0.075, 0.072],
[0.363, 0.110, 0.108],
[0.919, 0.112, 0.116],
[1.066, 0.385, 0.384],
[1.571, 0.455, 0.453],
[0.228, 0.074, 0.077],
[0.334, 0.073, 0.070],
[1.455, 0.441, 0.448],
[2.136, 0.507, 0.504],
[1.238, 0.260, 0.253],
[1.276, 0.262, 0.248],
[1.738, 0.467, 0.473],
[3.492, 0.746, 0.706],
[1.858, 0.560, 0.570],
[1.117, 0.444, 0.436],
[4.103, 1.253, 1.274],
[3.323, 1.019, 0.996],
[6.914, 2.376, 2.429],
[0.735, 0.119, 0.118],
[13.961, 0.754, 0.765],
[16.228, 1.245, 1.209],
[30.963, 1.910, 1.941],
[77.794, 5.563, 16.375],
[3.988, 0.328, 0.317],
[1.535, 0.233, 0.236],
[4.013, 0.334, 0.333],
[14.234, 1.064, 0.982],
[11.707, 4.765, 4.711],
[0.322, 0.106, 0.099],
[3.796, 0.563, 0.575],
[9.547, 0.748, 0.724],
[10.159, 5.604, 5.113],
[14.420, 4.675, 4.663],
[14.472, 4.760, 4.777],
[0.861, 0.349, 0.346],
[0.485, 0.174, 0.160],
[0.375, 0.127, 0.137],
[0.501, 0.100, 0.104],
[0.507, 0.169, 0.175],
[0.323, 0.072, 0.076],
[0.422, 0.057, 0.058],
[0.242, 0.064, 0.065],

Speedups (greater is better):

[1.130, 1.188, 1.152],
[0.672, 0.974, 0.857],
[0.919, 0.965, 0.956],
[0.914, 0.911, 0.921],
[0.964, 1.094, 1.113],
[0.987, 1.094, 1.076],
[1.416, 0.914, 1.000],
[2.303, 1.058, 1.014],
[1.018, 0.978, 0.955],
[0.958, 1.033, 1.012],
[0.994, 0.852, 0.900],
[0.958, 0.829, 0.780],
[1.059, 0.996, 1.017],
[0.993, 1.112, 1.047],
[1.064, 1.067, 1.092],
[1.025, 1.083, 1.043],
[1.083, 1.051, 1.047],
[0.990, 1.199, 1.190],
[1.006, 1.060, 1.087],
[1.027, 1.102, 1.035],
[1.007, 0.835, 0.832],
[1.002, 1.032, 1.006],
[0.997, 0.876, 0.874],
[1.007, 1.147, 3.312],
[0.983, 0.777, 0.760],
[1.019, 0.983, 0.959],
[1.001, 0.795, 0.795],
[1.010, 0.909, 0.815],
[0.969, 1.105, 1.097],
[0.920, 0.964, 1.010],
[0.999, 0.988, 0.997],
[1.088, 0.964, 0.915],
[1.021, 1.159, 1.005],
[1.002, 2.476, 2.458],
[1.016, 2.525, 2.550],
[1.477, 1.020, 0.980],
[1.045, 1.328, 1.103],
[1.059, 1.187, 1.305],
[1.179, 1.010, 1.040],
[1.079, 1.076, 1.101],
[0.796, 0.960, 1.086],
[1.426, 1.018, 0.773],
[0.820, 0.970, 1.032],

Geometric mean becomes better:

old: 1.8538961059026726
new: 1.7885948006744214
ratio new/old: 0.9647761786540592
old: 0.39366110244108643
new: 0.37386989623272876
ratio new/old: 0.9497252685479142
old: 0.4010315651650805
new: 0.37822098786461256
ratio new/old: 0.9431202446842851

@scanhex12

Copy link
Copy Markdown
Member Author

@scanhex12 scanhex12 enabled auto-merge February 4, 2026 09:02
@scanhex12 scanhex12 added this pull request to the merge queue Feb 4, 2026
Merged via the queue into master with commit a358daa Feb 4, 2026
125 of 131 checks passed
@scanhex12 scanhex12 deleted the revert-95496-revert-multistage-prewhere-parquet branch February 4, 2026 09:22
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Feb 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants