Problem
ClickBench setup knowledge is currently scattered across multiple locations:
HITS_VIEW_DDL constant in benchmarks/src/clickbench.rs with inline comments
- View creation SQL in
datafusion/sqllogictest/test_files/clickbench.slt
- Brief mention in
benchmarks/README.md (without critical setup details)
This makes it difficult for users to understand:
- Why the EventDate column needs special handling
- When and why to use the
binary_as_string option
- How to set up ClickBench correctly for DataFusion
Background
Related to #19881. The fix introduces a view that transforms EventDate from UInt16 (days since epoch) to proper DATE type. However, the knowledge needed to run ClickBench effectively is duplicated across files.
"I worry that we are spreading the knowledge needed to run DataFusion on ClickBench effectively all over the place. For example, this view definition is now copied twice."
Proposed Solution
Add comprehensive documentation to the existing ClickBench section in benchmarks/README.md that serves as the single source of truth. This documentation should cover:
- EventDate UInt16 → DATE transformation - Why it's needed and how it works
- binary_as_string option - When and why it's required
- Complete setup example - Copy-pasteable SQL showing the full setup
- Clarifications - Differences between full dataset and test subsets
Problem
ClickBench setup knowledge is currently scattered across multiple locations:
HITS_VIEW_DDLconstant inbenchmarks/src/clickbench.rswith inline commentsdatafusion/sqllogictest/test_files/clickbench.sltbenchmarks/README.md(without critical setup details)This makes it difficult for users to understand:
binary_as_stringoptionBackground
Related to #19881. The fix introduces a view that transforms EventDate from UInt16 (days since epoch) to proper DATE type. However, the knowledge needed to run ClickBench effectively is duplicated across files.
"I worry that we are spreading the knowledge needed to run DataFusion on ClickBench effectively all over the place. For example, this view definition is now copied twice."
Proposed Solution
Add comprehensive documentation to the existing ClickBench section in
benchmarks/README.mdthat serves as the single source of truth. This documentation should cover: