A reproducible benchmark comparing ClickHouse and Elasticsearch for full-text search (FTS) workloads over log data at scale (1B, 10B, 50B rows).
Produce a fair, reproducible benchmark measuring query latency and storage efficiency for FTS workloads typical of log analytics. The benchmark focuses exclusively on capabilities available in both systems.
Included:
- Single-term queries
- AND / OR boolean queries
- FTS combined with structured predicates (service, severity, time range)
- Aggregations driven by FTS filters (faceting, GROUP BY, time histograms)
Excluded:
- Elasticsearch-only features (phrase matching, wildcards)
Synthetic OpenTelemetry logs generated by the OpenTelemetry demo application.
| Property | Value |
|---|---|
| Total rows | 50 billion |
| Time range | 2025-09-23 00:00:00 — 2025-09-23 23:59:59 (one day) |
| Format | Parquet (1B rows per file, sorted by Timestamp) |
| S3 bucket | s3://public-pme/text_bench/ (public, region: eu-west-3) |
| Files | part_000.parquet … part_049.parquet |
| Column | Type | Notes |
|---|---|---|
| Timestamp | DateTime | Primary key component |
| TraceId | String | |
| SpanId | String | |
| TraceFlags | UInt8 | |
| SeverityText | LowCardinality(String) | |
| SeverityNumber | UInt8 | |
| ServiceName | LowCardinality(String) | Primary key component |
| Body | String | Full-text indexed field |
| ResourceAttributes | Map(String, String) | |
| ScopeAttributes | Map(String, String) | |
| LogAttributes | Map(String, String) |
| Scale | Files | Rows | Time range covered |
|---|---|---|---|
| 1B | part_000 |
1,000,000,000 | 00:00:00 – 00:29:24 |
| 10B | part_000 – part_009 |
10,000,000,000 | 00:00:00 – 04:53:57 |
| 50B | part_000 – part_049 |
50,000,000,000 | 00:00:00 – 23:59:59 |
- Cloud: AWS
- Instance type:
m6i.8xlarge(32 vCPU, 128 GiB RAM) - Storage: GP3 EBS - 16K IOPS
- Region: eu-west-3
- Topology: Single node (1× ClickHouse, 1× Elasticsearch — separate instances)
For each dataset scale:
- Ingest the dataset
- Restart the service (clears in-memory state)
- Run each query 3 times, recording:
- Cold run: run 1 (after restart, cold OS page cache)
- Hot run: best of runs 2–3 (warm OS page cache)
Metrics captured:
- Index size on disk (data size + text index size)
- Query elapsed time (cold and hot) for each query
-- Q1: AND, time-bounded
SELECT Timestamp, ServiceName, SeverityText, Body FROM otel_logs
WHERE ServiceName = 'checkout'
AND hasAllTokens(Body, ['failed', 'order'])
AND Timestamp >= '2025-09-23 00:00:00' AND Timestamp < '2025-09-23 00:30:00'
ORDER BY Timestamp DESC LIMIT 100;
-- Q2: single token count
SELECT count() FROM otel_logs WHERE hasToken(Body, 'timeout');
-- Q3: OR 5 tokens, GROUP BY service
SELECT ServiceName, count() AS cnt FROM otel_logs
WHERE hasAnyTokens(Body, ['exception', 'timeout', 'failed'])
GROUP BY ServiceName ORDER BY cnt DESC;
-- Q4: date histogram, single token, service filter
SELECT toStartOfHour(Timestamp) AS hour, count() AS cnt FROM otel_logs
WHERE ServiceName = 'checkout' AND hasToken(Body, 'payment')
GROUP BY hour ORDER BY hour;
-- Q5: AND, GROUP BY service
SELECT ServiceName, count() AS cnt FROM otel_logs
WHERE hasAllTokens(Body, ['connection', 'reset'])
GROUP BY ServiceName ORDER BY cnt DESC;
-- Q6: complex boolean, time-bounded
SELECT Timestamp, ServiceName, SeverityText, Body FROM otel_logs
WHERE ServiceName = 'frontend'
AND (hasAllTokens(Body, ['connection', 'reset']) OR hasToken(Body, 'timeout'))
AND SeverityNumber >= 13
AND Timestamp >= '2025-09-23 00:00:00' AND Timestamp < '2025-09-23 00:30:00'
ORDER BY Timestamp DESC LIMIT 100;
-- Q7: OR 5 tokens count
SELECT count() FROM otel_logs
WHERE hasAnyTokens(Body, ['error', 'exception', 'failed', 'timeout', 'refused']);
-- Q8: OR 5 tokens fetch
SELECT Timestamp, ServiceName, SeverityText, Body FROM otel_logs
WHERE hasAnyTokens(Body, ['error', 'exception', 'failed', 'timeout', 'refused'])
ORDER BY Timestamp DESC LIMIT 100;
-- Q9: date histogram, AND, full corpus
SELECT toStartOfHour(Timestamp) AS hour, count() AS cnt FROM otel_logs
WHERE hasAllTokens(Body, ['connection', 'reset'])
GROUP BY hour ORDER BY hour;cd clickhouse
# Download parquet files from S3 (example: 10B = parts 0–9)
mkdir -p /data/text_bench
for i in $(seq -w 0 9); do
wget -q -O /data/text_bench/part_0${i}.parquet \
https://public-pme.s3.eu-west-3.amazonaws.com/text_bench/part_0${i}.parquet
done
# Create table and load data (10B = 10 files, 4 parallel workers)
./create_and_load.sh text_bench_10B otel_logs /data/text_bench 10 success.log error.log 4
# Run benchmark — result written to results/results_10B.json
./run_queries.sh \
--query-file queries_10.sql \
--database text_bench_10B \
--dataset-size 10000000000 \
--machine "m6i.8xlarge, 10000gib gp3" \
> results/results_10B.jsoncd elasticsearch
# Install and start Elasticsearch
./install.sh
./start.sh
# Create indices for 10B (parts 0–9)
./create_indexes.sh 0 9
# Download from S3 and ingest
./load_data.sh 0 9
# Run DSL benchmark — result written to results/m6i.8xlarge_otel_logs_10b.json
./benchmark.sh 10b
# Run ES|QL benchmark — result written to results/m6i.8xlarge_otel_logs_10b_esql.json
./benchmark_esql.sh 10bTo run all three scales:
for scale in 1b 10b 50b; do
./benchmark.sh $scale
./benchmark_esql.sh $scale
doneTo capture detailed disk usage per field (slow — ~1 min/shard):
./disk_usage.sh 50b # writes results/m6i.8xlarge_otel_logs_50b_disk_usage.jsonThe S3 bucket is public and accessible without credentials:
aws s3 cp s3://public-pme/text_bench/part_000.parquet . --no-sign-request
# or
wget https://public-pme.s3.eu-west-3.amazonaws.com/text_bench/part_000.parquetBenchmark emit a JSON object result file:
{
"system": "ClickHouse",
"version": "26.3.1",
"date": "2026-04-20",
"machine": "m6i.8xlarge, 10000gib gp3",
"dataset_size": 10000000000,
"total_size": 995602681520,
"data_size": 990689862160,
"text_index_size": 4912812010,
"result": [
[0.306, 0.069, 0.065],
...
]
}