Enable snappy compression in HTTP interface#100752
Conversation
The server already recognized `Accept-Encoding: snappy` and set the `Content-Encoding` response header, but the write path threw "Unsupported compression method" when actually trying to compress. Replace the throw in `createWriteCompressedWrapper` with creating a `SnappyWriteBuffer`. Add the necessary `WriteBuffer *` constructor overload and change the `unique_ptr` constructor to take by reference so it works with the forwarding template. Closes #44885 Reimplements #96442 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the fake fallback — require python3 snappy module and fail if it is not available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Taking `unique_ptr &` silently steals ownership from the caller. Use the same pattern as other compression buffers (Zlib, Brotli, etc.): take `unique_ptr` by value and use `std::move` at the call site. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous implementation accumulated all data in memory and only compressed at finalize — OOM risk for large HTTP responses. Rewrite to use the snappy framing format (framing_format.txt): each `nextImpl` call emits independently compressed chunks (up to 64 KB each) with masked CRC-32C checksums. This bounds memory usage to the buffer size and enables streaming (first bytes sent before query completes). The framing format is the standard wire format for streaming snappy, used by python-snappy StreamCompressor/StreamDecompressor, Go's snappy.NewWriter/NewReader, etc. It does not affect Parquet or Hadoop snappy codepaths, which use raw snappy and HadoopSnappyReadBuffer respectively. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HTTP response headers use CRLF line endings; grep output includes the \r character. Match the existing pattern in 00302_http_compression. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The error code was declared but never used after switching to `snappy::RawCompress`, which does not return a failure status. This was flagged by style CI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Test 04061 was taken by keeper_client_watch_commands on master. Renumber to 04063. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename test from 04063 to 04065 to avoid number collision with `04063_in_subquery_lowcardinality_tuple_key` merged to master - Rewrite test to verify snappy framing format stream identifier bytes instead of requiring python3 `snappy` module (not available in CI) - Remove unused raw `WriteBuffer *` constructor from `SnappyWriteBuffer` to address review concern about dangling pointer safety Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Refactor `SnappyWriteBuffer` to inherit from `WriteBufferWithOwnMemoryDecorator` instead of manually managing the underlying buffer. This fixes the arm_tidy build failure (missing `WriteBuffer *` constructor) and provides safe `cancelImpl`/`finalizeImpl` via the decorator base class. - Add `SnappyFramedReadBuffer` that reads the snappy framing format, aligning the read and write paths for `CompressionMethod::Snappy`. Previously the write path used snappy framing format but the read path used `HadoopSnappyReadBuffer` (Hadoop snappy format), creating an asymmetric contract. - Fix the test to validate the full 10-byte snappy stream identifier instead of only the first 5 bytes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix arm_tidy build failure: pass pointer (`&getOutputStream(response)`) instead of lvalue reference to `SnappyWriteBuffer` constructor in `PrometheusRequestHandler`, since `WriteBufferDecorator` only accepts `std::unique_ptr<WriteBuffer>` or `WriteBuffer *`. - Enforce the snappy framing format limit of 65536 bytes per chunk on `uncompressed_length` before allocating the decompress buffer, preventing untrusted input from forcing huge allocations. - Distinguish truly empty streams (clean EOF) from truncated stream identifiers in `readStreamIdentifier`: only return `false` on EOF before any bytes; throw `SNAPPY_UNCOMPRESS_FAILED` on partial reads. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eus format - Fix `readExact` to distinguish partial reads (throw) from clean EOF at chunk boundaries (return false). Previously truncated chunk headers or payloads were silently accepted as EOF. - Add filename to all exception messages via `getExceptionEntryWithFileName` so errors include "While reading from: <path>" (fixes test `02724_decompress_filename_exception`). - Enforce the snappy framing format 65536-byte limit on uncompressed chunks, not only compressed chunks. - Keep Prometheus remote-read using raw snappy block compression (`snappy::Compress`) instead of the framing format written by `SnappyWriteBuffer`, preserving wire compatibility with Prometheus clients. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address review: `snappy_mode = 'framed'` was still unreachable on several user-selectable snappy input APIs because they wrapped the input with `wrapReadBufferWithCompressionMethod` without passing the query setting, silently falling back to the `SnappyMode::Basic` (Hadoop block) default. This mirrors the earlier write-path fix and makes framed-snappy input behave consistently with the already-threaded `file()` / `url()` / object-storage paths. Thread the setting through: - `ClientBase::sendDataFromStdin` (`client_context`) — stdin input. - `GRPCServer` input and external-table read wrappers (`query_context`). - `getReadBufferFromASTInsertQuery` (new `snappy_mode` parameter), updating its callers in `getInputFormatFromASTInsertQuery` and `AsynchronousInsertQueue` to pass the query setting. Add `04401_snappy_stdin_compression.sh` covering the client stdin read path: a framed-snappy stream round-trips when `snappy_mode = 'framed'` and is rejected by the default `basic` reader. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Continued work on this PR. Addressed the unresolved review thread (
Added The remaining red checks are unrelated master-wide flakes outside this PR's scope (compression/HTTP/storage): |
The merge with master kept the `snappy_mode` entry in the `26.6` block, which has since been released (master opened the `26.7` development block in `Adjust settings after 26.6 release`). New settings introduced now belong in the current development version's block, so move it to `26.7`. The value is unchanged (`basic` -> `basic`), so this has no behavioral effect; it only keeps the settings-history version correct. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| {"reserve_memory", 0, 0, "New setting to reserve memory for specific workload before starting a query."}, | ||
| {"allow_delta_lake_writes", false, false, "Added an alias for setting `allow_experimental_delta_lake_writes`, which was moved to Beta."}, | ||
| {"allow_experimental_delta_lake_writes", false, false, "Delta Lake writes were moved to Beta."}, | ||
| {"snappy_mode", "basic", "basic", "New setting to control the wire format used for snappy compression in generic file/URL I/O. The default `basic` preserves backward-compatible Hadoop snappy block format reads; HTTP `Content-Encoding: snappy` always uses the framing format independently of this setting."}, |
There was a problem hiding this comment.
Can we switch the default to framed?
There was a problem hiding this comment.
I would keep the default at basic for this PR. Before this change, generic .snappy file/URL paths meant the Hadoop snappy block format, and the current SettingsChangesHistory entry preserves that compatibility. Switching the default to framed would make unchanged reads of existing Hadoop-snappy data use the framed decoder unless it is also guarded by compatibility or a more explicit auto-detection/migration rule.
Previously `snappy_mode = 'basic'` could only read the Hadoop snappy block format; any snappy write under the default mode threw `Snappy write compression is only supported in framed mode`. That made the diagnostic impossible to satisfy on every write path that does not thread the `snappy_mode` setting (e.g. `file()` writes that infer the method from the path). Add `HadoopSnappyWriteBuffer`, which emits the Hadoop snappy block format (`[uncompressed-length BE32][compressed-length BE32][snappy data]`, one subblock per block, blocks bounded to 256 KiB so they fit `HadoopSnappyReadBuffer`'s buffers). Route the `basic` branch of `createWriteCompressedWrapper` to it instead of throwing, so both modes now round-trip for reads and writes: `basic` ↔ Hadoop block format, `framed` ↔ snappy framing format. Update the `snappy_mode` setting docs accordingly, and extend the `file()` and `INTO OUTFILE` tests to cover the basic-mode write round-trip (the previous tests asserted the now-removed rejection). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`iceberg_metadata_compression_method` accepts `snappy`, but the metadata write in `writeMessageToFile` called `wrapWriteBufferWithCompressionMethod` without passing `snappy_mode`, so the setting was discarded at this API boundary. Now that `basic` mode writes the Hadoop block format the write no longer fails, but the user-selected wire format was still ignored here. Pass `settings[Setting::snappy_mode]` so Iceberg metadata snappy writes honor the setting like every other user-selectable snappy write path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The previous commit threaded `snappy_mode` through the Iceberg metadata writer in `writeMessageToFile`, but the matching read path in `getMetadataJSONObject` calls `wrapReadBufferWithCompressionMethod` without a `SnappyMode` argument, so it always decodes `.snappy.metadata.json` as the Hadoop snappy block format (`SnappyMode::Basic`). A table created with `SETTINGS iceberg_metadata_compression_method = 'snappy', snappy_mode = 'framed'` would therefore write framed metadata and then fail to read its own metadata back, because only `CompressionMethod::Snappy` is persisted/inferred from the `.snappy` suffix and the wire format is not encoded anywhere. Pin both the writer and the reader to `SnappyMode::Basic` so Iceberg metadata snappy is always the Hadoop block format. This keeps the format deterministic for round-trip reads and interoperable with other Iceberg engines, which expect the Hadoop block format for `.snappy` data. The user-facing `snappy_mode` setting still controls the generic `file`/`url`/HTTP snappy paths. Add a focused regression test that creates an Iceberg table with snappy metadata and inserts with both `snappy_mode = 'basic'` and `snappy_mode = 'framed'`, verifying the metadata round-trips in either case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…mode docs Per the review on `Settings.cpp`, write `file` and `url` rather than `file()` and `url()` in the `snappy_mode` setting description. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
The public CI is green (170/170, AI Review: ✅ Approve). The only red required check is CH Inc sync (status
I re-ran the failed jobs of the sync run (https://github.com/ClickHouse/clickhouse-private/actions/runs/28395141680). The transient ones (hung-check, Keeper connection-loss, SharedMergeTree load) should clear on retry. @groeneai, the |
|
Continued work on this PR — no code change required; it is now fully green and
One review thread is still open — the |
|
Continued work on this PR — re-merged Since the last update the PR had gone To guard against semantic breakage from the large GitHub state: The one remaining open thread is still the |
|
Continued work on this PR — no code change required. Since the 1. @groeneai, please investigate the 2. I have re-run the two failed jobs (run 28477329368). The only remaining open item is the maintainer decision on the |
Done. Root cause and fix in #109011 (separate PR, as requested). Root cause: The default of Fix: the pre-existing analysis is legitimate, so #109011 drops the over-strict assertion and keeps |
In the gRPC `ExecuteQuery` external-table path, the decompression buffer was wrapped with `snappy_mode` read from the outer `query_context` before the external table's own `external_table.settings()` were applied to `external_table_context`. As a result, a per-table `settings['snappy_mode'] = 'framed'` (alongside `compression_type = 'snappy'`) was ignored, and the body was decoded with the outer/default `basic` (Hadoop block) mode. Move the `wrapReadBufferWithCompressionMethod` call after the per-table settings are applied and read `snappy_mode` from the resulting `settings`, so the per-table wire format is honored. This matches the code's evident intent (the same `external_table_context`/`settings` are already used to build the input format). Reported by the AI review on the internal sync of #100752
|
Continued work on this PR — pushed one fix ( The public CI on the previous head ( However, the AI review on the internal sync PR ( 1. gRPC external-table Ideal follow-up coverage would be a 2. I did not change this because it is a genuine design tension rather than a one-line fix: the
The one open review thread ( |
|
The gRPC fix ( gRPC test: I can add the StorageURL sink The underlying issue is two layers sharing one knob: |
|
Continued work on this PR — re-merged
Verified the snappy code survived intact: the core wiring and new buffers ( The only red on the previous head was the known-flaky No review feedback addressed in code — the remaining items are maintainer decisions for you: the open |
|
Continued work on this PR — no code change required; head The fresh CI I mentioned above has finished and the public PR report is green: 172 checks — 152 success, 0 failures, 20 skipped, with One thing worth flagging since the re-merge: the AI Review verdict on this head now reads So the only open items remain the two maintainer decisions you flagged — the |
|
Continued work on this PR — re-merged Since the last re-merge ( Verified the snappy work survived intact across every file
All six The two open items remain your maintainer calls, unchanged: the sole unresolved review thread ( |
LLVM Coverage ReportChanged lines: Changed C/C++ lines covered: 384/456 (84.21%) · Uncovered code |

The server already recognized
Accept-Encoding: snappyand set theContent-Encodingresponse header, but the write path threw "Unsupported compression method" when actually trying to compress the response body.Replace the throw in
createWriteCompressedWrapperwith creating aSnappyWriteBuffer. Add the necessaryWriteBuffer *constructor overload and change theunique_ptrconstructor to take by reference so it works with the forwarding template.Closes #44885
Reimplements #96442
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Allow snappy compression in the HTTP interface (
Accept-Encoding: snappy) and add thesnappy_modesetting to choose the Hadoop snappy block format or the snappy framing format for genericfile/urlsnappy I/O.Documentation entry for user-facing changes
Note
Medium Risk
Touches core I/O compression wrappers and HTTP request/response handling, so regressions could impact data import/export paths across multiple storage backends. New snappy framing parser/writer adds additional parsing and checksum logic that must be correct for untrusted HTTP input.
Overview
Enables working
Accept-Encoding: snappy/Content-Encoding: snappyin the HTTP interface by implementing the standardized snappy framing format for both response compression and request-body decompression.Introduces a new
snappy_modesetting (basicvsframed) and threads it through generic compression wrapping sofile()/url()/object storage and dictionary HTTP sources can choose between legacy Hadoop snappy blocks and framed streaming; snappy writes are now supported only inframedmode.Adds new
SnappyFramedReadBufferand rewritesSnappyWriteBufferto emit framed streams with CRC32C checks, plus build/link changes to always build/linkcrc32c(including new generic/big-endian configs) and tests covering HTTP snappy framing and mode-mismatch failures.Reviewed by Cursor Bugbot for commit 8d27928. Bugbot is set up for automated code reviews on this repo. Configure here.