Add pcodec (PCO) compression codec#106222
Conversation
pcodec (`pco`) is a lossless codec specialized for fixed-width numeric sequences. This adds a native C++ reimplementation as a new ClickHouse compression codec `PCO`, registered under method byte `0x9d`. The implementation is bit-for-bit wire-format compatible with the reference pcodec v1.0.2 (format major 4 / minor 1, standalone version 3): ClickHouse can decode `.pco` streams produced elsewhere, and streams produced by this codec are read back byte-identically by the reference Rust decoder. This compatibility was used as a cross-validation oracle in both directions during development. The codec supports all fixed-width numeric ClickHouse types via their underlying integer/float width (1/2/4/8 bytes): `Int8`..`Int64`, `UInt8`..`UInt64`, `Float32`/`Float64`, and the types backed by them (`Date`, `DateTime`, `Decimal32/64`, `IPv4`, `Enum`, etc.). It implements the full decode path (Classic / IntMult / FloatMult / FloatQuant / Dict modes; None / Consecutive / Lookback delta encodings; interleaved 4-way tANS) and an auto-selecting encode path (Classic / IntMult / FloatQuant modes and consecutive delta, chosen by cheap sampling so the ratio never regresses below the Classic guarantee). The codec is gated behind `allow_experimental_codecs` for now. The library lives in a self-contained `src/Compression/Pcodec/` sub-directory; the codec entry point is `CompressionCodecPco`. The on-disk block stores a 2-byte header (element width + partial-tail byte count) followed by a raw partial-value tail (as in `Gorilla`/`FPC`) and a complete standalone `.pco` payload, which the encoder writes directly into ClickHouse's destination buffer. A stateless test (`04303_pco_codec`) round-trips every supported numeric type (including per-element verification, edge cases, and codec chaining such as `CODEC(Delta, PCO)`), and the documentation section for `PCO` is added under `docs/en/sql-reference/statements/create/table.md`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
# Conflicts: # src/Compression/CompressionFactory.cpp
The `src/Compression` style lint forbids raw `std::vector` in favor of a memory-tracked alternative. Route the codec's internal buffers through a `PcoArray<T>` facade over `DB::PODArray` (so allocations use ClickHouse's memory-tracking allocator). The facade restores the `std::vector` semantics the codec relies on that plain `PODArray` lacks (copyable; value-initializing `resize` and sized constructor), so it is a drop-in. The standalone cross-validation harnesses compile these headers without the ClickHouse runtime, so under `PCODEC_STANDALONE` the container falls back to `std::vector`; because the facade mirrors `std::vector` semantics exactly, the production (`PODArray`) and oracle (`std::vector`) builds behave identically, keeping the reference-`.pco` cross-check faithful to the production path. The two `std::unordered_map` usages in IntMult detection (cold path) keep `std::vector` semantics via the sanctioned `STYLE_CHECK_ALLOW_STD_CONTAINERS` marker. Also add the `database = currentDatabase()` condition to the `system.parts` queries in `04303_pco_codec.sql` (required by the test style check). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`system.codecs` now lists the `PCO` codec, so the existing `01222_system_codecs` test reference needs the new row (method byte 157, experimental) and the updated codec count (14 -> 15). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Addresses the review findings on the new PCO (pcodec) codec: - `LatentDecoder.h`: fix `Conv1` delta inversion to output the reconstructed batch `residuals[order..order+len)` instead of the carried-in delta state `residuals[0..len)`, which produced shifted/stale values for reference `.pco` streams using `Conv1`. - `StandaloneDecoder.h`: reject `Conv1` for 64-bit latent types (the signed accumulator widening is lossy and unsupported), bound the `Lookback` window against the chunk size before `initPage` allocates scratch, and reject `FloatQuant` `quant_k` larger than the float type precision so the join shifts can never reach undefined behavior on a malformed stream. - `BitReader.h`: bounds-check `readUint`, `readBool`, and `readAlignedBytes` against `unpadded_bit_size` so truncated/malformed metadata fails closed instead of being decoded from the padding area. The hot batch loops are bounded once per batch via `checkInBounds` plus `DECODE_BATCH_OVERSHOOT` of readable slack in the source buffer. - `Metadata.h`: bound the `Dict` length against the remaining stream size before resizing, avoiding a large allocation driven by a tiny malformed stream. - `CompressionCodecPco.cpp`: decode straight into `dest` only when the output pointer is aligned for the element type (otherwise use aligned scratch), since `ICompressionCodec::decompress` does not guarantee alignment; enlarge the padded decode buffer to `DECODE_BATCH_OVERSHOOT`. - `StandaloneEncoder.h`: split blocks larger than `MAX_ENTRIES` values into multiple chunks instead of truncating the 24-bit per-chunk count and writing an unreadable stream; update the size bounds accordingly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… test After merging master, two follow-ups: - `src/Common/TargetSpecific.h` dropped the combined `MULTITARGET_FUNCTION_X86_V4_V3` multi-target macro (it now provides only the V4 variant). Define the V4+V3+default expansion the pcodec hot decode loops rely on locally in `LatentDecoder.h`, in terms of the per-arch attribute macros the header still exposes, so the codec keeps its AVX2 (`x86-64-v3`) and AVX-512 (`x86-64-v4`) specializations. - Renumber the stateless test `04303_pco_codec` to `04306_pco_codec` (master added several `04303_*`/`04304_*` tests) and reduce its row counts so it no longer trips the `Test runs too long` limit on the MSan flaky check, while still spanning many decode batches across every supported numeric type. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The commit adapting the codec to master's `TargetSpecific.h` referenced `X86_64_V3_FUNCTION_SPECIFIC_ATTRIBUTE`, but that header only exposes a per-function attribute macro for the V4 target — there is no V3 one. On x86-64 (`ENABLE_MULTITARGET_CODE`), the `_x86_64_v3` specialization of the hot decode loops therefore failed to declare, while `dispatchReadOffsets` still called it, breaking the build with `use of undeclared identifier 'pcoReadOffsetsImpl_x86_64_v3'` (Fast test). Define `X86_64_V3_FUNCTION_SPECIFIC_ATTRIBUTE` locally in the codec header, matching the `arch=x86-64-v3` target that the header's `BEGIN_X86_64_V3_SPECIFIC_CODE` block uses, guarded by `#ifndef` so it composes if the shared header gains the macro later. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Several correctness and fail-close hardening fixes raised in review: - `updateHash` now folds in `data_bytes_size` and the pcodec number-type byte. `PCO` is type-dependent, but the hash previously covered only the `PCO(level)` AST. Compact parts group substreams by codec hash, so a `UInt32`, `Int64` and `Float64` column all using `CODEC(PCO)` could share one codec object and encode later streams with the wrong width/type. (`Int64` and `Float64` share a width but differ in type byte, so both are needed — mirrors `Gorilla`'s `data_bytes_size` hashing.) - `getMaxCompressedDataSize` is computed in 64-bit and fails closed when the bound exceeds `UInt32`. The old `UInt32` expression could wrap for wide 1-byte blocks and under-reserve the destination buffer. - `decodeStandalone` validates that each chunk's number-type width matches the codec element width. The decompress path picks the aligned/scratch output from the outer width, so a malformed block with a wider inner type (outer width 4 wrapping an `F64`/`U64` chunk) could write 8-byte stores through a 4-byte-aligned pointer. - Reject newer wrapped format minors (`4.x` with `x > 1`); we only decode through `4.1`, and a newer minor may change metadata semantics. - Reject corrupt `Lookback` distance `0` (would read the slot being written) and `Classic`/other bins whose `lower + max-offset` overflows the latent type (would wrap in the offset add instead of reporting corrupt data). - Docs: `PCO` supports `Decimal32`/`Decimal64` only (not `Decimal128`/ `Decimal256`), and it is the embedded standalone payload — not the whole compressed block — that is wire-compatible with reference `.pco`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Addressed the AI Review Request changes verdict (and the corresponding
Valid streams are unaffected: the encoder always writes a supported Also merged latest All CI reds on the previous commit are unrelated flakes:
The higher-level design questions in @rienath's review (whether the niche ratio win justifies a new codec, and the C++-port maintenance burden vs. linking the Rust crate) are left for the author's decision — they are not blocking and not actionable here. |
rienath
left a comment
There was a problem hiding this comment.
Friendly reminder: #106222 (review)
A codec-only `ALTER TABLE ... MODIFY COLUMN x CODEC(PCO)` (where the column type is not restated) leaves `command.data_type` null. `AlterCommands::validate` passed that null type straight into `validateCodecAndGetPreprocessedAST`, which then took the untyped-codec path and rejected `PCO` with "requires the column type to compress and can only be specified for a column" -- even though `AlterCommand::apply` later falls back to the existing `column.type`. As a result, a numeric column created with `CODEC(PCO)` could not get `PCO` added later via the normal codec-only `ALTER`, even with `allow_experimental_codecs = 1`. Mirror the apply-time fallback in the validate path: use the existing column type when `command.data_type` is absent. The experimental gate and the fixed-width-numeric requirement are still enforced (both surface as `BAD_ARGUMENTS`). Add `04489_pco_codec_alter_modify_column` covering the codec-only ALTER, the data round-trip after re-encoding, the codec chain, the experimental gate, and the unsupported-type rejection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Pushed
New test The sole remaining CI red — No |
`estimateCompressionRatio('<codec>')(column)` builds the requested codec
through the typed `CompressionCodecFactory::get` path, which does not
enforce the `allow_experimental_codecs` gate. This let an experimental
codec such as `PCO` be exercised to compress real data with the setting
off, bypassing the gate applied to column codecs and the untyped
MergeTree compression settings.
Resolve `allow_experimental_codecs` from the query settings at aggregate
creation time and enforce it in `getCodecOrDefault`, right where the
codec is constructed (and immediately before it is used to compress).
`CompressionCodecMultiple::isExperimental` already surfaces an inner
experimental codec, so codec chains wrapping `PCO` (e.g. `PCO, ZSTD` or
`Delta, PCO`) are rejected too. The check fails closed when settings are
unavailable.
The gate is enforced at construction rather than at parse time on
purpose: an empty input never constructs the codec, so the existing
behavior of `estimateCompressionRatio` on an empty table (it returns 0
without resolving the codec) is preserved.
New test `04490_estimate_compression_ratio_pco_experimental_gate`:
`allow_experimental_codecs = 0` rejects `PCO` (and chains wrapping it)
with `BAD_ARGUMENTS`, while non-experimental codecs and the default
codec keep working; `allow_experimental_codecs = 1` allows `PCO` on
supported numeric types.
Addresses the AI Review "Request changes" verdict on
#106222
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Pushed Fix: resolve New test The two remaining CI reds — No |
…oder The `IntMult` join in the PCO standalone decoder reconstructs a value as `primary * base + secondary`. For a sub-32-bit latent (`U16`/`I16`, whose latent type `L` is `uint16_t`) both operands are promoted to signed `int` by the usual arithmetic conversions, so a malformed stream with `primary` and `base` both large (e.g. `65535 * 65535`) overflows `int` before the cast back to `L` — undefined behavior that a sanitizer build traps on, rather than the defined modular reconstruction the format specifies. A well-formed `IntMult` stream always satisfies `primary * base + secondary` equal to the in-range value, so the product never exceeds the latent width; only a malformed stream (whose metadata lies) reaches the overflow. This is the same integer-promotion hazard already handled in `decodeConv1`, which computes its products in a wider unsigned accumulator. Do the join in an explicitly unsigned accumulator at least 32 bits wide, then cast back to `L`. Valid streams produce identical results (their product fits); only malformed input differs, now wrapping modularly instead of invoking UB. Adds a regression that decodes a hand-built malformed `IntMult` stream with `primary = base = 65535`; under a sanitizer build the previous code trapped on the signed overflow. Addresses the AI Review Block verdict on commit 8a758cf. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`decodeStandalone` stops when it reads the `MAGIC_TERMINATION_BYTE` and returns only the number of output bytes. The `CompressionCodecPco` block wrapper then only checks that the produced byte count equals the expected size, so a block body such as `[W][B][raw][valid .pco][garbage]` was accepted as a valid `0x9d` frame: the trailing bytes after the terminator were silently ignored. For externally supplied compressed frames that weakens the fail-closed contract and leaves the block body non-canonical, even though the spec says the rest of the block is the standalone `.pco` stream. `readAlignedBytes` leaves the reader byte-aligned, so `byteIdx()` is the exact number of consumed bytes; a canonical stream (the reference encoder writes the terminator last) consumes exactly `src_len`. Reject any stream that consumes fewer bytes than that, so trailing bytes fail closed. Adds a malformed-block regression that appends bytes after a valid `PCO` body and expects decompression to fail. Addresses the AI Review Block verdict on commit 8a758cf. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The new `PCO` default-codec restrictions missed the server-wide `<compression>` selector. With no table-level `default_compression_codec` or `RECOMPRESS` TTL codec, `MergeTreeData::getCompressionCodecForPart` falls back to `Context::chooseCompressionCodec`, and `CompressionCodecSelector::choose` constructs the configured method with `factory.get(family_name, level)` — a path that receives no column type and performs no `allow_experimental_codecs` check and no `requiresColumnTypeToCompress` rejection. The part writer later re-resolves the stored `CODEC(PCO)` description with each column type, so numeric columns could be written with the experimental `PCO` codec from server config, even though the SQL/untyped codec settings (`default_compression_codec`, `marks_compression_codec`, `primary_key_compression_codec`) reject the same codec at validation time. Validate each `<compression>` case at config load: reject a codec that requires the column type to compress, or is experimental, mirroring the rejection the untyped codec settings already apply. This never rejects an ordinary codec (`lz4`, `zstd`, ...); it fails closed at load rather than at the first part write. Adds a gtest covering `<method>pco</method>` (type-dependent), `<method>alp</method>` (experimental), and ordinary codecs. Addresses the AI Review Block verdict on commit 8a758cf. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Addressed the AI Review Block verdict on
Focused regressions added: No |
…on ATTACH
The untyped MergeTree compression settings (`default_compression_codec`,
`marks_compression_codec`, `primary_key_compression_codec`) are validated
against the experimental / column-type-requiring codec gate only inside
`MergeTreeSettingsImpl::sanityCheck`, which `MergeTreeData` runs for
`LoadingStrictnessLevel::CREATE` only. A user could therefore run
ATTACH TABLE t ... ENGINE = MergeTree ... SETTINGS default_compression_codec = 'PCO'
with `allow_experimental_codecs = 0`: the setting was loaded, `sanityCheck`
was skipped, and `getCompressionCodecForPart` later resolved the string, so
writes used the typed experimental `PCO` codec with the gate off.
Extract the codec-settings check into `checkCompressionCodecSettings` and run
it for `mode <= LoadingStrictnessLevel::ATTACH` (user `ATTACH`, and
`SECONDARY_CREATE` for RESTORE / DatabaseReplicated internal queries), while
`FORCE_ATTACH` (server startup) and `FORCE_RESTORE` keep skipping it so
already-loaded tables are never rejected. Add ATTACH regression cases to
`04337_pco_codec_untyped_settings`.
Addresses the AI Review "Request changes" blocker on
#106222
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Addressed the AI Review Request changes verdict on Blocker ( Test: added. Major ( |
… collides in the flaky check) The full-form `ATTACH TABLE ... UUID '<literal>' ... SETTINGS default_compression_codec = 'PCO'` cases use a fixed table UUID (required for the full-form `ATTACH` syntax under the Atomic database engine; cf. 04046, 04159). A table UUID is server-global, so when the flaky check runs the same test in several parallel workers, two concurrent `ATTACH`es with the same UUID race: one registers the UUID mapping and the other fails with `TABLE_ALREADY_EXISTS` before reaching the codec-setting gate, instead of the expected `BAD_ARGUMENTS`. The single-query failing path already releases the reserved UUID (via `TemporaryLockForUUIDDirectory`), so a sequential run is unaffected — only concurrent runs of this test collide. Mark the test `no-parallel`, matching the sibling test `04159_unique_key_ttl_rejected`, which uses a fixed `ATTACH` UUID for the same reason. Flaky-check failures (amd_tsan/amd_debug/amd_asan_ubsan): https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=106222&sha=a7d83200a42f3edd541b595943980370165fb9bf&name_0=PR&name_1=Stateless%20tests%20%28amd_tsan%2C%20flaky%20check%29 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Pushed The only real, PR-caused CI failure was that test failing the flaky check ( The remaining red — |
`getCodec` for `temporary_files_codec` (used for external sort/aggregation spill files) only rejected codecs that `requiresColumnTypeToCompress` (e.g. the experimental `PCO`). An experimental codec that does not require a column type, such as `ALP`, still passed through and was used for temporary files even with `allow_experimental_codecs = 0`. `temporary_files_codec` is an untyped compression setting, so it bypasses the per-column `allow_experimental_codecs` validation. Reject experimental codecs here as well, matching the gate already enforced for the `<compression>` config selector and the untyped MergeTree compression settings (`marks_compression_codec`, ...); experimental codecs can only be specified per column. Extends `04337_pco_codec_untyped_settings` with an `ALP` case (single codec and inside a chain). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Pushed
The other finding — |
The standalone `.pco` decoder reconstructed `IntMult` and `FloatQuant` values
without validating the mode-specific decomposition invariants. Because
`CompressedReadBufferBase` dispatches external compressed frames by method byte
alone, a checksummed `0x9d` (`PCO`) frame reaches this decoder without
`allow_experimental_codecs`, so a malformed stream could silently decode to
fabricated values instead of raising a decompression exception.
Validate the decomposition in the join, matching the encoder in
`Modes.h::splitForMode`, and throw on violation:
- `IntMult`: `base != 0`, `secondary < base`, and `primary * base + secondary`
fits the latent width (checked with the overflow builtins so the check
itself stays defined for sub-word latents).
- `FloatQuant`: `primary` occupies at most `latentBits - k` bits and the
remainder `m` fits in `k` bits.
These checks never reject a valid stream: they are the exact inverse of the
encoder's split. Extend `gtest_pcodec_encode_bounds` and
`gtest_pcodec_malformed_block` (the latter reaching the guard through
`codec->decompress`, i.e. the shared compressed-frame reader) with malformed
`IntMult`/`FloatQuant` regressions, plus well-formed baselines that still
round-trip.
Addresses the AI Review finding on `StandaloneDecoder.h`.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Pushed The bot's follow-up was correct that this is a current fail-closed boundary, not only a future raw- The join now validates the decomposition against the encoder's split (
These are the exact inverse of the encoder's split, so they never reject a valid stream. New regressions: The higher-level review points from @rienath (whether to add the codec at all, and the maintainability of the hand-ported C++ rewrite vs. linking the Rust crate) remain open for your decision. |
The `Build (arm_tidy)` check failed compiling the two gtest files added in the previous commit (`gtest_pcodec_malformed_block.cpp` and `gtest_pcodec_encode_bounds.cpp`): `clang-tidy`'s `bugprone-argument-comment`, run with `-warnings-as-errors`, rejected inline argument comments whose names did not match the callee's parameter names — `/*base=*/`/`/*primary=*/` on `buildIntMultStandaloneU32` (params `base_latent`/`primary_latent`) and `/*primary=*/`/`/*m=*/` on `buildFloatQuantStream` (params `primary_latent`/`secondary_latent`). `bugprone-argument-comment` is a `clang-tidy`-only check, not a compiler warning, so it is not caught by a plain `-Weverything -Werror` compile; it only runs in the tidy build. Rename the argument comments to match the parameter names exactly. This is a comment-only change with no behavioral effect. CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=106222&sha=a86a41f89c22f922a954f6049531af2a4ee3473d&name_0=PR&name_1=Build%20%28arm_tidy%29 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Pushed The two gtest files added in
This is why the local |
|
|
||
| return std::make_shared<CompressionCodecPco>(data_bytes_size, pco_type_byte, level); | ||
| }; | ||
| factory.registerCompressionCodecWithType("PCO", method_code, codec_builder); |
There was a problem hiding this comment.
Registering PCO here still leaves one user-reachable path outside the experimental gate. CompressedReadBufferBase::readHeaderAndGetCodecAndSize accepts external framed input by method byte alone, and both HTTP decompress=1 and compressed TCP queries go through that path with external_data = true; there is no allow_experimental_codecs check there.
Because the PCO block wrapper is self-describing (W plus the inner standalone type byte) and W = 1 / U8 can represent arbitrary bytes, a client can already use 0x9d as request-body transport compression with allow_experimental_codecs = 0. That breaks the rollout contract behind isExperimental() and exposes the new decoder on production servers by default.
Please reject experimental codecs in the external-data CompressedReadBuffer path (or otherwise keep 0x9d out of externally accepted framed codecs), so the gate actually covers all user-reachable entry points.
|
The On the new AI Review blocker (
Scope note: |
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered: 2005/2181 (91.93%) · Uncovered code |

Adds pcodec (
pco) as a new ClickHouse compression codecPCO. pcodec is a lossless codec specialized for fixed-width numeric sequences (it typically beatsGorilla/FPC/ZSTDon ratio for numeric columns at comparable speed).Rather than linking the Rust crate, this is a native C++ reimplementation, which fits ClickHouse's all-C++ codec tradition (
Gorilla/FPC/ALP/T64) and allows future ClickHouse-style runtime CPU dispatch of the hot loops.The codec is bit-for-bit wire-format compatible with the reference pcodec v1.0.2 (format major 4 / minor 1, standalone version 3). This gives
.pcointerop and a free cross-validation oracle: streams produced by this codec are decoded byte-identically by the reference Rust implementation, and vice versa. This was used in both directions during development.Supported types are all fixed-width numerics via their underlying integer/float width:
Int8..Int64,UInt8..UInt64,Float32/Float64, and the types backed by them (Date,DateTime,Decimal32/Decimal64,IPv4,Enum, ...). The full decode path is implemented (Classic/IntMult/FloatMult/FloatQuant/Dictmodes;None/Consecutive/Lookback/Conv1delta; interleaved 4-way tANS). The encode path auto-selectsClassic/IntMult/FloatQuantmodes and a consecutive delta order via cheap sampling, only switching away fromClassicwhen it reduces the estimated size. The encoder additionally guarantees no expansion: a chunk whose chosen configuration would exceed the raw data size (plus small framing) is re-encoded with a trivial single-bin configuration, sogetMaxCompressedDataSize(the per-block bufferCompressedWriteBufferreserves) is tight.The codec is gated behind
allow_experimental_codecsfor the first release. Because it needs the column type, it can only be specified per column: the untyped MergeTree compression settings (marks_compression_codec,primary_key_compression_codec,default_compression_codec) reject it (and experimental codecs in general) at validation time.A stateless test (
04306_pco_codec) round-trips every supported numeric type (per-element verification, edge cases, and codec chaining such asCODEC(Delta, PCO)). A gtest (gtest_pcodec_fixtures.cpp) additionally decodes reference-generated.pcofixtures for the decoder-only modes (FloatMult,Dict,Lookback,Conv1), which the encoder round-trip cannot reach.Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Added a new experimental compression codec
PCO(a native port of pcodec), specialized for fixed-width numeric columns. It is wire-format compatible with pcodec.pcostreams and is enabled withallow_experimental_codecs.Documentation entry for user-facing changes:
PCOcodec is documented indocs/en/sql-reference/statements/create/table.md).