Validate the wire address in ClientInfo::read to avoid a SIGILL by groeneai · Pull Request #108410 · ClickHouse/ClickHouse · GitHub
Skip to content

Validate the wire address in ClientInfo::read to avoid a SIGILL#108410

Merged
alexey-milovidov merged 5 commits into
ClickHouse:masterfrom
groeneai:fix-clientinfo-getservbyname-sigill
Jun 26, 2026
Merged

Validate the wire address in ClientInfo::read to avoid a SIGILL#108410
alexey-milovidov merged 5 commits into
ClickHouse:masterfrom
groeneai:fix-clientinfo-getservbyname-sigill

Conversation

@groeneai

@groeneai groeneai commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Related: #106189

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixed a server crash (Received signal 4, illegal instruction) that could be triggered by a malformed or desynchronized native TCP protocol stream. The initial_address field of ClientInfo is now validated to be a numeric host:port before it is parsed, instead of letting a non-numeric port reach the trapped getservbyname libc function.

Description

Signal 4 (SIGILL) originates in ClientInfo::read (src/Interpreters/ClientInfo.cpp), which deserializes initial_address from the native TCP wire and constructs Poco::Net::SocketAddress directly from that string. When the port part is non-numeric (or above 65535), Poco resolves it through getservbyname, a non-reentrant libc function that base/harmful replaces with __builtin_trap() in debug and sanitizer builds. That is an uncatchable hard crash, not a recoverable exception.

ClientInfo::write always serializes the address as a numeric host:port via SocketAddress::toString(), so a well-formed stream never reaches the getservbyname branch. A corrupted or desynchronized stream can feed an arbitrary string here (for example TCPHandler::processUnexpectedQuery -> skipData when a Data packet is mis-parsed under stress with query fuzzing and fault injection), turning untrusted wire input into a server crash.

The fix validates that the port substring is numeric and in range before handing the string to Poco, locating the port exactly as Poco::Net::SocketAddress does (UNIX path, [ipv6]:port, and host:port first-colon split). Malformed input now raises a catchable INCORRECT_DATA exception and never calls getservbyname; legitimate numeric addresses, including IPv6, still parse. A unit-test round-trip confirms valid addresses are preserved.

This is pre-existing on master and not specific to any PR. It was surfaced by the Stress test (amd_asan_ubsan) job, STID 1781-3414, and observed on more than one unrelated PR.

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=106189&sha=ef34fc53dcda266cca006edea8332247405ff46c&name_0=PR&name_1=Stress%20test%20%28amd_asan_ubsan%29

Version info

  • Merged into: 26.7.1.136

ClientInfo::read deserializes initial_address from the native TCP protocol
and constructs Poco::Net::SocketAddress directly from that string. When the
port part is non-numeric (or out of the 0..65535 range), Poco resolves it
via getservbyname(), a non-reentrant libc function that base/harmful replaces
with __builtin_trap() in debug and sanitizer builds. The result is a hard,
uncatchable SIGILL (Received signal 4) rather than a recoverable error.

ClientInfo::write always serializes the address as a numeric "host:port" via
SocketAddress::toString(), so valid streams never hit the getservbyname()
branch. A corrupted or desynchronized native-protocol stream can, however,
feed an arbitrary string here (for example TCPHandler::processUnexpectedQuery
-> skipData when a Data packet is mis-parsed under stress with random query
fuzzing and fault injection), turning untrusted input into a server crash.

Validate that the port substring is numeric and within range before handing
the string to Poco, locating the port exactly as Poco::Net::SocketAddress
does (UNIX path, "[ipv6]:port", and "host:port" first-colon split). Malformed
input now raises a catchable INCORRECT_DATA exception and never reaches
getservbyname(). Legitimate numeric addresses still parse, including IPv6.

Surfaced by the Stress test (amd_asan_ubsan) on an unrelated PR
(STID 1781-3414); pre-existing on master and not specific to any PR.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

@groeneai

Copy link
Copy Markdown
Contributor Author

cc @CheSema @vitlibar could you review? This hardens native-protocol deserialization in ClientInfo::read: a wire address with a non-numeric port reached Poco's getservbyname, which base/harmful traps to an uncatchable SIGILL in debug/sanitizer builds. The port is now validated as numeric before constructing the SocketAddress, so a desynced or corrupted stream raises a catchable INCORRECT_DATA instead of crashing the server.

@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Jun 24, 2026
@clickhouse-gh

clickhouse-gh Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [5f004f1]

Summary:


AI Review

Summary

This PR hardens ClientInfo::read for native-protocol initial_address parsing by avoiding Poco resolver paths, rejecting malformed retained SECONDARY_QUERY addresses, and preserving INITIAL_QUERY compatibility by accepting non-ip:port values without resolving them before Session::makeQueryContextImpl overwrites the address. I found no remaining correctness, compatibility, or spec issue in the current code; the previous inline findings are addressed and resolved.

Final Verdict

Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Jun 24, 2026
Comment thread src/Interpreters/ClientInfo.cpp Outdated
The first version of the fix only validated the port, so a non-IP host
plus a numeric port (e.g. "host:9000", or ":9000") still passed and was
handed to Poco::Net::SocketAddress(String). That constructor calls
DNS::hostByName() for a non-IP host (and gethostbyname() on the
non-getaddrinfo branch), reaching the same base/harmful-trapped resolver
family the port check was meant to avoid.

ClientInfo::write only ever emits an IP literal plus a numeric port, so
parse the host substring here too (unbracketing [ipv6]:port), require
Poco::Net::IPAddress::tryParse, and construct SocketAddress(ip, port)
directly. Malformed values like "host:9000" and ":9000" are now rejected
as INCORRECT_DATA and never reach any resolver. Extends the round-trip
unit test and adds a non-IP-host rejection test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

Follow-up commit 22be05ecc2e addresses the host-resolver gap from the review.

Pre-PR validation gate (click to expand)
# Question Answer
a Deterministic repro? Yes. Unit test ClientInfoRead.NonIpHostThrows feeds host:9000, :9000, example.com:80, localhost:9000, [notipv6]:9000 through ClientInfo::read; each is rejected on demand.
b Root cause explained? SocketAddress(String) -> init(hostAndPort) -> init(host, port): when IPAddress::tryParse(host) is false it calls DNS::hostByName(host) (and gethostbyname on the non-getaddrinfo branch), the base/harmful-trapped resolver family. Our port-only check let a non-IP host reach it.
c Fix matches root cause? Yes. We validate the host with IPAddress::tryParse and build SocketAddress(ip, port_number) directly via init(IPAddress, port) -> newIPv4/newIPv6, which never resolves.
d Test intent preserved / new tests added? Yes. Added NonIpHostThrows; extended ValidNumericAddressRoundTrips with 0.0.0.0:65535 and [2001:db8::1]:443. Existing port tests unchanged.
e Both directions demonstrated? Yes. Pre-fix binary: NonIpHostThrows FAILS with a thrown "DNS error" (resolver reached). Fixed binary: all 5 ClientInfoRead.* pass (3x).
f Fix is general across code paths? Yes. ClientInfo::read is the only untrusted-wire-string -> SocketAddress(String) call site; both resolver entry points (getservbyname port, hostByName host) are now closed.
g Fix generalizes across inputs? Yes. Covers IPv4, bracketed IPv6 (unbracketed for parse), empty host, non-IP host, and the port boundary (0, 65535, >65535). tryParse rejects empty/non-IP; init(ip,port) handles both families.
h Backward compatible? Yes. No setting/format/wire change. Only tightens validation of corrupted input that previously crashed (SIGILL) or errored (DNS); legitimate IP-literal+port values still round-trip.
i Invariants and contracts preserved? Yes. SocketAddress(IPAddress, UInt16) is the documented direct constructor; port range enforced before the UInt16 cast; host token split exactly as Poco's init(hostAndPort) does.

Session id: cron:clickhouse-worker-slot-2:20260624-220900

Comment thread src/Interpreters/ClientInfo.cpp
…mmar

The new gtest fed ReadBufferFromString a temporary String returned by
makeClientInfoPrefix(). ReadBufferFromString is non-owning (it derives from
ReadBufferFromMemory and keeps only a pointer to the source bytes), so the
temporary was destroyed at the end of the statement and ClientInfo::read then
read freed memory via readNBytes. Non-sanitizer builds happened to pass, but
asan/tsan reported a heap-use-after-free and msan a use-of-uninitialized-value.
Use ReadBufferFromOwnString, which owns its backing string, for the cases that
pass a temporary. The production ClientInfo::read path was never affected: its
string_views point into the named local initial_address_string.

Also update the initial_address row in NativeProtocol.md: the SIGILL fix
narrowed the accepted wire value to an IP literal plus a numeric port (or a
UNIX socket path), with hostnames and service names no longer resolved, so the
spec must document the accepted grammar instead of the generic host:port form.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

Follow-up fix (f62f94b): sanitizer heap-use-after-free in the new gtest + docs grammar

The new ClientInfoRead gtest failed deterministically under asan/tsan (heap-use-after-free at ReadHelpers.h:113 in readNBytes) and msan (use-of-uninitialized-value at ClientInfo.cpp:287), while the non-sanitizer build passed.

Root cause: the malformed-address cases bound a ReadBufferFromString to the temporary String returned by makeClientInfoPrefix(). ReadBufferFromString is non-owning (it derives from ReadBufferFromMemory and keeps only a pointer to the source bytes), so the temporary was destroyed at the end of the statement and ClientInfo::read then read freed memory. Non-sanitizer builds happened to read the not-yet-reused bytes and passed; the sanitizers correctly flagged it. The production ClientInfo::read path was never affected: its string_views point into the named local initial_address_string.

Fix: use ReadBufferFromOwnString (which owns its backing string) for the cases that pass a temporary. Also updated the initial_address row in NativeProtocol.md per the docs review.

Pre-PR validation gate (click to expand)
# Question Answer
a Deterministic repro? Yes. unit_tests_dbms --gtest_filter=ClientInfoRead.MalformedAddressNonNumericPortThrows under asan reports a heap-use-after-free 100% of the time on the pre-fix code.
b Root cause explained? Yes. ReadBufferFromString stores only a pointer to its argument's bytes; the test bound it to a temporary String from makeClientInfoPrefix() which is destroyed at the ;. ClientInfo::read then reads the freed buffer via readNBytes -> HUAF (asan/tsan), uninit (msan).
c Fix matches root cause? Yes. ReadBufferFromOwnString inherits String and binds the buffer to its own owned copy, so the backing bytes outlive the read. No symptom-masking.
d Test intent preserved / new tests added? Yes. The tests still assert the same INCORRECT_DATA behavior on malformed addresses; only the buffer ownership changed.
e Both directions demonstrated? Yes. On the same asan build: pre-fix code -> AddressSanitizer heap-use-after-free; fixed code -> all 5 ClientInfoRead.* pass clean (also verified under debug).
f Fix is general across code paths? Yes. All four malformed-address cases that pass a temporary use ReadBufferFromOwnString. The round-trip case binds to a named local buf that outlives the reader, so it is already safe and is left unchanged.
g Fix generalizes across inputs? N/A (test-lifetime fix, not an input-dependent code bug).
h Backward compatible? N/A (test-only + docs; no behavior, format, or setting change).
i Invariants and contracts preserved? Yes. Upholds the ReadBufferFromMemory contract that the backing memory must outlive the buffer. Production ClientInfo::read is unchanged.

Session id: cron:clickhouse-worker-slot-0:20260624-230800

Comment thread src/Interpreters/ClientInfo.cpp Outdated
A leading-'/' wire value made Poco build a UNIX_LOCAL SocketAddress, which
escaped the new IP-endpoint validation in makeSocketAddressFromWire. But
LocalSocketAddressImpl::host()/port() throw InvalidAccessException, and every
consumer of initial_address calls them unconditionally
(QueryLogElement::appendClientInfo at QueryLog.cpp, Session::authenticate in
interserver mode). So a corrupted/desynced stream carrying initial_address
like "/tmp/ch.sock" would pass ClientInfo::read and then throw downstream,
violating this PR's contract that non-generated wire values are rejected at
read with INCORRECT_DATA.

ClientInfo::write only ever serializes an IP literal plus a numeric port
(SocketAddress::toString of an IP endpoint), so the UNIX-local form is never
legitimate on the wire. Reject host_and_port.front() == '/' with INCORRECT_DATA
and drop the UNIX socket path branch from the NativeProtocol.md grammar.

Add a gtest case proving "/tmp/ch.sock", "/" and a longer socket path are
rejected with INCORRECT_DATA instead of parsed into a UNIX_LOCAL address.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

Fixup 01a0a4e: reject UNIX-local addresses in ClientInfo::read

Addresses the review comment that a UNIX_LOCAL initial_address escapes validation. makeSocketAddressFromWire now rejects a leading-/ value with INCORRECT_DATA (the leading / is Poco's sole UNIX_LOCAL trigger), the / branch is removed from the NativeProtocol.md grammar, and a UnixLocalPathThrows gtest case is added.

Pre-PR validation gate (click to expand)
# Question Answer
a Deterministic repro? Yes. ClientInfoRead.UnixLocalPathThrows (/tmp/ch.sock, /, a longer socket path) fails on demand without the guard.
b Root cause explained? A leading-/ wire value makes Poco's SocketAddress::init call newLocal() -> UNIX_LOCAL address. LocalSocketAddressImpl::host()/port() throw InvalidAccessException, but QueryLogElement::appendClientInfo (QueryLog.cpp) and Session::authenticate call host()/port() unconditionally, so the value escaped read-time validation only to throw downstream, violating the PR's reject-at-read contract.
c Fix matches root cause? Yes. Rejects host_and_port.front() == '/' with INCORRECT_DATA at read, the exact and only UNIX_LOCAL trigger in Poco.
d Test intent preserved / new tests added? New UnixLocalPathThrows test; the existing 5 ClientInfoRead.* tests are unchanged and still pass.
e Both directions demonstrated? Yes. Rebuilt with the guard reverted to accept-UNIX-local: UnixLocalPathThrows FAILS. With the guard: PASSES.
f Fix is general across code paths? Yes. makeSocketAddressFromWire is the single wire-address entry point; the three resolver/escape gaps (non-numeric port, non-IP host, UNIX-local) are now all closed there.
g Fix generalizes across inputs? Yes. The leading / is the sole UNIX_LOCAL trigger in Poco::Net::SocketAddress::init; tested with /tmp/ch.sock, bare /, and a long path.
h Backward compatible? Yes. ClientInfo::write only ever emits an IP literal plus a numeric port, and QueryLog.cpp already dereferences ->host() unconditionally, so the codebase already required an IP endpoint. No setting/format change.
i Invariants and contracts preserved? Yes. Upholds the consumer contract that initial_address is always an IP socket address (its host()/port() are callable).

Validation: all 6 ClientInfoRead.* tests pass under debug and asan/ubsan (the asan/ubsan build is the one CI flagged for the earlier dangling-buffer HUAF, which stays green here). Build IDs verified to change across edits.

Session id: cron:clickhouse-worker-slot-0:20260625-000806

Comment thread src/Interpreters/ClientInfo.cpp
…o::read

The wire-address validation added in this PR rejected any non-"ip:port"
initial_address with INCORRECT_DATA for every query kind. That moved the
native-protocol compatibility boundary: a third-party client that sent a
generic host:port (e.g. localhost:9000) for an INITIAL_QUERY was rejected,
even though the server overwrites initial_address with the real peer address
in Session::makeQueryContextImpl, so the wire value is never consumed for
initial queries.

Only a SECONDARY_QUERY keeps and uses the wire initial_address (system.query_log,
StorageSystemProcesses, interserver authenticate). Make validation strict only
for that case: makeSocketAddressFromWire becomes the non-throwing,
non-resolving tryParseIpEndpointFromWire, and read() rejects an unparseable
address only for SECONDARY_QUERY, otherwise falls back to the default endpoint.
The SIGILL guard is unchanged: the parser never calls Poco's resolver, so no
query kind can reach the trapped getservbyname()/DNS family.

Extend the gtest with an INITIAL_QUERY lenient case and a SECONDARY_QUERY valid
round-trip, and reconcile the initial_address row in NativeProtocol.md.
@groeneai

Copy link
Copy Markdown
Contributor Author

Addressed the backward-compatibility concern (commit 5f004f1): ClientInfo::read now validates the wire initial_address strictly only for SECONDARY_QUERY (the only query kind whose value is kept and consumed), and accepts it leniently for INITIAL_QUERY (where Session::makeQueryContextImpl overwrites it with the real peer address). The SIGILL guard is unchanged for every path.

Pre-PR validation gate (click to expand)
# Question Answer
a Deterministic repro? Yes. gtest ClientInfoRead.InitialQueryAcceptsNonIpAddressLeniently feeds an INITIAL_QUERY ClientInfo whose initial_address is localhost:9000 / host:http / :9000 / /tmp/ch.sock / garbage; pre-fix it throws INCORRECT_DATA.
b Root cause explained? The PR's validation rejected any non-ip:port initial_address for every query kind. But for an INITIAL_QUERY the server discards the wire value: Session::makeQueryContextImpl calls setInitialAddress(*current_address) (Session.cpp:724-728), overwriting it with the real peer address. So rejecting localhost:9000 from an initiating client moved the native-protocol compatibility boundary for a value that is never consumed.
c Fix matches root cause? Yes. makeSocketAddressFromWire becomes the non-throwing, non-resolving tryParseIpEndpointFromWire (returns optional). read() rejects an unparseable address with INCORRECT_DATA only when query_kind == SECONDARY_QUERY; otherwise it falls back to the default endpoint. Targets exactly the per-query-kind consumption asymmetry from (b).
d Test intent preserved / new tests added? Yes. All reject tests (non-numeric port, missing port, out-of-range, non-IP host, UNIX-local) now use SECONDARY_QUERY and still assert INCORRECT_DATA (the consumed path, where the SIGILL guard matters). Added InitialQueryAcceptsNonIpAddressLeniently and ValidNumericAddressRoundTripsSecondaryQuery; kept the round-trip and the ReadBufferFromOwnString HUAF fix.
e Both directions demonstrated? Yes. Temporarily forcing strict-always (pre-fix behavior): InitialQueryAcceptsNonIpAddressLeniently FAILS (localhost:9000 throws); reject tests still pass. Restored fix: all 8 ClientInfoRead.* pass x3 (debug) and x2 under ASan+UBSan, no heap-use-after-free, no UBSan errors. Build IDs verified to change across edits.
f Fix is general across code paths? Yes. Every consumer of the wire initial_address (system.query_log QueryLog.cpp:362-363, system.processes StorageSystemProcesses.cpp:110-111, interserver Session::authenticate TCPHandler.cpp:2439) runs only for SECONDARY_QUERY: the interserver hash is sent only for !INITIAL_QUERY (Connection.cpp:998) and an invalid hash is rejected before authenticate. All consumed paths are covered by the strict gate; no symmetric path is left unguarded.
g Fix generalizes across inputs? Yes. tryParseIpEndpointFromWire handles IPv4, bracketed IPv6, empty, leading-/ UNIX-local, non-numeric/out-of-range port, and non-IP host uniformly via one parse path. For SECONDARY_QUERY all malformed forms are rejected; for INITIAL_QUERY all fall back to the default endpoint. Tests cover each form for both query kinds.
h Backward compatible? Yes - this change RESTORES compatibility that the PR had narrowed. An INITIAL_QUERY client may again send a generic host:port (it is overwritten anyway), so no client breaks. Validation remains strict only where the value is actually used (SECONDARY_QUERY), which is server-to-server traffic that always emits an IP literal. No setting/format/on-disk change; no SettingsChangesHistory entry needed. Not a deliberate breaking change, so no maintainer-exception category required.
i Invariants and contracts preserved? Yes. The downstream invariant "initial_address is a real IP endpoint after read(), safe to call .host()/.port()" holds on all paths: the value is now always either a parsed ip:port or the default 0.0.0.0:0, never a UNIX_LOCAL address (whose .host()/.port() throw InvalidAccessException). The wire-format read sequence is unchanged.

Session id: cron:clickhouse-worker-slot-2:20260625-040500

@clickhouse-gh

clickhouse-gh Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 85.40% 85.30% -0.10%
Functions 92.60% 92.60% +0.00%
Branches 77.60% 77.60% +0.00%

Changed lines: Changed C/C++ lines covered by tests: 129/130 (99.23%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

@groeneai

Copy link
Copy Markdown
Contributor Author

CI finish ledger - 5f004f1

CI fully finished on this head. Every real check passed (0 FAIL/ERROR rows). The only red check is the private upstream sync, which is exempt.

Check / test Reason Owner / fixing PR
CH Inc sync - CH Inc sync (private, not actionable)

All Stateless / Integration / Unit / Stress / AST-fuzzer / Style / Fast-test checks plus Mergeable Check / Finish Workflow / Config Workflow: pass. The earlier deterministic sanitizer failure in the new ClientInfoRead gtest (heap-use-after-free / use-of-uninitialized-value, fixed in f62f94b by switching the malformed-address cases to ReadBufferFromOwnString) does not recur, and the prior-head amd_tsan, s3 storage flake is absent on this head (all amd_tsan jobs green).

Session id: cron:our-pr-ci-monitor:20260625-093000

@alexey-milovidov alexey-milovidov self-assigned this Jun 26, 2026
@alexey-milovidov alexey-milovidov merged commit a115e22 into ClickHouse:master Jun 26, 2026
166 of 167 checks passed
@robot-clickhouse robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants