Validate the wire address in ClientInfo::read to avoid a SIGILL#108410
Conversation
ClientInfo::read deserializes initial_address from the native TCP protocol and constructs Poco::Net::SocketAddress directly from that string. When the port part is non-numeric (or out of the 0..65535 range), Poco resolves it via getservbyname(), a non-reentrant libc function that base/harmful replaces with __builtin_trap() in debug and sanitizer builds. The result is a hard, uncatchable SIGILL (Received signal 4) rather than a recoverable error. ClientInfo::write always serializes the address as a numeric "host:port" via SocketAddress::toString(), so valid streams never hit the getservbyname() branch. A corrupted or desynchronized native-protocol stream can, however, feed an arbitrary string here (for example TCPHandler::processUnexpectedQuery -> skipData when a Data packet is mis-parsed under stress with random query fuzzing and fault injection), turning untrusted input into a server crash. Validate that the port substring is numeric and within range before handing the string to Poco, locating the port exactly as Poco::Net::SocketAddress does (UNIX path, "[ipv6]:port", and "host:port" first-colon split). Malformed input now raises a catchable INCORRECT_DATA exception and never reaches getservbyname(). Legitimate numeric addresses still parse, including IPv6. Surfaced by the Stress test (amd_asan_ubsan) on an unrelated PR (STID 1781-3414); pre-existing on master and not specific to any PR. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
cc @CheSema @vitlibar could you review? This hardens native-protocol deserialization in |
|
Workflow [PR], commit [5f004f1] Summary: ✅
AI ReviewSummaryThis PR hardens Final VerdictStatus: ✅ Approve |
The first version of the fix only validated the port, so a non-IP host plus a numeric port (e.g. "host:9000", or ":9000") still passed and was handed to Poco::Net::SocketAddress(String). That constructor calls DNS::hostByName() for a non-IP host (and gethostbyname() on the non-getaddrinfo branch), reaching the same base/harmful-trapped resolver family the port check was meant to avoid. ClientInfo::write only ever emits an IP literal plus a numeric port, so parse the host substring here too (unbracketing [ipv6]:port), require Poco::Net::IPAddress::tryParse, and construct SocketAddress(ip, port) directly. Malformed values like "host:9000" and ":9000" are now rejected as INCORRECT_DATA and never reach any resolver. Extends the round-trip unit test and adds a non-IP-host rejection test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Follow-up commit Pre-PR validation gate (click to expand)
Session id: cron:clickhouse-worker-slot-2:20260624-220900 |
…mmar The new gtest fed ReadBufferFromString a temporary String returned by makeClientInfoPrefix(). ReadBufferFromString is non-owning (it derives from ReadBufferFromMemory and keeps only a pointer to the source bytes), so the temporary was destroyed at the end of the statement and ClientInfo::read then read freed memory via readNBytes. Non-sanitizer builds happened to pass, but asan/tsan reported a heap-use-after-free and msan a use-of-uninitialized-value. Use ReadBufferFromOwnString, which owns its backing string, for the cases that pass a temporary. The production ClientInfo::read path was never affected: its string_views point into the named local initial_address_string. Also update the initial_address row in NativeProtocol.md: the SIGILL fix narrowed the accepted wire value to an IP literal plus a numeric port (or a UNIX socket path), with hostnames and service names no longer resolved, so the spec must document the accepted grammar instead of the generic host:port form. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Follow-up fix (f62f94b): sanitizer heap-use-after-free in the new gtest + docs grammarThe new Root cause: the malformed-address cases bound a Fix: use Pre-PR validation gate (click to expand)
Session id: cron:clickhouse-worker-slot-0:20260624-230800 |
A leading-'/' wire value made Poco build a UNIX_LOCAL SocketAddress, which escaped the new IP-endpoint validation in makeSocketAddressFromWire. But LocalSocketAddressImpl::host()/port() throw InvalidAccessException, and every consumer of initial_address calls them unconditionally (QueryLogElement::appendClientInfo at QueryLog.cpp, Session::authenticate in interserver mode). So a corrupted/desynced stream carrying initial_address like "/tmp/ch.sock" would pass ClientInfo::read and then throw downstream, violating this PR's contract that non-generated wire values are rejected at read with INCORRECT_DATA. ClientInfo::write only ever serializes an IP literal plus a numeric port (SocketAddress::toString of an IP endpoint), so the UNIX-local form is never legitimate on the wire. Reject host_and_port.front() == '/' with INCORRECT_DATA and drop the UNIX socket path branch from the NativeProtocol.md grammar. Add a gtest case proving "/tmp/ch.sock", "/" and a longer socket path are rejected with INCORRECT_DATA instead of parsed into a UNIX_LOCAL address. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fixup 01a0a4e: reject UNIX-local addresses in
|
| # | Question | Answer |
|---|---|---|
| a | Deterministic repro? | Yes. ClientInfoRead.UnixLocalPathThrows (/tmp/ch.sock, /, a longer socket path) fails on demand without the guard. |
| b | Root cause explained? | A leading-/ wire value makes Poco's SocketAddress::init call newLocal() -> UNIX_LOCAL address. LocalSocketAddressImpl::host()/port() throw InvalidAccessException, but QueryLogElement::appendClientInfo (QueryLog.cpp) and Session::authenticate call host()/port() unconditionally, so the value escaped read-time validation only to throw downstream, violating the PR's reject-at-read contract. |
| c | Fix matches root cause? | Yes. Rejects host_and_port.front() == '/' with INCORRECT_DATA at read, the exact and only UNIX_LOCAL trigger in Poco. |
| d | Test intent preserved / new tests added? | New UnixLocalPathThrows test; the existing 5 ClientInfoRead.* tests are unchanged and still pass. |
| e | Both directions demonstrated? | Yes. Rebuilt with the guard reverted to accept-UNIX-local: UnixLocalPathThrows FAILS. With the guard: PASSES. |
| f | Fix is general across code paths? | Yes. makeSocketAddressFromWire is the single wire-address entry point; the three resolver/escape gaps (non-numeric port, non-IP host, UNIX-local) are now all closed there. |
| g | Fix generalizes across inputs? | Yes. The leading / is the sole UNIX_LOCAL trigger in Poco::Net::SocketAddress::init; tested with /tmp/ch.sock, bare /, and a long path. |
| h | Backward compatible? | Yes. ClientInfo::write only ever emits an IP literal plus a numeric port, and QueryLog.cpp already dereferences ->host() unconditionally, so the codebase already required an IP endpoint. No setting/format change. |
| i | Invariants and contracts preserved? | Yes. Upholds the consumer contract that initial_address is always an IP socket address (its host()/port() are callable). |
Validation: all 6 ClientInfoRead.* tests pass under debug and asan/ubsan (the asan/ubsan build is the one CI flagged for the earlier dangling-buffer HUAF, which stays green here). Build IDs verified to change across edits.
Session id: cron:clickhouse-worker-slot-0:20260625-000806
…o::read The wire-address validation added in this PR rejected any non-"ip:port" initial_address with INCORRECT_DATA for every query kind. That moved the native-protocol compatibility boundary: a third-party client that sent a generic host:port (e.g. localhost:9000) for an INITIAL_QUERY was rejected, even though the server overwrites initial_address with the real peer address in Session::makeQueryContextImpl, so the wire value is never consumed for initial queries. Only a SECONDARY_QUERY keeps and uses the wire initial_address (system.query_log, StorageSystemProcesses, interserver authenticate). Make validation strict only for that case: makeSocketAddressFromWire becomes the non-throwing, non-resolving tryParseIpEndpointFromWire, and read() rejects an unparseable address only for SECONDARY_QUERY, otherwise falls back to the default endpoint. The SIGILL guard is unchanged: the parser never calls Poco's resolver, so no query kind can reach the trapped getservbyname()/DNS family. Extend the gtest with an INITIAL_QUERY lenient case and a SECONDARY_QUERY valid round-trip, and reconcile the initial_address row in NativeProtocol.md.
|
Addressed the backward-compatibility concern (commit 5f004f1): Pre-PR validation gate (click to expand)
Session id: cron:clickhouse-worker-slot-2:20260625-040500 |
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered by tests: 129/130 (99.23%) | Lost baseline coverage: none · Uncovered code |
CI finish ledger - 5f004f1CI fully finished on this head. Every real check passed (0 FAIL/ERROR rows). The only red check is the private upstream sync, which is exempt. All Stateless / Integration / Unit / Stress / AST-fuzzer / Style / Fast-test checks plus Mergeable Check / Finish Workflow / Config Workflow: pass. The earlier deterministic sanitizer failure in the new Session id: cron:our-pr-ci-monitor:20260625-093000 |

Related: #106189
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fixed a server crash (
Received signal 4, illegal instruction) that could be triggered by a malformed or desynchronized native TCP protocol stream. Theinitial_addressfield ofClientInfois now validated to be a numerichost:portbefore it is parsed, instead of letting a non-numeric port reach the trappedgetservbynamelibc function.Description
Signal 4 (SIGILL) originates in
ClientInfo::read(src/Interpreters/ClientInfo.cpp), which deserializesinitial_addressfrom the native TCP wire and constructsPoco::Net::SocketAddressdirectly from that string. When the port part is non-numeric (or above 65535), Poco resolves it throughgetservbyname, a non-reentrant libc function thatbase/harmfulreplaces with__builtin_trap()in debug and sanitizer builds. That is an uncatchable hard crash, not a recoverable exception.ClientInfo::writealways serializes the address as a numerichost:portviaSocketAddress::toString(), so a well-formed stream never reaches thegetservbynamebranch. A corrupted or desynchronized stream can feed an arbitrary string here (for exampleTCPHandler::processUnexpectedQuery -> skipDatawhen aDatapacket is mis-parsed under stress with query fuzzing and fault injection), turning untrusted wire input into a server crash.The fix validates that the port substring is numeric and in range before handing the string to Poco, locating the port exactly as
Poco::Net::SocketAddressdoes (UNIX path,[ipv6]:port, andhost:portfirst-colon split). Malformed input now raises a catchableINCORRECT_DATAexception and never callsgetservbyname; legitimate numeric addresses, including IPv6, still parse. A unit-test round-trip confirms valid addresses are preserved.This is pre-existing on master and not specific to any PR. It was surfaced by the
Stress test (amd_asan_ubsan)job, STID 1781-3414, and observed on more than one unrelated PR.CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=106189&sha=ef34fc53dcda266cca006edea8332247405ff46c&name_0=PR&name_1=Stress%20test%20%28amd_asan_ubsan%29
Version info
26.7.1.136