Fix wrong shared library symbol resolution in stack traces#101435
Fix wrong shared library symbol resolution in stack traces#101435azat wants to merge 14 commits into
Conversation
`SymbolIndex` stored all symbols from all loaded objects (main binary + shared libs) in a single flat vector sorted by file-relative offset (`st_value`). Different shared libraries have overlapping file offset ranges, so a binary search could return a symbol from the wrong library. Fix by storing absolute virtual addresses (`info->addr + st_value`) instead of raw file offsets. This matches the approach already used on macOS (`n_value + slide`). For non-PIE builds `info->addr` is 0 for the main executable, so the change is a no-op for main binary symbols. Consistently store virtual addresses in all consumers: - `SymbolIndex`: store `info->addr + st_value` in both `collectSymbolsFromProgramHeaders` and `collectSymbolsFromELFSymbolTable` - `findSymbol`: simplify to a direct lookup (no `findObject` + subtraction) - `StackTrace`: pass `virtual_addr` instead of `physical_addr` to `findSymbol` - `StorageSystemStackTrace`: store virtual addresses in the `trace` column (remove `findObject` + subtraction) - `TraceCollector`: remove address normalization (was subtracting `address_begin`, which breaks PIE builds where `address_begin` != 0) - `addressToLine`, `TraceLog`: convert virtual address to file offset before DWARF lookups (`addr - address_begin`), since DWARF uses file-relative addresses v2: fix system.symbols.symbol_demangled Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
12c1975 to
1ded14c
Compare
|
The problem with XRAY was that the symbols has relative offsets, fixed in the new patch version |
|
Actually now we have a virtual addresses in trace_log, and after restart they cannot be resolved anymore |
|
The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix. |
|
Merged latest
Note for reviewers: this still leaves the open question you raised — with absolute virtual addresses stored in |
| #endif | ||
|
|
||
| trace.emplace_back(static_cast<UInt64>(addr) - offset); | ||
| trace.emplace_back(static_cast<UInt64>(addr)); |
There was a problem hiding this comment.
This stores ASLR-dependent virtual addresses in durable system.trace_log.trace rows, but the row does not keep the load base needed to rebase them later. After a server restart, addressToSymbol, addressToLine, and flameGraph search the current process ranges, so rows collected before the restart can no longer symbolize even when build_id matches; before this change, main-binary frames were normalized to stable file-relative offsets.
Please keep a stable persisted representation for trace (for example, object identity plus file-relative offset, or at least the previous main-binary file offsets) or store enough load-base metadata to rebase old rows before symbolization.
There was a problem hiding this comment.
This is a real trade-off worth a maintainer decision, so leaving it open for @azat / @al13n321 rather than resolving it here.
The concern is valid: storing absolute (ASLR-dependent) virtual addresses in the durable system.trace_log.trace means rows collected before a restart can no longer be symbolized after a PIE binary is reloaded at a different base, whereas the previous code normalized main-binary frames to stable file-relative offsets.
However, the previous representation was not stable/correct either: it only normalized main-binary frames and stored raw virtual addresses for shared-library frames, and — more importantly — file offsets overlap between objects, which is exactly the wrong-symbol bug this PR fixes. There is no single UInt64 representation that is both unambiguous across objects and stable across restarts.
A proper persisted fix would need object identity (e.g. build-id) plus a file-relative offset per frame, which is a larger trace_log schema change and out of scope for this PR. The live consumers (system.stack_trace, exception traces) are unaffected since they symbolize against the current process. Flagging for a decision on whether to accept the cross-restart limitation here or follow up with a schema change.
There was a problem hiding this comment.
Pushed 73a65a9163a taking the "explicitly document" path the bot offered as an alternative to a schema change, since the proper fix (object identity + file-relative offset per frame) is a trace_log schema change that is a maintainer call:
- Added an explanatory comment in
TraceCollector::runnext to where the address is stored, spelling out that we now persist the absolute (ASLR-dependent) virtual address — unambiguous across objects (file offsets overlap between the main binary and shared libraries), which is the bug this PR fixes — and the trade-off that such rows symbolize only while the collecting process is alive and may not re-symbolize after a restart of a PIE binary. - Documented the same on the
tracecolumn indocs/en/operations/system-tables/trace_log.md.
The cross-object correctness fix is the point of this PR and the live consumers (system.stack_trace, exception traces, in-process system.trace_log symbolization) are unaffected. Leaving this thread open for @azat / @al13n321 to decide whether the cross-restart limitation is acceptable here or warrants the follow-up schema change.
After switching SymbolIndex to absolute virtual addresses, a trace frame may belong to a shared library rather than the main binary. `addressToLine`, `addressToLineWithInlines` and `TraceLog`'s line resolution still hardcoded `thisObject()` (the main binary) for every DWARF lookup, so a `libc` frame would be looked up in ClickHouse DWARF and return the ClickHouse binary path or an empty result, even though `findSymbol` already resolved the correct symbol. Use `findObject` to resolve the object that actually contains the address and use that object's `elf`/`address_begin` (or `dsym`/`slide` on macOS). For main-binary frames `findObject` returns the same object as `thisObject`, so behavior is unchanged there. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Picked this up to move it forward (CI was green, branch ~719 commits behind Addressed two of the three review-bot threads (pushed
Left the Verification (locally built
|
Address review feedback on src/Common/SymbolIndex.cpp: add a focused unit test that resolves a known address from a loaded shared library (the runtime address of `getpid`, obtained via `dlsym` to avoid a main-binary PLT stub) and verifies that `SymbolIndex::findSymbol` returns the symbol that actually contains the address and starts exactly at the C library symbol reported by `dladdr`. This exercises the cross-object lookup path that the fix repairs: before storing absolute virtual addresses, the file-relative offset ranges of the main binary and shared libraries overlapped, so the lookup could return a symbol from the wrong object. The test skips on fully static builds (for example, musl) where the C library is part of the main binary. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
This is very low priority - the only shared library we use is libc, and even that we will remove soon. |
…symbol-resolution
…symbol-resolution
…symbol-resolution
…symbol-resolution
…log.trace Switching `SymbolIndex` and its consumers to absolute virtual addresses fixes wrong cross-object symbol resolution, but it also changes the durable `system.trace_log.trace` representation: it now stores ASLR-dependent runtime addresses instead of the previous main-binary file-offset normalization. Document the trade-off explicitly (as raised in the review thread): such rows symbolize correctly only while the collecting process is alive and may no longer symbolize after a restart of a PIE binary. Add an explanatory comment in `TraceCollector::run` and a note on the `trace` column in the system table docs. A representation that is both unambiguous across objects and stable across restarts would require object identity plus a file-relative offset (a `trace_log` schema change), left as a follow-up for maintainers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…description The warning that `system.trace_log.trace` stores absolute (ASLR-dependent) virtual addresses was added only to the autogenerated block of `docs/en/operations/system-tables/trace_log.md`, while `TraceLogElement::getColumnsDescription` still carried the old shorter `trace` contract. That block is regenerated from `system.columns` comments by `utils/generate-system-tables-docs`, so the warning would be overwritten on the next regeneration, and `system.columns.comment` for `trace` would keep exposing the incomplete description. Move the exact warning text into the `trace` column description in `src/Interpreters/TraceLog.cpp` so the canonical source carries it. The text matches the existing `.md` block byte-for-byte, so regenerating the docs leaves the autogenerated block unchanged. Addresses the AI review finding on `docs/en/operations/system-tables/trace_log.md:37`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
After switching `SymbolIndex` to store absolute virtual addresses instead of file-relative offsets, `system.symbols.address_begin`/`address_end` expose those absolute addresses. Update the `StorageSystemSymbols` column descriptions and the matching `docs/en/operations/system-tables/symbols.md` to say so, and spell out that for position-independent (PIE) builds these are ASLR-dependent runtime addresses (for the default non-PIE build they still match the address in the binary). Addresses the AI review finding on src/Common/SymbolIndex.h. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Now that `addressToLine`/`addressToLineWithInlines` resolve the object that contains the address via `findObject`, the argument must be an absolute virtual address from the running process (e.g. `system.trace_log.trace` or `system.stack_trace`). Note in the function documentation that the numeric addresses in the examples come from one specific build and will not match the reader's binary, and that for position-independent (PIE) builds they are ASLR-dependent and change between restarts. Addresses the AI review finding on src/Functions/addressToLine.h. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
LLVM Coverage ReportChanged lines: Changed C/C++ lines covered by tests: 95/104 (91.35%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 2 line(s) · Uncovered code |

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix wrong shared library symbol resolution in stack traces
SymbolIndexstored all symbols from all loaded objects (main binary + shared libs) in a single flat vector sorted by file-relative offset (st_value). Different shared libraries have overlapping file offset ranges, so a binary search could return a symbol from the wrong library.Fix by storing absolute virtual addresses (
info->addr + st_value) instead of raw file offsets. This matches the approach already used on macOS (n_value + slide). For non-PIE buildsinfo->addris 0 for the main executable, so the change is a no-op for main binary symbols.Consistently store virtual addresses in all consumers:
SymbolIndex: storeinfo->addr + st_valuein bothcollectSymbolsFromProgramHeadersandcollectSymbolsFromELFSymbolTablefindSymbol: simplify to a direct lookup (nofindObject+ subtraction)StackTrace: passvirtual_addrinstead ofphysical_addrtofindSymbolStorageSystemStackTrace: store virtual addresses in thetracecolumn (removefindObject+ subtraction)TraceCollector: remove address normalization (was subtractingaddress_begin, which breaks PIE builds whereaddress_begin!= 0)addressToLine,TraceLog: convert virtual address to file offset before DWARF lookups (addr - address_begin), since DWARF uses file-relative addressesNote, I've checked this and it looks valid I also manually tested it and it looks ok: