IGNITE-28747 GridToStringBuilder#handleRecursion may cause NPE. Colle…#13262
IGNITE-28747 GridToStringBuilder#handleRecursion may cause NPE. Colle…#13262EgorBaranovEnjoysTyping wants to merge 10 commits into
Conversation
…ct toString as tree, avoid extra allocations and problems in recursion resolution
…ide all SBLengthLimit's parent methods (wrong logic). Added new tests.
… up in case of exception
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Refactors Ignite’s toString infrastructure to a node-based implementation with improved recursion handling and enhanced bounded string building, plus adds/extends tests for the new behavior.
Changes:
- Replaced the previous thread-local
SBLimitedLength+ recursion map approach with aGridToStringNodetree, recursion monitors, and marker recovery. - Extended
SBLimitedLength/CircularStringBuildercapabilities (insert/substring, head/tail limits) to support the new toString flow. - Added new unit tests for
SBLimitedLengthand expanded existing toString/circular buffer tests.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 12 comments.
Show a summary per file
Comments suppressed due to low confidence (1)
modules/commons/src/main/java/org/apache/ignite/internal/util/tostring/SBLimitedLength.java:1
- These
i(...)overloads bypassSBLimitedLength's custom insert routing by callingsuper.i(...), which inserts directly into the head buffer and ignores head/tail limiting. They should delegate to this class’si(int, String)(or equivalent) so inserts beyond the head limit are correctly redirected/handled.
/*
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| boolean overflowed(SBLimitedLength sb) { | ||
| return sb.impl().length() > HEAD_LEN; | ||
| return sb.length() >= getHeadLengthLimit(); | ||
| } |
| @Override public GridStringBuilder i(int off, boolean b) { | ||
| return super.i(off, String.valueOf(b)); | ||
| } |
| if (childFieldCls.isPrimitive()) | ||
| return new GridToStringValueNode(childPropName, val); | ||
| else if (childFieldCls.isArray()) | ||
| return new GridToStringArrayNode(childPropName, (Object[])val, childFieldCls); |
| nodes = new GridToStringNode[Math.min(COLLECTION_LIMIT, arr.length)]; | ||
| for (int i = 0; i < nodes.length; i++) { | ||
| final int idx = i; | ||
| nodes[idx] = getGridToStringNode(null, () -> arr[idx], () -> arr[idx].getClass()); |
| Object obj = iter.next(); | ||
| GridToStringNode node = getGridToStringNode(null, () -> obj, obj::getClass); |
| void aqcuireRecursionMonitor(NodeRecursionMonitor node) { | ||
| OBJECT_REGISTRY.get().put(obj, node); | ||
| } |
| public static final ConcurrentHashMap<Thread, IdentityHashMap<String, GridToStringNode>> CATCHED_NODES | ||
| = new ConcurrentHashMap<>(); |
| public String substring(int beginIdx, int endIdx) { | ||
| if (beginIdx < 0 || endIdx < 0 || endIdx > skipped + length()) | ||
| throw new StringIndexOutOfBoundsException( | ||
| "Some of indexes is out of bounds. Begind index = " + beginIdx + " end index = " + endIdx); | ||
| if (beginIdx > endIdx) | ||
| throw new IllegalArgumentException( | ||
| "Begin index can not be greater then end index (begin = " + beginIdx + " end = " + endIdx + ")"); |
| @Override public void setLength(int len) { | ||
| throw new UnsupportedOperationException("setLength is not supported by this imlementation"); | ||
| } |
| @Test | ||
| public void testNPE() { | ||
| boolean success; | ||
| try { | ||
| SBLimitedLength sbLimitedLength = new SBLimitedLength(256); | ||
| sbLimitedLength.initLimit(new SBLengthLimit()); | ||
| sbLimitedLength.a("a".repeat(7999)); | ||
| sbLimitedLength.i(7000, "asd"); | ||
| sbLimitedLength.a("a".repeat(10)); | ||
| success = true; | ||
| } catch (NullPointerException ignored) { | ||
| success = false; | ||
| } | ||
| Assert.assertTrue(success); | ||
| } |
anton-vinogradov
left a comment
There was a problem hiding this comment.
Thanks for digging into this — the NPE is real and the diagnosis is correct. My concern is that a small, local defect is being fixed with a full rewrite of the toString engine, and the rewrite introduces problems that are worse than the original bug.
Root cause (small and local). SBLimitedLength overrides every a(...) (append) but none of the i(...) (insert) methods. handleRecursion() calls buf.i(pos, hash), which writes straight into impl() and bypasses onWrite() — the only place that allocates tail. The insert pushes the head past HEAD_LEN, so overflowed() flips to true while tail is still null, and the next a(savedName) takes the overflow branch -> tail.append(...) -> NPE. The RecursivePayload test reproduces exactly this.
The minimal fix is already in this PR. The i(...) overrides + lazy-tail in SBLimitedLength are the correct fix and are enough to close the ticket on their own (~2 files + the new test). Everything else — the GridToStringNode tree, NodeRecursionMonitor, the marker/recover mechanism, gutting GridToStringBuilder by ~685 lines — is an unrelated refactor.
Blocking concerns with the refactor (details inline):
- Marker/recover breaks the
toString()contract. A nestedS.toString()returns a marker string (new String("GridToStringNode")) instead of its value, recovered later by String identity. Any wrapping —"Wrapper{" + S.toString(...) + "}", very common in our code — defeats the identity lookup and leaks the literalGridToStringNodeinto the log.recoverObject's own javadoc concedes the design can't guarantee correctness. - Thread-keyed static map leaks.
static ConcurrentHashMap<Thread, ...> CATCHED_NODESpinsThreadkeys (and captured nodes) wheneverclear()is skipped on any path. The originalThreadLocalwas leak-free by construction; this replaces it with a slower, leak-prone map of identical semantics. toString()can now throw.markNode()ends withidentities().orElseThrow(); on a path whereinit()didn't run, an innertoString()throws — the exact failure mode the original blanket try/catch existed to prevent.
Also: the tree is built in full (reflection + recursion + each field's toString()) and only length-limited at render time, which defeats the whole point of SBLimitedLength (cap the cost of toString on large graphs). The PR claims fewer allocations but adds per-node objects, Optional/lambda chains, a per-call new SBLimitedLength, and a recoverObject lookup on every append — with no benchmark.
Proposal: reduce this PR to the SBLimitedLength insert overrides + lazy tail + the RecursivePayload test. If we genuinely want to cut toString allocations, let's do it as a separate ticket with JMH before/after and an explicit story for large graphs and wrapped toString().
| List<GridToStringNode> addNodes = | ||
| GridToStringNodeFactory.getNodes(addNames, addVals, addSens, addLen); | ||
| GridToStringNode node = GridToStringNode.getRootNode(obj, cls, addNodes); | ||
| return isNew ? node.toString() : GridToStringNode.markNode(node); |
There was a problem hiding this comment.
Core of the marker mechanism: every non-root S.toString() now returns a placeholder marker instead of its value, stitched back later by String identity. This only holds if the caller's toString() returns exactly that instance untouched. As soon as a class does "Foo{" + S.toString(Foo.class, this) + "}" (common across the codebase), the marker is embedded in a larger string, identity recovery fails, and the literal GridToStringNode leaks into the log. This silently corrupts output and is the main reason I'd not take the rewrite.
| * @return The unique marker string. | ||
| */ | ||
| static String markNode(GridToStringNode node) { | ||
| String result = new String(GridToStringNode.class.getSimpleName()); |
There was a problem hiding this comment.
Two problems on this method:
- Using
new String(...)identity as a map key is extremely fragile (see the wrapped-toStringcase). orElseThrow()makesmarkNodethrow if the thread state wasn't initialized (a nested call on a path whereinit()didn't run).toString()must never throw — the original guarded every path with try/catch; this reintroduces throwing fromtoString.
| * A thread-local cache for nodes, used to handle references of | ||
| * inner toString() calls by mapping temporary markers to actual nodes. | ||
| */ | ||
| static final ConcurrentHashMap<Thread, IdentityHashMap<String, GridToStringNode>> CATCHED_NODES |
There was a problem hiding this comment.
static ConcurrentHashMap<Thread, ...> keyed by Thread is a classic leak: in a pooled-thread server the keys never die, and any path that skips clear() (exception in a user toString, reentrancy) accumulates entries that pin the Thread and all captured nodes/objects. The value is a per-thread map touched only by its owner, so the ConcurrentHashMap buys nothing but overhead. This should be a ThreadLocal — which is what the original used and why it didn't leak. (nit: CATCHED -> CACHED.)
| * we should replace the substring | ||
| * @param obj Object. | ||
| */ | ||
| public static <T> T recoverObject(T obj) { |
There was a problem hiding this comment.
recoverObject is a band-aid for the marker design, and it only handles the case where the entire returned string is the marker — not the prefix + marker + suffix concatenation, which is exactly the dangerous case. The javadoc ('if someone decided to concat outside of default S.toString') is effectively an admission the mechanism can't be made correct. It's also now invoked on every a(...)/i(...) in SBLimitedLength, adding an identity-map lookup to every append on the render path.
| else if (val instanceof Map) | ||
| return new GridToStringMapNode(childPropName, (Map<?, ?>)val); | ||
|
|
||
| String toStrResult = val.toString(); |
There was a problem hiding this comment.
This is where wrapped/custom toString() breaks: if val.toString() returns anything other than the exact marker instance, recoverOrCreate can't match it and the marker text is treated as a literal value. Net effect for a class that wraps S.toString(): the inner object renders as GridToStringNode.
| Object[] addVals, | ||
| boolean[] addSens, | ||
| int addLen) { | ||
| List<GridToStringNode> result = new LinkedList<>(); |
There was a problem hiding this comment.
nit: ArrayList is a better default here (indexed iteration, less per-node overhead). LinkedList + Collections.unmodifiableList per object adds up on the hot path.
Suggested fix:
| List<GridToStringNode> result = new LinkedList<>(); | |
| List<GridToStringNode> result = new ArrayList<>(); |
Note: also add import java.util.ArrayList; — applying this single-line suggestion won't add the import.
| * Registers the current node in the thread-local registry for the given object. | ||
| * @param node The node to register. | ||
| */ | ||
| void aqcuireRecursionMonitor(NodeRecursionMonitor node) { |
There was a problem hiding this comment.
nit: typo aqcuireRecursionMonitor -> acquireRecursionMonitor.
Suggested fix (typo):
| void aqcuireRecursionMonitor(NodeRecursionMonitor node) { | |
| void acquireRecursionMonitor(NodeRecursionMonitor node) { |
Note: also rename the call site in GridToStringObjectNode — a single-line apply won't update it.
|
|
||
| /** {@inheritDoc} */ | ||
| @Override public void setLength(int len) { | ||
| throw new UnsupportedOperationException("setLength is not supported by this imlementation"); |
There was a problem hiding this comment.
nit: typo imlementation -> implementation.
Suggested fix (typo):
| throw new UnsupportedOperationException("setLength is not supported by this imlementation"); | |
| throw new UnsupportedOperationException("setLength is not supported by this implementation"); |
| public String substring(int beginIdx, int endIdx) { | ||
| if (beginIdx < 0 || endIdx < 0 || endIdx > skipped + length()) | ||
| throw new StringIndexOutOfBoundsException( | ||
| "Some of indexes is out of bounds. Begind index = " + beginIdx + " end index = " + endIdx); |
There was a problem hiding this comment.
nit: typo Begind index -> Begin index.
Suggested fix (typo):
| "Some of indexes is out of bounds. Begind index = " + beginIdx + " end index = " + endIdx); | |
| "Some of indexes is out of bounds. Begin index = " + beginIdx + " end index = " + endIdx); |
| } | ||
|
|
||
| /** {@inheritDoc} */ | ||
| @Override public GridStringBuilder i(int offset, String str) { |
There was a problem hiding this comment.
This is the correct fix for the NPE. My suggestion is to ship this (the insert overrides + lazy tail) plus the RecursivePayload test as the entire patch, drop recoverObject from it, and split the node-tree work into a separate ticket/RFC.

Collect toString as tree, avoid extra allocations and problems in recursion resolution
Problems solved:
Any recursive call in toString method would be handled
Allocate less memory
Memory wouldn't be kept for a whole thread life