IGNITE-28805 Fix flaky ContinuousQueryMarshallerTest#testRemoteFilterFactoryServer#13265
Conversation
|
|
||
| final Ignite node2 = "client".equals(node2Name) ? startClientGrid(node2Name) : startGrid(node2Name); | ||
|
|
||
| awaitPartitionMapExchange(); |
There was a problem hiding this comment.
Test started server2 and immediately executed CQ with an initial scan query. At this moment partition map exchange could still be in progress and the initial scan sometimes did not see all existing entries.
In failed run the initial query returned only 2 entries instead of 5. Then all update events arrived, but the latch was still not fully counted down and the test failed with AssertionError: 4
The fix is to wait for pme after starting server2 and before executing the continuous query
|
LGTM — root cause and fix are both correct. A couple of notes for the record: Why only the server variant flaked: both Latch math checks out: 5 initial (even keys 0,2,4,6,8) + 10 updates (keys 10–19; Minor: the numbers in the explanation don't quite reconcile — a 2-entry initial scan leaves the latch at 3 (15−2−10), but the message was Residual: the 5s |
|
|
||
| final Ignite node2 = "client".equals(node2Name) ? startClientGrid(node2Name) : startGrid(node2Name); | ||
|
|
||
| awaitPartitionMapExchange(); |
There was a problem hiding this comment.
Optional nit: a one-line note on why the wait is here would protect it from being removed during a future "cleanup", since the call looks redundant out of context. Up to you — most of the other awaitPartitionMapExchange() calls in this package aren't commented either.
There was a problem hiding this comment.
I'd prefer not to add this comment. It mostly repeats the method name becauseawaitPartitionMapExchange() already says that we wait for PME
Also, "rebalance" is a bit too narrow here. The main point is that the initial scan should not start while the topology and partition map is still changing after server2 joins

Thank you for submitting the pull request to the Apache Ignite.
In order to streamline the review of the contribution
we ask you to ensure the following steps have been taken:
The Contribution Checklist
The description explains WHAT and WHY was made instead of HOW.
The following pattern must be used:
IGNITE-XXXX Change summarywhereXXXX- number of JIRA issue.(see the Maintainers list)
the
green visaattached to the JIRA ticket (see tabPR Checkat TC.Bot - Instance 1 or TC.Bot - Instance 2)Notes
If you need any help, please email dev@ignite.apache.org or ask anу advice on http://asf.slack.com #ignite channel.