Conversation
blp
left a comment
There was a problem hiding this comment.
Can not rebalancing on the initial step make that step take much longer?
|
PS. I need to add an integration test before marking this PR ready. |
Oh, I get it now, I think: without rebalancing, there's no work to do because there's no input; with rebalancing, the entire integral up to this point ends up being processed. Is that right? |
correct! |
The balancer used to trigger a rebalancing on a restart from a checkpoint. This was a bad idea in hindsight. Not only it slowed down failover, it also caused the initial step executed before the pipeline transitioned to the RUNNING state potentially take a very long time. During this time none of the monitoring features are available, making this a very long an awkward silence. This commit disables this behavior. It also introduces a new DBSP-level API that disables automatic rebalancing altogether. The plan is to expose this API externally, allowing the user to control when rebalancings are allowed to happen, e.g., a latency-sensitive workload may disable rebalancing after initial backfill. This will be combined with an API to request a rebalancing on demand. For now this API is only used to disable rebalancing during the initial step, making sure that a regular (non-forced) rebalancing doesn't trigger accidentally at that point. This commit additionally skips the initial step when the pipeline is bootstrapping. The reason is similar to the above: bootstrapping steps can be expensive, so we don't want to perform them while the pipeline is initializing. This does mean that the pipeline won't initialize output snapshots until bootstrapping completes, but that was already the case previously. Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
Dedup a repetitive expression. Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
Improve debug output in replay tests. Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
Add a test that combines rebalancing, checkpointing, and bootstrapping. Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
This fixes a bug in adaptive joins that can manifest during bootstrapping. We did not correctly handle streams that did not participate in bootstrapping during bootstrapping. Such streams cannot change their balancing policy until bootstrapping completes; however we allowed the balancer to assign new policies to such streams. These policies were ignored leading to incorrect output and inconsistent internal state. As part of this fix, we also distinguish streams that haven't been assigned a policy yet instead of assigning a default policy (Shard) to them. Such default policies can be inconsistent during bootstrapping. This also improved debuggability as we can distinguish cases when the balancer when the balancer hasn't assigned a policy yet. Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
In replay mode, Z1Trace and AccumulateZ1Trace operators could continue producing outputs after returning `true` from `is_flush_complete`, which violates the transaction commit protocol. This commit fixes this behavior. It also addresses a related issue where strict operators (Z1) did not distinguish between `flush` being called on the input vs output half of the operator. Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
During bootstrapping, disabled operators do not participate in transactions and we shouldn't call start_transaction on such operators. This could cause a livelock in situations where some of the streams in a join cluster were inactive. Calling start_transaction on such streams set their flush_state to TransactionStarted. As a result RabalancingExchangeSender::ready_to_commit always returned false for all other streams in the cluster. With this commit, we no longer call `start_transaction` on disabled operator. Additionally, we fix the `ready_to_commit` implementation to ignore disabled operators when deciding whether to commit a transaction. Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
Make the test more complicated to trigger the bug fixed in the previous commit. Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
There was a problem hiding this comment.
yes, I often uncomment it for debugging

The balancer used to trigger a rebalancing on a restart from a checkpoint. This was a bad idea in hindsight. Not only it slowed down failover, it also caused the initial step executed before the pipeline transitioned to the RUNNING state potentially take a very long time. During this time none of the monitoring features are available, making this a potentially very long and awkward silence.
This commit disables this behavior. It also introduces a new DBSP-level API that disables automatic rebalancing altogether. The plan is to expose this API externally, allowing the user to control when rebalancings are allowed to happen, e.g., a latency-sensitive workload may disable rebalancing after initial backfill. This will be combined with an API to request a rebalancing on demand.
For now this API is only used to disable rebalancing during the initial step, making sure that a regular (non-forced) rebalancing doesn't trigger accidentally at that point.
This commit additionally skips the initial step when the pipeline is bootstrapping. The reason is similar to the above: bootstrapping steps can be expensive, so we don't want to perform them while the pipeline is initializing. This does mean that the pipeline won't initialize output snapshots until bootstrapping completes, but that was already the case previously.
The above fix revealed a bug in the implementation of adaptive joins during bootstrapping. The PR includes bug fix commits.
Describe Manual Test Plan
Checklist
Breaking Changes?
Mark if you think the answer is yes for any of these components:
Describe Incompatible Changes