Fix catch-up subscription onCancelled on leader change by w1am · Pull Request #396 · kurrent-io/KurrentDB-Client-Java · GitHub
Skip to content

Fix catch-up subscription onCancelled on leader change#396

Merged
w1am merged 2 commits into
trunkfrom
w1am/dev-1817-java-client-notleader-redirect-npes-in-readresponseobserver
Jun 25, 2026
Merged

Fix catch-up subscription onCancelled on leader change#396
w1am merged 2 commits into
trunkfrom
w1am/dev-1817-java-client-notleader-redirect-npes-in-readresponseobserver

Conversation

@w1am

@w1am w1am commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Problem

When a cluster leader changes (failover or re-election), a catch-up subscription would silently stop delivering events with no onCancelled callback — the application was never told the subscription had dropped, so any reconnect logic built on onCancelled never ran and the subscription stayed dead until the process was restarted.

This hits the default configuration: nodePreference=leader is the client default, so subscriptions request the leader out of the box and are exposed on any leader change. Persistent subscriptions could hit the same silent failure.

Cause

On a leader change the server tells the client "I'm no longer the leader, reconnect here". The client tried to record that new leader but threw a NullPointerException internally, because the subscription path never set up the connection context the read path already uses. The NPE was thrown before onCancelled ran, so the error was swallowed instead of being reported.

Fix

  • Subscriptions now set up that connection context, so a leader change is handled exactly like reads already handle it: the client re-points to the new leader and onCancelled is always invoked with a NotLeaderException.
  • Hardened both the catch-up and persistent subscription error paths so a failure while handling the redirect can never again suppress onCancelled.

Closes DEV-1817

@linear-code

linear-code Bot commented Jun 25, 2026

Copy link
Copy Markdown

@qodo-code-review

Copy link
Copy Markdown

PR Summary by Qodo

Fix catch-up subscription onCancelled on leader change
🐞 Bug fix 🧪 Tests 🕐 20-40 Minutes

Grey Divider

Description

• Ensure catch-up subscriptions capture connection args for leader redirect handling.
• Harden leader-redirect error handling so onCancelled always fires.
• Add regression test covering NotLeader redirect without observer args set.
Diagram

graph TD
A["AbstractRegularSubscription"] --> B["GrpcClient.runWithArgs"] --> C["StreamsGrpc stub"] --> D["ReadResponseObserver"] --> E["SubscriptionStreamConsumer"] --> F["SubscriptionListener (app)"]
G{{"gRPC server"}} --> D
D --> H["NotLeaderException"] --> E
Loading
High-Level Assessment

The following are alternative approaches to this PR:

1. Null-object args (always non-null)
  • ➕ Eliminates null checks throughout observer code
  • ➕ Makes it harder for future call sites to forget onConnected
  • ➖ Can hide lifecycle bugs where args should be set but isn't
  • ➖ Requires defining safe semantics for reportNewLeader when not connected
2. Move leader-redirect handling to a gRPC interceptor / client layer
  • ➕ Centralizes redirect parsing and leader reporting across all call types
  • ➕ Keeps observers focused on stream consumption
  • ➖ More invasive refactor; higher risk for a targeted regression fix
  • ➖ May be harder to test without broader integration scaffolding

Recommendation: Current approach is a good minimal, low-risk fix: (1) align subscription execution with the read path by using runWithArgs and calling observer.onConnected(args), and (2) make onError robust so onCancelled is still delivered even if leader parsing/reporting fails. Consider the null-object args approach only if more call paths continue to miss onConnected in the future.

Files changed (4) +68 / -7

Bug fix (2) +12 / -6
AbstractRegularSubscription.javaInitialize observer with connection args for subscriptions +4/-3

Initialize observer with connection args for subscriptions

• Switch subscription execution to GrpcClient.runWithArgs, build the stub from args.getChannel(), and call observer.onConnected(args) before starting the stream. This ensures subscription observers have args populated so leader redirects can be handled consistently.

src/main/java/io/kurrent/dbclient/AbstractRegularSubscription.java

ReadResponseObserver.javaGuard leader-redirect handling against null args and parsing failures +8/-3

Guard leader-redirect handling against null args and parsing failures

• Wrap leader redirect handling in try/catch and null-check args before calling reportNewLeader. This prevents exceptions in onError from short-circuiting cancellation propagation to consumers.

src/main/java/io/kurrent/dbclient/ReadResponseObserver.java

Tests (2) +56 / -1
LeaderRedirectUnitTest.javaAdd regression test for leader redirect without observer args +55/-0

Add regression test for leader redirect without observer args

• Adds a unit test that constructs ReadResponseObserver without calling onConnected (args remains null) and asserts a leader-redirect onError does not throw and still triggers listener onCancelled with a NotLeaderException.

src/test/java/io/kurrent/dbclient/LeaderRedirectUnitTest.java

MiscTests.javaRegister LeaderRedirectUnitTest in MiscTests suite +1/-1

Register LeaderRedirectUnitTest in MiscTests suite

• Updates the JUnit suite selection to include the new LeaderRedirectUnitTest alongside existing misc tests.

src/test/java/io/kurrent/dbclient/MiscTests.java

@qodo-code-review

Copy link
Copy Markdown

@w1am w1am force-pushed the w1am/dev-1817-java-client-notleader-redirect-npes-in-readresponseobserver branch from f056aab to f44165f Compare June 25, 2026 07:48
@w1am w1am force-pushed the w1am/dev-1817-java-client-notleader-redirect-npes-in-readresponseobserver branch from f44165f to 0498004 Compare June 25, 2026 10:06
@w1am w1am self-assigned this Jun 25, 2026
@w1am w1am force-pushed the w1am/dev-1817-java-client-notleader-redirect-npes-in-readresponseobserver branch from 0498004 to 88cc6c8 Compare June 25, 2026 11:17
@w1am w1am force-pushed the w1am/dev-1817-java-client-notleader-redirect-npes-in-readresponseobserver branch from 88cc6c8 to 0cf8f9f Compare June 25, 2026 11:24
@w1am w1am merged commit 595b069 into trunk Jun 25, 2026
42 checks passed
@w1am w1am deleted the w1am/dev-1817-java-client-notleader-redirect-npes-in-readresponseobserver branch June 25, 2026 12:45

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@w1am 👉 Created pull request targeting release/v1.2: #397

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant