feat: add opt-in periodic interface-change monitor by bdraco · Pull Request #1798 · python-zeroconf/python-zeroconf · GitHub
Skip to content

feat: add opt-in periodic interface-change monitor#1798

Closed
bdraco wants to merge 24 commits into
masterfrom
feat/interface-change-monitor
Closed

feat: add opt-in periodic interface-change monitor#1798
bdraco wants to merge 24 commits into
masterfrom
feat/interface-change-monitor

Conversation

@bdraco

@bdraco bdraco commented Jun 21, 2026

Copy link
Copy Markdown
Member

Summary

Adds an opt-in poller that watches for adapter changes and calls async_update_interfaces when the set of interface addresses changes, so a consumer with no native interface-change signal can self-heal without restarting Zeroconf. It is off by default; interface change detection stays platform specific and left to the consumer, this is just a convenience for callers that want it. Stacked on #1797.

Details

  • New InterfaceMonitor in _utils/interface_monitor.py polls ifaddr.get_adapters every interval seconds (default 5), diffs a hashable snapshot of adapter index plus address, and rescans only on change.
  • A rescan failure is logged and swallowed so a transient error does not kill the monitor; the next change still triggers a rescan.
  • Surfaced as async_start_interface_monitor / async_stop_interface_monitor on Zeroconf and AsyncZeroconf; the monitor task is cancelled and awaited on close.
  • Consumers that already get adapter-change events (for example Home Assistant) should keep calling async_update_interfaces directly and skip the monitor.

Test plan

  • tests/test_interface_monitor.py covers the snapshot helper, rescan on change, no rescan when unchanged, surviving a rescan error, start and stop idempotency, stop without start, and cancellation on close.
  • interface_monitor.py at 100 percent; full monitor, rescan, and engine suites green; pre-commit run clean.

@codecov

codecov Bot commented Jun 21, 2026

Copy link
Copy Markdown

@codspeed-hq

codspeed-hq Bot commented Jun 21, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 21 untouched benchmarks


Comparing feat/interface-change-monitor (ca0e061) with feat/async-update-interfaces (c2df902)

Open in CodSpeed

@bdraco bdraco force-pushed the feat/async-update-interfaces branch from cc6dd24 to f624cb2 Compare June 21, 2026 23:31
@bdraco bdraco force-pushed the feat/interface-change-monitor branch from 9465fb1 to db40f9a Compare June 21, 2026 23:32
@bdraco bdraco force-pushed the feat/async-update-interfaces branch from f624cb2 to f7b6624 Compare June 21, 2026 23:50
@bdraco bdraco force-pushed the feat/interface-change-monitor branch from db40f9a to 4699b53 Compare June 21, 2026 23:51
@bluetoothbot

Copy link
Copy Markdown
Contributor

PR Review — feat: add opt-in periodic interface-change monitor

Solid, well-scoped opt-in feature. No blocking issues; one behavior question to confirm before merge.

Strengths:

  • Clean separation: the monitor lives in its own module, is genuinely off-by-default, and the start/stop surface is mirrored correctly across Zeroconf and AsyncZeroconf.
  • Lifecycle is handled carefully — async_start is idempotent, async_stop nulls the reference before awaiting, cancellation is awaited, and _async_close tears the monitor down. except Exception in the poll loop correctly lets CancelledError (a BaseException) propagate.
  • Test coverage is thorough and hits the real edge cases: rescan-on-change, no-rescan-when-unchanged, surviving a rescan error, double-start, stop-without-start, and stop-on-close. 100% line coverage reported.

Needs attention:

  • Confirm no-arg async_update_interfaces() default — the monitor watches all host adapters but the instance may be bound to a configured subset; depending on the base PR's default this could expand interfaces unexpectedly (warning, unverified — method is in stacked PR feat: add async_update_interfaces to rescan network interfaces at runtime #1797).
  • ifaddr.get_adapters() runs synchronously on the event loop each poll (fine at 5s default).
  • test_monitor_survives_rescan_error asserts >= 2 calls off a 50ms wall-clock sleep — flake risk on loaded CI.
  • New public methods lack user-facing docs beyond docstrings.

🟡 Important

1. Snapshot watches every system adapter, not the configured interface subset
src/zeroconf/_utils/interface_monitor.py:26-28

_adapter_snapshot() diffs all adapters/addresses on the host, but a Zeroconf instance may have been constructed with a specific interfaces=[...] subset.

Two consequences worth confirming:

  • A change on an adapter the instance never bound to (a VPN coming up, an unrelated docker bridge, a link-local address churning) will still trip the diff and trigger a full async_update_interfaces() rescan — wasted work on a change the instance doesn't care about.
  • The monitor calls async_update_interfaces() with no arguments. If that no-arg form re-resolves to all interfaces (rather than reusing the originally-configured subset), the monitor would silently expand the instance beyond the user's chosen interfaces on the first unrelated adapter change. I could not verify the no-arg default here because that method comes from the stacked base PR (feat: add async_update_interfaces to rescan network interfaces at runtime #1797) and isn't in this checkout.

If the no-arg form preserves the original InterfaceChoice/subset, this is just minor redundant work (suggestion-level). If it re-detects all interfaces, it's a behavior change. Please confirm which, and consider scoping the snapshot to the addresses the instance actually uses.

return frozenset((adapter.index, str(ip.ip)) for adapter in ifaddr.get_adapters() for ip in adapter.ips)

🟢 Suggestions

1. ifaddr.get_adapters() blocks the event loop each poll
src/zeroconf/_utils/interface_monitor.py:26-27

_adapter_snapshot() runs ifaddr.get_adapters(), a synchronous syscall, and it is invoked directly on the event loop both at construction and inside _async_run every interval seconds.

At the default 5s interval this is negligible, but at the test-style sub-second intervals (or on a host with many adapters) it briefly stalls the loop. Worth a one-line acknowledgement that this is intentionally synchronous, or running it via loop.run_in_executor if it ever shows up in profiling. Not a blocker.

2. Timing-dependent assertion may flake under loaded CI
tests/test_interface_monitor.py:89-90

test_monitor_survives_rescan_error sleeps 0.05s at interval 0.001 and asserts len(calls) >= 2. The cycler yields exactly two changed snapshots (a→b, b→c) before sticking, so the test needs both iterations to complete within 50ms. On a loaded CI runner that window can compress.

The surrounding suite already learned this lesson (see the loaded-CI timing notes). Consider gating on an asyncio.Event set after the second call, or driving the snapshots through a controllable sequence rather than wall-clock sleep, so the assertion is deterministic.

await asyncio.sleep(0.05)
        await aiozc_loopback.async_stop_interface_monitor()
    assert len(calls) >= 2
3. New public API methods are undocumented outside docstrings
src/zeroconf/asyncio.py:237-251

async_start_interface_monitor / async_stop_interface_monitor are new public surface on both Zeroconf and AsyncZeroconf. The diff adds no README/docs mention of the opt-in monitor or the guidance that consumers with native adapter-change events (e.g. Home Assistant) should skip it.

The PR description captures this nuance well — moving a sentence of it into user-facing docs would keep the public API discoverable. semantic-release will handle the changelog from the feat: title, so this is just docs, not blocking.


Checklist

  • Error handling does not silently swallow unexpected errors
  • No resource leaks (task cancelled and awaited on close)
  • Public API behavior verified / documented — warning #1
  • Tests are deterministic (no timing-dependent assertions) — suggestion #3
  • No event-loop blocking on hot path — suggestion #2
  • Public API changes documented — suggestion #4

To rebase specific severity levels, mention me: @bluetoothbot rebase critical (fixes 🔴 only), @bluetoothbot rebase important (fixes 🔴 + 🟡), or just @bluetoothbot rebase for all.


Silent Failure Analysis

🟠 **HIGH** — fire-and-forget async failure
src/zeroconf/_utils/interface_monitor.py:60-61

Risk: The _adapter_snapshot() call sits outside the try/except block; if ifaddr.get_adapters() raises a platform-level exception, the exception propagates out of the while True loop and kills the asyncio task with no log message and no recovery — the monitor silently stops working while self._task holds a reference to the dead task, so the asyncio GC warning is also suppressed.

while True:
    await asyncio.sleep(self._interval)
    snapshot = _adapter_snapshot()   # <-- unprotected
    if snapshot == self._snapshot:
        continue

Fix: Wrap the _adapter_snapshot() call in the same try/except Exception block (or a nested one) and continue after logging, mirroring the treatment already given to async_update_interfaces().

🟡 **MEDIUM** — incomplete exception suppression on task join
src/zeroconf/_utils/interface_monitor.py:53-55

Risk: contextlib.suppress(asyncio.CancelledError) only suppresses CancelledError; if the monitor task already died with a different exception (e.g., an unguarded _adapter_snapshot() failure as noted above), await task re-raises that exception straight through async_stop_interface_monitor() and into _async_close(), potentially aborting the shutdown sequence.

task.cancel()
with contextlib.suppress(asyncio.CancelledError):
    await task

Fix: Broaden the suppress to contextlib.suppress(Exception) inside async_stop, or add a separate except Exception: log.exception(...) so any prior task failure is drained and logged rather than re-raised during teardown.


Automated review by Kōan (Claude) HEAD=4699b53 5 min 15s

@bluetoothbot bluetoothbot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking issues found.

  • Snapshot watches every system adapter, not the configured interface subset

@bdraco bdraco force-pushed the feat/async-update-interfaces branch from f7b6624 to 14d468f Compare June 22, 2026 01:19
@bdraco bdraco force-pushed the feat/interface-change-monitor branch from 4699b53 to 24c504e Compare June 22, 2026 01:20
@bdraco bdraco force-pushed the feat/async-update-interfaces branch from 14d468f to 9c65842 Compare June 22, 2026 01:28
@bdraco bdraco force-pushed the feat/interface-change-monitor branch from 24c504e to 5339f81 Compare June 22, 2026 01:28
@bdraco bdraco force-pushed the feat/async-update-interfaces branch from 9c65842 to 4029b73 Compare June 22, 2026 01:43
@bdraco bdraco force-pushed the feat/interface-change-monitor branch from 5339f81 to 622abc9 Compare June 22, 2026 01:43
@bdraco bdraco force-pushed the feat/async-update-interfaces branch from 4029b73 to b2f6c89 Compare June 22, 2026 02:00
@bdraco bdraco force-pushed the feat/interface-change-monitor branch from 622abc9 to f767f06 Compare June 22, 2026 02:00
@bdraco bdraco force-pushed the feat/async-update-interfaces branch from b2f6c89 to 320f800 Compare June 22, 2026 02:10
@bdraco bdraco force-pushed the feat/interface-change-monitor branch from f767f06 to 00d724e Compare June 22, 2026 02:11
@bdraco bdraco force-pushed the feat/async-update-interfaces branch from 320f800 to f97422b Compare June 22, 2026 02:21
@bdraco bdraco force-pushed the feat/interface-change-monitor branch from 00d724e to e5c7779 Compare June 22, 2026 02:21
@bdraco bdraco force-pushed the feat/async-update-interfaces branch from f97422b to 96e57f9 Compare June 22, 2026 02:26
@bdraco bdraco force-pushed the feat/interface-change-monitor branch from e5c7779 to e606f55 Compare June 22, 2026 02:27
@bdraco bdraco force-pushed the feat/async-update-interfaces branch from 96e57f9 to 74251cc Compare June 22, 2026 02:28
Instead of raising, moving a Default single-family instance to an explicit
interface set now demotes its dual-use listen/responder socket to a pure
listener and rebuilds it clean, then adds per-interface responders for the
whole desired set. Rebuilding releases the dual-use socket's existing group
memberships so the new joins do not collide (EADDRINUSE) when the desired
set overlaps the interface it served, and demoting it first prevents double
announcements.
@bdraco bdraco force-pushed the feat/interface-change-monitor branch from e406194 to 6a4feaa Compare June 22, 2026 05:40
@bdraco bdraco force-pushed the feat/interface-change-monitor branch from 6a4feaa to a5155c9 Compare June 22, 2026 05:41
bdraco added 2 commits June 22, 2026 00:49
… of recomputing

Add a _without_transport helper for the repeated 'filter wrappers whose
transport is not X' list comprehension, used by _async_remove_transport and
the dual-use demote. In the demote, pop the known listen key from current
instead of rebuilding the whole dict.
…sten socket

The rebuild re-join discarded add_multicast_member's result, so a staying
interface that could not re-join went silently send-only. Surface it with
log_warning_once, mirroring _async_add_interface.
@bdraco bdraco force-pushed the feat/interface-change-monitor branch from a5155c9 to e3e1ed0 Compare June 22, 2026 05:49
Add regression guards the diff structurally couldn't fail on: re-announce
over multiple registrations with one failing, an interface IP change handled
as remove+add in one rescan, and the IPv6 leave packing the join interface
index. Document that re-announcement is best-effort.
@bdraco bdraco force-pushed the feat/interface-change-monitor branch from e3e1ed0 to b1bfd3f Compare June 22, 2026 06:03
A bare except OSError in _listen_socket_supports masked a getsockopt failure
on any platform into 'dual-stack supported', which could suppress a needed
listen-socket rebuild and leave an added address family unreceivable. Narrow
the fallback to win32 (where the read legitimately fails) and re-raise
elsewhere, mirroring make_wrapped_transport.
@bdraco bdraco force-pushed the feat/interface-change-monitor branch from b1bfd3f to 30201c8 Compare June 22, 2026 13:57
The re-announce warning now identifies which registration failed (zipped back
to its ServiceInfo) so a partial failure is actionable. The sync
update_interfaces wrapper awaits the announce inline (to keep per-service
failure logging), so widen its timeout to cover the full announce window plus
reconcile overhead rather than leaving a thin margin under EventLoopBlocked.
@bdraco bdraco force-pushed the feat/interface-change-monitor branch from 30201c8 to 4890fd8 Compare June 22, 2026 14:12
make_wrapped_transport runs on the startup/connection path; re-raising a
failed IPV6_MULTICAST_IF read there could abort instance startup to protect
multicast_index, which only selects the interface for a benign group leave
that drop_multicast_member already tolerates. Fall back to the default index
with a debug log instead. Apply the same graceful fallback to
_listen_socket_supports (assume dual-stack) so a getsockopt failure can't
abort a rescan, and the two paths agree. Soften the _core config-commit
comment, which overstated the invariant on a partial reconcile failure.
@bdraco bdraco force-pushed the feat/interface-change-monitor branch from 4890fd8 to 05467e4 Compare June 22, 2026 14:38
…rt bring-up

A failed IPV6_V6ONLY read assumes dual-stack (returning False could loop
rebuilds when the rebuilt socket's read also fails), but if the socket really
were v6-only that skips a needed rebuild and strands an added IPv4 family, so
log it at warning rather than debug. Document that interface bring-up is
best-effort, and note the benign group leave on the rebuilt listen socket.
@bdraco bdraco force-pushed the feat/interface-change-monitor branch from 05467e4 to 221daf7 Compare June 22, 2026 15:14
… bind failure

normalize_interface_choice raises RuntimeError for an All/Default instance
that transiently resolves to zero addresses during adapter churn; catch it as
a logged no-op so a momentary down state doesn't crash a caller's
adapter-change handler. Make a per-interface endpoint-creation failure roll
back and skip (log_warning_once) rather than abort the whole reconcile, so
other interfaces still come up and get re-announced, matching the documented
best-effort contract. Log a non-Windows IPV6_MULTICAST_IF read failure at
warning (Windows WSAEINVAL stays debug) since a wrong-index leave leaks the
membership.
@bdraco bdraco force-pushed the feat/interface-change-monitor branch from 221daf7 to 963e1fe Compare June 22, 2026 15:45
Best-effort bring-up should downgrade an expected socket-level failure, not a
real bug. Roll back on any wrap failure, but re-raise anything that is not an
OSError so a TypeError/AttributeError surfaces instead of being deduped into a
one-time 'interface not added' warning.
@bdraco bdraco force-pushed the feat/interface-change-monitor branch from 963e1fe to 1cad66f Compare June 22, 2026 16:19
The Default dual-use conversion demoted the socket from senders before the
fallible rebuild, so a rebuild failure left it pulled from senders but not
replaced (instance stops responding) and propagated into the caller's handler.
The explicit demote was redundant; the rebuild removes the old transport from
senders itself, but only after the new socket succeeds. Drop the pre-demote
(keep only the diff-view pop) so a failed rebuild leaves senders intact, and
catch the rebuild failure as a logged no-op to honor the best-effort contract.
@bdraco bdraco force-pushed the feat/interface-change-monitor branch from 1cad66f to 9e14e0f Compare June 22, 2026 17:00
bdraco added 2 commits June 22, 2026 12:12
gather(return_exceptions=True) also captures BaseExceptions like
CancelledError, which the isinstance(result, Exception) check skipped, so a
cancelled re-announce vanished silently; re-raise a captured BaseException
instead of swallowing it. Document on the public update_interfaces methods
that apple_p2p on a non-Apple platform raises RuntimeError.
@bdraco bdraco force-pushed the feat/interface-change-monitor branch from 9e14e0f to ca0e061 Compare June 22, 2026 17:12
Base automatically changed from feat/async-update-interfaces to master June 22, 2026 17:45
@bdraco

bdraco commented Jun 22, 2026

Copy link
Copy Markdown
Member Author

@bdraco bdraco closed this Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants