chore: CI/Windows hygiene — Node 24 action bumps + write-file-atomic EPERM retry#350
Conversation
Review verdict: APPROVE (with one minor observability suggestion)Mergeable: ✅ Both changes (Node-24 action bumps + write-file-atomic EPERM retry wrapper) land cleanly and the actual Windows EPERM flake reproduced live during the smoke run on this Windows box — the retry wrapper absorbed it and the pre-existing Per-task results
Live evidence of the fix workingDuring That's the dev's BEFORE/AFTER claim landing in a third-party environment — exactly what the brief flagged as a praise candidate. Observability gap (minor, non-blocking)The brief asked for "debug log: fires on every retry" and the PR body itself claims "a new console.debug line fires on every retry so production flakes are observable", but the wrapper actually emits a single summary log on success after retries and nothing at all on budget exhaustion. The tests pin this exact behaviour:
Information-content-wise the success path is fine (one line per recovered call with if (!retryable || attempt >= MAX_ATTEMPTS - 1) {
if (retryable) {
console.debug(`[runtime-file] giving-up path=${p} attempts=${attempt + 1} code=${code}`);
}
throw err;
}Closes the observability hole without changing the retry contract or test count. Not blocking — happy to merge as-is and chase this in a follow-up. Praise
— review automated; verdict APPROVE. |
LangSensei
left a comment
There was a problem hiding this comment.
Verdict: APPROVE (submitted as a COMMENT review because GitHub forbids self-approval on PRs authored by the reviewing actor; the full APPROVE per-task breakdown is in the issue comment posted alongside this review.)
8 of 8 tasks pass on this Windows smoke run, including live reproduction of the EPERM flake ([runtime-file] retried-write … attempts=2 fired during runtime-file.test.ts > "write is atomic", and the test passed deterministically).
One minor non-blocking observability suggestion left inline — the budget-exhaustion path is silent, which diverges from the brief's wording ("fires on every retry") and from the PR body's own claim. Happy to merge as-is and chase in a follow-up.
Praise:
- Genuine BEFORE/AFTER — the flake reproduced live on a third-party Windows box and the wrapper absorbed it.
- Deterministic mocked retry tests (no real fs races).
- Clean file-tree disjointness between the two bundled changes.
- Decision notes in PR body justify v5-not-v6, dedup, and log-tag choices up front.
| } catch (err) { | ||
| const code = (err as NodeJS.ErrnoException).code; | ||
| const retryable = code === "EPERM" || code === "EBUSY" || code === "EACCES"; | ||
| if (!retryable || attempt >= MAX_ATTEMPTS - 1) throw err; |
There was a problem hiding this comment.
Minor (non-blocking) — observability gap on budget-exhaustion path.
The brief asked for "debug log fires on every retry" and the PR description itself says "a new console.debug line fires on every retry so production flakes are observable." But this wrapper only emits a single summary log on success after retries — the exhaustion path is completely silent, as the new test asserts at runtime-file-retry.test.ts:97-98 (expect(debugSpy).not.toHaveBeenCalled() when persistent EPERM blows the budget).
Information-content-wise the success path is fine, but exhaustion is the case operators most need to see. Suggested fix:
if (!retryable || attempt >= MAX_ATTEMPTS - 1) {
if (retryable) {
console.debug(`[runtime-file] giving-up path=${p} attempts=${attempt + 1} code=${code}`);
}
throw err;
}Same change in packages/runtime/src/copilot/trust.ts for symmetry. Happy to merge as-is and chase in a follow-up.
| } catch (err) { | ||
| const code = (err as NodeJS.ErrnoException).code; | ||
| const retryable = code === "EPERM" || code === "EBUSY" || code === "EACCES"; | ||
| if (!retryable || attempt >= MAX_ATTEMPTS - 1) throw err; |
There was a problem hiding this comment.
Minor (non-blocking) — same observability gap as in packages/cli/src/runtime-file.ts:78: the budget-exhaustion branch is silent. trust-retry.test.ts:105-106 pins that behaviour (expect(debugSpy).not.toHaveBeenCalled() on 8-attempt persistent EPERM). Operators only see the resulting TrustRegistrationFailed, never the retry trail that preceded it. See the matching comment on runtime-file.ts:78 for a suggested one-liner.
| return err; | ||
| }; | ||
|
|
||
| describe("ensureDirTrusted retry behaviour", () => { |
There was a problem hiding this comment.
Nit (non-blocking) — symmetry with runtime-file-retry.test.ts. The cli copy has an explicit "does not log on a first-try success (zero-overhead happy path)" assertion (runtime-file-retry.test.ts:101); this file doesn't. The happy path is covered implicitly by the unchanged trust.test.ts suite, so it isn't a gap — just an asymmetry that's cheap to close if you do another pass over this file.
…EPERM retry Bundles two small CI/Windows hygiene items with zero file-tree intersection (workflow YAML vs runtime/cli write paths) so reviewers see one focused PR instead of two. GitHub Actions Node 24 deprecation cuts over on 2026-06-16. Bump the three named actions to the minimum Node-24-compatible major (v5): - actions/checkout@v4 -> @v5 (Node 24 runtime; no input changes) - actions/setup-node@v4 -> @v5 (Node 24 runtime; auto-cache changes are gated on omitting `cache:`, so our explicit `cache: 'pnpm'` preserves behaviour) - pnpm/action-setup@v4 -> @v5 (Node 24 runtime; reads `packageManager` from package.json same as v4) Add a bounded-retry wrapper around `write-file-atomic` at the three known wrap sites (cli runtime-file plus the two trust.ts calls) to fix the Windows EPERM/EBUSY flake where `MoveFileEx` loses to AV scans or concurrent renames. Budget is 8 attempts with exponential backoff capped at 500 ms; only EPERM/EBUSY/EACCES are retried, all other errors propagate immediately. A new `console.debug` line fires on every retry so production flakes are observable. Closes #269. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
000db2e to
f212424
Compare
|
Amended in Addresses the observability-gap minor: Tests in Verification: |
Re-review — observability amend (focused) — APPROVEAmended head:
Per-task verdictObservability gapClosed. A persistent retryable failure now leaves exactly one breadcrumb ( LGTM. |

What
Two small CI/Windows hygiene items bundled into one PR.
actions/checkout,actions/setup-node, andpnpm/action-setupbumped from@v4to@v5across all three workflow files, ahead of the 2026-06-16 Node-20 deprecation/forced-upgrade cutover.writeFileAtomicEPERM retry. Closes write-file-atomic: bounded retry on Windows EPERM/EBUSY (fix concurrent-write flakes) #269. Add a bounded-retry wrapper aroundwrite-file-atomicat the three known wrap sites (packages/cli/src/runtime-file.tsonce +packages/runtime/src/copilot/trust.tstwice) to fix Windows EPERM/EBUSY flakes where the singleMoveFileExcall inside the library loses to Defender scans or concurrent renames.Why bundle the two
Both are small CI/Windows hygiene items with zero file-tree intersection (workflow YAML vs runtime/cli write paths). Splitting would double reviewer load with no isolation benefit.
Why these specific versions for Change A
actions/checkout@v4→@v5actions/setup-node@v4→@v5packageManageris present; we already passcache: 'pnpm'explicitly so behaviour is unchanged.pnpm/action-setup@v4→@v5versionkeeps reading from rootpackage.json'spackageManager: pnpm@10.33.4.Decision: v5, not v6. All three actions still receive maintenance on v5.x (e.g.
actions/checkout@v5.0.1, 2025-11-17). v6 is not unmaintained EOL territory for v5; per pilot brief default, pick the minimum Node-24-compatible major to minimise churn and blast radius.Out of scope per brief:
actions/configure-pages@v5(already on v5),actions/upload-pages-artifact@v3,actions/deploy-pages@v4, andlycheeverse/lychee-action@v2are NOT in the three named actions and were not touched. The two still on v3/v4 (upload-pages-artifact,deploy-pages) are candidates for a follow-up but are not part of this PR.Why this specific shape for Change B
write-file-atomic@5.0.1does a singlefs.renamewith zero retries; on Windows that maps toMoveFileEx(MOVEFILE_REPLACE_EXISTING), which returnsEPERM/EBUSYwhile Defender scans the temp file or a sibling rename is in flight. Adds a 15-line wrapper:runtime-file.test.ts > "write is atomic"test that already assumed this budget).2^attempt + jitter, capped at 500 ms. Worst-case total under 1 second.EPERM,EBUSY,EACCES. Everything else (ENOSPC,EROFS, …) throws immediately — never paper over a real disk problem.console.debug(...)fires on success-after-retry with the path and final attempt count. We do not pull in a logging dep just for this;@emploke/cliand@emploke/runtimealready use plainconsole.*(the structured pino logger lives in@emploke/server).runtime-file.tsandtrust.tsrather than shared. There is no clean shared location —@emploke/cli(T_top) and@emploke/runtime(T0) don't depend on each other and adding a cross-tier dep for ~15 lines is overkill. Acknowledged trade-off; rolled with "duplicate" per brief's fallback ("verify there is a sensible shared location, otherwise duplicate").Kept the pre-existing concurrent-write tests
Per brief default: kept, not deleted.
packages/cli/test/runtime-file.test.ts > "write is atomic"packages/runtime/test/copilot/trust.test.ts > "serialises concurrent calls"(+ the lost-update sibling)They still document the on-POSIX behaviour end-to-end and (on Windows) are now actually deterministic because the retry wrapper absorbs the very EPERM that used to flake them — verified locally on Windows: before this PR
runtime-file.test.ts > "write is atomic"fails with EPERM ~50% of the time; after, it passes consistently. The new mocked-rename tests pin the retry decision tree deterministically on every OS.Changes
.github/workflows/ci.ymlactions/checkout, 2×actions/setup-node, 2×pnpm/action-setupto@v5.github/workflows/pages.ymlactions/checkoutto@v5.github/workflows/release.ymlactions/checkout, 1×actions/setup-node, 1×pnpm/action-setupto@v5packages/cli/src/runtime-file.tswriteFileAtomicWithRetry; routewriteRuntimeFilethrough itpackages/runtime/src/copilot/trust.tswriteFileAtomicWithRetry; route both wrap sites through it (touch + final config write)packages/cli/test/runtime-file-retry.test.tspackages/runtime/test/copilot/trust-retry.test.tsTrustRegistrationFailedpropagationHow to Test
On Windows specifically,
pnpm --filter @emploke/cli testshould pass consistently — prior to this PR it flaked ~50% onruntime-file.test.ts > "write is atomic"withEPERM: operation not permitted, rename.Cross-cutting compliance
@v4references for the three named actions removed.node-version: '22'left unchanged everywhere (the bump is the action's own runtime, not the toolchain Node it sets up).writeFileAtomicWithRetryused at all three wrap sites.git diff origin/main | grep -E '^\+' | grep -Ei '(#\d{2,4}\b|see issue|per the brief|per the spec|TODO\(.*#)'returns zero hits inside the code diff.setTimeoutis fromnode:timers/promises, not the global.write-file-atomicversion bump; no cross-process locking added; no new logging dependency.Note on the Node 24 action deprecation
There is no GitHub issue for the Node 24 action bump (it is calendared by GitHub's deprecation schedule, not tracked in this repo's issue tracker). PR body therefore does NOT close any issue for Change A; only
Closes #269for Change B.