ci: replace ttl.sh with artifacts and reuse prebuilt binaries for Docker by slayerjain · Pull Request #4097 · keploy/keploy · GitHub
Skip to content

ci: replace ttl.sh with artifacts and reuse prebuilt binaries for Docker#4097

Merged
slayerjain merged 12 commits intomainfrom
feat/docker-artifacts-reuse-binaries
Apr 21, 2026
Merged

ci: replace ttl.sh with artifacts and reuse prebuilt binaries for Docker#4097
slayerjain merged 12 commits intomainfrom
feat/docker-artifacts-reuse-binaries

Conversation

@slayerjain
Copy link
Copy Markdown
Member

@slayerjain slayerjain commented Apr 21, 2026

Describe the changes that are made

Four structural changes to the CI + release Docker pipeline, plus a correctness fix for a data race that this PR's stricter CI surfaced:

  1. GitHub artifacts replace ttl.sh for the PR-build docker image. The new composite action .github/actions/download-image/action.yml runs actions/download-artifact + docker load instead of docker pull ttl.sh/keploy/keploy:<tag>. Images never leave GitHub — no external registry dependency.

  2. One docker image build per architecture, across all platforms. prepare_and_run_macos.yml and prepare_and_run_windows.yml are merged into prepare_and_run.yml. The consolidated workflow builds docker-image-linux-amd64 once (consumed by linux tests + self-hosted Windows) and docker-image-linux-arm64 once (consumed by self-hosted macOS), saves each as a workflow artifact, and fans out to every downstream test.

  3. Dockerfile reuses prebuilt binaries for both PR CI and release. A new Dockerfile.runtime COPYs dist/keploy-linux-$TARGETARCH/keploy from the build context and skips the go build stage entirely. The main Dockerfile is unchanged — external users building from source still get the self-contained multi-stage image. For release, goreleaser.yaml gains a keploy-docker build entry that emits CGO_ENABLED=0 linux/{amd64,arm64} binaries with SENTRY_DSN_DOCKER baked into ldflags (preserving the existing separation from the public SENTRY_DSN_BINARY). release.yml's build-docker job now downloads those staged binaries and runs a pure COPY-only multi-arch docker buildx build --push. The old cp Dockerfile Dockerfile.release && sed ... SSH-mount dance is gone.

  4. Data race in protocol matchers — fixed. This PR's switch away from the ttl.sh pipeline made the previously-broken update-docker.sh add_race_flag regex irrelevant (that regex silently failed to match RUN GOMAXPROCS=2 go build …, so historic linux docker images were never actually race-enabled). Under the new flow the -race binary surfaced a real concurrency bug in HTTP / MySQL / generic matchers: updateMock() mutated matchedMock.TestModeInfo.IsFiltered and SortOrder on a pointer handed out of the shared MockMemDb, racing against the struct-copy read on the line above when two concurrent requests matched the same session-lifetime mock. Fix builds a fresh shallow copy, mutates the copy, and passes (old=matchedMock, new=&updatedMock) to UpdateUnFilteredMock — the DB already serialises the swap under treesMu. Applied to pkg/agent/proxy/integrations/{http,mysql/replayer,generic}/match.go for consistency; pkg/util.go:filterByTimeStamp already did DeepCopy() up front for this exact reason and is untouched.

Performance impact

  • PR docker build: ~5 min (go build + qemu arm64) → ~10 s (COPY + apt install) per arch, amortised across all platform consumers
  • Release docker build: ~15 min (arm64 cross-compile under qemu on amd64 runner) → ~1 min (pure COPY of goreleaser binaries)
  • No qemu emulation anywhere — arm64 is always built natively (ubuntu-24.04-arm for CI, goreleaser cross-compile without cgo for release)

Branch protection

The three required checks keep their exact names — no branch-protection update needed on merge:

  • CI Gate (Linux)
  • CI Gate (macOS)
  • CI Gate (Windows)

Note: artifact name renames

So the consolidated workflow can hold all platform binaries side-by-side without collision:

  • macOS consumers pull build-darwin (was build in the old macOS-only workflow)
  • Windows consumers pull build-windows (was build in the old Windows-only workflow)

Linux consumers keep build / build-no-race / latest — important because prepare_and_run_integrations.yml is called from external repos and still publishes those names.

Docker images use the non-race build-no-race / build-no-race-linux-arm64 artifacts (race runtime roughly doubles binary size and inflates baseline memory enough to trip go-memory-load's 250 MiB in-container threshold; race coverage is preserved at the native-linux level where run_golang_linux consumes the race-enabled build artifact).

Links & References

Closes: NA

  • NA

🔗 Related PRs

  • NA

🐞 Related Issues

  • NA

📄 Related Documents

  • .github/CI_CONTRIBUTING.md (updated in this PR)

What type of PR is this? (check all applicable)

  • 📦 Chore
  • 🍕 Feature
  • 🐞 Bug Fix
  • 📝 Documentation Update
  • 🎨 Style
  • 🧑‍💻 Code Refactor
  • 🔥 Performance Improvements
  • ✅ Test
  • 🔁 CI
  • ⏩ Revert

Added e2e test pipeline?

  • 👍 yes
  • 🙅 no, because they aren't needed
  • 🙋 no, because I need help

This PR is the pipeline. The existing matrix of golang / python / node / java / fuzzer / grpc / schema-match jobs now runs against the reworked builder — they exercise the new artifact flow end-to-end on every PR run. The data-race fix is validated by go test -race ./pkg/agent/proxy/integrations/http/ locally and by the run_golang_linux jobs (race-enabled build artifact) plus the docker replay matrix in CI.

Added comments for hard-to-understand areas?

  • 👍 yes
  • 🙅 no, because the code is self-explanatory

Added to documentation?

  • 📜 README.md
  • 📓 Wiki
  • 🙅 no documentation needed

.github/CI_CONTRIBUTING.md updated in-tree — ttl.sh references swapped for artifact-based flow, and the "Adding a new workflow" checklist now shows the artifact_name input.

Are there any sample code or steps to test the changes?

  • 👍 yes, mentioned below
  • 🙅 no, because it is not needed

Watch this PR's own CI run — it exercises every path:

  • build-docker-image-amd64 / build-docker-image-arm64 produce the image tarballs
  • Every run_*_docker* job docker loads the artifact before running samples
  • run_golang_linux / proxy-stress-test and run_golang_docker / go-memory-load validate the race-free matcher under load
  • The release path is not exercised by PR CI; it runs on tag push. First post-merge v*.*.* tag will validate Dockerfile.runtime + keploy-docker goreleaser builds end-to-end.

Self Review done?

  • ✅ yes
  • ❌ no, because I need help

Any relevant screenshots, recordings or logs?

  • NA

Additional checklist:

Dockerfile.runtime is a thin runtime-only image (debian:trixie-slim +
ca-certificates + the keploy binary + entrypoint.sh). It expects a
prebuilt binary at dist/keploy-linux-$TARGETARCH/keploy in the build
context, so builds are COPY-only (~10s) instead of going through an
in-container 'go build'. Multi-arch uses BuildKit's TARGETARCH.

The main Dockerfile is left alone so external users who want to build
from source still get a self-contained image.

update-docker.sh and update-docker-ci.sh existed only to sed the main
Dockerfile for -race / SSH mounts / GOPRIVATE at CI time. Those
mutations move upstream into the binary-build step (where -race /
CGO / the private module access already live), so the scripts become
dead code.

Signed-off-by: Shubham Jain <shubhamkjain@outlook.com>
…image

The composite action previously pulled ghcr.io/keploy/keploy:v3-dev
equivalent from ttl.sh. It now downloads a workflow artifact named
docker-image-linux-<arch> (produced by the new build-docker-image-*
jobs), runs 'docker load', and re-tags the loaded image to
ghcr.io/keploy/keploy:v<version> so samples find it at the expected
name.

Input renamed from image_tag to artifact_name. Default points at
docker-image-linux-amd64 (the most common caller). Caller workflows
(golang_docker, python_docker, node_docker, *_macos, *_windows) are
updated in a follow-up commit.

Drops the external ttl.sh dependency — the image stays inside
GitHub's artifact store and is scoped to the workflow run.

Signed-off-by: Shubham Jain <shubhamkjain@outlook.com>
Each docker consumer now points download-image at the image artifact
for its runner's architecture:

  * golang_docker / python_docker / node_docker (linux amd64) and
    golang_docker_windows (self-hosted Windows runs linux/amd64
    containers via Docker Desktop + WSL2) -> docker-image-linux-amd64
  * golang_docker_macos / python_docker_macos (self-hosted Apple
    Silicon) -> docker-image-linux-arm64

The now-unused image_tag workflow_call input is removed from each
file.

The binary artifact names also split per-platform so a single
workflow run can hold them side-by-side without collision: the
macOS consumers now pull build-darwin, Windows consumers pull
build-windows. The linux consumers keep 'build' / 'build-no-race'
/ 'latest' unchanged, which is the name the shared
prepare_and_run_integrations reusable workflow still publishes.

Signed-off-by: Shubham Jain <shubhamkjain@outlook.com>
…and_run

Three workflows (prepare_and_run.yml, prepare_and_run_macos.yml,
prepare_and_run_windows.yml) triggered on every push/PR to main and
each built their own copy of the keploy docker image. Merging them
lets one docker-image-build job per architecture feed every downstream
test job in the same workflow run via GitHub artifacts.

Layout:

  binary builds
    build-and-upload        linux/amd64 race + non-race (artifact: build, build-no-race)
    build-linux-arm64       linux/arm64 race (artifact: build-linux-arm64)
    build-darwin-arm64      darwin/arm64 cross (artifact: build-darwin)
    build-windows-amd64     windows/amd64 cgo native (artifact: build-windows)
    upload-latest           last published release amd64 tarball (artifact: latest)

  docker image builds
    build-docker-image-amd64  ubuntu-latest, downloads 'build', runs
                              docker buildx build --file Dockerfile.runtime
                              --output type=docker,dest=image.tar, uploads
                              docker-image-linux-amd64
    build-docker-image-arm64  ubuntu-24.04-arm, downloads build-linux-arm64,
                              same buildx flow, uploads docker-image-linux-arm64

  linux tests: run_* (ubuntu-latest) consume docker-image-linux-amd64
  macOS tests: self-hosted Apple Silicon consume docker-image-linux-arm64
  Windows tests: self-hosted Windows consume docker-image-linux-amd64 (Docker
    Desktop on Windows runs linux/amd64 containers via WSL2)

Three gate jobs preserved for branch protection: 'CI Gate (Linux)',
'CI Gate (macOS)', 'CI Gate (Windows)'. Names match the previous
per-workflow gates so the required-check set does not need to change
on merge.

The consolidation also lets macOS and Windows image builds run on
ubuntu runners instead of arm64 / Windows runners — the docker tarball
is a cheap COPY-only build thanks to Dockerfile.runtime's prebuilt
binary.

Signed-off-by: Shubham Jain <shubhamkjain@outlook.com>
…lding

The release docker build previously copied Dockerfile to
Dockerfile.release, sed-injected SSH mounts + GOPRIVATE, then ran
'docker buildx build --platform linux/amd64,linux/arm64' which
recompiled keploy inside the image — cross-compiling arm64 on the
amd64 runner under qemu was the slow path (~15 min).

Now build-go adds a 'keploy-docker' build entry to goreleaser.yaml
that produces CGO_ENABLED=0 linux/{amd64,arm64} binaries with
SENTRY_DSN_DOCKER baked into -X main.dsn (separate from the public
SENTRY_DSN_BINARY used by the distributable archive builds so
telemetry from containerised users stays disentangled). goreleaser
cross-compiles these in seconds because there is no cgo.

build-go uploads each as a workflow artifact
(release-keploy-linux-{amd64,arm64}). build-docker downloads both,
stages them under dist/keploy-linux-<arch>/, and runs a multi-arch
'docker buildx build --file Dockerfile.runtime --push' — a pure
COPY + apt install with no compilation at all. No more Dockerfile
mutation, no more ssh forwarding inside docker, no more qemu.

Signed-off-by: Shubham Jain <shubhamkjain@outlook.com>
@slayerjain slayerjain requested a review from gouravkrosx as a code owner April 21, 2026 18:00
Copilot AI review requested due to automatic review settings April 21, 2026 18:00
@github-actions
Copy link
Copy Markdown

Principal-review follow-ups on #4097. No behavioural change on the
happy path; hardening for edge cases.

* download-image/action.yml (Unix): anchor the 'Loaded image: '
  regex with a leading caret and trailing space so the
  'Loaded image ID: sha256:...' variant (emitted when the tarball
  has no RepoTags) can't silently retag a digest. NR==1 + exit
  keeps selection deterministic without a 'head -n1' that could
  SIGPIPE awk under pipefail. On miss, log the first lines of
  docker load output so failures are debuggable.

* download-image/action.yml (Windows): anchor ^Loaded image: and
  .Trim() the capture so the trailing CR from docker CLI's CRLF
  line endings doesn't end up in the retag reference (docker tag
  rejects 'keploy/keploy:ci\r' as invalid format). Also log full
  docker load output on miss.

* release.yml: replace 'ls -d ... | head -n1' with an explicit
  shell nullglob array so an empty dist/ layout after a
  goreleaser change fails loudly instead of 'cp'-ing from an
  empty string under set -e.

* prepare_and_run.yml: bump image-artifact retention from 1 day
  to 7 days. Re-running a failed platform gate more than 24h
  after the original workflow run was hitting 'artifact not
  found' because the upstream build jobs were cached success.
  Two tarballs per run is cheap.

* prepare_and_run.yml: document the shared-concurrency-group
  trade-off so a future reader understands why cleanup jobs
  can be concurrency-cancelled across all three OS legs
  simultaneously rather than per-workflow.

* download-binary/action.yml: broaden the 'src' input description
  — 'build | latest' was stale after the build-darwin rename.

Signed-off-by: Shubham Jain <shubhamkjain@outlook.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the CI and release Docker pipelines to remove the external ttl.sh dependency by shipping Docker images as GitHub Actions artifacts, and to speed up Docker image builds by reusing prebuilt binaries (CI + release) via a new runtime-only Dockerfile.

Changes:

  • Replace ttl.sh pull/push flow with image.tar workflow artifacts loaded via a composite action (download-image).
  • Consolidate macOS/Windows orchestrator workflows into a single prepare_and_run.yml that builds once per arch and fans out.
  • Add a keploy-docker GoReleaser build + Dockerfile.runtime to build/push release images by COPY’ing staged binaries (no in-container go build).

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
goreleaser.yaml Adds keploy-docker build (linux/{amd64,arm64}) with separate Sentry DSN for Docker images.
Dockerfile.runtime New runtime-only Dockerfile that copies prebuilt dist/keploy-linux-$TARGETARCH/keploy.
.github/workflows/release.yml Stages GoReleaser docker binaries as artifacts and builds/pushes multi-arch image using Dockerfile.runtime.
.github/workflows/prepare_and_run.yml Consolidates orchestration; builds binaries + docker image tar artifacts per arch; updates fan-out + gates.
.github/actions/download-image/action.yml Switches from docker pull ttl.sh/... to download-artifact + docker load + retag.
.github/workflows/node_docker.yml Loads docker image from artifact instead of pulling by tag input.
.github/workflows/python_docker.yml Loads docker image from artifact instead of pulling by tag input.
.github/workflows/python_docker_macos.yml Uses arch-specific image artifact + updated darwin build artifact name.
.github/workflows/golang_docker.yml Loads docker image from artifact instead of pulling by tag input.
.github/workflows/golang_docker_macos.yml Uses arch-specific image artifact + updated darwin build artifact name.
.github/workflows/golang_docker_windows.yml Uses image artifact + updates Windows build artifact name.
.github/workflows/golang_native_windows.yml Updates Windows build artifact name (build-windows).
.github/CI_CONTRIBUTING.md Updates contributor documentation for artifact-based docker image flow.
.github/workflows/prepare_and_run_macos.yml Removed (replaced by consolidated prepare_and_run.yml).
.github/workflows/prepare_and_run_windows.yml Removed (replaced by consolidated prepare_and_run.yml).
.github/workflows/test_workflow_scripts/update-docker.sh Removed (ttl.sh/buildkit SSH rewrite flow no longer needed).
.github/workflows/test_workflow_scripts/update-docker-ci.sh Removed (ttl.sh single-platform CI build helper no longer needed).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/CI_CONTRIBUTING.md Outdated
Comment on lines +41 to +45
@@ -42,7 +42,7 @@ Advantages are identical to the binary‑artifact strategy – plus we keep our
| -------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `.github/workflows/prepare_and_run.yml` | The *aggregator* – builds the PR binary, downloads `latest`, uploads both as artifacts **and** builds + pushes the one Docker image. Then it fans‑out to language/sample workflows. |
| `.github/actions/download-binary/action.yml` | Composite action – downloads **one** of those two binary artifacts and outputs its absolute path. |
| `.github/actions/download-image/action.yml` | Composite action – pulls the temporary image from `ttl.sh`, re‑tags it to `ghcr.io/keploy/keploy:v3-dev`, and makes it available for the sample. |
| `.github/actions/download-image/action.yml` | Composite action – downloads the image artifact, `docker load`s it, and re‑tags to `ghcr.io/keploy/keploy:v3-dev` so samples find it at the expected name. |
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Key files table still says prepare_and_run.yml “builds + pushes the one Docker image”, but the workflow has moved to uploading image.tar artifacts and loading them downstream (no registry push). This line should be updated to avoid sending contributors to the old mental model.

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +66
docker tag "$LOADED" "ghcr.io/keploy/keploy:v${{ inputs.version }}"
docker images | awk 'NR==1 || /keploy\/keploy/ {print}'

# Verify the image architecture matches the host
echo "Image architecture:"
docker image inspect "${IMAGE_REF}" --format '{{.Os}}/{{.Architecture}}' || true
echo "Docker server architecture:"
docker version --format '{{.Server.Os}}/{{.Server.Arch}}' || true
docker image inspect "$LOADED" --format '{{.Os}}/{{.Architecture}}' || true

- name: Pull Image (Windows)
- name: Load image (Windows)
if: runner.os == 'Windows'
shell: pwsh
run: |
$ErrorActionPreference = 'Stop'
$IMAGE_REF = "${{ inputs.image_tag }}"
Write-Host "Pulling $IMAGE_REF ..."
docker pull $IMAGE_REF

# Retag to the name your stack expects (works for tags and digests)
docker tag $IMAGE_REF ghcr.io/keploy/keploy:v${{ inputs.version }}
docker images --format "{{.Repository}}:{{.Tag}}" | Select-String -Pattern "keploy/keploy" -SimpleMatch No newline at end of file
$tar = Join-Path "${{ runner.temp }}" "keploy-image\image.tar"
if (-not (Test-Path $tar)) {
Write-Error "expected image tarball at $tar"
Get-ChildItem (Join-Path "${{ runner.temp }}" "keploy-image") -Force -ErrorAction SilentlyContinue
exit 1
}
Write-Host "Loading image from $tar ..."
$loadOutput = docker load -i $tar
# Anchor at line start and trim the capture — docker CLI on
# Windows emits CRLF, which `(.+)$` would swallow into the
# capture group, yielding a trailing \r that makes
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Windows, this composite action runs the load step with shell: pwsh. Several Windows jobs that call download-image (e.g. golang_docker_windows.yml) only assume Windows PowerShell 5.1 is present, and your own workflow notes that some self-hosted runners don’t ship pwsh. This will make those jobs fail at the image-load step on such runners. Consider switching the Windows step to shell: powershell (PS 5.1) or adding an explicit fallback/bootstrapping inside the action before invoking pwsh.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.61ms 3.36ms 4.96ms 100.02 0.00% ✅ PASS
2 2.48ms 3.18ms 4.51ms 100.00 0.00% ✅ PASS
3 2.41ms 3.09ms 4.42ms 100.00 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

…review)

Addresses Copilot review on b6db664 and the go-memory-load /
proxy-stress-test regressions that surfaced in the same run.

1. Docker images are now built from non-race binaries.

   Investigation: on the PR's CI run, 'run_golang_docker / go-memory-load'
   failed with 'Keploy container exceeded 250.00 MiB during record',
   and 'run_golang_docker / proxy-stress-test' failed with
   'WARNING: DATA RACE / FAIL: Data race during replay'. Both are
   caused by the refactor switching the linux/amd64 docker image
   from the historically non-race binary to the 'build' race
   artifact. The old update-docker.sh had a broken add_race_flag
   BRE regex ('^(RUN\s+go\s+build)\s' did not match
   'RUN GOMAXPROCS=2 go build ...') that silently failed to insert
   -race into the in-container go build; the effective prior
   behaviour was a non-race linux docker image. Restore parity:

     * build-linux-arm64 now builds a non-race binary, uploaded
       as build-no-race-linux-arm64 (makes arm64 consistent with
       amd64 — the arm64 docker image is the only consumer).
     * build-docker-image-amd64 downloads build-no-race instead
       of build.
     * build-docker-image-arm64 downloads build-no-race-linux-arm64.

   Race-detection coverage is preserved at the native-linux level
   (run_golang_linux consumes the race-enabled 'build' artifact).
   Docker coverage is about integration semantics, not data-race
   scrutiny, and inflating the agent's memory baseline by the
   race runtime broke go-memory-load's calibrated threshold.

   Note: the 'DATA RACE during replay' surfaced by this PR is a
   real pre-existing bug in keploy's replay path that was masked
   by the broken CI script. Filing as a follow-up issue — out of
   scope for this CI refactor.

2. download-image Windows step switches shell: pwsh -> shell: powershell.

   Windows PowerShell 5.1 is always preinstalled on Windows runners;
   pwsh (7) is not. The build-windows-amd64 job has a Bootstrap
   PowerShell step that repairs missing pwsh, but
   pull-docker-image-windows (which invokes this composite action)
   has no such bootstrap, so a fleet runner without pwsh would
   fail the image-load step with 'pwsh: command not found'. All
   cmdlets used in the Windows branch (Join-Path, Test-Path,
   Get-ChildItem, Select-String, [Environment]::NewLine, .Trim(),
   [string]::IsNullOrWhiteSpace) are 5.1-compatible, so the floor
   is free.

3. CI_CONTRIBUTING.md: Key files table row for prepare_and_run.yml
   updated — said 'builds + pushes the one Docker image' but the
   workflow now uploads image.tar artifacts (no registry push).

Signed-off-by: Shubham Jain <shubhamkjain@outlook.com>
@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.67ms 3.55ms 5.18ms 100.02 0.00% ✅ PASS
2 2.48ms 3.33ms 4.62ms 100.00 0.00% ✅ PASS
3 2.42ms 3.21ms 4.52ms 100.02 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

Fixes the data race surfaced by this PR's -race-enabled docker image
under proxy-stress-test (record_build_replay_build):

  WARNING: DATA RACE
  Read/Write at 0x... by goroutine <N>:
    go.keploy.io/server/v3/pkg/agent/proxy/integrations/http.(*HTTP).updateMock()
    pkg/agent/proxy/integrations/http/match.go:723 / :724 / :725
    pkg/agent/proxy/integrations/http.(*HTTP).decodeHTTP.func1()
    pkg/agent/proxy/integrations/http.(*HTTP).MockOutgoing()
    pkg/agent/proxy.(*Proxy).handleConnection()
    pkg/agent/proxy.(*Proxy).start.func4()

Root cause: updateMock() received matchedMock as a pointer handed
out of the shared MockMemDb. Two concurrent HTTP requests that
match the same session-lifetime mock receive the SAME pointer, so
the in-place mutations

    matchedMock.TestModeInfo.IsFiltered = false
    matchedMock.TestModeInfo.SortOrder  = pkg.GetNextSortNum()

race against another goroutine's struct-copy read on the same
memory (matched by the 'originalMatchedMock := *matchedMock' line
one above). Multiple distinct DATA RACE reports in the CI log all
point at these three consecutive lines under decode.go:165's
per-request goroutine fan-out.

Fix: build a fresh shallow copy of the matched mock, mutate the
copy, and pass (old=matchedMock, new=&updatedMock) to
MockManager.UpdateUnFilteredMock. The mock DB already serialises
the tree swap under treesMu; the only missing piece was callers
not racing on the caller-visible pointer before handing it in.
Escape analysis heap-allocates the copy when taking its address,
so pool-stored pointers remain valid after return.

Applied to every site with this anti-pattern:
  * pkg/agent/proxy/integrations/http/match.go         (the one CI caught)
  * pkg/agent/proxy/integrations/mysql/replayer/match.go
  * pkg/agent/proxy/integrations/generic/match.go      (two call sites)

pkg/util.go:filterByTimeStamp already does mock.DeepCopy() up front
specifically to avoid this class of race (see the comment at line
2577) — left untouched.

Local verification: go test -race ./pkg/agent/proxy/integrations/http/
passes (1.8s). The full CI will exercise the docker replay path
under -race via the race-enabled 'build' artifact consumed by the
native golang_linux job — if the race resurfaces elsewhere, the
pattern above is the one to look for.

Signed-off-by: Shubham Jain <shubhamkjain@outlook.com>
@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.21ms 2.85ms 4.26ms 100.00 0.00% ✅ PASS
2 2.12ms 2.67ms 3.96ms 100.02 0.00% ✅ PASS
3 2.1ms 2.64ms 3.97ms 100.02 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +30 to +42
echo "::error::expected image tarball at $TAR"
ls -la "${{ runner.temp }}/keploy-image" || true
exit 1
fi
echo "Loading image from $TAR ..."
# Anchor to "Loaded image: " (with the trailing space) so the
# `Loaded image ID: sha256:…` variant emitted when the tarball
# has no RepoTags doesn't match and silently yield `sha256` as
# the tag target. NR==1 guarantees a single value without
# risking SIGPIPE from a downstream `head -n1` under pipefail.
LOADED=$(docker load -i "$TAR" | awk -F'Loaded image: ' '/^Loaded image: / && NR==1 {print $2; exit}')
if [ -z "$LOADED" ]; then
echo "::error::docker load did not report a tagged image (got: $(docker load -i "$TAR" 2>&1 | head -n5))"
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Unix branch, the failure path runs docker load a second time inside the error message ($(docker load ... | head -n5)). Under set -euo pipefail, piping into head can cause docker load to terminate with SIGPIPE (non-zero), which can abort the step before the intended ::error:: message is printed; it also needlessly re-loads the image. Capture docker load output once, parse the loaded tag from that output, and reuse the captured output for diagnostics (without a head pipeline under pipefail).

Suggested change
echo "::error::expected image tarball at $TAR"
ls -la "${{ runner.temp }}/keploy-image" || true
exit 1
fi
echo "Loading image from $TAR ..."
# Anchor to "Loaded image: " (with the trailing space) so the
# `Loaded image ID: sha256:…` variant emitted when the tarball
# has no RepoTags doesn't match and silently yield `sha256` as
# the tag target. NR==1 guarantees a single value without
# risking SIGPIPE from a downstream `head -n1` under pipefail.
LOADED=$(docker load -i "$TAR" | awk -F'Loaded image: ' '/^Loaded image: / && NR==1 {print $2; exit}')
if [ -z "$LOADED" ]; then
echo "::error::docker load did not report a tagged image (got: $(docker load -i "$TAR" 2>&1 | head -n5))"
echo "::error::expected image tarball at $TAR. Verify the artifact download step produced image.tar in ${{ runner.temp }}/keploy-image."
ls -la "${{ runner.temp }}/keploy-image" || true
exit 1
fi
echo "Loading image from $TAR ..."
if ! LOAD_OUTPUT=$(docker load -i "$TAR" 2>&1); then
LOAD_OUTPUT_SINGLE_LINE=$(printf '%s' "$LOAD_OUTPUT" | tr '\n' '; ')
echo "::error::docker load failed for $TAR. Review the docker load output and confirm the tarball is valid: $LOAD_OUTPUT_SINGLE_LINE"
exit 1
fi
# Anchor to "Loaded image: " (with the trailing space) so the
# `Loaded image ID: sha256:…` variant emitted when the tarball
# has no RepoTags doesn't match and silently yield `sha256` as
# the tag target. Parse from the captured output so docker load
# only runs once and diagnostics can reuse the same output.
LOADED=$(printf '%s\n' "$LOAD_OUTPUT" | awk -F'Loaded image: ' '/^Loaded image: / && NR==1 {print $2; exit}')
if [ -z "$LOADED" ]; then
LOAD_OUTPUT_SINGLE_LINE=$(printf '%s' "$LOAD_OUTPUT" | tr '\n' '; ')
echo "::error::docker load completed but did not report a tagged image. Review the docker load output and confirm the tarball includes RepoTags: $LOAD_OUTPUT_SINGLE_LINE"

Copilot uses AI. Check for mistakes.
Comment thread .github/CI_CONTRIBUTING.md Outdated
Comment on lines +31 to +33
1. **Build once** per architecture in `prepare_and_run.yml` → jobs `build‑docker‑image‑amd64` and `build‑docker‑image‑arm64`. They download the matching prebuilt binary artifact (no in-container `go build`) and run `docker buildx build --output type=docker,dest=image.tar` against `Dockerfile.runtime`.
2. **Upload** each `image.tar` as a workflow artifact (`docker-image-linux-amd64`, `docker-image-linux-arm64`).
3. **Load & re‑tag** inside downstream jobs via the composite action `download‑image`, which calls `actions/download-artifact` + `docker load` and renames the image to `ghcr.io/keploy/keploy:v3-dev` so samples find it at the expected name.
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc states that download-image “renames the image to ghcr.io/keploy/keploy:v3-dev”, but the composite action actually tags to ghcr.io/keploy/keploy:v${{ inputs.version }} (default 3-dev). Several workflows pass version: ${{ github.sha }}, so the resulting tag is v<sha>, not v3-dev. Please update this sentence to reflect that the tag is parameterized by the version input (and only defaults to v3-dev when version isn’t provided).

Copilot uses AI. Check for mistakes.
Comment on lines +43 to 46
| `.github/workflows/prepare_and_run.yml` | The *aggregator* – builds the PR binary, downloads `latest`, uploads both as artifacts **and** builds one Docker image per architecture (saved as `image.tar` workflow artifacts, no registry push). Then it fans‑out to language/sample workflows. |
| `.github/actions/download-binary/action.yml` | Composite action – downloads **one** of those two binary artifacts and outputs its absolute path. |
| `.github/actions/download-image/action.yml` | Composite action – pulls the temporary image from `ttl.sh`, re‑tags it to `ghcr.io/keploy/keploy:v3-dev`, and makes it available for the sample. |
| `.github/actions/download-image/action.yml` | Composite action – downloads the image artifact, `docker load`s it, and re‑tags to `ghcr.io/keploy/keploy:v3-dev` so samples find it at the expected name. |
| `.github/workflows/*_linux.yml`, `*_docker.yml`, … | Language/sample workflows. They declare the 3‑row matrix and obtain the two binaries (and, for Docker flows, the image) via the composite actions. |
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This table entry says the action re-tags to ghcr.io/keploy/keploy:v3-dev, but the action tags to v${{ inputs.version }} (default 3-dev). To avoid confusion for workflows that set version (e.g. to ${{ github.sha }}), consider wording this as “re-tags to ghcr.io/keploy/keploy:v (defaults to v3-dev)” or similar.

Copilot uses AI. Check for mistakes.
Copilot review follow-ups on 9a47372.

1. .github/actions/download-image/action.yml:
   Previously the failure path did `$(docker load -i "$TAR" 2>&1 | head -n5)`
   inside the ::error:: message — re-importing the tarball a second time
   AND piping into head, which under set -euo pipefail can SIGPIPE
   docker load and abort the step before the error line is emitted.
   Capture docker load output once into LOAD_OUTPUT, check the exit
   code up front (so a genuine load failure surfaces with the output
   as diagnostic), and reuse the same buffer to parse the loaded
   image tag and to populate the 'no RepoTags' diagnostic. tr 'n;'
   flattens multi-line output for the ::error:: marker.

2. .github/CI_CONTRIBUTING.md:
   The composite action tags with ghcr.io/keploy/keploy:v${{ inputs.version }},
   defaulting to 3-dev — several workflows (e.g. golang_docker_macos,
   golang_docker_windows, pull-docker-image-*) pass
   version: ${{ github.sha }}, producing v<sha> tags. Two mentions in
   the CI contributing guide hard-coded 'v3-dev' and would mislead
   contributors writing new docker-based samples. Both now say
   'v<version> (defaults to 3-dev; callers may override)'.

Signed-off-by: Shubham Jain <shubhamkjain@outlook.com>
@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.46ms 3.16ms 27.6ms 100.01 0.00% ✅ PASS
2 2.39ms 3.05ms 4.52ms 100.00 0.00% ✅ PASS
3 2.36ms 2.97ms 4.18ms 100.03 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +49 to +51
# the tag target. NR==1 + exit keeps selection deterministic
# without a `head -n1` pipeline.
LOADED=$(printf '%s\n' "$LOAD_OUTPUT" | awk -F'Loaded image: ' '/^Loaded image: / && NR==1 {print $2; exit}')
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Unix loader, the awk condition NR==1 restricts the match to the first output line rather than the first Loaded image: line. If docker load emits any leading line (or multiple Loaded image: lines), LOADED may stay empty and the action will fail even though the load succeeded. Adjust the parsing to select the first matching ^Loaded image: line regardless of its line number (and keep the guard against the Loaded image ID: variant).

Suggested change
# the tag target. NR==1 + exit keeps selection deterministic
# without a `head -n1` pipeline.
LOADED=$(printf '%s\n' "$LOAD_OUTPUT" | awk -F'Loaded image: ' '/^Loaded image: / && NR==1 {print $2; exit}')
# the tag target. `exit` keeps selection deterministic by
# choosing the first matching loaded image line without a
# `head -n1` pipeline.
LOADED=$(printf '%s\n' "$LOAD_OUTPUT" | awk -F'Loaded image: ' '/^Loaded image: / {print $2; exit}')

Copilot uses AI. Check for mistakes.
Comment on lines +86 to +97
Write-Host "Loading image from $tar ..."
$loadOutput = docker load -i $tar
# Anchor at line start and trim the capture — docker CLI on
# Windows emits CRLF, which `(.+)$` would swallow into the
# capture group, yielding a trailing \r that makes
# `docker tag` reject the reference. Trim() strips both CR
# and any trailing whitespace.
$match = $loadOutput | Select-String -Pattern '^Loaded image:\s+(.+)$' | Select-Object -First 1
if ($null -eq $match) {
Write-Error "docker load did not report a tagged image. Output: $($loadOutput -join [Environment]::NewLine)"
exit 1
}
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Windows loader, $loadOutput = docker load -i $tar doesn't capture stderr and doesn't check $LASTEXITCODE. If docker load fails, the step can end up reporting a misleading "did not report a tagged image" error with empty output, hiding the real failure reason. Capture 2>&1 and explicitly fail when $LASTEXITCODE is non-zero so diagnostics include the actual docker error output.

Copilot uses AI. Check for mistakes.
src:
required: true
description: "build | latest"
description: "Name of the binary artifact to download (e.g. build, build-no-race, build-darwin, latest)."
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input description suggests this action can download artifacts like build-windows, but the implementation assumes a Unix binary at ${src}/keploy and runs chmod +x via bash. Using a Windows artifact (which contains keploy.exe) would fail. Either tighten the description to only mention Unix-style artifacts, or extend the action to handle .exe artifacts/Windows runners explicitly.

Suggested change
description: "Name of the binary artifact to download (e.g. build, build-no-race, build-darwin, latest)."
description: "Name of the Unix-style binary artifact to download; the artifact must contain a `keploy` binary (for example: `build`, `build-no-race`, `build-darwin`, or `latest`)."

Copilot uses AI. Check for mistakes.
…ption

Copilot review follow-ups on 16d2f75.

1. download-image/action.yml (Unix loader):
   awk pattern was '/^Loaded image: / && NR==1' — restricted the match
   to the first *output line*, not the first 'Loaded image:' line.
   If docker load ever prints a status line before the tag line
   (some daemon versions do), LOADED ends up empty and the action
   fails despite a successful load. Drop the NR==1 guard; 'exit'
   still keeps selection deterministic by stopping on the first
   matching line regardless of where it appears in the output.

2. download-image/action.yml (Windows loader):
   'docker load -i $tar' did not capture stderr and did not check
   $LASTEXITCODE. On a genuine load failure, stderr was lost and
   $loadOutput stayed empty, so the step fell through to the 'no
   tagged image' branch with useless diagnostics. Capture 2>&1
   into $loadOutput and fail fast on non-zero $LASTEXITCODE so
   the real docker error output appears in the ::error::. Same
   pattern applied to the existing 'no RepoTags' and 'empty
   reference' branches so all three diagnostic paths use
   Out-String on the real captured output.

3. download-binary/action.yml:
   The src input description still read 'Name of a binary artifact
   ... (e.g. build, build-no-race, build-darwin, latest).' which
   suggested Windows artifacts like 'build-windows' were acceptable.
   The composite action hardcodes chmod +x on ${src}/keploy and
   assumes a Unix binary — passing a Windows artifact would fail
   because the file is keploy.exe and chmod +x isn't meaningful.
   Tighten the description to explicitly call out Unix-only and
   point Windows consumers at actions/download-artifact@v4 directly
   (which is what golang_docker_windows.yml and
   golang_native_windows.yml already do).

Signed-off-by: Shubham Jain <shubhamkjain@outlook.com>
@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.46ms 3.1ms 4.63ms 100.03 0.00% ✅ PASS
2 2.41ms 3ms 4.36ms 100.02 0.00% ✅ PASS
3 2.38ms 2.95ms 4.07ms 100.02 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/release.yml Outdated
Comment on lines 205 to 209
uses: docker/build-push-action@v6.19.2
with:
context: .
file: ./Dockerfile.release
ssh: default
file: ./Dockerfile.runtime
platforms: linux/amd64, linux/arm64
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This multi-arch build runs on ubuntu-latest and targets linux/arm64 as well as linux/amd64, but Dockerfile.runtime contains RUN steps (apt-get, sed/chmod). Unless the runner has QEMU/binfmt set up or the buildx builder has a native arm64 node, the arm64 build may fail or silently fall back to QEMU—conflicting with the PR description’s “no qemu” / “pure COPY” release build claim. Consider using a native arm64 builder/node (or per-arch builds + manifest), or making the runtime Dockerfile RUN-free for release, or explicitly setting up the required emulation if that’s acceptable.

Copilot uses AI. Check for mistakes.
Comment on lines 731 to +734
func (h *HTTP) updateMock(_ context.Context, matchedMock *models.Mock, mockDb integrations.MockMemDb) bool {
originalMatchedMock := *matchedMock
matchedMock.TestModeInfo.IsFiltered = false
matchedMock.TestModeInfo.SortOrder = pkg.GetNextSortNum()
updatedMock := *matchedMock
updatedMock.TestModeInfo.IsFiltered = false
updatedMock.TestModeInfo.SortOrder = pkg.GetNextSortNum()
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updateMock’s behavior changed to avoid mutating the shared matchedMock pointer and instead pass a copy into UpdateUnFilteredMock, but there doesn’t appear to be unit test coverage for this path (no updateMock-focused assertions in match_test.go). Adding a small test that verifies (a) matchedMock.TestModeInfo is unchanged after updateMock and (b) UpdateUnFilteredMock receives the expected old/new values would help prevent regressions and catch future reintroductions of in-place mutation.

Copilot uses AI. Check for mistakes.
# -------------------------------------------------------------------

build-docker-image-amd64:
needs: [build-and-upload]
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build-docker-image-amd64 runs on forked PRs (no job-level if:), while build-docker-image-arm64 is gated. This will build/upload the amd64 image tarball even when all downstream docker/self-hosted jobs are skipped on forks, adding cost and time. Consider adding the same non-fork guard to build-docker-image-amd64 (or explicitly documenting why amd64 should run on forks).

Suggested change
needs: [build-and-upload]
needs: [build-and-upload]
if: ${{ github.event_name != 'pull_request' || !github.event.pull_request.head.repo.fork }}

Copilot uses AI. Check for mistakes.
Comment thread .github/workflows/prepare_and_run.yml Outdated
Comment on lines +596 to +640
pull-docker-image-macos:
if: ${{ (github.event_name == 'pull_request' && !github.event.pull_request.head.repo.fork) || (github.event_name == 'push' && github.ref == 'refs/heads/main') }}
runs-on: [self-hosted, macOS, native]
needs: [build-docker-image-arm64]
timeout-minutes: 10
steps:
- name: Create workflow start lock
run: |
LOCK_DIR="$HOME/.github-workflow-locks"
mkdir -p "$LOCK_DIR"
echo "started-$(date +%s)" > "$LOCK_DIR/prepare-macos-workflow-${{ github.run_id }}.lock"

- name: Unlock keychain
if: runner.os == 'macOS'
run: |
security unlock-keychain -p $KEYCHAIN_PASSWORD
env:
KEYCHAIN_PASSWORD: ${{ secrets.MAC_RUNNER_USER_PASSWORD }}

- name: Verify Docker Desktop is running (macOS)
run: |
if ! docker info >/dev/null 2>&1; then
echo "ERROR: Docker Desktop is not running."
echo "Ensure Docker Desktop is configured as a Login Item on the self-hosted runner."
exit 1
fi
echo "Docker Desktop is running."

- uses: actions/checkout@v4

- name: Load docker image
timeout-minutes: 5
uses: ./.github/actions/download-image
with:
artifact_name: docker-image-linux-arm64
version: ${{ github.sha }}

run_python_docker_macos:
needs: [build-darwin-arm64, pull-docker-image-macos, build-docker-image-arm64]
uses: ./.github/workflows/python_docker_macos.yml
secrets:
MAC_RUNNER_USER_PASSWORD: ${{ secrets.MAC_RUNNER_USER_PASSWORD }}

run_golang_docker_macos:
needs: [build-darwin-arm64, pull-docker-image-macos, build-docker-image-arm64]
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This job loads the arm64 image tarball, but the downstream macOS docker workflows also run the download-image composite action (so they’ll download + docker load again). If jobs can land on different self-hosted machines, this pre-load won’t be reused; if they land on the same machine, it’s redundant work. Consider removing this pre-load step (and corresponding needs:) or removing the per-workflow loads and enforcing single-runner/daemon reuse if that’s the intended model.

Suggested change
pull-docker-image-macos:
if: ${{ (github.event_name == 'pull_request' && !github.event.pull_request.head.repo.fork) || (github.event_name == 'push' && github.ref == 'refs/heads/main') }}
runs-on: [self-hosted, macOS, native]
needs: [build-docker-image-arm64]
timeout-minutes: 10
steps:
- name: Create workflow start lock
run: |
LOCK_DIR="$HOME/.github-workflow-locks"
mkdir -p "$LOCK_DIR"
echo "started-$(date +%s)" > "$LOCK_DIR/prepare-macos-workflow-${{ github.run_id }}.lock"
- name: Unlock keychain
if: runner.os == 'macOS'
run: |
security unlock-keychain -p $KEYCHAIN_PASSWORD
env:
KEYCHAIN_PASSWORD: ${{ secrets.MAC_RUNNER_USER_PASSWORD }}
- name: Verify Docker Desktop is running (macOS)
run: |
if ! docker info >/dev/null 2>&1; then
echo "ERROR: Docker Desktop is not running."
echo "Ensure Docker Desktop is configured as a Login Item on the self-hosted runner."
exit 1
fi
echo "Docker Desktop is running."
- uses: actions/checkout@v4
- name: Load docker image
timeout-minutes: 5
uses: ./.github/actions/download-image
with:
artifact_name: docker-image-linux-arm64
version: ${{ github.sha }}
run_python_docker_macos:
needs: [build-darwin-arm64, pull-docker-image-macos, build-docker-image-arm64]
uses: ./.github/workflows/python_docker_macos.yml
secrets:
MAC_RUNNER_USER_PASSWORD: ${{ secrets.MAC_RUNNER_USER_PASSWORD }}
run_golang_docker_macos:
needs: [build-darwin-arm64, pull-docker-image-macos, build-docker-image-arm64]
run_python_docker_macos:
needs: [build-darwin-arm64, build-docker-image-arm64]
uses: ./.github/workflows/python_docker_macos.yml
secrets:
MAC_RUNNER_USER_PASSWORD: ${{ secrets.MAC_RUNNER_USER_PASSWORD }}
run_golang_docker_macos:
needs: [build-darwin-arm64, build-docker-image-arm64]

Copilot uses AI. Check for mistakes.
Comment thread .github/workflows/prepare_and_run.yml Outdated
Comment on lines +803 to +807
- name: Load docker image
uses: ./.github/actions/download-image
with:
artifact_name: docker-image-linux-amd64
version: ${{ github.sha }}
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This job loads the linux/amd64 image tarball, but golang_docker_windows.yml also loads the image via the download-image composite action. That duplicates artifact download + docker load on self-hosted runners. Consider dropping this pre-load step/job, or alternatively removing the per-job load and guaranteeing runner/daemon reuse if that’s the intent.

Copilot uses AI. Check for mistakes.
Five review items, all addressed with prod-grade solutions:

1. release.yml — Setup QEMU explicitly before buildx multi-arch build.
   Dockerfile.runtime has RUN steps (apt-get + sed/chmod), and the
   linux/arm64 leg runs under emulation on ubuntu-latest runners.
   Prior flow happened to work because ubuntu-latest preinstalls
   binfmt_misc handlers, but relying on implicit runner defaults is
   fragile. Add docker/setup-qemu-action@v3 with a header comment
   clarifying that qemu handles the base-image bootstrap only — the
   keploy binary is still a native prebuilt arm64 artifact, so qemu
   scope is bounded to one apt-get layer (not a go-build cross
   compile, which was the original performance goal).

2. http/match_test.go — Add unit coverage for updateMock's
   no-mutation invariant. Two tests:
     * TestUpdateMock_DoesNotMutatePoolPointer: session-lifetime
       path. Asserts matchedMock.TestModeInfo.IsFiltered /
       SortOrder are UNCHANGED after updateMock, that
       UpdateUnFilteredMock received the pool pointer as 'old',
       and a distinct (non-aliased) *Mock with updated
       TestModeInfo as 'new'.
     * TestUpdateMock_PerTestPrefersDelete: per-test path.
       Asserts DeleteFilteredMock is called with the original
       mock and UpdateUnFilteredMock is NOT invoked on the
       success path.
   Both guard against future regressions that would reintroduce
   the in-place mutation the race-fix commit removed. mockMemDb
   gets spy fields for old/new args + configurable return values.

3. prepare_and_run.yml — Fork-guard build-docker-image-amd64.
   Every linux docker consumer (run_*_docker) is already fork-
   guarded via the reusable workflow's inner 'if:'. Without a
   guard on the producer, fork PRs still run the docker image
   build + artifact upload even though no job will consume it.
   Add the same condition to match build-docker-image-arm64.

4-5. prepare_and_run.yml — Drop redundant docker-image pre-load
   from pull-docker-image-macos and pull-docker-image-windows.
   Each downstream docker test job already runs the download-image
   composite action (actions/download-artifact + docker load), so
   the pre-load step just duplicated artifact download + docker
   load on the self-hosted runner. Rename the jobs to precheck-*
   to reflect their remaining purpose: per-OS setup (lock file,
   keychain unlock on macOS; git isolation + Docker readiness
   check on Windows) that must fail fast BEFORE the parallel
   test matrix spins up. needs: chain for downstream tests
   (run_*_docker_*) still includes build-docker-image-* directly
   so the artifact is ready when tests load it. Gate jobs
   (gate_macos, gate_windows) updated to reference the new job
   ids.

Verified:
  * go test -race -run TestUpdateMock ./pkg/agent/proxy/integrations/http/
    → both tests PASS
  * go build ./... clean
  * YAML syntax valid for prepare_and_run.yml and release.yml
Signed-off-by: Shubham Jain <shubhamkjain@outlook.com>
@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.83ms 3.66ms 5.18ms 100.02 0.00% ✅ PASS
2 2.64ms 3.46ms 4.92ms 100.00 0.00% ✅ PASS
3 2.57ms 3.33ms 4.83ms 100.00 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/prepare_and_run.yml Outdated

build-and-upload:
# Linux amd64 binaries (race + non-race). `build` = race, consumed
# by most linux docker tests and by the linux/amd64 docker image.
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on this job says the race-enabled build artifact is consumed by the linux/amd64 Docker image, but the Docker image build below actually downloads and embeds build-no-race. Please update this comment to reflect the current flow so future maintainers don’t assume the container image is race-enabled.

Suggested change
# by most linux docker tests and by the linux/amd64 docker image.
# by most linux docker tests. The linux/amd64 docker image embeds
# `build-no-race`, so the container image is not race-enabled.

Copilot uses AI. Check for mistakes.
Comment thread .github/workflows/release.yml Outdated
Comment on lines +168 to +177
# Register qemu-user-static binfmt handlers explicitly so the
# linux/arm64 leg of the multi-arch build below runs the base
# image's RUN steps (apt-get + sed/chmod in Dockerfile.runtime)
# under emulation when this job lands on an amd64 runner. The
# keploy binary itself is a pre-built native arm64 artifact —
# qemu is only doing the debian:trixie-slim bootstrap, not a
# go-build cross-compile, so the perf hit is bounded to a
# single apt-get layer.
- name: Setup QEMU (for linux/arm64 base-image bootstrap)
uses: docker/setup-qemu-action@v3
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description/performance notes mention “No qemu emulation anywhere”, but this workflow explicitly sets up QEMU to run the linux/arm64 leg’s RUN steps under emulation. Either update the PR description to clarify that QEMU is still used for base-image bootstrap, or adjust the release docker build to truly avoid QEMU (e.g., use a multi-node buildx builder with a native arm64 node or run build-docker on an arm64 runner).

Copilot uses AI. Check for mistakes.
Copilot review follow-ups on 75e8260.

1. release.yml — Refactor build-docker into per-arch matrix over
   [ubuntu-latest / ubuntu-24.04-arm] with push-by-digest, plus a
   downstream build-docker-manifest job that stitches the digests
   into the public multi-arch manifest list via docker buildx
   imagetools create and cosign-signs the resulting manifest.

   This replaces the prior single-job multi-arch build that required
   docker/setup-qemu-action for the linux/arm64 leg's
   Dockerfile.runtime RUN steps (apt-get + sed/chmod). Each arch
   now builds natively on its own runner, making the PR
   description's 'no qemu anywhere in the release pipeline' claim
   actually true — not just for the goreleaser binary cross
   compile, but for the base-image bootstrap too. Wall-clock is
   about the same (both legs run in parallel; amd64 was never
   under qemu; arm64 shaves ~1 min of emulated apt-get).

   docker/metadata-action runs in both the per-arch legs (for
   labels) and in the manifest job (for tags); setting tags on the
   per-arch build would push single-arch images by tag, defeating
   the push-by-digest flow. Metadata is deterministic across runs
   for a given ref so the tags the manifest job computes match
   what the legs would have produced.

   mark-latest's needs: list updated from build-docker to
   build-docker-manifest since that's the new final-tag job.

2. prepare_and_run.yml — Fix stale build-and-upload header comment.
   Said 'build = race, consumed by most linux docker tests AND by
   the linux/amd64 docker image' — but the linux/amd64 docker image
   now uses build-no-race (see build-docker-image-amd64 download
   step). Updated to reflect build = native linux tests, build-no-race
   = docker image, with a note about why (race-runtime memory
   baseline vs go-memory-load's calibrated threshold).

Signed-off-by: Shubham Jain <shubhamkjain@outlook.com>
@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.51ms 3.2ms 4.69ms 100.00 0.00% ✅ PASS
2 2.43ms 3.04ms 4.4ms 100.03 0.00% ✅ PASS
3 2.42ms 3.01ms 4.13ms 100.02 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@slayerjain slayerjain merged commit 7065554 into main Apr 21, 2026
139 checks passed
@slayerjain slayerjain deleted the feat/docker-artifacts-reuse-binaries branch April 21, 2026 20:58
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 21, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants