feat(cli): add concurrent images pulling by avallete · Pull Request #4394 · supabase/cli · GitHub
Skip to content

feat(cli): add concurrent images pulling#4394

Merged
sweatybridge merged 22 commits into
developfrom
avallete/devwf-828-parallelize-container-image-pulls-in-supabase-cli-start
Nov 24, 2025
Merged

feat(cli): add concurrent images pulling#4394
sweatybridge merged 22 commits into
developfrom
avallete/devwf-828-parallelize-container-image-pulls-in-supabase-cli-start

Conversation

@avallete

@avallete avallete commented Nov 1, 2025

Copy link
Copy Markdown
Member

What kind of change does this PR introduce?

  • When running supabase start all images are pulled at the time the service is starting, so each services need to wait until it's started before pulling something else. This change the logic into two steps:
  1. List all services/images that will be needed concurrently (with a 1s delay between each to reduce the likeliness of hitting AWS ecr rate limit issue), prioritizing the largest images first.
  2. Actually start the service with docker (when the image is already there)

On my machine/network, this reduce the "cold start" (no images at all) of supabase start from ~2:43.19 total to ~1:30.33 total.

Note that even with all images already present there is a ~30s uncompressible delay to start all the containers. So this only bring the cold start download overhead from ~2.13min to ~1min.

@avallete avallete requested a review from a team as a code owner November 1, 2025 10:42

@sweatybridge sweatybridge left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check how docker compose downloads images and import that as a library instead? https://github.com/docker/compose/blob/main/pkg/compose/images.go

@avallete

avallete commented Nov 3, 2025

Copy link
Copy Markdown
Member Author

@sweatybridge

Copy link
Copy Markdown
Contributor

What about the console logs when pulling images concurrently? Docker compose has a nice display for each image and clears up when downloads complete.

@avallete

avallete commented Nov 3, 2025

Copy link
Copy Markdown
Member Author

What about the console logs when pulling images concurrently? Docker compose has a nice display for each image and clears up when downloads complete.

Handled this with a single spinner showing progress (nb of remaining images to pull) and a checkmark at the end to show which image failed/succeed.
Screenshot 2025-11-03 at 09 46 47

Screenshot 2025-11-03 at 09 47 49

I've looked at the docker-compose repo, and it's not built as a library, the pull logic seems purely internal so I think we need to come up with our own implementation here.

@sweatybridge

Copy link
Copy Markdown
Contributor

Hmm ok, I will explore a bit more if you don't mind. Their progress logs is something I've always wanted to try.

@avallete

avallete commented Nov 3, 2025

Copy link
Copy Markdown
Member Author

Hmm ok, I will explore a bit more if you don't mind. Their progress logs is something I've always wanted to try.

If it's only the progress bar you're interested in, then maybe that's doable, it seems like it's handled on it's own separate package in their codebase.

@avallete

avallete commented Nov 3, 2025

Copy link
Copy Markdown
Member Author

Made a new version keeping the concurrency logic in our code (since it's not exposed by compose). But improving the implementation by looking how compose does it (errgroup, channel).

Also added a 1s delay at the start between each image pull, this reduce greatly the number of "api rate limit" encountered which overall reduce the total pulling time.

And re-use part of compose for better logging (docker-compose style):

Also thank's for this new more detailed log (time for each image) we can see the largest image (postgres) takes ~1min to download, which match our total time of running from cached image (30s) + network speed (1min to download the largest).

@avallete avallete requested a review from sweatybridge November 3, 2025 10:20
@coveralls

coveralls commented Nov 3, 2025

Copy link
Copy Markdown

Pull Request Test Coverage Report for Build 19627513060

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 115 of 145 (79.31%) changed or added relevant lines in 2 files are covered.
  • 7 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.3%) to 55.395%

Changes Missing Coverage Covered Lines Changed/Added Lines %
internal/utils/config.go 86 92 93.48%
internal/start/start.go 29 53 54.72%
Files with Coverage Reduction New Missed Lines %
internal/storage/rm/rm.go 2 80.61%
internal/gen/keys/keys.go 5 12.9%
Totals Coverage Status
Change from base Build 19626009298: 0.3%
Covered Lines: 6659
Relevant Lines: 12021

💛 - Coveralls

@avallete

Copy link
Copy Markdown
Member Author

Closing in favor of: #4447

Which branch from 3b6ecd4 before the attempt to directly use the docker-compose approach.

Reason are:

  1. Couldn't figure out a way to have the docker-compose used as a library to have image pulling retry atomics (only allowed to retry pulling all images of service if one of them encounter an error during pulling)
  2. Couldn't distinguish between "fatal" and "retryable" errors while I can with custom implementation.
  3. Couldn't have the "pull the largest ones in priority" even though this is just a nitpick, it still makes things between 5-10% faster compare to non prioritized pulling of docker-compose during my testing.

@sweatybridge sweatybridge reopened this Nov 20, 2025
@sweatybridge

Copy link
Copy Markdown
Contributor

@sweatybridge sweatybridge merged commit 2b7134a into develop Nov 24, 2025
20 checks passed
@sweatybridge sweatybridge deleted the avallete/devwf-828-parallelize-container-image-pulls-in-supabase-cli-start branch November 24, 2025 08:17
jgoux pushed a commit to ametel01/cli that referenced this pull request Jun 24, 2026
…se#5681)

## What changed

`supabase start` could start containers before their Docker images had
finished downloading.

The command ran two uncoordinated image-acquisition paths:

1. A best-effort concurrent pre-pull (`pullImagesUsingCompose`) using
docker-compose's `Pull` with `PullOptions{IgnoreFailures: true}`. It
only targets the primary registry and, by design, **silently swallows
per-image pull failures** (the `IgnoreFailures` flag is the hook that
lets the registry fallback recover).
2. An authoritative lazy per-container pull inside `utils.DockerStart` →
`DockerResolveImageIfNotCached` (multi-registry fallback: ECR → GHCR →
Docker Hub).

So any image the concurrent pre-pull failed to cache — a transient
registry/network/rate-limit hiccup, common on a fresh machine pulling
10+ images at once — was pulled **later**, during the `Starting
database… / Starting containers…` phase. That is the "start doesn't wait
for pulls" behaviour from the issue. The pre-pull was added in supabase#4394,
matching the reporter's "last few versions" regression window.

## The fix

Add `ensureImagesCached`, a completeness pass that runs immediately
after the best-effort pre-pull and before any container starts. It
resolves every project image through the **same** multi-registry
fallback resolver `DockerStart` already uses
(`DockerResolveImageIfNotCached`), fanned out concurrently via the
existing `utils.WaitAll` primitive.

After it returns, every required image is guaranteed present in the
local cache, so the per-container `DockerStart` calls become pure cache
hits and never pull mid-start. On the happy path it is just N cheap
image inspects; an image that genuinely cannot be pulled from any
registry now fails the start cleanly **before** any container is
created, instead of limping into a half-pulled start. The compose
pre-pull (and its `IgnoreFailures`) is kept as the fast concurrent
progress UI — it is simply no longer relied on for completeness.

## TypeScript port

The native-TS port already pulls in a preparation phase that is awaited
before startup and fails hard on pull errors, so it does not have this
bug. This PR adds regression guards locking that contract in:

- `Stack.unit.test.ts`: `stack.start()` aborts and starts zero
containers when a docker pull fails.
- `prefetch.unit.test.ts`: preparation fails with `DockerPullError` when
the whole registry fallback chain fails.

Fixes supabase#5068
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants