{{ message }}
Retry wget downloads in Ubuntu and distroless Dockerfiles#105139
Merged
alexey-milovidov merged 1 commit intoMay 19, 2026
Merged
Retry wget downloads in Ubuntu and distroless Dockerfiles#105139alexey-milovidov merged 1 commit into
alexey-milovidov merged 1 commit into
Conversation
The `Docker server image` job intermittently fails when fetching the prebuilt `.deb` packages from `clickhouse-builds.s3.amazonaws.com`: ``` ERROR 500: Internal Server Error. ERROR 503: Service Unavailable. ``` Three unrelated PRs hit this S3 download flake over three days (`ClickHouse#100173` on 2026-05-15 with `500`, `ClickHouse#104694` on 2026-05-14 with `503`, `ClickHouse#104853` on 2026-05-13 with `503`). On master in the last 14 days the master `Docker server image` job has 1 failure of the same shape. `docker/server/Dockerfile.alpine` already wraps `wget` with a retry helper (added in ClickHouse#100380 to absorb the same kind of transient errors during Alpine cross-architecture builds via QEMU). The Ubuntu and distroless flavours still call `wget ... || exit 1` on the very first attempt, so a single `500`/`503` from S3 kills the whole build. This change ports the same `wget_with_retry` wrapper to `docker/server/Dockerfile.ubuntu` and `docker/server/Dockerfile.distroless` for the `DIRECT_DOWNLOAD_URLS` block (the path used by CI). `WGET_RETRIES` (default `5`) and `WGET_RETRY_DELAY` (default `1s`) match the Alpine version and remain overridable via `--build-arg`. CI report (PR ClickHouse#100173, sha 3ceac71): https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100173&sha=3ceac71a43a5fd4fc1a7859937b23e71b8fe42ae&name_0=PR&name_1=Docker%20server%20image Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Contributor
Author
Contributor
Author
|
Pre-PR validation gate (per worker policy): This is an infrastructure / Dockerfile fix, not a flaky-test fix, so the deterministic-repro gate doesn't apply 1:1. The relevant checks:
Cross-PR check at PR-creation time: no competing open PR touching |
alexey-milovidov
approved these changes
May 17, 2026
Contributor
pull Bot
pushed a commit
to Spencerx/ClickHouse
that referenced
this pull request
Jun 28, 2026
The `Docker keeper image` job intermittently fails while fetching the prebuilt packages from `clickhouse-builds.s3.amazonaws.com`. Example from PR ClickHouse#108573 (`docker/keeper/Dockerfile.distroless`, amd64): the `clickhouse-common-static` deb downloads fine, then the `clickhouse-keeper` deb gets HTTP request sent, awaiting response... 503 Service Unavailable ERROR 503: Service Unavailable. and the single-shot `wget ... || exit 1` kills the whole build, so the downstream `docker library image test` then reports `image does not exist!`. The server Dockerfiles already wrap `wget` with a retry helper: `Dockerfile.alpine` since ClickHouse#100380 (tgz path) and `Dockerfile.ubuntu` / `Dockerfile.distroless` since ClickHouse#105139 (deb path). The Keeper Dockerfiles were left with the brittle first-attempt `wget`, so a single 500/503 from S3 fails the build. This ports the same `wget_with_retry` wrapper to both Keeper files: - `docker/keeper/Dockerfile.distroless` deb path (`DIRECT_DOWNLOAD_URLS`), mirroring `docker/server/Dockerfile.distroless`. - `docker/keeper/Dockerfile` tgz path (`DIRECT_DOWNLOAD_URLS` and the repository fallback), mirroring `docker/server/Dockerfile.alpine`. `WGET_RETRIES` (default 5) and `WGET_RETRY_DELAY` (default 1s) match the server Dockerfiles and stay overridable via `--build-arg`. A persistent error still fails the build after the retries are exhausted. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
pull Bot
pushed a commit
to admariner/ClickHouse
that referenced
this pull request
Jun 30, 2026
The "Docker server image" job builds the distroless/alpine/ubuntu images with `docker buildx build`. buildx resolves the SBOM scanner image (docker/buildkit-syft-scanner, pulled because of --sbom=true) and base images from docker.io. When docker.io has a transient outage, the build fails at: #2 resolve image config for docker-image://docker.io/docker/buildkit-syft-scanner:stable-1 #2 ERROR: unexpected status from HEAD request to https://registry-1.docker.io/... and the downstream "docker library image test" then reports "image does not exist!" as a secondary victim. praktika's Shell.run / Result.from_commands_run already support retries with exponential backoff and an error allowlist, but the buildx build and the imagetools merge commands ran with no retries. Wire retries=5 and an allowlist of transient registry/network signatures into both. The allowlist keeps fail-fast behavior for genuine Dockerfile/build errors. This mirrors the existing retry hardening for the in-Dockerfile wget downloads (ClickHouse#105139) and the keeper image S3 artifact fetch (ClickHouse#108675). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

The
Docker server imagejob intermittently fails when fetching the prebuilt.debpackages fromclickhouse-builds.s3.amazonaws.com:Three unrelated PRs hit this S3 download flake over three days —
#100173on 2026-05-15 (500),#104694on 2026-05-14 (503),#104853on 2026-05-13 (503). The 14-day CIDB pattern is below.(Most of
2026-05-04/2026-05-05was a different, apt-ca-certificatesfailure mode; the recent500/503shape on May 13–15 is the one this PR addresses.)docker/server/Dockerfile.alpinealready wrapswgetwith a retry helper (added in #100380 to absorb the same kind of transient errors during Alpine cross-architecture builds via QEMU). The Ubuntu and distroless flavours still callwget ... || exit 1on the very first attempt, so a single500/503from S3 kills the whole build.This change ports the same
wget_with_retrywrapper todocker/server/Dockerfile.ubuntuanddocker/server/Dockerfile.distrolessfor theDIRECT_DOWNLOAD_URLSblock (the path used by CI).WGET_RETRIES(default5) andWGET_RETRY_DELAY(default1s) match the Alpine version and remain overridable via--build-arg.Triggered by @alexey-milovidov's directive on #100173:
CI report (#100173, sha
3ceac71): https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100173&sha=3ceac71a43a5fd4fc1a7859937b23e71b8fe42ae&name_0=PR&name_1=Docker%20server%20imageChangelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
...
Documentation entry for user-facing changes
Version info
26.5.1.797