{{ message }}
Allow gateway to boot with only a Postgres connection#7361
Open
AntoineToussaint wants to merge 10 commits intomainfrom
Open
Allow gateway to boot with only a Postgres connection#7361AntoineToussaint wants to merge 10 commits intomainfrom
AntoineToussaint wants to merge 10 commits intomainfrom
Conversation
When `--config-file` and `--default-config` are both absent, the gateway now falls through to the DB-authoritative load path whenever `TENSORZERO_POSTGRES_URL` is set. Previously this required explicit opt-in via the `ENABLE_CONFIG_IN_DATABASE` feature flag. An empty database is a valid starting point: every singleton falls back to its default and every collection is empty, so the gateway serves a functional runtime with zero user config. This is the first step toward a "zero-config deploy": the operator provides a database URL and populates functions, variants, and models through REST endpoints. Also adds an empty-database smoke test for `load_config_from_db`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 23, 2026
Apply review feedback on the startup-config-from-Postgres fallback: - Treat an empty `TENSORZERO_POSTGRES_URL` as absent so a shell/compose misconfiguration produces the clear "no config source" error instead of an opaque sqlx dial failure. - Read the env var once and thread the `Option<String>` into `load_startup_config_from_database`, eliminating the double read. - Log a prominent `WARN` when falling through to the implicit DB path (env var set, no feature flag, no `--config-file`) so operators see the fallback in startup logs. Many deployments set the env var for observability/rate-limiting without intending DB-config boot. - Replace the positional `(…, …, bool /* config_in_database */)` tuple with a `StartupConfig` struct so callers don't rely on an inline-comment-documented bool. - Introduce a `TENSORZERO_POSTGRES_URL_ENV` constant for the two new call sites in this file. - Rewrite the empty-DB smoke test with `expect_that!` + `matches_pattern!` per `AGENTS.md` guidance, giving per-field failure diagnostics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`expect_that!` needs a `#[gtest]` test context to collect failures; the `#[sqlx::test]` macro doesn't provide one, so using it here panics with "No test context found" instead of running the assertion. Switch to `assert_that!`, which works without the gtest context and matches the convention used by every other test in this file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spawns the actual gateway binary with `TENSORZERO_POSTGRES_URL` set against a migrated Postgres and nothing else (no `--config-file`, no `--default-config`, no `ENABLE_CONFIG_IN_DATABASE` feature flag) and verifies the gateway binds a port, serves a healthy `/health`, and returns a well-formed `StatusResponse` from `/status`. This is the end-to-end counterpart to the unit-level empty-DB test on `load_config_from_db`: that one proves the loader returns defaults, this one proves the full binary actually reaches listening state and answers HTTP with that defaulted config. Also factors the "wait for listening + parse bound addr + build ChildData" tail of `start_gateway_impl` into a shared `await_gateway_listening` helper so the new `start_gateway_from_db_url_on_random_port` helper doesn't duplicate it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extend the new integration test from "does the gateway serve /health" to the full config-in-database scenario the UI will build on top of: migrated Postgres, no config rows, no `--config-file`, feature flag on, then assert: - `/health` 200 - `/status` returns `ok` + a non-empty `config_hash` - `/internal/config_toml` returns a default editable TOML whose hash matches `/status`, and whose `path_contents` is empty (no user-provided templates) - The TOML body parses as a valid TOML table The helper `start_gateway_from_db_url_on_random_port` now takes an `extra_env` slice so callers can either exercise the implicit-opt-in path (env var only) or the full config-in-database scenario (feature flag on) without duplicating the subprocess plumbing. Adds `toml` to gateway dev-dependencies for assertion-side parsing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a dedicated e2e scenario, parallel to `live-tests`, `live-tests-config-in-database`, `evaluation-tests`, and the existing live flavors: gateway booting from a migrated-but-empty Postgres + ClickHouse stack with `ENABLE_CONFIG_IN_DATABASE=true` and no `--config-file`. This is the deploy shape the configure-via-UI story builds on — schema present, no config rows, no files on disk. New pieces, all mirroring the existing config-in-database pattern: - `crates/tensorzero-core/tests/e2e/docker-compose.db-only-boot.yml`: override of `docker-compose.live.yml` that drops `gateway-migrate-config`, flips the feature flag, clears `--config-file`, and uses `!override` on `volumes` to remove every inherited bind mount (config TOMLs, fixtures, credentials) — so the gateway literally has nothing on disk to read. - `crates/tensorzero-core/tests/e2e/db_only_boot/mod.rs`: two `#[gtest] #[tokio::test]` Rust tests that run inside the live-tests container and hit the gateway over the compose network: one asserts `/status` reports the default config and a non-empty hash, the other asserts `/internal/config_toml` returns the same hash with empty `path_contents` and a TOML body that parses back as a valid table. - `crates/.config/nextest.toml`: new `db-only-boot` profile filtering to `db_only_boot::` tests, and `e2e`'s `default-filter` excludes them so they only run in their own CI job. - `.github/workflows/db-only-boot-e2e.yml`: new reusable workflow standing up the stack, running the profile inside `live-tests`, and asserting the gateway logs show the DB-authoritative boot banner. - `.github/workflows/general.yml`: wires the new job behind `detect-changes.outputs.code`; `ci/check-all-general-jobs-passed.sh` adds it to `ALLOWED_SKIP` so the merge queue tolerates skipped runs. Also drops the subprocess-spawning `crates/gateway/tests/boot_from_empty_db.rs` and its helper additions in `gateway/tests/common/mod.rs` and `gateway/Cargo.toml` — superseded by the in-container Rust test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four small cleanups from the branch review:
- `load_startup_config_from_database` takes `Option<&str>` instead of
`Option<String>` — the function never owned the url; caller now
passes `postgres_url.as_deref()`.
- Consolidate `UnwrittenConfig` import into the existing
`use tensorzero_core::config::{...}` block and drop the two inline
long-form paths, per AGENTS.md.
- Fold the three separate `expect_that!` calls on `StatusResponse`
into a single `matches_pattern!` — if the struct gains a field, the
test now makes a conscious choice instead of silently ignoring it.
- Replace `toml::from_str(...).unwrap_or_else(|e| panic!(...))` with
`assert_that!(parsed, ok(predicate(toml::Value::is_table)))` so
success + the "is-a-table" check collapse to one googletest
assertion.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two mistakes in the initial push of the new job: - The workflow pulls `tensorzero/live-tests:sha-$SHA` but only declared `build-gateway-container` in `needs:`. Adds `build-live-tests-container` and `build-fixtures-container` to the dependency list, matching `live-tests-config-in-database`. Also gates the job on the same fork/dependabot condition the sibling jobs use. - `pre-commit`'s `check-yaml` can't parse Compose's `!override` custom tag, so `validate` failed on the new compose file. Excludes that single file from `check-yaml`; Docker Compose still validates it at stack-up time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CI job failed because `docker compose run live-tests` started its full `depends_on` graph — including `fixtures-postgres`, which exits 1 when loading fixtures against a migrated-but-empty DB. The whole point of this scenario is an empty DB, so fixture loading is a semantic mismatch. Override `live-tests.depends_on` with `!override` to keep only the infra + gateway + migrations services and drop `fixtures` and `fixtures-postgres`. The `up --wait gateway` and the subsequent `run --rm live-tests` both pass locally after this change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Run the zero-config boot scenario in both observability modes:
- Postgres-config + ClickHouse-data (default TOML-config deploy shape)
- Postgres-config + Postgres-data (single-datastore deploy)
Matches the `live-tests` workflow's `database: [clickhouse, postgres]`
matrix. When `matrix.database == postgres`, sets
`TENSORZERO_INTERNAL_TEST_OBSERVABILITY_BACKEND=postgres` so the gateway
uses Postgres as the primary observability backend and exercises its
pgcron/pgvector/trigram extension checks.
The `check-all-general-jobs-passed.sh` ALLOWED_SKIP entry
(`db-only-boot-e2e`) already covers matrix-suffixed job names via the
existing `"entry ("` prefix match — no change needed there.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
TENSORZERO_POSTGRES_URLis set, even withoutENABLE_CONFIG_IN_DATABASE. An empty, freshly-migrated database is a valid starting point: every singleton defaults, every collection is empty.StartupConfigstruct replaces a positional(_, _, bool)tuple,Option<&str>threaded intoload_startup_config_from_databaseso the env var is read once, empty env var is filtered toNoneso a shell/compose misconfiguration produces the clear "no config source" error instead of an opaque sqlx dial failure, and aWARNlog fires when the gateway takes the implicit DB-boot path (many deployments set the env var for observability/rate-limiting without intending DB config).db-only-boot-e2ematrix overdatabase: [clickhouse, postgres]— migrated DB, no config rows, no files on disk, gateway boots, REST endpoints return the defaulted config.Test plan
load_config_from_db_returns_defaults_on_empty_database(usesmatches_pattern!field-by-field againstUninitializedConfig) — passes locally.db_only_boot::db_only_boot_serves_status_with_defaulted_configasserts/statusreturnsStatusResponse { status: "ok", version: $VERSION, config_hash: non-empty }.db_only_boot::db_only_boot_returns_default_config_via_config_toml_endpointasserts/internal/config_tomlreturns the same hash/statusreported, with emptypath_contentsand a TOML body that parses back as a valid table. The hash-equivalence check is the load-bearing contract (the UI depends on it).db-only-boot-e2e (database: clickhouse)anddb-only-boot-e2e (database: postgres)green. Existing--config-fileandlive-tests-config-in-databasepaths unchanged.What's NOT in this PR
config-editingprofile.🤖 Generated with Claude Code