Allow adding/editing files not referenced by config by shuyangli · Pull Request #7275 · tensorzero/tensorzero · GitHub
Skip to content

Allow adding/editing files not referenced by config#7275

Open
shuyangli wants to merge 8 commits intomainfrom
sl/allow-files-not-referenced-by-config
Open

Allow adding/editing files not referenced by config#7275
shuyangli wants to merge 8 commits intomainfrom
sl/allow-files-not-referenced-by-config

Conversation

@shuyangli
Copy link
Copy Markdown
Contributor

@shuyangli shuyangli commented Apr 13, 2026

We add "deleted_at" to stored files so we can show all latest version of the same file that are not deleted, regardless of whether it's referenced in the config toml. This allows persisting files that are created but not yet referenced in the config toml (users may want to write a file, save it, then change the config to reference it).


Note

Medium Risk
Touches Postgres schema and write semantics for stored_files, including new tombstoning behavior and editor reads; mistakes could hide files or break editor/config apply flows, but changes are scoped and covered by new e2e tests.

Overview
Config editor can now persist and reload “free” files not yet referenced by TOML. The editor GET/apply flow now merges all active stored_files into path_contents, computes CAS signatures over this merged view, and on apply persists unreferenced files while tombstoning removed ones.

stored_files gains soft-delete + “single active row per path” enforcement. Adds deleted_at with a migration that backfills by tombstoning superseded rows and creates a partial index for active lookups; the write path now only reuses active rows and tombstones other active versions per file_path. New load_editor_path_contents reads non-tombstoned rows and errors on duplicates, with expanded Rust + Playwright e2e coverage for free-file add/edit/remove and referenced-file rewrites.

Reviewed by Cursor Bugbot for commit a9a479e. Bugbot is set up for automated code reviews on this repo. Configure here.

@shuyangli shuyangli force-pushed the sl/allow-files-not-referenced-by-config branch from d6c5834 to 4a901d3 Compare April 13, 2026 18:34
@shuyangli shuyangli force-pushed the sl/ui-config-editor branch 2 times, most recently from 132b322 to 8a20cf5 Compare April 13, 2026 19:09
@shuyangli shuyangli force-pushed the sl/allow-files-not-referenced-by-config branch from 4a901d3 to e68483f Compare April 13, 2026 19:09
@shuyangli shuyangli force-pushed the sl/ui-config-editor branch 2 times, most recently from 217dc60 to 65af36b Compare April 13, 2026 19:50
@shuyangli shuyangli force-pushed the sl/allow-files-not-referenced-by-config branch 2 times, most recently from 2bb0d77 to 234904b Compare April 13, 2026 21:00
@shuyangli shuyangli force-pushed the sl/ui-config-editor branch 2 times, most recently from 5ab0cc8 to de96668 Compare April 13, 2026 21:14
@shuyangli shuyangli force-pushed the sl/ui-config-editor branch from de96668 to 4d3a06b Compare April 13, 2026 21:18
@shuyangli shuyangli force-pushed the sl/allow-files-not-referenced-by-config branch from 234904b to d383465 Compare April 13, 2026 21:18
Base automatically changed from sl/ui-config-editor to main April 13, 2026 22:35
@shuyangli shuyangli force-pushed the sl/allow-files-not-referenced-by-config branch from 63ec7ab to cf0aad6 Compare April 14, 2026 14:44
@shuyangli shuyangli marked this pull request as ready for review April 14, 2026 14:44
@shuyangli
Copy link
Copy Markdown
Contributor Author

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cf0aad6fb8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +383 to +387
qb.push_values(free_files, |mut b, (path, body)| {
let hash = blake3::hash(body.as_bytes()).as_bytes().to_vec();
b.push_bind(Uuid::now_v7())
.push_bind(path)
.push_bind(body)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Deduplicate free-file inserts before writing stored_files

write_free_files_in_tx unconditionally inserts every free file on every apply, but this path never reuses an existing (file_path, content_hash) row like write_collected_files does. Re-saving an unchanged editor state repeatedly will accumulate duplicate active rows for the same file path, and those rows are only tombstoned when the path is removed entirely, so table size and GET /internal/config_toml scan cost grow with save count.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CTE in write_free_files_in_tx already dedups on the editor's "content unchanged" path: its WHERE NOT EXISTS (SELECT 1 ... WHERE file_path = input.file_path AND content_hash = input.content_hash AND deleted_at IS NULL) clause skips the insert when an active row with the same (file_path, content_hash) already exists, so re-saving an unchanged editor state is a no-op on that path.

Content-change dedup is separately covered — the paired UPDATE ... FROM new_rows ... WHERE t.id != n.id tombstones any previously-active row for the same file_path, and write_collected_files (reached via extra_templates for free files too) now does the same tombstone step for referenced files (d177b54). Postgres regression test added in 0641cb8.

If there's a scenario you think is still doubled up, happy to look — but I don't see one in the current code.

Comment on lines +984 to +988
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add a deterministic tie-breaker to editor file selection

This DISTINCT ON (file_path) query orders only by created_at DESC, so when multiple active rows for the same path share a timestamp, Postgres may return either row nondeterministically. In this change, one apply can create multiple same-path rows in a single transaction, and NOW() is transaction-stable, so ties are realistic; if tied rows differ, editor path_contents (and therefore CAS signatures) can flip unpredictably across reads.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should reject multiple rows with the same path

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — load_editor_path_contents now returns an error (in b736742) instead of silently picking a row when stored_files has more than one active row for a file_path. Also dropped the DISTINCT ON (and the no-longer-needed created_at DESC column from idx_stored_files_editor_latest) — the write path is the single source of truth for maintaining the at-most-one-active-row invariant.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — the DISTINCT ON is gone entirely. load_editor_path_contents now does a plain SELECT file_path, source_body WHERE deleted_at IS NULL and errors if any file_path appears more than once (b736742). The write path maintains the "at most one active row per file_path" invariant: free files via the CTE's paired INSERT … RETURNING + UPDATE tombstone, referenced files via write_collected_files's new tombstone step (d177b54). Pre-migration duplicates are backfilled (tombstoned) in 1d20517 so the invariant holds from day one. Postgres regression test in 0641cb8.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit cf0aad6. Configure here.

@shuyangli shuyangli force-pushed the sl/allow-files-not-referenced-by-config branch from f7be467 to c3591e3 Compare April 16, 2026 13:19
@shuyangli shuyangli force-pushed the sl/allow-files-not-referenced-by-config branch from c3591e3 to a1dcf7c Compare April 17, 2026 13:51
Per review: the write path maintains the invariant that at most one
active row exists per file_path, so we should surface a duplicate as a
data integrity error rather than silently picking a row via
`DISTINCT ON` + tie-breaker. Drops the `created_at DESC` column from
the partial index since the read no longer sorts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread crates/tensorzero-core/src/endpoints/internal/config_toml.rs Outdated
AntoineToussaint and others added 3 commits April 22, 2026 13:44
The `load_editor_path_contents` integrity check added in b736742
assumes the write path maintains "at most one active row per
`file_path`". The free-file CTE already maintains this, but
`write_collected_files` (used for referenced files) inserted new rows
on content change without tombstoning old ones, which is what caused
CI failures once the read path started rejecting duplicates.

Make `write_collected_files` maintain the invariant:
- Filter existing-row lookup on `deleted_at IS NULL` so we never reuse
  a tombstoned UUID (which would leave the path with zero active rows).
- After inserting new rows, tombstone any OTHER active rows for the
  processed `file_path`s. Config consumers read files by UUID without a
  `deleted_at` filter, so older function/tool/eval versions still
  resolve correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two follow-ups from the review:

1. `write_free_files_in_tx` tombstones any active row whose `file_path`
   is not in `all_new_paths`. Previously `all_new_paths` only contained
   what the client sent in `request.path_contents`, so an apply that
   omitted a canonical file from `path_contents` would tombstone its
   still-referenced active row. Union the canonical paths into
   `all_new_paths` so canonical files are always protected regardless
   of what the client includes.

2. Add a Postgres sqlx test asserting that editing a tool's referenced
   file body through `write_stored_config` twice leaves exactly one
   active row for the path (old version tombstoned, new version
   carries the updated content). This locks in the
   `write_collected_files` tombstone fix from d177b54.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before this migration, `write_collected_files` could leave multiple
rows with the same `file_path` (when a referenced file's content was
edited, a new row was inserted without tombstoning the old one). Once
the `deleted_at` column is added all of those rows default to
`deleted_at IS NULL`, which would break the new editor read path: it
enforces "at most one active row per `file_path`" and now returns a
data-integrity error rather than silently picking a row.

Tombstone every non-latest row per `file_path` in the same migration,
using `(created_at DESC, id DESC)` so the row we keep active matches
what the previous `DISTINCT ON` read-time tie-breaker would have
returned. The invariant holds from migration time forward and the
editor GET doesn't 500 on any deployment that already had duplicates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 1d20517. Configure here.

Comment thread crates/tensorzero-core/src/db/postgres/stored_config_queries.rs
Cursor review flagged the error message for using single-quoted `{:?}`
output around the file path and leaving `stored_files` / `file_path`
unquoted. Project convention prefers backticks for technical terms in
error messages and user-visible strings. Switch to backtick-wrapped
identifiers and `{}` formatting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants