iframe-proxy

BIORESTORE · 2026-06-06T21:15:36Z

Adds OpenRouter as a provider and turns the bench into something you can point at many models — two ways — plus a UI redesign.

The bench could already run any of the 44 frameworks on a task, but only one model at a time, baked in via env var. To compare reasoning frameworks you want to vary the model too; to test agent-style coordination you want different models handing work to each other in one run. OpenRouter (one key, OpenAI-compatible) makes every model reachable, so neither needed a per-provider integration.

The core change is in the engine: it resolves a model per call instead of binding one provider for the whole run. A plain provider instance still works for single-model runs and existing callers, but a caller can pass a resolver so each step picks its own model. That's what makes both new modes work across every framework and orchestration, not just chains.

On top of it:

Compare (src/matrix.js): run the same framework or chain on several models in parallel, isolated columns — one model failing doesn't abort the others. CLI --models a,b,c, or the web Models box.
Handoff: assign a model per chain stage (chain plan:claude,pot:gpt) or per orchestration role (--roles advocate,opponent,judge). The trace records which model produced each step.

The provider talks to OpenRouter over plain fetch, so the zero-dependency promise holds — no SDK — and keys stay server-side.

The UI moves from cramped dev styling to a light editorial layout: the catalog is a searchable index by category, the composer holds the mode and model controls, and results render per framework with the model trace. It also fails loudly with setup steps when the catalog can't load (it was a silent blank sidebar before) and shows a live framework count. System serif fonts only, so it stays offline.

Notes:

Compare and handoff assume an OpenRouter-style provider (one provider, many slugs). Under PROVIDER=anthropic/openai, slugs must be valid for that provider.
ReAct/ReWOO/Tool-chaining still simulate tool use in-prompt; real tool loops are a separate change.
Added a /health route for deploy checks.

Tests cover provider request shaping, model-per-call routing, compare isolation, and handoff ordering; CI smokes exercise compare and handoff. Everything runs offline against the mock provider.

OpenRouter joins anthropic/openai/mock (one key, many models) over the OpenAI-compatible REST API, so the zero-dependency story holds. The engine now resolves a model per call instead of binding one provider for the whole run. On top of the existing 44 frameworks that unlocks compare (run the same framework or chain on several models in parallel columns, via --models and the web Models box) and handoff (assign a model per chain stage with id:slug, or per orchestration role with --roles, so models hand work to each other within one run). CLI, server, tests, CI smokes, .env.example and the README cover all three.

Editorial light theme: warm paper, system serif display type (no web-font dependency), the catalog as a numbered index with section rules, a single red accent, and results as serif cards that show which model ran each step. The compare and handoff controls are wired into the composer. The catalog fetch now fails loudly with setup instructions instead of a blank sidebar when nothing answers /api/catalog, and the header shows a live framework count. Adds .claude/launch.json so the app can be previewed.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b8fb2b8300

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-06T21:17:50Z

+      if (mode === "single")     result = await engine.runSingle(ids[0], inputs, opts);
+      else if (mode === "chain") result = await engine.runChain(ids, inputs, opts);
+      else                       result = { results: await engine.runAll(ids, inputs, opts) };


Keep per-run model overrides out of comparison columns

When opts.model or opts.roleModels is supplied together with models (possible through /api/run, and through CLI --models ... --roles ...), forwarding the unchanged options lets OpenRouter's complete() override the provider's column-bound model. Consequently, columns can execute on the same override or on role-specific models while still being labeled as their requested comparison model, corrupting comparison results. Comparison mode should remove these overrides or explicitly force every call to the current column model.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request introduces multi-model comparison and handoff capabilities to the framework-lab test bench, adding OpenRouter support, a matrix runner for executing pipelines across multiple models, and a redesigned web UI to configure these options. The review feedback highlights a potential memory leak and state-sharing issue in server.js due to the global engine singleton caching providers, a bug in src/providers/index.js where missing API keys for explicitly requested providers silently fall back to the mock provider instead of throwing errors, and an opportunity to run model comparisons in parallel using Promise.all in src/matrix.js to improve performance.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-06T21:18:43Z

+      // Handoff / single: opts.model + opts.roleModels + per-stage models flow through the engine.
      if (mode === "single") return json(res, 200, await engine.runSingle(ids[0], inputs, opts));
      if (mode === "all")    return json(res, 200, { results: await engine.runAll(ids, inputs, opts) });
-      if (mode === "chain")  return json(res, 200, await engine.runChain(ids, inputs, opts));
+      if (mode === "chain")  return json(res, 200, await engine.runChain(stages || ids, inputs, opts));


The global engine singleton defined on line 20 caches resolved providers in a Map that is never cleared. If clients send requests with unique or arbitrary model names, this cache will grow indefinitely, causing a memory leak in the long-running server process. Additionally, sharing provider instances across concurrent requests means that stateful providers (like the mock provider with its call counter n) will share state, leading to non-deterministic behavior.\n\nTo fix this, instantiate the engine dynamically per-request inside the POST handler, and remove the global engine declaration on line 20.

Suggested change

// Handoff / single: opts.model + opts.roleModels + per-stage models flow through the engine.

if (mode === "single") return json(res, 200, await engine.runSingle(ids[0], inputs, opts));

if (mode === "all") return json(res, 200, { results: await engine.runAll(ids, inputs, opts) });

if (mode === "chain") return json(res, 200, await engine.runChain(ids, inputs, opts));

if (mode === "chain") return json(res, 200, await engine.runChain(stages || ids, inputs, opts));

// Handoff / single: opts.model + opts.roleModels + per-stage models flow through the engine.\n const requestEngine = createEngine(model => getProvider(process.env, { model }));\n if (mode === "single") return json(res, 200, await requestEngine.runSingle(ids[0], inputs, opts));\n if (mode === "all") return json(res, 200, { results: await requestEngine.runAll(ids, inputs, opts) });\n if (mode === "chain") return json(res, 200, await requestEngine.runChain(stages || ids, inputs, opts));

gemini-code-assist · 2026-06-06T21:18:43Z

+  if (want === "openrouter" || (!want && env.OPENROUTER_API_KEY)) {
+    if (env.OPENROUTER_API_KEY) return openrouter(env, model);
+  }
  if (want === "anthropic" || (!want && env.ANTHROPIC_API_KEY)) {
-    if (env.ANTHROPIC_API_KEY) return anthropic(env);
+    if (env.ANTHROPIC_API_KEY) return anthropic(env, model);
  }
  if (want === "openai" || (!want && env.OPENAI_API_KEY)) {
-    if (env.OPENAI_API_KEY) return openai(env);
+    if (env.OPENAI_API_KEY) return openai(env, model);
  }


There is a bug in the provider resolution logic when a provider is explicitly requested via PROVIDER but its corresponding API key is missing.\n\nFor example, if PROVIDER=openrouter is set but OPENROUTER_API_KEY is undefined, the outer condition want === "openrouter" is true, but the inner condition if (env.OPENROUTER_API_KEY) is false. This causes the function to silently fall through to the next provider checks, eventually returning the mock provider. This completely bypasses the helpful configuration error checks defined inside the provider modules (e.g., throw new Error("OPENROUTER_API_KEY not set")).\n\nRemoving the inner key checks allows the requested provider to be returned directly, so it can throw a clear and actionable error message.

if (want === "openrouter" || (!want && env.OPENROUTER_API_KEY)) {\n return openrouter(env, model);\n }\n if (want === "anthropic" || (!want && env.ANTHROPIC_API_KEY)) {\n return anthropic(env, model);\n }\n if (want === "openai" || (!want && env.OPENAI_API_KEY)) {\n return openai(env, model);\n }

gemini-code-assist · 2026-06-06T21:18:43Z

+  for (const model of models) {
+    onEvent({ type: "model:start", model });
+    const provider = getProvider(env, { model });
+    const engine = createEngine(provider, onEvent);
+    let result = null, error = null;
+    try {
+      if (mode === "single")     result = await engine.runSingle(ids[0], inputs, opts);
+      else if (mode === "chain") result = await engine.runChain(ids, inputs, opts);
+      else                       result = { results: await engine.runAll(ids, inputs, opts) };
+    } catch (e) {
+      error = e.message;
+    }
+    columns.push({ model, provider: provider.name, mode, result, error });
+    onEvent({ type: "model:end", model });
+  }


The comparison mode currently runs sequentially because of the await inside the for...of loop. This means if a user compares multiple models, they must wait for each model's pipeline to completely finish before the next one starts. For multi-call orchestrations or chains, this can be extremely slow.\n\nRunning the models in parallel using Promise.all would significantly improve performance (especially for the web UI). Note that if you run them in parallel, the onEvent console logs for the CLI might interleave, but this can be managed or accepted as a trade-off for the massive speedup.

const tasks = models.map(async (model) => {\n onEvent({ type: "model:start", model });\n const provider = getProvider(env, { model });\n const engine = createEngine(provider, onEvent);\n let result = null, error = null;\n try {\n if (mode === "single") result = await engine.runSingle(ids[0], inputs, opts);\n else if (mode === "chain") result = await engine.runChain(ids, inputs, opts);\n else result = { results: await engine.runAll(ids, inputs, opts) };\n } catch (e) {\n error = e.message;\n }\n onEvent({ type: "model:end", model });\n return { model, provider: provider.name, mode, result, error };\n });\n const columns = await Promise.all(tasks);

Waji added 2 commits June 6, 2026 23:08

chatgpt-codex-connector Bot reviewed Jun 6, 2026

View reviewed changes

gemini-code-assist Bot reviewed Jun 6, 2026

View reviewed changes

BIORESTORE merged commit 12de08c into main Jun 7, 2026
3 checks passed

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenRouter provider, cross-model compare + handoff, and a redesigned UI#1

OpenRouter provider, cross-model compare + handoff, and a redesigned UI#1
BIORESTORE merged 2 commits into
mainfrom
feat/openrouter-multi-model-ui

BIORESTORE commented Jun 6, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Sunbelt Computer Software

PL/B Language Development and Support

Conversation

BIORESTORE commented Jun 6, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant