fix(runtime): make codex Responses shim work against non-OpenAI backends by yaozheng-fang · Pull Request #592 · volcengine/veadk-python · GitHub
Skip to content

fix(runtime): make codex Responses shim work against non-OpenAI backends#592

Merged
yaozheng-fang merged 3 commits into
mainfrom
fix/codex-shim-ark-compat
Jun 9, 2026
Merged

fix(runtime): make codex Responses shim work against non-OpenAI backends#592
yaozheng-fang merged 3 commits into
mainfrom
fix/codex-shim-ark-compat

Conversation

@yaozheng-fang

Copy link
Copy Markdown
Collaborator

Problem

Agent(runtime="codex") bridged onto a non-OpenAI chat backend (e.g. Volcengine Ark) failed every turn with a generic RuntimeError: We're currently experiencing high demand. Two distinct incompatibilities in the Responses shim were the cause:

  1. Codex injects built-in tools Ark can't parse. Alongside the standard function tools, Codex sends a web_search tool whose schema carries OpenAI-only fields like external_web_access. Ark's stricter Responses endpoint returns BadRequest: unknown field "external_web_access"; Codex retries it and then surfaces the generic "high demand" message.
  2. Degenerate streaming. litellm's chat→Responses bridge can only emit a single response.completed event when streaming a chat backend (no response.created, no output_text.delta). Codex's strict SSE parser rejects that and reports the same generic error.

(Follow-up to #591, which fixed the openai_codex import.)

Fix (veadk/runtime/codex/proxy.py)

  • Sanitize tools: keep only type == "function" tools in the inbound request; drop web_search and other built-in OpenAI tool types the bridged backend doesn't understand.
  • Synthesize the stream: call the backend non-streaming, then expand the completed result into the canonical Responses event sequence Codex expects — response.created → per output item (output_item.addedoutput_text/reasoning_summary deltas → output_item.done) → response.completed.

Verification

agent = Agent(name="Xiaoming", ..., runtime="codex")
runner = Runner(agent=agent, short_term_memory=ShortTermMemory())
await runner.run("你叫什么")
# -> 我叫 **Xiaoming**(小明)!有什么需要帮忙的吗?😊

Agent(runtime="codex") on Ark (deepseek-v4-flash, via pip install openai-codex) now returns a real answer end-to-end. ruff check + format pass.

The codex runtime bridges Codex (Responses API only) onto a chat backend
(e.g. Volcengine Ark) via the in-process shim. Two incompatibilities made
every turn fail with a generic "high demand" error:

- Codex injects built-in tools (e.g. `web_search`) whose schema carries
  OpenAI-only fields like `external_web_access`. Ark's stricter Responses
  endpoint rejects the unknown field (BadRequest), which Codex retries and
  then surfaces as "high demand". Sanitize the inbound request to keep only
  standard `function` tools.
- litellm's chat->Responses bridge can only emit a single degenerate
  `response.completed` event when streaming a chat backend; Codex's strict
  SSE parser rejects that. Call the backend non-streaming and synthesize the
  canonical Responses event sequence (response.created -> per-item
  output_item.added / text+reasoning deltas / output_item.done ->
  response.completed) ourselves.

Verified end-to-end: Agent(runtime="codex") on Ark now returns a real answer.
…exit

The shim's uvicorn server runs as a background task that is never stopped, so
the event loop cancels it at process exit, and uvicorn logs a CancelledError
traceback from its lifespan handler. The shim app has no startup/shutdown
hooks, so disable the lifespan protocol (lifespan="off"). Cosmetic only.
…inal message

A reasoning model (e.g. DeepSeek) sometimes ends a turn with only a reasoning
item and no final agentMessage, leaving TurnResult.final_response empty —
result_to_events then returned [] and the turn printed nothing. Surface the
reasoning summary (with a short note) in that case so a turn is never silently
empty. Verified across repeated runs: no more empty outputs.
@yaozheng-fang yaozheng-fang merged commit cfac404 into main Jun 9, 2026
16 checks passed
yaozheng-fang added a commit that referenced this pull request Jun 9, 2026
…idelity (#593)

Follow-up to #592. Makes the codex runtime work for multi-step tool turns and
forward the whole turn faithfully, by aligning the shim and the ADK mapping
with the Codex Responses protocol and the genai/ADK event shapes.

proxy.py (the Responses shim):
- Stream `function_call` output items (output_item.added ->
  function_call_arguments.delta/.done -> output_item.done). Previously only
  message/reasoning were streamed, so a tool call was dropped and the turn
  ended at the model's preamble.
- Backfill `status="completed"` on replayed assistant messages in `input`;
  Ark's Responses API requires it (MissingParameter: input.status) and Codex
  replays them without it on multi-step turns.

translate.py (result -> ADK events):
- Forward every Codex thread item in order instead of collapsing to
  final_response: reasoning -> thought text; commandExecution / mcpToolCall /
  dynamicToolCall / fileChange / webSearch -> function_call + function_response;
  agentMessage / plan / any text-bearing item -> text; userMessage skipped.
- Coerce tool-call arguments to a dict and normalize status enums; fall back to
  final_response so a turn is never silently empty.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants