Summary
When a provider API call fails (401/403/404/429/5xx), run_select_query
over MCP returns a successful tool result with {"rows": []}. The HTTP
failure is only visible on the server process's stderr. MCP clients cannot
distinguish "query ran, zero rows" from "query failed", and absence-based
checks (e.g. compliance controls of the form "this query must return no
rows") silently false-pass.
This is fixable entirely within stackql/stackql - any-sdk already returns
the erroneous response to the caller (see analysis below).
Reproduction
Exhaust the unauthenticated GitHub rate limit (60 req/h/IP), then via MCP
stdio with --auth='{"github": {"type": "null_auth"}}':
{"name": "run_select_query", "arguments": {"sql": "SELECT name FROM github.repos.repos WHERE org = 'stackql'"}}
Server stderr shows:
http response status code: 403, response body: {"message":"API rate limit exceeded for ..."}
but the tool result is isError: false with structuredContent.rows: []
and a "no results" markdown rendering. Observed on v0.10.500.
Analysis
-
any-sdk's HTTP layer (internal/anysdk/client.go,
parseReponseBodyIfErroneous) logs the error body to stderr but returns
the full non-2xx *http.Response to stackql with the body re-buffered -
the status code and body are available downstream. No any-sdk change is
required.
-
The swallow points are in
internal/stackql/execution/mono_valent_execution.go:
-
the plain-select path returns success with a nil payload on erroneous
responses:
if httpResponse.StatusCode >= 300 {
return newHTTPProcessorResponse(nil, reversalStream, false, nil)
}
-
similarly, res.HasError() routes the error text into the message
handler and returns a nil error.
By contrast the isMaterialiseResponse branch already generates an
error for StatusCode >= 400, so mutations/materialised ops surface
failures - it is specifically selects that degrade to empty results.
-
internal/stackql/mcpbackend.extractQueryResults already checks
resp.GetError() and would propagate a populated error; today it can
only ever produce the opaque failed to extract query results.
Proposed fix (two layers, both in this repo)
-
Execution: in the select path of mono_valent_execution.go, treat
StatusCode >= 400 as an error (mirroring the materialise branch),
carrying status code and response body into the ExecutorOutput error
instead of returning (nil payload, nil error).
-
MCP surface (pkg/mcp_server): map backend errors to
CallToolResult with isError: true and a structured error payload in
the text/structured content, including a retryability classification:
{
"error": {
"message": "GET https://api.github.com/... failed",
"http_status": 403,
"provider": "github",
"retryable": false
}
}
Suggested classification: 408/429/5xx retryable; 400/401/403/404 not
(with 403-rate-limit bodies ideally classified as retryable, since
GitHub uses 403 for rate limiting).
Impact
Any MCP client running absence-based checks gets false negatives today.
The interim mitigation we ship in auditron (stackql-mcp-rs demo) is a
canary control (SELECT that must return rows) whose failure flags a
broken run - workable, but it shouldn't be necessary, and it cannot
attribute the failure or advise on retry.
Summary
When a provider API call fails (401/403/404/429/5xx),
run_select_queryover MCP returns a successful tool result with
{"rows": []}. The HTTPfailure is only visible on the server process's stderr. MCP clients cannot
distinguish "query ran, zero rows" from "query failed", and absence-based
checks (e.g. compliance controls of the form "this query must return no
rows") silently false-pass.
This is fixable entirely within stackql/stackql - any-sdk already returns
the erroneous response to the caller (see analysis below).
Reproduction
Exhaust the unauthenticated GitHub rate limit (60 req/h/IP), then via MCP
stdio with
--auth='{"github": {"type": "null_auth"}}':{"name": "run_select_query", "arguments": {"sql": "SELECT name FROM github.repos.repos WHERE org = 'stackql'"}}Server stderr shows:
but the tool result is
isError: falsewithstructuredContent.rows: []and a "no results" markdown rendering. Observed on v0.10.500.
Analysis
any-sdk's HTTP layer (
internal/anysdk/client.go,parseReponseBodyIfErroneous) logs the error body to stderr but returnsthe full non-2xx
*http.Responseto stackql with the body re-buffered -the status code and body are available downstream. No any-sdk change is
required.
The swallow points are in
internal/stackql/execution/mono_valent_execution.go:the plain-select path returns success with a nil payload on erroneous
responses:
similarly,
res.HasError()routes the error text into the messagehandler and returns a nil error.
By contrast the
isMaterialiseResponsebranch already generates anerror for
StatusCode >= 400, so mutations/materialised ops surfacefailures - it is specifically selects that degrade to empty results.
internal/stackql/mcpbackend.extractQueryResultsalready checksresp.GetError()and would propagate a populated error; today it canonly ever produce the opaque
failed to extract query results.Proposed fix (two layers, both in this repo)
Execution: in the select path of
mono_valent_execution.go, treatStatusCode >= 400as an error (mirroring the materialise branch),carrying status code and response body into the
ExecutorOutputerrorinstead of returning
(nil payload, nil error).MCP surface (
pkg/mcp_server): map backend errors toCallToolResultwithisError: trueand a structured error payload inthe text/structured content, including a retryability classification:
{ "error": { "message": "GET https://api.github.com/... failed", "http_status": 403, "provider": "github", "retryable": false } }Suggested classification: 408/429/5xx retryable; 400/401/403/404 not
(with 403-rate-limit bodies ideally classified as retryable, since
GitHub uses 403 for rate limiting).
Impact
Any MCP client running absence-based checks gets false negatives today.
The interim mitigation we ship in auditron (stackql-mcp-rs demo) is a
canary control (
SELECTthat must return rows) whose failure flags abroken run - workable, but it shouldn't be necessary, and it cannot
attribute the failure or advise on retry.