[FEATURE] MCP tools return empty result sets on upstream HTTP errors instead of erroring · Issue #670 · stackql/stackql · GitHub
Skip to content

[FEATURE] MCP tools return empty result sets on upstream HTTP errors instead of erroring #670

Description

@jeffreyaven

Summary

When a provider API call fails (401/403/404/429/5xx), run_select_query
over MCP returns a successful tool result with {"rows": []}. The HTTP
failure is only visible on the server process's stderr. MCP clients cannot
distinguish "query ran, zero rows" from "query failed", and absence-based
checks (e.g. compliance controls of the form "this query must return no
rows") silently false-pass.

This is fixable entirely within stackql/stackql - any-sdk already returns
the erroneous response to the caller (see analysis below).

Reproduction

Exhaust the unauthenticated GitHub rate limit (60 req/h/IP), then via MCP
stdio with --auth='{"github": {"type": "null_auth"}}':

{"name": "run_select_query", "arguments": {"sql": "SELECT name FROM github.repos.repos WHERE org = 'stackql'"}}

Server stderr shows:

http response status code: 403, response body: {"message":"API rate limit exceeded for ..."}

but the tool result is isError: false with structuredContent.rows: []
and a "no results" markdown rendering. Observed on v0.10.500.

Analysis

  • any-sdk's HTTP layer (internal/anysdk/client.go,
    parseReponseBodyIfErroneous) logs the error body to stderr but returns
    the full non-2xx *http.Response to stackql with the body re-buffered -
    the status code and body are available downstream. No any-sdk change is
    required.

  • The swallow points are in
    internal/stackql/execution/mono_valent_execution.go:

    • the plain-select path returns success with a nil payload on erroneous
      responses:

      if httpResponse.StatusCode >= 300 {
          return newHTTPProcessorResponse(nil, reversalStream, false, nil)
      }
    • similarly, res.HasError() routes the error text into the message
      handler and returns a nil error.

    By contrast the isMaterialiseResponse branch already generates an
    error for StatusCode >= 400, so mutations/materialised ops surface
    failures - it is specifically selects that degrade to empty results.

  • internal/stackql/mcpbackend.extractQueryResults already checks
    resp.GetError() and would propagate a populated error; today it can
    only ever produce the opaque failed to extract query results.

Proposed fix (two layers, both in this repo)

  1. Execution: in the select path of mono_valent_execution.go, treat
    StatusCode >= 400 as an error (mirroring the materialise branch),
    carrying status code and response body into the ExecutorOutput error
    instead of returning (nil payload, nil error).

  2. MCP surface (pkg/mcp_server): map backend errors to
    CallToolResult with isError: true and a structured error payload in
    the text/structured content, including a retryability classification:

    {
      "error": {
        "message": "GET https://api.github.com/... failed",
        "http_status": 403,
        "provider": "github",
        "retryable": false
      }
    }

    Suggested classification: 408/429/5xx retryable; 400/401/403/404 not
    (with 403-rate-limit bodies ideally classified as retryable, since
    GitHub uses 403 for rate limiting).

Impact

Any MCP client running absence-based checks gets false negatives today.
The interim mitigation we ship in auditron (stackql-mcp-rs demo) is a
canary control (SELECT that must return rows) whose failure flags a
broken run - workable, but it shouldn't be necessary, and it cannot
attribute the failure or advise on retry.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions