Feature request: optional WFGY 16-problem RAG debugger for pandas-ai · Issue #1868 · sinaptik-ai/pandas-ai · GitHub
Skip to content

Feature request: optional WFGY 16-problem RAG debugger for pandas-ai #1868

@onestardao

Description

@onestardao

🚀 The feature

I would like to propose an optional diagnostic add-on for pandas-ai that uses the open source WFGY 16 Problem Map as a failure taxonomy for RAG-style workflows.

Concretely, this could be any of the following (whichever fits your roadmap best):

  • a short “debugging RAG failures” recipe in the docs, showing how to send a failed pandas-ai query (prompt, generated code, error, retrieved context) into an external WFGY debugger script, and get back a Problem Map number No.1–No.16 plus a suggested fix, or
  • a small optional helper / callback that users can plug into their existing pandas-ai pipelines to dump a failed interaction into WFGY for classification.

The WFGY debugger itself is just plain Python that talks to any OpenAI-compatible endpoint (no extra infra). It reads the WFGY Problem Map text files and uses them as a 16-mode failure map (hallucination, retrieval drift, bootstrap ordering, config drift, etc.) and then returns:

  • one primary ProblemMap number No.1–No.16
  • an optional secondary candidate
  • a short explanation and pointer to the corresponding WFGY doc.

Motivation, pitch

A lot of pandas-ai users are effectively building lightweight RAG systems over tabular or mixed data: LLMs generate code, hit the database, call tools, and then answer in natural language. When something goes wrong, it is often hard to tell what kind of failure it is:

  • sometimes the retrieval is wrong (wrong file, wrong table, stale embedding),
  • sometimes the reasoning is wrong even though the data is fine,
  • sometimes the infra or config is wrong (missing secrets, startup races, version drift, etc.).

WFGY Problem Map is an open source taxonomy of 16 common AI system failure modes, originally built for RAG debugging. It focuses on things like:

  • No.1 hallucination / chunk drift,
  • No.2 interpretation collapse,
  • No.5 embedding vs semantic mismatch,
  • No.8 missing retrieval traceability,
  • No.14 bootstrap ordering issues,
  • No.16 pre-deploy / secrets drift,

and so on.

Right now pandas-ai already gives users a lot of power for building RAG-style workflows. A small, optional integration or recipe that says “when something weird happens, send the trace through this 16-mode debugger and see what kind of failure it is” could make debugging much easier, especially for less experienced users.

My goal here is not to change any core behaviour, but to offer a standard vocabulary for failures that many pandas-ai users could share, and to give them a concrete next step when things break.

Alternatives

Right now the main alternative is to run WFGY completely outside of pandas-ai:

  • when a pandas-ai call fails in a confusing way, the developer manually copies the prompt, generated code, logs, and context into a separate WFGY debugger notebook,
  • the notebook classifies the failure into one of the 16 Problem Map modes and suggests a fix.

This works in practice, but it is a bit clumsy and most users will never discover it unless there is some official hook or recipe in the pandas-ai ecosystem.

Another alternative is of course to build a completely separate, pandas-ai-specific taxonomy of failures. My suggestion here is to reuse an existing open source one (WFGY) so that different tools and repos can eventually speak the same language about RAG / LLM failures.

Additional context

If this sounds interesting and in-scope, I am happy to:

  • prepare a small example notebook that shows how to pipe a pandas-ai failure into the WFGY debugger and interpret the output, or
  • draft a short docs section / recipe PR that you can adjust to match your style.

Relevant links:

Either way, totally fine if you feel this is out of scope for pandas-ai. I mainly wanted to ask before sending any PR. Thanks a lot for maintaining this project and for considering the idea.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions