A self-evolving multi-agent MVP generator that transforms business requirements into complete specifications.
The AI Factory uses collaborative AI agents to:
- Parse business requirements from documents (PDF, images, text)
- Generate user stories and architecture
- Draft backend and frontend specifications
- Review for quality issues
- Fix identified problems
- Evaluate final output with PM gatekeeper
- Learn from failures to improve over time
Agents read persistent "playbooks" (Markdown files with learned rules). When output fails PM evaluation, a Coach agent extracts lessons and updates playbooks. The system gets smarter with each failure.
-
Install dependencies:
poetry install
-
Set up environment variables:
cp .env.example .env # Edit .env with your API keys -
Run the application:
poetry run streamlit run app.py
GOOGLE_API_KEY=<gemini_key> # For document vision
GROQ_API_KEY=<deepseek_key> # For agent reasoning
TAVILY_API_KEY=<tavily_key> # For library verification
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=<key>| Agent | Purpose | API |
|---|---|---|
| VisionAgent | Document parsing | Gemini 1.5 Pro |
| PMAgent | User stories, clarifications, evaluation | DeepSeek R1 (Groq) |
| TechLeadAgent | Architecture design | DeepSeek R1 (Groq) |
| DevTeamAgent | Backend/Frontend specs | DeepSeek R1 (Groq) |
| QAAgent | Quality analysis | DeepSeek R1 (Groq) |
| CoachAgent | Lesson extraction | DeepSeek R1 (Groq) |
- Ingestion - Parse uploaded document
- Foundation - Generate user stories + architecture
- Clarification - Resolve ambiguities
- Drafting - Create backend/frontend specs
- QA - Analyze for issues
- Fixing - Apply corrections
- Gatekeeper - PM evaluation (may loop back)
- Output - Generate final files
- ⏱️ Rate Limit Protection - Exponential backoff (1s → 16s)
- 🔄 Infinite Loop Guard - Max 3 retries then force proceed
- 📝 Memory Deduplication - 80% similarity threshold
- 💾 State Recovery - Graceful corruption handling
ai_factory/
├── app.py # Streamlit entry point
├── src/
│ ├── agents/ # AI agent implementations
│ │ ├── base.py # Abstract base + Groq wrapper
│ │ ├── vision.py # Gemini document parser
│ │ ├── pm.py # Product Manager
│ │ ├── tech_lead.py # Architect
│ │ ├── dev_team.py # Backend/Frontend
│ │ ├── qa.py # Quality checker
│ │ └── coach.py # Memory updater
│ ├── utils/ # Utilities
│ │ ├── state.py # Session management
│ │ ├── files.py # I/O operations
│ │ └── fuzzy.py # Text similarity
│ └── schemas.py # Pydantic models
├── memory/ # Persistent playbooks
│ ├── pm_playbook.md
│ ├── tech_lead_playbook.md
│ ├── backend_playbook.md
│ ├── frontend_playbook.md
│ └── qa_playbook.md
└── projects/ # Generated outputs
Each agent reads from learned rules to improve over time:
- PM Playbook - Requirements quality, evaluation criteria
- Tech Lead Playbook - Architecture decisions, tech choices
- Backend Playbook - API design, error handling
- Frontend Playbook - UX patterns, state management
- QA Playbook - Security checks, quality standards
- Upload clear requirements - The better the input, the better the output
- Review ambiguities - The system flags unclear requirements
- Check QA report - Critical issues must be addressed
- Trust the process - If rejected, the system learns and improves
Each project generates:
- Gemini fails: Falls back to text input
- Groq fails: Retries with exponential backoff
- Tavily fails: Uses fallback version list
- PM rejects: Extracts lessons and retries
- Max retries: Forces proceed with warnings
MIT License - Feel free to modify and use.
