Proactive lead extraction and qualification using industrial-grade AI.
Social networks (Reddit, X/Twitter) are saturated with conversations about problems, needs, and tool searches. 80% of the noise doesn't represent real sales opportunities. A salesperson's time is limited — manually reviewing thousands of posts is unsustainable.
Ghost Lead Hunter eliminates human intervention from the prospecting process:
- Monitors subreddits and Twitter in real-time
- Analyzes every post with AI to detect purchase intent
- Qualifies automatically (0-10) and generates outreach email
- Exports only "Grade A" leads (>8 points) to Notion
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Reddit │ │ Twitter │ │ AI │
│ (PRAW) │ │ (Playwright) │ │ (Multi-LLM) │
└──────┬──────┘ └──────┬───────┘ └──────┬──────┘
│ │ │
└───────────────────┼─────────────────────┘
▼
┌─────────────────────┐
│ SQLite DB │
│ (Deduplication) │
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Lead Scored │
│ Score > 8? │
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Notion │
│ (Export) │
└─────────────────────┘
| Layer | Technology | Why |
|---|---|---|
| Language | Python 3.12+ | Type hints, async support, rich ecosystem |
| Scraping | PRAW (Reddit) + Playwright (X) | PRAW: Official API wrapper; Playwright: Browser automation resistant to platform changes |
| Data Validation | Pydantic v2 | Runtime type safety, serialization |
| LLM | Gemini 1.5 Flash / OpenAI GPT-4o / Claude 3 / MiniMax M2.5 | Agnostic design — switch providers via env var |
| Database | SQLite | Lightweight, zero-config, embedded |
| Orchestration | Custom loop + python-dotenv | Production-ready configuration management |
- Multi-LLM Agnostic: Switch between Gemini, OpenAI, Claude, or MiniMax with a single environment variable
- Smart Deduplication: Never process the same lead twice
- Rate Limiting: Configurable delays to avoid bans
- Graceful Degradation: Works without Notion if not configured
- Docker-Ready: Optimized image build for production
git clone https://github.com/yourusername/ghost-lead-hunter.git
cd ghost-lead-hunter
cp .env.example .env# Python dependencies
pip install -r requirements.txt
# Playwright browsers (required for Twitter/X scraping)
playwright install chromiumEdit .env with your API keys:
# LLM Provider (gemini | openai | claude | minimax)
LLM_PROVIDER=gemini
GEMINI_API_KEY=your_gemini_key
# Notion (optional)
NOTION_API_KEY=your_notion_key
NOTION_DATABASE_ID=your_database_id
# Reddit (required for Reddit scraping)
REDDIT_CLIENT_ID=your_client_id
REDDIT_CLIENT_SECRET=your_client_secret
REDDIT_USER_AGENT=GhostLeadHunter/1.0
# Subreddits to monitor
REDDIT_SUBREDDITS=SaaS,SideProject,automation,startups# Development
python -m src.main
# Or with Docker (see Dockerfile)
docker build -t ghost-lead-hunter .
docker run -d --env-file .env ghost-lead-hunter| Variable | Default | Description |
|---|---|---|
LLM_PROVIDER |
gemini |
LLM backend: gemini, openai, claude, or minimax |
SCAN_INTERVAL_SECONDS |
300 |
Time between scan cycles |
MIN_SCORE_THRESHOLD |
8.0 |
Minimum score to export to Notion |
REDDIT_SUBREDDITS |
SaaS,SideProject,automation |
Comma-separated list |
TWITTER_KEYWORDS |
need tool,looking for software |
Search terms for Twitter |
HEADLESS |
true |
Run Playwright in headless mode |
Each exported lead includes:
- Score: 0-10 purchase intent rating
- Intent Level:
high|medium|low - Reasoning: Why this was scored this way
- Draft Email: Generated cold outreach (if score ≥ 7)
- Source Link: Direct URL to original post
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
RUN playwright install chromium --with-deps
CMD ["python", "-m", "src.main"]docker build -t ghost-lead-hunter .
docker run -d --name ghost-hunter --env-file .env ghost-lead-hunterghost-lead-hunter/
├── src/
│ ├── brain/ # AI Analysis
│ │ ├── base.py # Abstract LLM provider
│ │ └── providers/ # Gemini, OpenAI, Claude
│ ├── integrations/ # External services
│ │ └── notion_handler.py
│ ├── scrapers/ # Data collection
│ │ ├── reddit.py
│ │ └── twitter.py
│ ├── models/ # Data schemas
│ │ └── lead.py
│ ├── db/ # Persistence
│ │ └── sqlite_manager.py
│ └── main.py # Orchestrator
├── tests/ # Unit tests
├── .env.example # Configuration template
├── requirements.txt
└── Dockerfile
# Run unit tests
pytest tests/ -v
# With coverage
pytest tests/ --cov=src --cov-report=htmlMIT License — See LICENSE for details.
This project is for educational and portfolio purposes. Ensure you comply with:
- Reddit's API Terms of Use
- Twitter/X's Terms of Service
- Notion's API Terms of Use
Respect rate limits and don't abuse scraping.
