GitHub - callampin/ghost-lead-hunter: Ghost Lead Hunter: Agente autónomo de generación de leads B2B que utiliza IA para extraer, calificar y organizar oportunidades de venta desde Reddit y X sin intervención humana. Desarrollado en Python con arquitectura agnóstica de LLM. · GitHub
Skip to content

callampin/ghost-lead-hunter

Folders and files

Repository files navigation

Español

Ghost Lead Hunter

Proactive lead extraction and qualification using industrial-grade AI.

Python 3.12+ License: MIT Docker Ready

The Problem

Social networks (Reddit, X/Twitter) are saturated with conversations about problems, needs, and tool searches. 80% of the noise doesn't represent real sales opportunities. A salesperson's time is limited — manually reviewing thousands of posts is unsustainable.

Ghost Lead Hunter eliminates human intervention from the prospecting process:

  1. Monitors subreddits and Twitter in real-time
  2. Analyzes every post with AI to detect purchase intent
  3. Qualifies automatically (0-10) and generates outreach email
  4. Exports only "Grade A" leads (>8 points) to Notion

Architecture

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Reddit    │     │   Twitter   │     │    AI       │
│   (PRAW)    │     │ (Playwright) │     │ (Multi-LLM) │
└──────┬──────┘     └──────┬───────┘     └──────┬──────┘
       │                   │                     │
       └───────────────────┼─────────────────────┘
                           ▼
                ┌─────────────────────┐
                │   SQLite DB        │
                │   (Deduplication)   │
                └──────────┬──────────┘
                           │
                           ▼
                ┌─────────────────────┐
                │   Lead Scored      │
                │   Score > 8?       │
                └──────────┬──────────┘
                           │
                           ▼
                ┌─────────────────────┐
                │      Notion         │
                │   (Export)          │
                └─────────────────────┘

Tech Stack

Layer Technology Why
Language Python 3.12+ Type hints, async support, rich ecosystem
Scraping PRAW (Reddit) + Playwright (X) PRAW: Official API wrapper; Playwright: Browser automation resistant to platform changes
Data Validation Pydantic v2 Runtime type safety, serialization
LLM Gemini 1.5 Flash / OpenAI GPT-4o / Claude 3 / MiniMax M2.5 Agnostic design — switch providers via env var
Database SQLite Lightweight, zero-config, embedded
Orchestration Custom loop + python-dotenv Production-ready configuration management

Key Features

  • Multi-LLM Agnostic: Switch between Gemini, OpenAI, Claude, or MiniMax with a single environment variable
  • Smart Deduplication: Never process the same lead twice
  • Rate Limiting: Configurable delays to avoid bans
  • Graceful Degradation: Works without Notion if not configured
  • Docker-Ready: Optimized image build for production

Installation

1. Clone & Setup

git clone https://github.com/yourusername/ghost-lead-hunter.git
cd ghost-lead-hunter
cp .env.example .env

2. Install Dependencies

# Python dependencies
pip install -r requirements.txt

# Playwright browsers (required for Twitter/X scraping)
playwright install chromium

3. Configure Environment

Edit .env with your API keys:

# LLM Provider (gemini | openai | claude | minimax)
LLM_PROVIDER=gemini
GEMINI_API_KEY=your_gemini_key

# Notion (optional)
NOTION_API_KEY=your_notion_key
NOTION_DATABASE_ID=your_database_id

# Reddit (required for Reddit scraping)
REDDIT_CLIENT_ID=your_client_id
REDDIT_CLIENT_SECRET=your_client_secret
REDDIT_USER_AGENT=GhostLeadHunter/1.0

# Subreddits to monitor
REDDIT_SUBREDDITS=SaaS,SideProject,automation,startups

4. Run

# Development
python -m src.main

# Or with Docker (see Dockerfile)
docker build -t ghost-lead-hunter .
docker run -d --env-file .env ghost-lead-hunter

Configuration

Variable Default Description
LLM_PROVIDER gemini LLM backend: gemini, openai, claude, or minimax
SCAN_INTERVAL_SECONDS 300 Time between scan cycles
MIN_SCORE_THRESHOLD 8.0 Minimum score to export to Notion
REDDIT_SUBREDDITS SaaS,SideProject,automation Comma-separated list
TWITTER_KEYWORDS need tool,looking for software Search terms for Twitter
HEADLESS true Run Playwright in headless mode

Metrics & Success Criteria

Lead Qualification Pipeline

Stage Criteria Output
Raw Any post from monitored sources ~100-500/day
Analyzed Processed by LLM Score 0-10
Qualified Score > 8.0 Grade A Lead
Exported Sent to Notion Actionable leads

Sample Notion Output

Each exported lead includes:

  • Score: 0-10 purchase intent rating
  • Intent Level: high | medium | low
  • Reasoning: Why this was scored this way
  • Draft Email: Generated cold outreach (if score ≥ 7)
  • Source Link: Direct URL to original post

Docker Deployment

FROM python:3.12-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .
RUN playwright install chromium --with-deps

CMD ["python", "-m", "src.main"]
docker build -t ghost-lead-hunter .
docker run -d --name ghost-hunter --env-file .env ghost-lead-hunter

Project Structure

ghost-lead-hunter/
├── src/
│   ├── brain/                    # AI Analysis
│   │   ├── base.py               # Abstract LLM provider
│   │   └── providers/            # Gemini, OpenAI, Claude
│   ├── integrations/             # External services
│   │   └── notion_handler.py
│   ├── scrapers/                  # Data collection
│   │   ├── reddit.py
│   │   └── twitter.py
│   ├── models/                    # Data schemas
│   │   └── lead.py
│   ├── db/                        # Persistence
│   │   └── sqlite_manager.py
│   └── main.py                    # Orchestrator
├── tests/                         # Unit tests
├── .env.example                   # Configuration template
├── requirements.txt
└── Dockerfile

Testing

# Run unit tests
pytest tests/ -v

# With coverage
pytest tests/ --cov=src --cov-report=html

License

MIT License — See LICENSE for details.


Disclaimer

This project is for educational and portfolio purposes. Ensure you comply with:

  • Reddit's API Terms of Use
  • Twitter/X's Terms of Service
  • Notion's API Terms of Use

Respect rate limits and don't abuse scraping.

About

Ghost Lead Hunter: Agente autónomo de generación de leads B2B que utiliza IA para extraer, calificar y organizar oportunidades de venta desde Reddit y X sin intervención humana. Desarrollado en Python con arquitectura agnóstica de LLM.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

Languages