A full-stack search engine with web crawler, indexer, and React frontend.
- 🕷️ Web crawler with robots.txt support
- 📊 BM25 + PageRank ranking algorithm
- ⚛️ React frontend with modern UI
- 🐳 Docker deployment ready
- 🔍 Fast search API
- Backend: Python, Flask, SQLite
- Frontend: React, Vite
- Deployment: Docker, Docker Compose
NEXUS/
├── app.py # Flask API + React serving
├── crawler.py # Web crawler
├── indexer.py # BM25 + PageRank indexer
├── main.py # CLI interface
├── requirements.txt # Python dependencies
├── Dockerfile # Docker build
├── docker-compose.yml # Deployment
├── frontend/ # React app
│ ├── src/
│ ├── dist/ # Built files
│ └── ...
└── search.db # SQLite database (generated)
- Python 3.11+
- Node.js 18+
- Docker (optional)
git clone <repo>
cd NEXUS
pip install -r requirements.txt# Crawl a website
python main.py crawl https://example.com --max 100
# Build search index
python main.py index# Backend
python main.py serve
# Frontend (separate terminal)
cd frontend
npm install
npm run devVisit http://localhost:5000 for the full app.
# Build image
docker build -t pythonsearch .
# Run container
docker run -p 5000:5000 -v $(pwd)/search.db:/app/search.db pythonsearchdocker-compose up --buildThe app will be available at http://localhost:5000.
# Crawl website
python main.py crawl <url> [--max N]
# Build index
python main.py index
# Start server
python main.py serve [--port 5000]
# Combined crawl + index + serve
python main.py run <url> [--max N] [--port 5000]
# Query from CLI
python main.py query "search term" [--n 10]GET /search?q=<query>&n=10- Search APIGET /stats- Index statisticsGET /- React frontend
cd frontend
npm install
npm run dev # Runs on :5173 with proxy to :5000
npm run build # Production buildpip install -r requirements.txt
python main.py serve # With debug=TrueCombines BM25 scoring with PageRank for relevance:
- BM25: Term frequency, document length normalization
- PageRank: Link-based importance scoring
- Combined Score: BM25 × (1 + α × PageRank), α=5.0
- Fork the repository
- Create a feature branch
- Make changes
- Test thoroughly
- Submit a pull request
MIT License - see LICENSE file for details.
NEXUS/ │── backend/ │ ├── app.py │ ├── search.py │ └── data.csv │ │── frontend/ │ ├── index.html │ ├── style.css │ └── script.js │ │── static/ │── templates/ │── README.md │── requirements.txt
- Dataset is loaded from CSV
- Data is preprocessed and indexed
- User enters search query
- Backend processes query:
- Tokenization
- Matching
- Scoring
- Results are ranked and returned
- UI displays results dynamically
The ranking system is based on:
- 📌 Keyword frequency
- 📌 Relevance score
- 📌 Position of keywords
- 📌 Matching accuracy
Query → Tokenize → Match → Score → Sort → Display
Contributions are welcome!
- Fork the repo
- Create a new branch
- Make your changes
- Submit a Pull Request
This project is licensed under the MIT License.
Anand Pandey
BTech Student | Developer | Data Enthusiast
If you like this project, give it a ⭐ on GitHub!
