GitHub - anas-bahi/python-sql-data-pipeline · GitHub
Skip to content

anas-bahi/python-sql-data-pipeline

Folders and files

Repository files navigation

🚀 Python SQL Data Pipeline 📌 Overview

This project builds a complete end-to-end data pipeline using Python and SQL. It demonstrates how raw data can be collected, cleaned, stored in a database, and analyzed using SQL queries.

The pipeline follows a real-world workflow used in data engineering and analytics:

Raw Data → Cleaning → Database → Query → Insights ⚙️ Tech Stack Python 3 Pandas – Data processing SQLAlchemy – Database connection SQLite – Relational database 📂 Project Structure python-sql-data-pipeline/ │ ├── data.csv # Raw dataset ├── clean.py # Data cleaning script ├── clean_data.csv # Cleaned dataset (generated) ├── database.py # Save data into SQLite database ├── sales.db # Database file (generated) ├── query.py # SQL queries ├── main.py # Run full pipeline └── README.md 🔄 Pipeline Steps

  1. Data Collection

Raw data is stored in a CSV file (data.csv).

  1. Data Cleaning Handles encoding issues (utf-8, latin1) Removes missing values Creates new column: total = price × quantity
  2. Database Storage Data is stored in SQLite Table name: sales
  3. Data Analysis (SQL) Uses SQL queries to analyze data Example: total sales per product ▶️ How to Run Install dependencies pip3 install pandas sqlalchemy Run full pipeline python3 main.py Or run step by step python3 clean.py python3 database.py python3 query.py 📊 Example Output Total sales per product:

Product Total Sales Apple 12.0 Banana 16.0 Orange 22.5 🎯 Key Features End-to-end data pipeline Real-world workflow SQL + Python integration Modular and scalable structure Handles messy CSV data 📈 Learning Outcomes Data cleaning with Pandas Working with SQL databases in Python Writing SQL queries for analysis Building automated pipelines 🚀 Future Improvements Connect to PostgreSQL / MySQL Add data visualization (Matplotlib / Seaborn) Use APIs instead of static CSV Automate with scheduling tools (Cron / Airflow) 💡 Why This Project?

This project demonstrates skills used in real jobs:

Data Engineering basics Data Analysis workflows SQL + Python integration Pipeline automation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors