python-data-engineering-mastery/00_getting_started at main · kenneth-pro/python-data-engineering-mastery · GitHub
Skip to content

Latest commit

 

History

History

Folders and files

README.md

Module 00: Getting Started with Python

🎯 Learning Objectives

  • Set up Python development environment
  • Understand Python's role in data engineering
  • Write and run your first Python scripts
  • Master the Python REPL and Jupyter notebooks

📊 Python in the Data Engineering Ecosystem

┌─────────────────────────────────────────────────────────────────────┐
│                      DATA ENGINEERING STACK                         │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐    │
│   │ EXTRACT  │───▶│TRANSFORM │───▶│   LOAD   │───▶│ ANALYZE  │    │
│   └──────────┘    └──────────┘    └──────────┘    └──────────┘    │
│        │               │               │               │           │
│        ▼               ▼               ▼               ▼           │
│   ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐    │
│   │  APIs    │    │  Pandas  │    │    S3    │    │ Jupyter  │    │
│   │  Files   │    │  PySpark │    │   DWH    │    │   BI     │    │
│   │   DBs    │    │  NumPy   │    │   Lake   │    │   ML     │    │
│   └──────────┘    └──────────┘    └──────────┘    └──────────┘    │
│                                                                     │
│                     🐍 ALL POWERED BY PYTHON 🐍                     │
└─────────────────────────────────────────────────────────────────────┘

🔧 Installation Guide

Windows

winget install Python.Python.3.11
python --version
pip --version

macOS

brew install python@3.11
python3 --version
pip3 --version

Linux (Ubuntu/Debian)

sudo apt update
sudo apt install python3.11 python3-pip python3-venv
python3 --version

🛠️ Development Environment

Recommended VS Code Extensions

Extension Purpose
Python IntelliSense, debugging, linting
Pylance Fast, feature-rich language server
Jupyter Notebook support in VS Code
autoDocstring Generate docstrings automatically
GitLens Enhanced Git integration

Virtual Environment Setup

# Create virtual environment
python -m venv venv

# Activate (Linux/macOS)
source venv/bin/activate

# Activate (Windows)
venv\Scripts\activate

# Install packages
pip install pandas numpy jupyter

# Deactivate when done
deactivate

📁 Recommended Project Structure

my_data_project/
│
├── 📁 src/                  # Source code
│   ├── __init__.py
│   ├── extract.py
│   ├── transform.py
│   └── load.py
│
├── 📁 tests/                # Unit tests
│   └── test_transform.py
│
├── 📁 notebooks/            # Jupyter notebooks
│   └── exploration.ipynb
│
├── 📁 data/                 # Data files
│   ├── raw/
│   └── processed/
│
├── 📁 config/               # Configuration
│   └── settings.yaml
│
├── 📄 requirements.txt      # Dependencies
├── 📄 README.md             # Documentation
└── 📄 .gitignore            # Git ignore rules

📋 CHEATSHEET: Python Data Types

┌────────────────────────────────────────────────────────────────────┐
│                       PYTHON DATA TYPES                            │
├──────────────┬───────────────────┬─────────────────────────────────┤
│     TYPE     │      EXAMPLE      │           USE CASE              │
├──────────────┼───────────────────┼─────────────────────────────────┤
│ int          │ 42, -17, 0        │ Counts, IDs, indices            │
│ float        │ 3.14, -0.001      │ Measurements, calculations      │
│ str          │ "hello", 'world'  │ Text, names, messages           │
│ bool         │ True, False       │ Flags, conditions               │
│ None         │ None              │ Missing/undefined values        │
├──────────────┼───────────────────┼─────────────────────────────────┤
│ list         │ [1, 2, 3]         │ Ordered, mutable collection     │
│ tuple        │ (1, 2, 3)         │ Ordered, immutable collection   │
│ dict         │ {"a": 1, "b": 2}  │ Key-value mapping               │
│ set          │ {1, 2, 3}         │ Unique values, fast lookup      │
└──────────────┴───────────────────┴─────────────────────────────────┘

📋 CHEATSHEET: Essential Commands

Running Python

Command Description
python script.py Run a Python script
python -m module Run a module as script
python -c "print('hi')" Run inline code
python -i script.py Run script then interactive
jupyter notebook Start Jupyter notebook
jupyter lab Start JupyterLab

Package Management (pip)

Command Description
pip install package Install a package
pip install package==1.0.0 Install specific version
pip install -r requirements.txt Install from file
pip freeze > requirements.txt Export installed packages
pip list List installed packages
pip show package Show package details
pip uninstall package Remove a package

🐍 Python REPL (Interactive Mode)

┌─────────────────────────────────────────────────────────────────┐
│  $ python                                                       │
│  Python 3.11.0                                                  │
│  >>> 2 + 2                    # Simple calculation              │
│  4                                                              │
│  >>> "Hello" * 3              # String repetition               │
│  'HelloHelloHello'                                              │
│  >>> help(str)                # Get help on a type              │
│  >>> dir(list)                # List all methods                │
│  >>> exit()                   # Exit REPL                       │
└─────────────────────────────────────────────────────────────────┘

🎓 Key Takeaways

  1. Python is the lingua franca of data engineering - Used for ETL, APIs, ML, and more
  2. Always use virtual environments - Isolate project dependencies
  3. VS Code + Extensions - Best free IDE for Python development
  4. Jupyter for exploration - Interactive development and visualization
  5. Project structure matters - Organize code from day one

📚 Next Steps

→ Continue to Module 01: Python Fundamentals


📚 Next Steps

→ Continue to Module 01: Python Fundamentals