- Set up Python development environment
- Understand Python's role in data engineering
- Write and run your first Python scripts
- Master the Python REPL and Jupyter notebooks
┌─────────────────────────────────────────────────────────────────────┐
│ DATA ENGINEERING STACK │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ EXTRACT │───▶│TRANSFORM │───▶│ LOAD │───▶│ ANALYZE │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ APIs │ │ Pandas │ │ S3 │ │ Jupyter │ │
│ │ Files │ │ PySpark │ │ DWH │ │ BI │ │
│ │ DBs │ │ NumPy │ │ Lake │ │ ML │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ 🐍 ALL POWERED BY PYTHON 🐍 │
└─────────────────────────────────────────────────────────────────────┘
winget install Python.Python.3.11
python --version
pip --versionbrew install python@3.11
python3 --version
pip3 --versionsudo apt update
sudo apt install python3.11 python3-pip python3-venv
python3 --version| Extension | Purpose |
|---|---|
| Python | IntelliSense, debugging, linting |
| Pylance | Fast, feature-rich language server |
| Jupyter | Notebook support in VS Code |
| autoDocstring | Generate docstrings automatically |
| GitLens | Enhanced Git integration |
# Create virtual environment
python -m venv venv
# Activate (Linux/macOS)
source venv/bin/activate
# Activate (Windows)
venv\Scripts\activate
# Install packages
pip install pandas numpy jupyter
# Deactivate when done
deactivatemy_data_project/
│
├── 📁 src/ # Source code
│ ├── __init__.py
│ ├── extract.py
│ ├── transform.py
│ └── load.py
│
├── 📁 tests/ # Unit tests
│ └── test_transform.py
│
├── 📁 notebooks/ # Jupyter notebooks
│ └── exploration.ipynb
│
├── 📁 data/ # Data files
│ ├── raw/
│ └── processed/
│
├── 📁 config/ # Configuration
│ └── settings.yaml
│
├── 📄 requirements.txt # Dependencies
├── 📄 README.md # Documentation
└── 📄 .gitignore # Git ignore rules
┌────────────────────────────────────────────────────────────────────┐
│ PYTHON DATA TYPES │
├──────────────┬───────────────────┬─────────────────────────────────┤
│ TYPE │ EXAMPLE │ USE CASE │
├──────────────┼───────────────────┼─────────────────────────────────┤
│ int │ 42, -17, 0 │ Counts, IDs, indices │
│ float │ 3.14, -0.001 │ Measurements, calculations │
│ str │ "hello", 'world' │ Text, names, messages │
│ bool │ True, False │ Flags, conditions │
│ None │ None │ Missing/undefined values │
├──────────────┼───────────────────┼─────────────────────────────────┤
│ list │ [1, 2, 3] │ Ordered, mutable collection │
│ tuple │ (1, 2, 3) │ Ordered, immutable collection │
│ dict │ {"a": 1, "b": 2} │ Key-value mapping │
│ set │ {1, 2, 3} │ Unique values, fast lookup │
└──────────────┴───────────────────┴─────────────────────────────────┘
| Command | Description |
|---|---|
python script.py |
Run a Python script |
python -m module |
Run a module as script |
python -c "print('hi')" |
Run inline code |
python -i script.py |
Run script then interactive |
jupyter notebook |
Start Jupyter notebook |
jupyter lab |
Start JupyterLab |
┌─────────────────────────────────────────────────────────────────┐
│ $ python │
│ Python 3.11.0 │
│ >>> 2 + 2 # Simple calculation │
│ 4 │
│ >>> "Hello" * 3 # String repetition │
│ 'HelloHelloHello' │
│ >>> help(str) # Get help on a type │
│ >>> dir(list) # List all methods │
│ >>> exit() # Exit REPL │
└─────────────────────────────────────────────────────────────────┘
- Python is the lingua franca of data engineering - Used for ETL, APIs, ML, and more
- Always use virtual environments - Isolate project dependencies
- VS Code + Extensions - Best free IDE for Python development
- Jupyter for exploration - Interactive development and visualization
- Project structure matters - Organize code from day one
→ Continue to Module 01: Python Fundamentals
→ Continue to Module 01: Python Fundamentals
