GitHub - bala93kumar/bala93kumar: Hello World This is my profile · GitHub
Skip to content

bala93kumar/bala93kumar

Folders and files

Repository files navigation

👋 Hi, I'm Balakumar

💼 Data Engineer | Spark & Databricks Specialist | Cloud Data Pipelines

Welcome to my GitHub! I love building scalable, optimized data pipelines that power analytics and business decisions.


🚀 Tech Stack & Expertise

🔹 Big Data & Distributed Processing

  • Apache Spark (PySpark, Spark SQL)
  • Databricks (Workflows, Delta Lake, Z-Ordering, Optimizations)
  • SAP HANA Data Extraction & Performance Tuning
  • Parallelism, Shuffle Optimization, Cluster Tuning

🔹 Data Engineering & ETL

  • Complex SQL Transformations & CTE Pipelines
  • Metadata-driven ETL Frameworks
  • Snapshot Validation, Partition Management
  • Incremental Loads & Rolling-window Logic
  • Job Monitoring Dashboards

🔹 Cloud & Storage

  • AWS Glue ETL
  • Delta Lake
  • Lakehouse Architectures
  • Snowflake (Community Edition)

🔹 Tools & Languages

  • Python (ETL frameworks, automation)
  • SQL (Analytical queries, joins optimization)
  • REST APIs (Databricks Jobs API)

📊 What I Work On

  • Optimizing large-scale Spark SQL jobs
  • Improving slow ETL pipelines
  • Building data quality & monitoring frameworks
  • Snapshot comparison systems for B2B analytics
  • Designing scalable metadata-based ETL workflows

📚 Currently Learning

  • Kubernetes (K8s)
  • Docker
  • GitHub Actions
  • Spark on Kubernetes (future goal)

🛠️ Projects You'll Find Here

  • Automated Snapshot Validation System
  • Databricks Job Monitoring Dashboard
  • Metadata-driven PySpark ETL Framework
  • Delta Lake Optimization Scripts

📫 Contact

If you'd like to collaborate or discuss data engineering ideas — feel free to reach out on my linkedin profile Bala !


Thanks for visiting my profile!

About

Hello World This is my profile

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors