soumitra9 (Soumitra Mehrotra) · GitHub
Skip to content
View soumitra9's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Autodesk
  • San Francisco

Block or report soumitra9

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
soumitra9/README.md

LinkedIn Email GitHub


About Me

Senior Data Scientist at Autodesk with 6+ years of experience building production ML systems that deliver measurable business impact. I specialize in the full ML lifecycle — from feature engineering and model development to deployment, monitoring, and stakeholder dashboards.

  • 🏢 Autodesk → ML for cloud optimization, anomaly detection, LLM-powered tooling
  • 📊 Expedia → Large-scale propensity modeling for 200M+ customers
  • 🤖 Tata Elxsi → CNN-based health monitoring & deep learning R&D
  • 📍 Bay Area, CA  |  🎓 MS Computer Science, Penn State

Impact Highlights

💰  ~$4M annual cloud compute savings via cost-aware ML model at Autodesk
📉  12% reduction in customer opt-out rates at Affine Analytics
👥  200M+ customers served through personalized recommendation systems
🚨  Real-time API anomaly detection preventing production incidents
🤖  Text-to-GraphQL interface via LLMs for streamlined developer experience

Tech Stack

Languages

Python PySpark SQL

ML / AI

PyTorch TensorFlow scikit-learn LangChain HuggingFace

Data & Cloud

Snowflake AWS Apache Spark Airflow DBT

MLOps & Tooling

SageMaker Docker Git Looker


Featured Projects

🌟 AI / LLM Projects

Chat with Documents — Advanced RAG

A local RAG web app with two production-grade pipelines built on top of a swappable LLM/embedding/vectorstore backend.

Feature Details
📄 Document Q&A Upload PDFs → persistent vector store → grounded answers with citations (filename, page, excerpt)
🎯 Resume Tailor 6-agent pipeline: Resume Understanding → Job Analysis → Gap Analysis → Suggestions → Tailoring → Judge
🔄 Runtime config Swap LLM, embeddings & vectorstore from the UI without restarting
🤝 Model support Ollama (local) · OpenAI · Anthropic

Python LangChain RAG Vector Store Agentic AI Ollama OpenAI Anthropic


Snowflake MCP Server — Pure Async

A production-ready MCP (Model Context Protocol) server for Snowflake using the low-level async API — giving full control over server lifecycle, tool registration, and async execution.

Feature Details
🔌 Tools exposed execute_query, list_databases, list_schemas, list_tables, describe_table, check_database_exists
🛡️ Query safety Read-only validation (SELECT, WITH, SHOW, DESCRIBE, EXPLAIN only)
⚙️ Production features Persistent connection · health checks · timeout control · query tagging · cache control · row limiting
📐 Config Pydantic models for type-safe, validated configuration with clear startup errors

Python MCP Snowflake Async LLM Tooling Pydantic


Computer Vision

Project Description Stack
🫁 COVID-19 Chest X-ray Detection Transfer learning with VGG16 for binary medical image classification PyTorch · VGG16 · OpenCV

Classical ML & NLP

Project Description Stack
🏷️ StackOverflow Tag Predictor Multi-label classifier on 6M+ posts scikit-learn · TF-IDF · Linear Models
🧮 Neural Network from Scratch Two-layer NN with backprop — no frameworks Python · NumPy

GitHub Stats


What I'm Exploring

current_focus = {
    "LLMs":    ["Fine-tuning", "RAG systems", "Agentic workflows"],
    "MLOps":   ["Feature stores", "Model monitoring", "SageMaker pipelines"],
    "GenAI":   ["LangChain", "LangGraph", "Prompt engineering"],
}

Building ML systems that don't just work in notebooks — they work in production.

Profile views

Popular repositories Loading

  1. Predict-tags-on-StackOverflow-with-linear-models Predict-tags-on-StackOverflow-with-linear-models Public

    The repository consists of Multi class classifier implements on stack Overflow data to predict tags

    Jupyter Notebook 2 6

  2. Deep-Neural-Network-From-scratch Deep-Neural-Network-From-scratch Public

    The repository contains two layer Deep Layer Network programmed from scratch using Python

    Python 1 1

  3. Covid-19-Detection-from-chest-Xrays Covid-19-Detection-from-chest-Xrays Public

    VGG16 model for detecting Covid-19

    Python 1

  4. InteractiveStory InteractiveStory Public

    The Android App entails a seven paged Story,It asks the user for Username and frames a story based on name.

    Java

  5. Hangman Hangman Public

    Made a Hangman game using Java on IntelliJ IDE. The game’s purpose is to display an empty word, represented in dashes and the user is prompted to guess the letters of the word. It shows the remaini…

    Java 2

  6. Stormy Stormy Public

    Android Weather Forecast Application

    Java