vivek12345 (Vivek Nayyar) · GitHub
Skip to content
View vivek12345's full-sized avatar

Block or report vivek12345

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
vivek12345/README.md

👋 Hi, I'm Vivek Nayyar

🚀 Engineering Leader | 🧠 AI Builder

I’m an engineering leader and hands-on AI/ML builder, focused on deeply understanding and implementing the nuts and bolts of modern LLMs. From training tokenizers and building BPE from scratch, to implementing transformers line-by-line, I love demystifying AI—one project and one workshop at a time.

🧠 Projects & Experiments

Project What I Did
🧩 Byte Pair Encoding (BPE) from Scratch Wrote a custom BPE tokenizer in Python with support for special tokens, regex splitting, and vocab merging.
🧠 LLM from Scratch Implemented a transformer-based LLM (embedding → attention → MLP → logits) using only PyTorch. Includes training loop, sampling, and inference.
🦙 Agentic RAG Pipeline Built end-to-end Retrieval-Augmented Generation workflows using LangChain, DuckDB, FAISS, and streaming token-by-token inference.
📊 Text-to-SQL for CSVs Built a system to parse natural language queries into SQL and run them over uploaded CSVs. Added Vespa-style search for LIKE queries.
🇮🇳 Hindi Tokenizer (WIP) Training a BPE tokenizer from scratch on Hindi corpora to enable better subword tokenization for Indian languages.
🔐 Secure LLM Workflows Integrated Cloudflare Zero Trust, IP whitelisting, and API key validation in LangChain-based pipelines.
📦 SmartInvestReturns A personal finance site to calculate SIP, retirement corpus, and mutual fund strategies. Built with Next.js & TypeScript.

🎓 Workshops & Knowledge Sharing

  • 🎥 YouTube Channel@locallobaat I create short explainers and tutorials on AI topics like tokenization, transformers, and building your own RAG pipeline.
    Recent videos include “No code whatsapp bot” and “Chat with any CSV using langchain”

  • 🧠 RAG Beyond Basics Workshop
    Covers advanced topics like agentic workflows, text-to-SQL, streaming outputs, observability, and PII-safe deployments.
    Delivered at internal events, React Summit 2024, and community meetups.


🔗 Let's Connect


Pinned Loading

  1. bpe-tokenizer bpe-tokenizer Public

    A pure Python implementation of Byte Pair Encoding (BPE) tokenization, inspired by GPT-4's tokenization approach

    Python 1

  2. gpt2-from-scratch gpt2-from-scratch Public

    A clean, educational implementation of GPT-2 built from scratch using PyTorch. This project demonstrates the architecture and training of transformer-based language models.

    Python

  3. moe-with-gqa moe-with-gqa Public

    Implementation of mixture of experts with grouped query attention

    Python

  4. llama-with-gqa-and-rope llama-with-gqa-and-rope Public

    Implementation of llama models with GQA and RoPE

    Python

  5. fast-api-with-next-ai-sdk fast-api-with-next-ai-sdk Public

    Fast api sever with ai sdk v15 from next js

    JavaScript 2 1

  6. mini-rag mini-rag Public

    Lightweight RAG library with Milvus vector store

    Python 3 1