GitHub - KirtiJha/code-historian: 🕰️ AI-powered VS Code extension for code history tracking with RAG-based natural language search. Capture changes in real-time, search with natural language, and restore any version instantly. · GitHub
Skip to content

KirtiJha/code-historian

Repository files navigation

Code Historian

Code Historian Logo

AI-Powered Code History Tracking with RAG-Based Natural Language Search

VS Code Marketplace Installs Rating TypeScript License: MIT

Never lose track of your code changes again. Search, explore, and restore any version with natural language.

FeaturesInstallationUsageConfigurationArchitecture


✨ Features

🔄 Automatic Change Capture

  • Real-time capture of all code changes as you work
  • Intelligent debouncing to avoid capturing every keystroke
  • Configurable exclusion patterns for node_modules, build files, etc.
  • Tracks file creates, modifies, deletes, and renames
  • Session-based organization for better context

🧠 Semantic Search with RAG

  • Natural language queries: "What changes did I make to the authentication logic?"
  • Hybrid search combining vector similarity and keyword matching (BM25)
  • Context-aware results with relevant code snippets
  • Temporal filtering: "Changes from last week"
  • Works with zero configuration — a neural embedding model (all-MiniLM-L6-v2) runs locally out of the box; or plug in Ollama, HuggingFace, or OpenAI
  • Symbol time-travel: trace the history of a single function/class/variable

💬 Chat Integration

  • Use @historian in VS Code Chat to explore your code history
  • Conversational interface powered by your choice of LLM
  • Ask questions like:
    • "When did I last modify the User class?"
    • "Find all changes related to database queries"
    • "What did the login function look like before the refactor?"

Code Restoration

  • Restore any previous version of your code with one click
  • One-click Undo after a restore
  • Preview changes, or open them in VS Code's native diff editor (Compare with current / Open diff)
  • Automatic backup creation before restoration
  • Works seamlessly with your existing git workflow

🔎 Inline History (CodeLens)

  • "N changes in history" above a file and "k versions" above each function/class — click to jump straight to that file's or symbol's history
  • Toggle with codeHistorian.ui.showInlineHistory

📊 Visual Timeline

  • Beautiful, modern timeline view of all changes
  • Multiple view modes: Timeline, Cards, or Compact list
  • Stats dashboard with activity heatmap
  • Group by date, file, or folder
  • Inline diff preview with syntax highlighting
  • Filter by change type, date range, and more

🚀 Installation

VS Code Marketplace (Recommended)

  1. Open VS Code
  2. Go to Extensions (Cmd/Ctrl + Shift + X)
  3. Search for "Code Historian"
  4. Click Install

Or install directly: Install from Marketplace

From Source

git clone https://github.com/KirtiJha/code-historian.git
cd code-historian
npm install
npm run build

Then press F5 in VS Code to launch the extension in development mode.


⚙️ Configuration

Open VS Code Settings (Cmd/Ctrl + ,) and search for "Code Historian".

Embedding Provider

Code Historian supports multiple embedding providers for semantic search:

Provider Model Local Cost Dimensions Setup
Built-in (default) all-MiniLM-L6-v2 (neural) Free 384 None — runs locally
Ollama nomic-embed-text Free 768 Install Ollama
HuggingFace BAAI/bge-large-en-v1.5 Free tier 1024 API token
OpenAI text-embedding-3-small Paid 1536 API key

The built-in provider needs no configuration and runs a small neural sentence-transformer (all-MiniLM-L6-v2) locally via Transformers.js — the model (~23 MB) is downloaded once and cached for offline use. If the local ONNX runtime can't load, it automatically falls back to a lightweight hashing embedding so search always works. Switch to Ollama / HuggingFace / OpenAI for other models:

{
  "codeHistorian.embedding.provider": "ollama",
  "codeHistorian.embedding.model": "nomic-embed-text"
}

🔒 API keys are stored in VS Code's encrypted Secret Storage, not in settings.json.

LLM Provider

For the chat interface, configure your preferred LLM:

Provider Models Local Setup
Ollama llama3.2, mistral, codellama Free, local
OpenAI gpt-4o, gpt-4-turbo, gpt-3.5-turbo API key required
Anthropic claude-sonnet-4-20250514, claude-3-haiku API key required
Google Gemini gemini-pro, gemini-1.5-flash API key required
{
  "codeHistorian.llm.provider": "openai",
  "codeHistorian.llm.model": "gpt-4o"
}

API keys are configured per provider (e.g. codeHistorian.llm.openaiApiKey, codeHistorian.llm.anthropicApiKey, codeHistorian.llm.googleApiKey) and are moved into encrypted Secret Storage automatically.

Capture Settings

{
  "codeHistorian.capture.enabled": true,
  "codeHistorian.capture.debounceMs": 2000,
  "codeHistorian.capture.excludePatterns": [
    "**/node_modules/**",
    "**/.git/**",
    "**/dist/**",
    "**/*.lock"
  ],
  "codeHistorian.capture.maxFileSizeKB": 1024
}

📖 Usage

Timeline View

  1. Click the Code Historian icon in the Activity Bar (sidebar)
  2. Browse your change history with multiple view options:
    • Timeline View: Classic vertical timeline with connecting lines
    • Cards View: Grid layout for visual scanning
    • Compact View: Dense list for maximum information
  3. Use filters to narrow down results:
    • Filter by change type (Created, Modified, Deleted)
    • Filter by date range
    • Search by filename or content
  4. Click any change to see detailed diff view
  5. Restore any previous version with one click

Chat Commands

Open VS Code Chat (Cmd/Ctrl + Shift + I) and use @historian:

@historian What changes did I make to the authentication module?
@historian Show me the login function from last week
@historian Find all database-related changes
@historian When did I add the validation logic?

Keyboard Shortcuts

Shortcut Command
Ctrl+Shift+H / Cmd+Shift+H Open Timeline
Ctrl+Alt+F / Cmd+Alt+F Search History

🏗️ Architecture

Code Historian uses a modern architecture optimized for VS Code extensions:

┌─────────────────────────────────────────────────────────────┐
│                     VS Code Extension                        │
├──────────────┬──────────────┬───────────────┬───────────────┤
│   Capture    │   Embedding  │    Search     │     LLM       │
│   Engine     │   Service    │    Engine     │  Orchestrator │
│              │              │               │               │
│  • Debounce  │  • HuggingFace│ • Hybrid     │  • OpenAI     │
│  • Diff Gen  │  • Ollama    │   Search     │  • Anthropic  │
│  • Sessions  │  • OpenAI    │ • BM25+Vector│  • Ollama     │
├──────────────┴──────────────┴───────────────┴───────────────┤
│                      Database Layer                          │
│     SQLite (sql.js)          │        LanceDB               │
│     • Metadata               │        • Vector embeddings   │
│     • BM25 keyword search    │        • Similarity search   │
├──────────────────────────────┴──────────────────────────────┤
│                     React Webview UI                         │
│  Timeline • Search • Settings • Diff Viewer • Chat          │
└─────────────────────────────────────────────────────────────┘

Note: the SQLite layer uses sql.js (WebAssembly), which is not built with the FTS5 extension. Keyword search therefore uses candidate retrieval plus a JavaScript BM25 ranker rather than SQLite FTS5.

Key Technologies

Component Technology Purpose
Metadata DB SQLite (sql.js) In-process metadata storage + BM25 keyword ranking
Vector DB LanceDB Embedded vector database with ANN search
Embeddings Built-in/Ollama/HuggingFace/OpenAI Semantic code understanding
UI Framework React 18 Modern, reactive webview interface
Build Tool esbuild Fast TypeScript bundling
Chat API VS Code Chat Native chat participant integration

Search Pipeline

User Query → Embedding → Vector Search (top-k)
                     ↘
                       RRF Fusion → (optional rerank) → Ranked Results
                     ↗
            → BM25 Keyword Search

The hybrid search combines:

  • Vector similarity (default 60% weight): Semantic understanding of code
  • Keyword matching (default 40% weight): BM25 ranking over candidate matches
  • Reciprocal Rank Fusion: Combines both result sets with overlap boosting
  • Reranking: a free local BM25 reranker refines the top results (a cloud cross-encoder via Cohere/HuggingFace can be configured instead)

📈 Performance

Approximate design targets (vary by machine, provider, and history size):

Operation Target Notes
Change capture non-blocking Debounced, batched, off the UI thread
Embedding generation provider-dependent Built-in is instant; cloud adds network latency
Vector search low latency LanceDB ANN over the local store
Hybrid search interactive Vector + BM25 keyword fusion
UI render smooth Virtualized timeline

Large diffs are gzip-compressed at rest, and history older than maxHistoryDays is pruned automatically.


🔒 Privacy

Your data stays 100% local by default:

  • ✅ SQLite database in VS Code's global storage
  • ✅ LanceDB vectors stored locally
  • ✅ Optional Ollama for completely local AI
  • No telemetry or external data sharing
  • ✅ API keys stored securely in VS Code settings

When using cloud providers (OpenAI, HuggingFace, Anthropic), only embedding requests and chat queries are sent externally.


🛠️ Development

# Clone the repository
git clone https://github.com/KirtiJha/code-historian.git
cd code-historian

# Install dependencies
npm install

# Build the extension
npm run build

# Watch mode (auto-rebuild on changes)
npm run watch

# Type checking
npm run typecheck

# Linting
npm run lint

# Run tests
npm test

Project Structure

code-historian/
├── src/
│   ├── extension.ts        # Extension entry point
│   ├── constants.ts        # Configuration constants
│   ├── types/              # TypeScript type definitions
│   ├── database/           # SQLite & LanceDB wrappers
│   ├── services/           # Core services
│   │   ├── capture.ts      # Change capture engine
│   │   ├── embedding.ts    # Embedding service
│   │   ├── search.ts       # Hybrid search engine
│   │   ├── llm.ts          # LLM orchestrator
│   │   └── restoration.ts  # Code restoration
│   ├── chat/               # VS Code Chat participant
│   ├── webview/            # React webview UI
│   │   ├── ui/             # React components
│   │   └── provider.ts     # Webview provider
│   └── utils/              # Utilities
├── media/                  # Icons and assets
├── dist/                   # Build output
└── package.json            # Extension manifest

🤝 Contributing

Contributions are welcome! Here's how you can help:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Please read our Contributing Guide for details on our code of conduct and development process.


📄 License

This project is licensed under the MIT License - see the LICENSE file for details.


🙏 Acknowledgments

  • LanceDB - Excellent embedded vector database
  • Ollama - Local AI inference made easy
  • HuggingFace - State-of-the-art embeddings
  • VS Code - Amazing extension API
  • sql.js - SQLite compiled to WebAssembly

Made with ❤️ for developers who value their code history

Report BugRequest Feature

About

🕰️ AI-powered VS Code extension for code history tracking with RAG-based natural language search. Capture changes in real-time, search with natural language, and restore any version instantly.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

Contributors