GitHub - vhpm18/cc-nim: Use claude-code for free with NVIDIA-NIM via the terminal CLI or telegram · GitHub
Skip to content

vhpm18/cc-nim

 
 

Repository files navigation

claude-code-nim

Use Claude Code CLI for free with NVIDIA NIM's free unlimited 40 reqs/min API. This lightweight proxy converts Claude Code's Anthropic API requests to NVIDIA NIM format. Includes Telegram bot integration for remote control from your phone!

Quick Start

1. Get Your Free NVIDIA API Key

  1. Get a new API key from build.nvidia.com/settings/api-keys
  2. Install claude-code
  3. Install uv

2. Clone & Configure

git clone https://github.com/Alishahryar1/cc-nim.git
cd cc-nim

cp .env.example .env

Edit .env:

NVIDIA_NIM_API_KEY=nvapi-your-key-here
MODEL=moonshotai/kimi-k2-thinking

Claude Code

Terminal 1 - Start the proxy:

uv run uvicorn server:app --host 0.0.0.0 --port 8082

Terminal 2 - Run Claude Code:

ANTHROPIC_AUTH_TOKEN=ccnim ANTHROPIC_BASE_URL=http://localhost:8082 claude

That's it! Claude Code now uses NVIDIA NIM for free.


Telegram Bot Integration

Control Claude Code remotely via Telegram! Set an allowed directory, send tasks from your phone, and watch Claude-Code autonomously work on multiple tasks.

Setup

  1. Get a Bot Token:

    • Open Telegram and message @BotFather
    • Send /newbot and follow the prompts
    • Copy the HTTP API Token
  2. Add to .env:

TELEGRAM_BOT_TOKEN=123456789:ABCdefGHIjklMNOpqrSTUvwxYZ
ALLOWED_TELEGRAM_USER_ID=your_telegram_user_id

💡 To find your Telegram user ID, message @userinfobot on Telegram.

  1. Configure the workspace (where Claude will operate):
CLAUDE_WORKSPACE=./agent_workspace
ALLOWED_DIR=C:/Users/yourname/projects
  1. Start the server:
uv run uvicorn server:app --host 0.0.0.0 --port 8082
  1. Usage:
    • Send /start to your bot
    • Send a message to yourself on Telegram with a task
    • Claude will respond with:
      • 💭 Thinking tokens (reasoning steps)
      • 🔧 Tool calls as they execute
      • Final result when complete
    • Send /stop to cancel all running tasks

Available Models

See nvidia_nim_models.json for the full list of supported models.

Popular choices:

  • stepfun-ai/step-3.5-flash
  • moonshotai/kimi-k2.5
  • z-ai/glm4.7
  • minimaxai/minimax-m2.1
  • mistralai/devstral-2-123b-instruct-2512

Browse all models at build.nvidia.com

Updating the Model List

To update nvidia_nim_models.json with the latest models from NVIDIA NIM, run the following command:

curl "https://integrate.api.nvidia.com/v1/models" > nvidia_nim_models.json

Configuration

Variable Description Default
NVIDIA_NIM_API_KEY Your NVIDIA API key required
MODEL Model to use for all requests moonshotai/kimi-k2-thinking
NVIDIA_NIM_BASE_URL NIM endpoint https://integrate.api.nvidia.com/v1
CLAUDE_WORKSPACE Directory for agent workspace ./agent_workspace
ALLOWED_DIR Allowed directories for agent ""
MAX_CLI_SESSIONS Max concurrent CLI sessions 10
FAST_PREFIX_DETECTION Enable fast prefix detection true
ENABLE_NETWORK_PROBE_MOCK Enable network probe mock true
ENABLE_TITLE_GENERATION_SKIP Skip title generation true
ENABLE_SUGGESTION_MODE_SKIP Skip suggestion mode true
ENABLE_FILEPATH_EXTRACTION_MOCK Enable filepath extraction mock true
TELEGRAM_BOT_TOKEN Telegram Bot Token ""
ALLOWED_TELEGRAM_USER_ID Allowed Telegram User ID ""
MESSAGING_RATE_LIMIT Telegram messages per window 1
MESSAGING_RATE_WINDOW Messaging window (seconds) 1
NVIDIA_NIM_RATE_LIMIT API requests per window 40
NVIDIA_NIM_RATE_WINDOW Rate limit window (seconds) 60
NVIDIA_NIM_TEMPERATURE Model temperature 1.0
NVIDIA_NIM_TOP_P Top P sampling 1.0
NVIDIA_NIM_TOP_K Top K sampling -1
NVIDIA_NIM_MAX_TOKENS Max tokens for generation 81920

See .env.example for all supported parameters.

Development

Running Tests

To run the test suite, use the following command:

uv run pytest

Adding Your Own Provider

Extend BaseProvider in providers/ to add support for other APIs:

from providers.base import BaseProvider, ProviderConfig

class MyProvider(BaseProvider):
    async def complete(self, request):
        # Make API call, return raw JSON
        pass

    async def stream_response(self, request, input_tokens=0):
        # Yield Anthropic SSE format events
        pass

    def convert_response(self, response_json, original_request):
        # Convert to Anthropic response format
        pass

Adding Your Own Messaging App

Extend MessagingPlatform in messaging/ to add support for other platforms (Discord, Slack, etc.):

from messaging.base import MessagingPlatform
from messaging.models import IncomingMessage

class MyPlatform(MessagingPlatform):
    async def start(self):
        # Initialize connection
        pass

    async def stop(self):
        # Cleanup
        pass

    async def queue_send_message(self, chat_id, text, **kwargs):
        # Send message to platform
        pass

    async def queue_edit_message(self, chat_id, message_id, text, **kwargs):
        # Edit existing message
        pass

    def on_message(self, handler):
        # Register callback for incoming messages
        # Handler expects an IncomingMessage object
        pass

Voice Message Features

Automatic Context Retention

The bot now automatically maintains conversation context for consecutive voice messages! No need to use "Reply" every time.

How it works:

  • When you send a voice (or text) message, the bot checks for recent activity in the same chat
  • If there's a completed message within the configured time window, the new message is automatically associated
  • Claude continues the conversation with full context of previous messages

Example Usage:

  1. Send: "Hazme un reporte" (creates tree1)
  2. Send 30 secs later: "Dónde está mi reporte" (automatically continues tree1)
  3. Claude responds with context, knowing you're asking about the report

Configuration:

Variable Description Default
VOICE_CONTEXT_WINDOW_MINUTES Time window for automatic context detection (0 = disabled) 10

Add to .env:

VOICE_CONTEXT_WINDOW_MINUTES=10  # Associate messages within 10 minutes

Voice Message Transcription

Send voice messages directly to the bot - they are automatically transcribed using Whisper (faster-whisper) and processed by Claude.

Setup:

# Install audio processing dependencies (automatic with uv sync)
# The service downloads the Whisper model on first use

How it works:

  1. Record and send a voice message to the bot
  2. Bot downloads the audio from Telegram
  3. Whisper transcribes the speech to text
  4. Claude processes the transcribed text
  5. Claude responds as if you typed the message

Audio formats supported:

  • OGG (Opus) - Telegram's default format
  • MP3, WAV - Via conversion

Features:

  • Multi-language transcription (auto-detects language)
  • Retry logic for network failures (3 attempts)
  • Graceful fallback if transcription fails
  • Preserves original voice message alongside transcription

Testing:

# Test transcription only
uv run python test_whisper.py

# Run all voice-related tests
uv run pytest tests/messaging/test_voice_processor.py tests/services/test_transcription.py -v

Message Queue System

The bot uses a tree-based queuing system:

  • Tree structure: Each conversation is a tree, messages are nodes
  • Parent nodes: Replies create child nodes (explicit or automatic)
  • State tracking: Each node has state (pending → in_progress → completed/error)
  • Concurrent processing: Multiple conversations can run simultaneously (max 10 sessions)

Message States:

  • PENDING - Queued waiting for processing
  • IN_PROGRESS - Currently being processed by Claude
  • COMPLETED - Successfully completed with response
  • ERROR - Processing failed with error message

When a parent task fails, all pending children are automatically cancelled with error propagation.

About

Use claude-code for free with NVIDIA-NIM via the terminal CLI or telegram

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

Languages

  • Python 100.0%