Use Claude Code CLI for free with NVIDIA NIM's free unlimited 40 reqs/min API. This lightweight proxy converts Claude Code's Anthropic API requests to NVIDIA NIM format. Includes Telegram bot integration for remote control from your phone!
- Get a new API key from build.nvidia.com/settings/api-keys
- Install claude-code
- Install uv
git clone https://github.com/Alishahryar1/cc-nim.git
cd cc-nim
cp .env.example .envEdit .env:
NVIDIA_NIM_API_KEY=nvapi-your-key-here
MODEL=moonshotai/kimi-k2-thinkingTerminal 1 - Start the proxy:
uv run uvicorn server:app --host 0.0.0.0 --port 8082Terminal 2 - Run Claude Code:
ANTHROPIC_AUTH_TOKEN=ccnim ANTHROPIC_BASE_URL=http://localhost:8082 claudeThat's it! Claude Code now uses NVIDIA NIM for free.
Control Claude Code remotely via Telegram! Set an allowed directory, send tasks from your phone, and watch Claude-Code autonomously work on multiple tasks.
-
Get a Bot Token:
- Open Telegram and message @BotFather
- Send
/newbotand follow the prompts - Copy the HTTP API Token
-
Add to
.env:
TELEGRAM_BOT_TOKEN=123456789:ABCdefGHIjklMNOpqrSTUvwxYZ
ALLOWED_TELEGRAM_USER_ID=your_telegram_user_id💡 To find your Telegram user ID, message @userinfobot on Telegram.
- Configure the workspace (where Claude will operate):
CLAUDE_WORKSPACE=./agent_workspace
ALLOWED_DIR=C:/Users/yourname/projects- Start the server:
uv run uvicorn server:app --host 0.0.0.0 --port 8082- Usage:
- Send
/startto your bot - Send a message to yourself on Telegram with a task
- Claude will respond with:
- 💭 Thinking tokens (reasoning steps)
- 🔧 Tool calls as they execute
- ✅ Final result when complete
- Send
/stopto cancel all running tasks
- Send
See nvidia_nim_models.json for the full list of supported models.
Popular choices:
stepfun-ai/step-3.5-flashmoonshotai/kimi-k2.5z-ai/glm4.7minimaxai/minimax-m2.1mistralai/devstral-2-123b-instruct-2512
Browse all models at build.nvidia.com
To update nvidia_nim_models.json with the latest models from NVIDIA NIM, run the following command:
curl "https://integrate.api.nvidia.com/v1/models" > nvidia_nim_models.json| Variable | Description | Default |
|---|---|---|
NVIDIA_NIM_API_KEY |
Your NVIDIA API key | required |
MODEL |
Model to use for all requests | moonshotai/kimi-k2-thinking |
NVIDIA_NIM_BASE_URL |
NIM endpoint | https://integrate.api.nvidia.com/v1 |
CLAUDE_WORKSPACE |
Directory for agent workspace | ./agent_workspace |
ALLOWED_DIR |
Allowed directories for agent | "" |
MAX_CLI_SESSIONS |
Max concurrent CLI sessions | 10 |
FAST_PREFIX_DETECTION |
Enable fast prefix detection | true |
ENABLE_NETWORK_PROBE_MOCK |
Enable network probe mock | true |
ENABLE_TITLE_GENERATION_SKIP |
Skip title generation | true |
ENABLE_SUGGESTION_MODE_SKIP |
Skip suggestion mode | true |
ENABLE_FILEPATH_EXTRACTION_MOCK |
Enable filepath extraction mock | true |
TELEGRAM_BOT_TOKEN |
Telegram Bot Token | "" |
ALLOWED_TELEGRAM_USER_ID |
Allowed Telegram User ID | "" |
MESSAGING_RATE_LIMIT |
Telegram messages per window | 1 |
MESSAGING_RATE_WINDOW |
Messaging window (seconds) | 1 |
NVIDIA_NIM_RATE_LIMIT |
API requests per window | 40 |
NVIDIA_NIM_RATE_WINDOW |
Rate limit window (seconds) | 60 |
NVIDIA_NIM_TEMPERATURE |
Model temperature | 1.0 |
NVIDIA_NIM_TOP_P |
Top P sampling | 1.0 |
NVIDIA_NIM_TOP_K |
Top K sampling | -1 |
NVIDIA_NIM_MAX_TOKENS |
Max tokens for generation | 81920 |
See .env.example for all supported parameters.
To run the test suite, use the following command:
uv run pytestExtend BaseProvider in providers/ to add support for other APIs:
from providers.base import BaseProvider, ProviderConfig
class MyProvider(BaseProvider):
async def complete(self, request):
# Make API call, return raw JSON
pass
async def stream_response(self, request, input_tokens=0):
# Yield Anthropic SSE format events
pass
def convert_response(self, response_json, original_request):
# Convert to Anthropic response format
passExtend MessagingPlatform in messaging/ to add support for other platforms (Discord, Slack, etc.):
from messaging.base import MessagingPlatform
from messaging.models import IncomingMessage
class MyPlatform(MessagingPlatform):
async def start(self):
# Initialize connection
pass
async def stop(self):
# Cleanup
pass
async def queue_send_message(self, chat_id, text, **kwargs):
# Send message to platform
pass
async def queue_edit_message(self, chat_id, message_id, text, **kwargs):
# Edit existing message
pass
def on_message(self, handler):
# Register callback for incoming messages
# Handler expects an IncomingMessage object
passThe bot now automatically maintains conversation context for consecutive voice messages! No need to use "Reply" every time.
How it works:
- When you send a voice (or text) message, the bot checks for recent activity in the same chat
- If there's a completed message within the configured time window, the new message is automatically associated
- Claude continues the conversation with full context of previous messages
Example Usage:
- Send: "Hazme un reporte" (creates tree1)
- Send 30 secs later: "Dónde está mi reporte" (automatically continues tree1)
- Claude responds with context, knowing you're asking about the report
Configuration:
Add to .env:
VOICE_CONTEXT_WINDOW_MINUTES=10 # Associate messages within 10 minutesSend voice messages directly to the bot - they are automatically transcribed using Whisper (faster-whisper) and processed by Claude.
Setup:
# Install audio processing dependencies (automatic with uv sync)
# The service downloads the Whisper model on first useHow it works:
- Record and send a voice message to the bot
- Bot downloads the audio from Telegram
- Whisper transcribes the speech to text
- Claude processes the transcribed text
- Claude responds as if you typed the message
Audio formats supported:
- OGG (Opus) - Telegram's default format
- MP3, WAV - Via conversion
Features:
- Multi-language transcription (auto-detects language)
- Retry logic for network failures (3 attempts)
- Graceful fallback if transcription fails
- Preserves original voice message alongside transcription
Testing:
# Test transcription only
uv run python test_whisper.py
# Run all voice-related tests
uv run pytest tests/messaging/test_voice_processor.py tests/services/test_transcription.py -vThe bot uses a tree-based queuing system:
- Tree structure: Each conversation is a tree, messages are nodes
- Parent nodes: Replies create child nodes (explicit or automatic)
- State tracking: Each node has state (pending → in_progress → completed/error)
- Concurrent processing: Multiple conversations can run simultaneously (max 10 sessions)
Message States:
PENDING- Queued waiting for processingIN_PROGRESS- Currently being processed by ClaudeCOMPLETED- Successfully completed with responseERROR- Processing failed with error message
When a parent task fails, all pending children are automatically cancelled with error propagation.
