Local dashboard for exploring the free models exposed through build.nvidia.com.
The app fetches the active model catalog, loads per-model metadata, flattens every metadata field into sortable table columns, and lets you probe live capabilities such as latency, context length, max output tokens, and tool calling support.
- Shows only models that appear active and usable.
- Fetches model metadata for every listed model and renders it as a sortable table.
- Keeps the most useful columns pinned on the left:
Live Ping,Model ID,Publisher,Context Limit,Max Output,Latency (ms),Tool Support, andTested At. - Supports global search,
Exclude Inactive/Error, andTool Supportfiltering. - Probes live model behavior from the UI:
Pingre-tests one model.Test Displayed Modelstests displayed models that do not already have a complete live result.Shift + ClickonTest Displayed Modelsforces a full re-test of all displayed rows.- Backend probe requests are globally paced and automatically back off on
429 Too Many Requests. - Tool support probing tries multiple request variants, classifies explicit unsupported-tool responses, and retries accepted-but-truncated responses with a larger completion budget before giving up.
- Right-click any row to open a copyable cURL API example for that model.
Force Refresh Datadrops all saved test results, clears backend caches, and reloads the model list from NVIDIA with no cache reuse.
- Install Node.js 18 or later.
- Export your NVIDIA key in the shell:
export NVIDIA_API_KEY="your_nvidia_api_key"- Start the app:
./start.sh- The server starts on
http://localhost:4920by default and attempts to open the dashboard in your default browser.
Each live test can perform up to three NVIDIA API requests:
- A small chat completion request to confirm availability and measure latency.
- Metadata-aware token limit detection that prefers numeric metadata hints and falls back to an oversized
max_tokensprobe only when a live value is still missing. - An adaptive tool-calling probe that tries multiple compatible request shapes and can retry truncated accepted responses with a larger
max_tokensvalue.
Tool Support is intentionally three-state:
- blank: not tested yet
true: tool calling was observedfalse: the probe completed and concluded either that tool fields are explicitly unsupported or that accepted requests still never emitted tool calls
If NVIDIA rate-limits a probe, the row shows Rate Limited instead of being cached as a normal failure. Inconclusive tool support probes stay blank so they can be retried later. Hover the Tool Support cell to inspect the saved reason summary for false or inconclusive rows.
The right-click popover intentionally keeps only the hosted OpenAI-compatible cURL example. On 2026-04-14, https://integrate.api.nvidia.com/v1/messages returned 404, so the hosted endpoint used by this dashboard does not currently expose the Anthropic-compatible path that Claude Code requires.
The runtime reads the API key only from NVIDIA_API_KEY. It does not use .env.
Optional backend environment variables:
PORTdefault4920MAX_CONCURRENCYdefault12REQUEST_TIMEOUT_MSdefault20000CACHE_TTL_MSdefault300000PROBE_RATE_LIMIT_RPMdefault36PROBE_MIN_INTERVAL_MSdefault derived fromPROBE_RATE_LIMIT_RPMPROBE_TIMEOUT_MSdefault15000TOOL_SUPPORT_TIMEOUT_MSdefault25000PROBE_MAX_429_RETRIESdefault2PROBE_429_BACKOFF_MSdefault10000
