Models
37 models
nv-mistralai/mistral-nemo-12b-instruct
— input
— output
NVIDIA: Nemotron 3 Nano 30B A3B (free)
— input
— output
NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully...
NVIDIA: Nemotron 3 Nano Omni (free)
— input
— output
NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and...
NVIDIA: Nemotron 3 Ultra (free)
— input
— output
NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it...
NVIDIA: Nemotron 3.5 Content Safety (free)
— input
— output
NVIDIA Nemotron 3.5 Content Safety is a compact 4B-parameter multimodal guardrail model from NVIDIA, fine-tuned from Google Gemma-3-4B. It moderates both inputs to and responses from LLMs and VLMs, accepting...
NVIDIA: Nemotron Nano 12B 2 VL (free)
— input
— output
NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s...
NVIDIA: Nemotron Nano 9B V2 (free)
— input
— output
NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and...
nvidia/llama-3.1-nemoguard-8b-content-safety
— input
— output
nvidia/llama-3.1-nemoguard-8b-topic-control
— input
— output
nvidia/llama-3.1-nemotron-51b-instruct
— input
— output
nvidia/llama-3.1-nemotron-70b-instruct
— input
— output
nvidia/llama-3.1-nemotron-nano-8b-v1
— input
— output
nvidia/llama-3.1-nemotron-nano-vl-8b-v1
— input
— output
nvidia/llama-3.1-nemotron-safety-guard-8b-v3
— input
— output
nvidia/llama-3.1-nemotron-ultra-253b-v1
— input
— output
nvidia/llama-3.2-nemoretriever-1b-vlm-embed-v1
— input
— output
nvidia/llama-3.2-nv-embedqa-1b-v1
— input
— output
nvidia/llama-3.3-nemotron-super-49b-v1
— input
— output
nvidia/llama-3.3-nemotron-super-49b-v1.5
Max Output
16K
$0.400/M input
$0.400/M output
Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and...
nvidia/llama-nemotron-embed-1b-v2
— input
— output
nvidia/llama-nemotron-embed-vl-1b-v2
— input
— output
nvidia/llama3-chatqa-1.5-70b
— input
— output
nvidia/mistral-nemo-minitron-8b-8k-instruct
— input
— output
nvidia/nemotron-3-content-safety
— input
— output
nvidia/nemotron-3-nano-30b-a3b
Max Output
228K
$0.050/M input
$0.200/M output
NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully...
nvidia/nemotron-3-nano-omni-30b-a3b-reasoning
— input
— output
nvidia/nemotron-3-ultra-550b-a55b
Max Output
16K
$0.500/M input
$2.20/M output
NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it...
nvidia/nemotron-3.5-content-safety
— input
— output
nvidia/nemotron-4-340b-instruct
— input
— output
nvidia/nemotron-4-340b-reward
— input
— output
nvidia/nemotron-content-safety-reasoning-4b
— input
— output
nvidia/nemotron-mini-4b-instruct
— input
— output
nvidia/nemotron-nano-12b-v2-vl
— input
— output
nvidia/nemotron-nano-3-30b-a3b
— input
— output
nvidia/nemotron-parse
— input
— output
nvidia/nv-embedqa-mistral-7b-v2
— input
— output
nvidia/nvidia-nemotron-nano-9b-v2
— input
— output