BazaarLinkBazaarLink
登入

AI 模型目錄

357 個模型 — GPT-4o, Claude, Gemini, DeepSeek, Llama

Modality
T
Tongyi-MAI/Z-Image-Turbo
textimage
$0.0055/張
T
thenlper/gte-base
texttext
$0.01免費
I
intfloat/e5-base-v2
texttext
$0.01免費
S
sentence-transformers/all-minilm-l6-v2
texttext
$0.01免費
S
sentence-transformers/paraphrase-minilm-l6-v2
texttext
$0.01免費
S
sentence-transformers/all-minilm-l12-v2
texttext
$0.01免費
S
sentence-transformers/multi-qa-mpnet-base-dot-v1
texttext
$0.01免費
B
baai/bge-base-en-v1.5
texttext
$0.01免費
S
sentence-transformers/all-mpnet-base-v2
texttext
$0.01免費
T
thenlper/gte-large
texttext
$0.01免費
I
intfloat/e5-large-v2
texttext
$0.01免費
I
intfloat/multilingual-e5-large
texttext
$0.01免費
B
baai/bge-large-en-v1.5
texttext
$0.01免費
B
baai/bge-m3
texttext
$0.01免費
qwen/qwen3-embedding-8b
texttext
$0.01免費
liquid/lfm-2.2-6b
texttext
$0.01$0.02
liquid/lfm2-8b-a1b
texttext
$0.01$0.02
I
ibm-granite/granite-4.0-h-micro
texttext
$0.02$0.11
openai/text-embedding-3-small
text-embedding-3-small is OpenAI's improved, more performant version of the ada embedding model. Embeddings are a numerical representation of text that can be used to measure the relatedness between two pieces of text. Embeddings are useful for search, clustering, recommendations, anomaly detection, and classification tasks.
textembeddings
$0.02免費8KOct 2025
qwen/qwen3-embedding-4b
texttext
$0.02免費
meta-llama/llama-3.1-8b-instruct
texttext
$0.02$0.05
meta-llama/llama-3.2-3b-instruct
texttext
$0.02$0.02
meta-llama/llama-guard-3-8b
texttext
$0.02$0.06
mistralai/mistral-nemo
texttext
$0.02$0.04
google/gemma-3n-e4b-it
Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks such as text generation, speech recognition, translation, and image analysis. Leveraging innovations like Per-Layer Embedding (PLE) caching and the MatFormer architecture, Gemma 3n dynamically manages memory usage and computational load by selectively activating model parameters, significantly reducing runtime resource requirements. This model supports a wide linguistic range (trained in over 140 languages) and features a flexible 32K token context window. Gemma 3n can selectively load parameters, optimizing memory and computational efficiency based on the task or device capabilities, making it well-suited for privacy-focused, offline-capable applications and on-device AI solutions. [Read more in the blog post](https://developers.googleblog.com/en/introducing-gemma-3n/)
texttext
$0.02$0.0433KMay 2025
meta-llama/llama-3.2-1b-instruct
texttext
$0.03$0.20
perplexity/pplx-embed-v1-4b
pplx-embed-v1 -4B is one of Perplexity's state-of-the-art text embedding models built for real-world, web-scale retrieval. pplx-embed-v1 is optimized for standard dense text retrieval with the 4B parameter model maximizing retrieval quality.
textembeddings
$0.03免費32KMar 2026
google/gemma-2-9b-it
texttext
$0.03$0.09
meta-llama/llama-3-8b-instruct
texttext
$0.03$0.04
openai/gpt-oss-20b
texttext
$0.03$0.14
qwen/qwen2.5-coder-7b-instruct
texttext
$0.03$0.09
liquid/lfm-2-24b-a2b
LFM2-24B-A2B is the largest model in the LFM2 family of hybrid architectures designed for efficient on-device deployment. Built as a 24B parameter Mixture-of-Experts model with only 2B active parameters per token, it delivers high-quality generation while maintaining low inference costs. The model fits within 32 GB of RAM, making it practical to run on consumer laptops and desktops without sacrificing capability.
texttext
$0.03$0.1233KFeb 2026
a
amazon/nova-micro-v1
texttext
$0.04$0.14
cohere/command-r7b-12-2024
texttext
$0.04$0.15
openai/gpt-oss-120b:exacto
texttext
$0.04$0.19
openai/gpt-oss-120b
texttext
$0.04$0.19
google/gemma-3-12b-it
texttext
$0.04$0.13
google/gemma-3-27b-it
texttext
$0.04$0.15
google/gemma-3-4b-it
texttext
$0.04$0.08
nvidia/nemotron-nano-9b-v2
texttext
$0.04$0.16
qwen/qwen-2.5-7b-instruct
texttext
$0.04$0.10
S
sao10k/l3-lunaris-8b
texttext
$0.04$0.05
A
arcee-ai/trinity-mini
texttext
$0.04$0.15
meta-llama/llama-3.2-11b-vision-instruct
texttext
$0.05$0.05
mistralai/mistral-small-24b-instruct-2501
texttext
$0.05$0.08
nvidia/nemotron-3-nano-30b-a3b
texttext
$0.05$0.20
qwen/qwen-turbo
texttext
$0.05$0.20
qwen/qwen3-8b
texttext
$0.05$0.40
openai/gpt-5-nano
GPT-5-Nano is the smallest and fastest variant in the GPT-5 system, optimized for developer tools, rapid interactions, and ultra-low latency environments. While limited in reasoning depth compared to its larger counterparts, it retains key instruction-following and safety features. It is the successor to GPT-4.1-nano and offers a lightweight option for cost-sensitive or real-time applications.
textimagefiletext
$0.05$0.40400KAug 2025
qwen/qwen3-30b-a3b-thinking-2507
texttext
$0.05$0.34
mistralai/mistral-small-3.2-24b-instruct
texttext
$0.06$0.18
a
amazon/nova-lite-v1
texttext
$0.06$0.24
G
gryphe/mythomax-l2-13b
texttext
$0.06$0.06
qwen/qwen3-14b
texttext
$0.06$0.24
Z
z-ai/glm-4.7-flash
texttext
$0.06$0.40
microsoft/phi-4
[Microsoft Research](/microsoft) Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed. At 14 billion parameters, it was trained on a mix of high-quality synthetic datasets, data from curated websites, and academic materials. It has undergone careful improvement to follow instructions accurately and maintain strong safety standards. It works best with English language inputs. For more information, please see [Phi-4 Technical Report](https://arxiv.org/pdf/2412.08905)
texttext
$0.06$0.1416KJan 2025
qwen/qwen3-coder-30b-a3b-instruct
texttext
$0.07$0.27
B
baidu/ernie-4.5-21b-a3b
texttext
$0.07$0.28
B
baidu/ernie-4.5-21b-a3b-thinking
texttext
$0.07$0.28
nvidia/nemotron-nano-12b-v2-vl
texttext
$0.07$0.20
qwen/qwen3-235b-a22b-2507
texttext
$0.07$0.10
google/gemini-2.0-flash-lite-001
texttext
$0.07$0.30
B
bytedance-seed/seed-1.6-flash
texttext
$0.07$0.30
openai/gpt-oss-safeguard-20b
texttext
$0.07$0.30
meta-llama/llama-4-scout
texttext
$0.08$0.30
qwen/qwen3-30b-a3b
texttext
$0.08$0.28
qwen/qwen3-32b
texttext
$0.08$0.24
qwen/qwen3-vl-8b-instruct
texttext
$0.08$0.50
alibaba/tongyi-deepresearch-30b-a3b
texttext
$0.09$0.45
N
neversleep/llama-3.1-lumimaid-8b
texttext
$0.09$0.60
qwen/qwen3-30b-a3b-instruct-2507
texttext
$0.09$0.30
qwen/qwen3-next-80b-a3b-instruct
texttext
$0.09$1.10
X
xiaomi/mimo-v2-flash
texttext
$0.09$0.29
A
allenai/olmo-3-7b-instruct
texttext
$0.10$0.20
B
bytedance/ui-tars-1.5-7b
texttext
$0.10$0.20
openai/text-embedding-ada-002
text-embedding-ada-002 is OpenAI's legacy text embedding model.
textembeddings
$0.10免費8KOct 2025
qwen/qwen3.5-flash-02-23
The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the 3 series, these models deliver a leap forward in performance for both pure text and multimodal tasks, offering fast response times while balancing inference speed and overall performance.
textimagevideotext
$0.10$0.401MFeb 2026
mistralai/voxtral-small-24b-2507
texttext
$0.10$0.30
nvidia/nemotron-3-super-120b-a12b
NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it delivers over 50% higher token generation compared to leading open models. The model features a 1M token context window for long-term agent coherence, cross-document reasoning, and multi-step task planning. Latent MoE enables calling 4 experts for the inference cost of only one, improving intelligence and generalization. Multi-environment RL training across 10+ environments delivers leading accuracy on benchmarks including AIME 2025, TerminalBench, and SWE-Bench Verified. Fully open with weights, datasets, and recipes under the NVIDIA Open License, Nemotron 3 Super allows easy customization and secure deployment anywhere — from workstation to cloud.
texttext
$0.10$0.50262KMar 2026
mistralai/mistral-small-creative
texttext
$0.10$0.30
nvidia/llama-3.3-nemotron-super-49b-v1.5
texttext
$0.10$0.40
mistralai/mistral-embed-2312
texttext
$0.10免費
S
stepfun/step-3.5-flash
texttext
$0.10$0.30
Z
z-ai/glm-4-32b
texttext
$0.10$0.10
google/gemini-2.0-flash-001
texttext
$0.10$0.40
google/gemini-2.5-flash-lite-preview-09-2025
texttext
$0.10$0.40
google/gemini-2.5-flash-lite
texttext
$0.10$0.40
meta-llama/llama-3.3-70b-instruct
texttext
$0.10$0.32
mistralai/ministral-3b-2512
texttext
$0.10$0.10
R
reka/reka-edge
Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video+text inputs and generates text outputs. This model is optimized specifically to deliver industry-leading performance in image understanding, video analysis, object detection, and agentic tool-use.
imagetextvideotext
$0.10$0.1016KMar 2026
openai/gpt-4.1-nano
For tasks that demand low latency, GPT‑4.1 nano is the fastest and cheapest model in the GPT-4.1 series. It delivers exceptional performance at a small size with its 1 million token context window, and scores 80.1% on MMLU, 50.3% on GPQA, and 9.8% on Aider polyglot coding – even higher than GPT‑4o mini. It’s ideal for tasks like classification or autocompletion.
imagetextfiletext
$0.10$0.401MApr 2025
B
bytedance-seed/seed-2.0-mini
Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, emphasizing fast response and flexible inference deployment. It delivers performance comparable to ByteDance-Seed-1.6, supports 256k context, four reasoning effort modes (minimal/low/medium/high), multimodal understanding, and is optimized for lightweight tasks where cost and speed take priority.
textimagevideotext
$0.10$0.40262KFeb 2026
R
rekaai/reka-flash-3
Reka Flash 3 is a general-purpose, instruction-tuned large language model with 21 billion parameters, developed by Reka. It excels at general chat, coding tasks, instruction-following, and function calling. Featuring a 32K context length and optimized through reinforcement learning (RLOO), it provides competitive performance comparable to proprietary models within a smaller parameter footprint. Ideal for low-latency, local, or on-device deployments, Reka Flash 3 is compact, supports efficient quantization (down to 11GB at 4-bit precision), and employs explicit reasoning tags ("<reasoning>") to indicate its internal thought process. Reka Flash 3 is primarily an English model with limited multilingual understanding capabilities. The model weights are released under the Apache 2.0 license.
texttext
$0.10$0.2066KMar 2025
qwen/qwen3.5-9b
Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design with early fusion of multimodal tokens, allowing the model to process and reason across text and images within the same context.
textimagevideotext
$0.10$0.15262KMar 2026
R
rekaai/reka-edge
Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video+text inputs and generates text outputs. This model is optimized specifically to deliver industry-leading performance in image understanding, video analysis, object detection, and agentic tool-use.
imagetextvideotext
$0.10$0.1016K
mistralai/devstral-small
Devstral Small 1.1 is a 24B parameter open-weight language model for software engineering agents, developed by Mistral AI in collaboration with All Hands AI. Finetuned from Mistral Small 3.1 and released under the Apache 2.0 license, it features a 128k token context window and supports both Mistral-style function calling and XML output formats. Designed for agentic coding workflows, Devstral Small 1.1 is optimized for tasks such as codebase exploration, multi-file edits, and integration into autonomous development agents like OpenHands and Cline. It achieves 53.6% on SWE-Bench Verified, surpassing all other open models on this benchmark, while remaining lightweight enough to run on a single 4090 GPU or Apple silicon machine. The model uses a Tekken tokenizer with a 131k vocabulary and is deployable via vLLM, Transformers, Ollama, LM Studio, and other OpenAI-compatible runtimes.
texttext
$0.10$0.30131KJul 2025
qwen/qwen3-vl-32b-instruct
texttext
$0.10$0.42
mistralai/mistral-7b-instruct-v0.1
texttext
$0.11$0.19
qwen/qwen3-vl-8b-thinking
texttext
$0.12$1.36
A
allenai/olmo-3-7b-think
texttext
$0.12$0.20
qwen/qwen-2.5-72b-instruct
texttext
$0.12$0.39
qwen/qwen3-coder-next
texttext
$0.12$0.75
openai/text-embedding-3-large
text-embedding-3-large is OpenAI's most capable embedding model for both english and non-english tasks. Embeddings are a numerical representation of text that can be used to measure the relatedness between two pieces of text. Embeddings are useful for search, clustering, recommendations, anomaly detection, and classification tasks.
textembeddings
$0.13免費8KOct 2025
qwen/qwen3-vl-30b-a3b-instruct
texttext
$0.13$0.52
N
nousresearch/hermes-4-70b
texttext
$0.13$0.40
qwen/qwen3-vl-30b-a3b-thinking
texttext
$0.13$1.56
Z
z-ai/glm-4.5-air
texttext
$0.13$0.85
google/gemma-4-26b-a4b-it
Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at a fraction of the compute cost. Supports multimodal input including text, images, and video (up to 60s at 1fps). Features a 256K token context window, native function calling, configurable thinking/reasoning mode, and structured output support. Released under Apache 2.0.
imagetextvideotext
$0.13$0.40262K
B
baidu/ernie-4.5-vl-28b-a3b
texttext
$0.14$0.56
N
nousresearch/hermes-2-pro-llama-3-8b
texttext
$0.14$0.14
T
tencent/hunyuan-a13b-instruct
texttext
$0.14$0.57
google/gemma-4-31b-it
Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function calling, and multilingual support across 140+ languages. Strong on coding, reasoning, and document understanding tasks. Apache 2.0 license.
imagetextvideotext
$0.14$0.40262K
qwen/qwen3-235b-a22b-thinking-2507
texttext
$0.15$1.50
A
allenai/olmo-3.1-32b-think
texttext
$0.15$0.50
U
upstage/solar-pro-3
Solar Pro 3 is Upstage's powerful Mixture-of-Experts (MoE) language model. With 102B total parameters and 12B active parameters per forward pass, it delivers exceptional performance while maintaining computational efficiency. Optimized for Korean with English and Japanese support.
texttext
$0.15$0.60128KJan 2026
openai/gpt-4o-mini-2024-07-18
texttext
$0.15$0.60
openai/gpt-4o-mini
texttext
$0.15$0.60
openai/gpt-4o-mini-search-preview
texttext
$0.15$0.60
google/gemini-embedding-001
texttext
$0.15免費
mistralai/codestral-embed-2505
texttext
$0.15免費
E
essentialai/rnj-1-instruct
texttext
$0.15$0.15
A
allenai/olmo-3-32b-think
texttext
$0.15$0.50
meta-llama/llama-4-maverick
texttext
$0.15$0.60
mistralai/ministral-8b-2512
texttext
$0.15$0.15
cohere/command-r-08-2024
texttext
$0.15$0.60
deepseek/deepseek-chat-v3.1
texttext
$0.15$0.75
qwen/qwen3-next-80b-a3b-thinking
texttext
$0.15$1.20
qwen/qwq-32b
texttext
$0.15$0.40
mistralai/mistral-small-2603
Mistral Small 4 is the next major release in the Mistral Small family, unifying the capabilities of several flagship Mistral models into a single system. It combines strong reasoning from Magistral, multimodal understanding from Pixtral, and agentic coding capabilities from Devstral, enabling one model to handle complex analysis, software development, and visual tasks within the same workflow.
textimagetext
$0.15$0.60262KMar 2026
qwen/qwen3.5-35b-a3b
The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall performance is comparable to that of the Qwen3.5-27B.
textimagevideotext
$0.16$1.30262KFeb 2026
T
thedrummer/rocinante-12b
texttext
$0.17$0.43
meta-llama/llama-guard-4-12b
texttext
$0.18$0.18
deepseek/deepseek-chat-v3-0324
texttext
$0.19$0.87
qwen/qwen3.5-27b
The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of the Qwen3.5-122B-A10B.
textimagevideotext
$0.20$1.56262KFeb 2026
mistralai/mistral-7b-instruct-v0.2
texttext
$0.20$0.20
openai/gpt-5.4-nano
GPT-5.4 nano is the most lightweight and cost-efficient variant of the GPT-5.4 family, optimized for speed-critical and high-volume tasks. It supports text and image inputs and is designed for low-latency use cases such as classification, data extraction, ranking, and sub-agent execution. The model prioritizes responsiveness and efficiency over deep reasoning, making it ideal for pipelines that require fast, reliable outputs at scale. GPT-5.4 nano is well suited for background tasks, real-time systems, and distributed agent architectures where minimizing cost and latency is essential.
fileimagetexttext
$0.20$1.25400KMar 2026
M
meituan/longcat-flash-chat
texttext
$0.20$0.80
A
allenai/molmo-2-8b
texttext
$0.20$0.20
A
allenai/olmo-3.1-32b-instruct
texttext
$0.20$0.60
meta-llama/llama-guard-2-8b
texttext
$0.20$0.20
minimax/minimax-01
texttext
$0.20$1.10
mistralai/ministral-14b-2512
texttext
$0.20$0.20
mistralai/mistral-7b-instruct
texttext
$0.20$0.20
mistralai/mistral-7b-instruct-v0.3
texttext
$0.20$0.20
mistralai/mistral-saba
texttext
$0.20$0.60
P
prime-intellect/intellect-3
texttext
$0.20$1.10
qwen/qwen-2.5-vl-7b-instruct
texttext
$0.20$0.20
qwen/qwen-2.5-coder-32b-instruct
texttext
$0.20$0.20
qwen/qwen2.5-vl-32b-instruct
texttext
$0.20$0.60
qwen/qwen3-vl-235b-a22b-instruct
texttext
$0.20$0.88
x-ai/grok-4-fast
texttext
$0.20$0.50
x-ai/grok-4.1-fast
texttext
$0.20$0.50
x-ai/grok-code-fast-1
texttext
$0.20$1.50
K
kwaipilot/kat-coder-pro
texttext
$0.21$0.83
deepseek/deepseek-v3.1-terminus:exacto
texttext
$0.21$0.79
deepseek/deepseek-v3.1-terminus
texttext
$0.21$0.79
qwen/qwen-vl-plus
texttext
$0.21$0.63
qwen/qwen3-coder
texttext
$0.22$1.00
qwen/qwen3-coder:exacto
texttext
$0.22$1.80
A
arcee-ai/trinity-large-thinking
Trinity Large Thinking is a powerful open source reasoning model from the team at Arcee AI. It shows strong performance in PinchBench, agentic workloads, and reasoning tasks. It is free in open claw for the first five days. Launch video: https://youtu.be/Gc82AXLa0Rg?si=4RLn6WBz33qT--B7
texttext
$0.22$0.85262K
google/gemini-3.1-flash-lite-preview
Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality and approaches Gemini 2.5 Flash performance across key capabilities. Improvements span audio input/ASR, RAG snippet ranking, translation, data extraction, and code completion. Supports full thinking levels (minimal, low, medium, high) for fine-grained cost/performance trade-offs. Priced at half the cost of Gemini 3 Flash.
textimagevideofileaudiotext
$0.25$1.501MMar 2026
openai/gpt-5.1-codex-mini
GPT-5.1-Codex-Mini is a smaller and faster version of GPT-5.1-Codex
imagetexttext
$0.25$2.00400KNov 2025
openai/gpt-5-mini
GPT-5 Mini is a compact version of GPT-5, designed to handle lighter-weight reasoning tasks. It provides the same instruction-following and safety-tuning benefits as GPT-5, but with reduced latency and cost. GPT-5 Mini is the successor to OpenAI's o4-mini model.
textimagefiletext
$0.25$2.00400KAug 2025
I
inception/mercury
texttext
$0.25$1.00
I
inception/mercury-coder
texttext
$0.25$1.00
B
bytedance-seed/seed-1.6
texttext
$0.25$2.00
anthropic/claude-3-haiku
texttext
$0.25$1.25
I
inception/mercury-2
Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Instead of generating tokens sequentially, Mercury 2 produces and refines multiple tokens in parallel, achieving >1,000 tokens/sec on standard GPUs. Mercury 2 is 5x+ faster than leading speed-optimized LLMs like Claude 4.5 Haiku and GPT 5 Mini, at a fraction of the cost. Mercury 2 supports tunable reasoning levels, 128K context, native tool use, and schema-aligned JSON output. Built for coding workflows where latency compounds, real-time voice/search, and agent loops. OpenAI API compatible. Read more in the [blog post](https://www.inceptionlabs.ai/blog/introducing-mercury-2).
texttext
$0.25$0.75128KMar 2026
B
bytedance-seed/seed-2.0-lite
Seed-2.0-Lite is a balanced model designed for high-frequency enterprise workloads, optimizing for both capability and cost. Its overall performance surpasses the previous-generation Seed-1.8. It is well-suited for production tasks such as unstructured information processing, text content creation, search and recommendation, and data analysis. The model supports long-context processing, multi-source information fusion, multi-step instruction execution, and high-fidelity structured outputs—delivering stable quality while significantly reducing cost.
textimagevideotext
$0.25$2.00262KMar 2026
minimax/minimax-m2
texttext
$0.26$1.00
qwen/qwen3.5-122b-a10b
The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of overall performance, this model is second only to Qwen3.5-397B-A17B. Its text capabilities significantly outperform those of Qwen3-235B-2507, and its visual capabilities surpass those of Qwen3-VL-235B.
textimagevideotext
$0.26$2.08262KFeb 2026
qwen/qwen3-vl-235b-a22b-thinking
texttext
$0.26$2.60
deepseek/deepseek-v3.2
texttext
$0.26$0.38
deepseek/deepseek-v3.2-exp
texttext
$0.27$0.41
minimax/minimax-m2.1
texttext
$0.27$0.95
N
nex-agi/deepseek-v3.1-nex-n1
texttext
$0.27$1.00
B
baidu/ernie-4.5-300b-a47b
texttext
$0.28$1.10
deepseek/deepseek-r1-distill-qwen-32b
texttext
$0.29$0.29
minimax/minimax-m2.7
MiniMax-M2.7 is a next-generation large language model designed for autonomous, real-world productivity and continuous improvement. Built to actively participate in its own evolution, M2.7 integrates advanced agentic capabilities through multi-agent collaboration, enabling it to plan, execute, and refine complex tasks across dynamic environments. Trained for production-grade performance, M2.7 handles workflows such as live debugging, root cause analysis, financial modeling, and full document generation across Word, Excel, and PowerPoint. It delivers strong results on benchmarks including 56.2% on SWE-Pro and 57.0% on Terminal Bench 2, while achieving a 1495 ELO on GDPval-AA, setting a new standard for multi-agent systems operating in real-world digital workflows.
texttext
$0.30$1.20205KMar 2026
minimax/minimax-m2.5
texttext
$0.30$1.10
T
thedrummer/cydonia-24b-v4.1
texttext
$0.30$0.50
x-ai/grok-3-mini-beta
texttext
$0.30$0.50
x-ai/grok-3-mini
texttext
$0.30$0.50
K
kwaipilot/kat-coder-pro-v2
KAT-Coder-Pro V2 is the latest high-performance model in KwaiKAT’s KAT-Coder series, designed for complex enterprise-grade software engineering and SaaS integration. It builds on the agentic coding strengths of earlier versions, with a focus on large-scale production environments, multi-system coordination, and seamless integration across modern software stacks, while also supporting web aesthetics generation to produce production-grade landing pages and presentation decks.
texttext
$0.30$1.20256KMar 2026
google/gemini-2.5-flash
texttext
$0.30$2.50
minimax/minimax-m2-her
texttext
$0.30$1.20
mistralai/codestral-2508
texttext
$0.30$0.90
a
amazon/nova-2-lite-v1
texttext
$0.30$2.50
N
nousresearch/hermes-3-llama-3.1-70b
texttext
$0.30$0.30
Z
z-ai/glm-4.6v
texttext
$0.30$0.90
qwen/qwen3-coder-flash
texttext
$0.30$1.50
deepseek/deepseek-chat
texttext
$0.32$0.89
qwen/qwen3.6-plus
Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention with sparse mixture-of-experts routing, enabling strong scalability and high-performance inference. Compared to the 3.5 series, it delivers...
textimagevideotext
$0.33$1.951M
mistralai/mistral-small-3.1-24b-instruct
texttext
$0.35$0.56
Z
z-ai/glm-4.6
texttext
$0.35$1.71
Z
z-ai/glm-4.7
texttext
$0.38$1.70
X
xiaomi/mimo-v2-omni
MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step planning, tool use, and code execution - making it well-suited for complex real-world tasks that span modalities. 256K context window.
textaudioimagevideotext
$0.40$2.00262KMar 2026
M
moonshotai/kimi-k2-0905
texttext
$0.40$2.00
qwen/qwen-plus-2025-07-28
texttext
$0.40$1.20
qwen/qwen3.5-plus-02-15
texttext
$0.40$2.40
deepseek/deepseek-r1-0528
texttext
$0.40$1.75
deepseek/deepseek-v3.2-speciale
texttext
$0.40$1.20
meta-llama/llama-3.1-70b-instruct
texttext
$0.40$0.40
minimax/minimax-m1
texttext
$0.40$2.20
mistralai/devstral-2512
texttext
$0.40$2.00
mistralai/devstral-medium
texttext
$0.40$2.00
mistralai/mistral-medium-3
texttext
$0.40$2.00
qwen/qwen-plus
texttext
$0.40$1.20
qwen/qwen-plus-2025-07-28:thinking
texttext
$0.40$1.20
T
thedrummer/unslopnemo-12b
texttext
$0.40$0.40
mistralai/mistral-medium-3.1
Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost compared to traditional large models, making it suitable for scalable deployments across professional and industrial use cases. The model excels in domains such as coding, STEM reasoning, and enterprise adaptation. It supports hybrid, on-prem, and in-VPC deployments and is optimized for integration into custom workflows. Mistral Medium 3.1 offers competitive accuracy relative to larger models like Claude Sonnet 3.5/3.7, Llama 4 Maverick, and Command R+, while maintaining broad compatibility across cloud environments.
textimagetext
$0.40$2.00131KAug 2025
openai/gpt-4.1-mini
GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost. It retains a 1 million token context window and scores 45.1% on hard instruction evals, 35.8% on MultiChallenge, and 84.1% on IFEval. Mini also shows strong coding ability (e.g., 31.6% on Aider’s polyglot diff benchmark) and vision understanding, making it suitable for interactive applications with tight performance constraints.
imagetextfiletext
$0.40$1.601MApr 2025
B
baidu/ernie-4.5-vl-424b-a47b
texttext
$0.42$1.25
Z
z-ai/glm-4.6:exacto
texttext
$0.44$1.76
M
moonshotai/kimi-k2.5
texttext
$0.45$2.20
U
undi95/remm-slerp-l2-13b
texttext
$0.45$0.65
qwen/qwen3-235b-a22b
texttext
$0.46$1.82
M
moonshotai/kimi-k2-thinking
texttext
$0.47$2.00
M
moonshotai/kimi-k2
texttext
$0.50$2.40
google/gemini-3-flash-preview
texttext
$0.50$3.00
mistralai/mistral-large-2512
texttext
$0.50$1.50
openai/gpt-3.5-turbo
texttext
$0.50$1.50
meta-llama/llama-3-70b-instruct
texttext
$0.51$0.74
mistralai/mixtral-8x7b-instruct
texttext
$0.54$0.54
qwen/qwen3.5-397b-a17b
texttext
$0.55$3.50
T
thedrummer/skyfall-36b-v2
texttext
$0.55$0.80
Z
z-ai/glm-4.5
texttext
$0.55$2.00
M
moonshotai/kimi-k2-0905:exacto
texttext
$0.60$2.50
nvidia/llama-3.1-nemotron-ultra-253b-v1
texttext
$0.60$1.80
W
writer/palmyra-x5
texttext
$0.60$6.00
Z
z-ai/glm-4.5v
texttext
$0.60$1.80
microsoft/wizardlm-2-8x22b
texttext
$0.62$0.62
google/gemma-2-27b-it
texttext
$0.65$0.65
S
sao10k/l3.3-euryale-70b
texttext
$0.65$0.75
S
sao10k/l3.1-euryale-70b
texttext
$0.65$0.75
deepseek/deepseek-r1
texttext
$0.70$2.50
deepseek/deepseek-r1-distill-llama-70b
texttext
$0.70$0.80
A
aion-labs/aion-1.0-mini
Aion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1 model, designed for strong performance in reasoning domains such as mathematics, coding, and logic. It is a modified variant of a FuseAI model that outperforms R1-Distill-Qwen-32B and R1-Distill-Llama-70B, with benchmark results available on its [Hugging Face page](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview), independently replicated for verification.
texttext
$0.70$1.40131KFeb 2025
openai/gpt-5.4-mini
GPT-5.4 mini brings the core capabilities of GPT-5.4 to a faster, more efficient model optimized for high-throughput workloads. It supports text and image inputs with strong performance across reasoning, coding, and tool use, while reducing latency and cost for large-scale deployments. The model is designed for production environments that require a balance of capability and efficiency, making it well suited for chat applications, coding assistants, and agent workflows that operate at scale. GPT-5.4 mini delivers reliable instruction following, solid multi-step reasoning, and consistent performance across diverse tasks with improved cost efficiency.
fileimagetexttext
$0.75$4.50400KMar 2026
M
mancer/weaver
texttext
$0.75$1.00
M
morph/morph-v3-fast
texttext
$0.80$1.20
qwen/qwen2.5-vl-72b-instruct
texttext
$0.80$0.80
E
eleutherai/llemma_7b
texttext
$0.80$1.20
A
aion-labs/aion-rp-llama-3.1-8b
Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of the RPBench-Auto benchmark, a roleplaying-specific variant of Arena-Hard-Auto, where LLMs evaluate each other’s responses. It is a fine-tuned base model rather than an instruct model, designed to produce more natural and varied writing.
texttext
$0.80$1.6033KFeb 2025
A
alfredpros/codellama-7b-instruct-solidity
texttext
$0.80$1.20
a
amazon/nova-pro-v1
texttext
$0.80$3.20
anthropic/claude-3.5-haiku
texttext
$0.80$4.00
qwen/qwen-vl-max
texttext
$0.80$3.20
A
aion-labs/aion-2.0
Aion-2.0 is a variant of DeepSeek V3.2 optimized for immersive roleplaying and storytelling. It is particularly strong at introducing tension, crises, and conflict into stories, making narratives feel more engaging. It also handles mature and darker themes with more nuance and depth.
texttext
$0.80$1.60131KFeb 2026
S
switchpoint/router
texttext
$0.85$3.40
M
morph/morph-v3-large
texttext
$0.90$1.90
Z
z-ai/glm-5
texttext
$0.95$2.55
Z
z-ai/glm-5-turbo
GLM-5 Turbo is a new model from Z.ai designed for fast inference and strong performance in agent-driven environments such as OpenClaw scenarios. It is deeply optimized for real-world agent workflows involving long execution chains, with improved complex instruction decomposition, tool use, scheduled and persistent execution, and overall stability across extended tasks.
texttext
$0.96$3.20203KMar 2026
N
neversleep/noromaid-20b
texttext
$1.00$1.75
X
xiaomi/mimo-v2-pro
MiMo-V2-Pro is Xiaomi's flagship foundation model, featuring over 1T total parameters and a 1M context length, deeply optimized for agentic scenarios. It is highly adaptable to general agent frameworks like OpenClaw. It ranks among the global top tier in the standard PinchBench and ClawBench benchmarks, with perceived performance approaching that of Opus 4.6. MiMo-V2-Pro is designed to serve as the brain of agent systems, orchestrating complex workflows, driving production engineering tasks, and delivering results reliably.
texttext
$1.00$3.001MMar 2026
anthropic/claude-haiku-4.5
texttext
$1.00$5.00
N
nousresearch/hermes-3-llama-3.1-405b
texttext
$1.00$1.00
N
nousresearch/hermes-4-405b
texttext
$1.00$3.00
openai/gpt-3.5-turbo-0613
texttext
$1.00$2.00
perplexity/sonar
texttext
$1.00$1.00
qwen/qwen3-coder-plus
texttext
$1.00$5.00
R
relace/relace-search
texttext
$1.00$3.00
openai/o3-mini-high
OpenAI o3-mini-high is the same model as [o3-mini](/openai/o3-mini) with reasoning_effort set to high. o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and coding. The model features three adjustable reasoning effort levels and supports key developer capabilities including function calling, structured outputs, and streaming, though it does not include vision processing capabilities. The model demonstrates significant improvements over its predecessor, with expert testers preferring its responses 56% of the time and noting a 39% reduction in major errors on complex questions. With medium reasoning effort settings, o3-mini matches the performance of the larger o1 model on challenging reasoning evaluations like AIME and GPQA, while maintaining lower latency and cost.
textfiletext
$1.10$4.40200KFeb 2025
openai/o3-mini
OpenAI o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and coding. This model supports the `reasoning_effort` parameter, which can be set to "high", "medium", or "low" to control the thinking time of the model. The default is "medium". OpenRouter also offers the model slug `openai/o3-mini-high` to default the parameter to "high". The model features three adjustable reasoning effort levels and supports key developer capabilities including function calling, structured outputs, and streaming, though it does not include vision processing capabilities. The model demonstrates significant improvements over its predecessor, with expert testers preferring its responses 56% of the time and noting a 39% reduction in major errors on complex questions. With medium reasoning effort settings, o3-mini matches the performance of the larger o1 model on challenging reasoning evaluations like AIME and GPQA, while maintaining lower latency and cost.
textfiletext
$1.10$4.40200KJan 2025
openai/o4-mini
OpenAI o4-mini is a compact reasoning model in the o-series, optimized for fast, cost-efficient performance while retaining strong multimodal and agentic capabilities. It supports tool use and demonstrates competitive reasoning and coding performance across benchmarks like AIME (99.5% with Python) and SWE-bench, outperforming its predecessor o3-mini and even approaching o3 in some domains. Despite its smaller size, o4-mini exhibits high accuracy in STEM tasks, visual problem solving (e.g., MathVista, MMMU), and code editing. It is especially well-suited for high-throughput scenarios where latency or cost is critical. Thanks to its efficient architecture and refined reinforcement learning training, o4-mini can chain tools, generate structured outputs, and solve multi-step tasks with minimal delay—often in under a minute.
imagetextfiletext
$1.10$4.40200KApr 2025
openai/o4-mini-high
OpenAI o4-mini-high is the same model as [o4-mini](/openai/o4-mini) with reasoning_effort set to high. OpenAI o4-mini is a compact reasoning model in the o-series, optimized for fast, cost-efficient performance while retaining strong multimodal and agentic capabilities. It supports tool use and demonstrates competitive reasoning and coding performance across benchmarks like AIME (99.5% with Python) and SWE-bench, outperforming its predecessor o3-mini and even approaching o3 in some domains. Despite its smaller size, o4-mini exhibits high accuracy in STEM tasks, visual problem solving (e.g., MathVista, MMMU), and code editing. It is especially well-suited for high-throughput scenarios where latency or cost is critical. Thanks to its efficient architecture and refined reinforcement learning training, o4-mini can chain tools, generate structured outputs, and solve multi-step tasks with minimal delay—often in under a minute.
imagetextfiletext
$1.10$4.40200KApr 2025
nvidia/llama-3.1-nemotron-70b-instruct
texttext
$1.20$1.20
qwen/qwen3-max
texttext
$1.20$6.00
qwen/qwen3-max-thinking
texttext
$1.20$6.00
Z
z-ai/glm-5v-turbo
GLM-5V-Turbo is Z.ai’s first native multimodal agent foundation model, built for vision-based coding and agent-driven tasks. It natively handles image, video, and text inputs, excels at long-horizon planning, complex coding, and task execution, and works seamlessly with agents to complete the full loop of “perceive → plan → execute“.
imagetextvideotext
$1.20$4.00203K
openai/gpt-5.3-codex
GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with the broader reasoning and professional knowledge capabilities of GPT-5.2. It achieves state-of-the-art results on SWE-Bench Pro and strong performance on Terminal-Bench 2.0 and OSWorld-Verified, reflecting improved multi-language coding, terminal proficiency, and real-world computer-use skills. The model is optimized for long-running, tool-using workflows and supports interactive steering during execution, making it suitable for complex development tasks, debugging, deployment, and iterative product work. Beyond coding, GPT-5.3-Codex performs strongly on structured knowledge-work benchmarks such as GDPval, supporting tasks like document drafting, spreadsheet analysis, slide creation, and operational research across domains. It is trained with enhanced cybersecurity awareness, including vulnerability identification capabilities, and deployed with additional safeguards for high-risk use cases. Compared to prior Codex models, it is more token-efficient and approximately 25% faster, targeting professional end-to-end workflows that span reasoning, execution, and computer interaction.
textimagetext
$1.75$1.22$14.00$9.80400KFeb 2026
google/gemini-2.5-pro-preview-05-06
texttext
$1.25$10.00
openai/gpt-5
GPT-5 is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like "think hard about this." Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks.
textimagefiletext
$1.25$10.00400KAug 2025
openai/gpt-5-chat
GPT-5 Chat is designed for advanced, natural, multimodal, and context-aware conversations for enterprise applications.
fileimagetexttext
$1.25$10.00128KAug 2025
openai/gpt-5.1
GPT-5.1 is the latest frontier-grade model in the GPT-5 series, offering stronger general-purpose reasoning, improved instruction adherence, and a more natural conversational style compared to GPT-5. It uses adaptive reasoning to allocate computation dynamically, responding quickly to simple queries while spending more depth on complex tasks. The model produces clearer, more grounded explanations with reduced jargon, making it easier to follow even on technical or multi-step problems. Built for broad task coverage, GPT-5.1 delivers consistent gains across math, coding, and structured analysis workloads, with more coherent long-form answers and improved tool-use reliability. It also features refined conversational alignment, enabling warmer, more intuitive responses without compromising precision. GPT-5.1 serves as the primary full-capability successor to GPT-5
imagetextfiletext
$1.25$10.00400KNov 2025
openai/gpt-5.1-chat
GPT-5.1 Chat (AKA Instant is the fast, lightweight member of the 5.1 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively “think” on harder queries, improving accuracy on math, coding, and multi-step tasks without slowing down typical conversations. The model is warmer and more conversational by default, with better instruction following and more stable short-form reasoning. GPT-5.1 Chat is designed for high-throughput, interactive workloads where responsiveness and consistency matter more than deep deliberation.
fileimagetexttext
$1.25$10.00128KNov 2025
openai/gpt-5-codex
GPT-5-Codex is a specialized version of GPT-5 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5, Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. Reasoning effort can be adjusted with the `reasoning.effort` parameter. Read the [docs here](https://openrouter.ai/docs/use-cases/reasoning-tokens#reasoning-effort-level) Codex integrates into developer environments including the CLI, IDE extensions, GitHub, and cloud tasks. It adapts reasoning effort dynamically—providing fast responses for small tasks while sustaining extended multi-hour runs for large projects. The model is trained to perform structured code reviews, catching critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs such as images or screenshots for UI development and integrates tool use for search, dependency installation, and environment setup. Codex is intended specifically for agentic coding applications.
textimagetext
$1.25$10.00400KSep 2025
openai/gpt-5.1-codex
GPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5.1, Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. Reasoning effort can be adjusted with the `reasoning.effort` parameter. Read the [docs here](https://openrouter.ai/docs/use-cases/reasoning-tokens#reasoning-effort-level) Codex integrates into developer environments including the CLI, IDE extensions, GitHub, and cloud tasks. It adapts reasoning effort dynamically—providing fast responses for small tasks while sustaining extended multi-hour runs for large projects. The model is trained to perform structured code reviews, catching critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs such as images or screenshots for UI development and integrates tool use for search, dependency installation, and environment setup. Codex is intended specifically for agentic coding applications.
textimagetext
$1.25$10.00400KNov 2025
google/gemini-2.5-pro
texttext
$1.25$10.00
google/gemini-2.5-pro-preview
texttext
$1.25$10.00
D
deepcogito/cogito-v2.1-671b
texttext
$1.25$1.25
openai/gpt-5.1-codex-max
GPT-5.1-Codex-Max is OpenAI’s latest agentic coding model, designed for long-running, high-context software development tasks. It is based on an updated version of the 5.1 reasoning stack and trained on agentic workflows spanning software engineering, mathematics, and research. GPT-5.1-Codex-Max delivers faster performance, improved reasoning, and higher token efficiency across the development lifecycle.
textimagetext
$1.25$10.00400KDec 2025
Z
z-ai/glm-5.1
GLM-5.1 delivers a major leap in coding capability, with particularly significant gains in handling long-horizon tasks. Unlike previous models built around minute-level interactions, GLM-5.1 can work independently and continuously on...
texttext
$1.26$3.96203K
S
sao10k/l3-euryale-70b
texttext
$1.48$1.48
openai/gpt-3.5-turbo-instruct
texttext
$1.50$2.00
qwen/qwen-max
texttext
$1.60$6.40
openai/gpt-5.2-codex
GPT-5.2-Codex is an upgraded version of GPT-5.1-Codex optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5.1-Codex, 5.2-Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. Reasoning effort can be adjusted with the `reasoning.effort` parameter. Read the [docs here](https://openrouter.ai/docs/use-cases/reasoning-tokens#reasoning-effort-level) Codex integrates into developer environments including the CLI, IDE extensions, GitHub, and cloud tasks. It adapts reasoning effort dynamically—providing fast responses for small tasks while sustaining extended multi-hour runs for large projects. The model is trained to perform structured code reviews, catching critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs such as images or screenshots for UI development and integrates tool use for search, dependency installation, and environment setup. Codex is intended specifically for agentic coding applications.
textimagetext
$1.75$14.00400KJan 2026
openai/gpt-5.2-chat
GPT-5.2 Chat (AKA Instant) is the fast, lightweight member of the 5.2 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively “think” on harder queries, improving accuracy on math, coding, and multi-step tasks without slowing down typical conversations. The model is warmer and more conversational by default, with better instruction following and more stable short-form reasoning. GPT-5.2 Chat is designed for high-throughput, interactive workloads where responsiveness and consistency matter more than deep deliberation.
fileimagetexttext
$1.75$14.00128KDec 2025
openai/gpt-5.2
GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agentic and long context perfomance compared to GPT-5.1. It uses adaptive reasoning to allocate computation dynamically, responding quickly to simple queries while spending more depth on complex tasks. Built for broad task coverage, GPT-5.2 delivers consistent gains across math, coding, sciende, and tool calling workloads, with more coherent long-form answers and improved tool-use reliability.
fileimagetexttext
$1.75$14.00400KDec 2025
openai/gpt-5.3-chat
GPT-5.3 Chat is an update to ChatGPT's most-used model that makes everyday conversations smoother, more useful, and more directly helpful. It delivers more accurate answers with better contextualization and significantly reduces unnecessary refusals, caveats, and overly cautious phrasing that can interrupt conversational flow.
textimagefiletext
$1.75$14.00128KMar 2026
google/gemini-3.1-pro-preview
texttext
$2.00$12.00
A
ai21/jamba-large-1.7
Jamba Large 1.7 is the latest model in the Jamba open family, offering improvements in grounding, instruction-following, and overall efficiency. Built on a hybrid SSM-Transformer architecture with a 256K context window, it delivers more accurate, contextually grounded responses and better steerability than previous versions.
texttext
$2.00$8.00256KAug 2025
google/gemini-3.1-pro-preview-customtools
Gemini 3.1 Pro Preview Custom Tools is a variant of Gemini 3.1 Pro that improves tool selection behavior by preventing overuse of a general bash tool when more efficient third-party or user-defined functions are available. This specialized preview endpoint significantly increases function calling reliability and ensures the model selects the most appropriate tool in coding agents and complex, multi-tool workflows. It retains the core strengths of Gemini 3.1 Pro, including multimodal reasoning across text, image, video, audio, and code, a 1M-token context window, and strong software engineering performance.
textaudioimagevideofiletext
$2.00$12.001MFeb 2026
openai/gpt-4.1
GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering, and long-context reasoning. It supports a 1 million token context window and outperforms GPT-4o and GPT-4.5 across coding (54.6% SWE-bench Verified), instruction compliance (87.4% IFEval), and multimodal understanding benchmarks. It is tuned for precise code diffs, agent reliability, and high recall in large document contexts, making it ideal for agents, IDE tooling, and enterprise knowledge retrieval.
imagetextfiletext
$2.00$8.001MApr 2025
mistralai/mixtral-8x22b-instruct
Mistral's official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include: - strong math, coding, and reasoning - large context length (64k) - fluency in English, French, Italian, German, and Spanish See benchmarks on the launch announcement [here](https://mistral.ai/news/mixtral-8x22b/). #moe
texttext
$2.00$6.0066KApr 2024
mistralai/pixtral-large-2411
Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of [Mistral Large 2](/mistralai/mistral-large-2411). The model is able to understand documents, charts and natural images. The model is available under the Mistral Research License (MRL) for research and educational use, and the Mistral Commercial License for experimentation, testing, and production for commercial purposes.
textimagetext
$2.00$6.00131KNov 2024
perplexity/sonar-deep-research
texttext
$2.00$8.00
google/gemini-3-pro-preview
texttext
$2.00$12.00
mistralai/mistral-large
texttext
$2.00$6.00
mistralai/mistral-large-2407
texttext
$2.00$6.00
mistralai/mistral-large-2411
texttext
$2.00$6.00
perplexity/sonar-reasoning-pro
texttext
$2.00$8.00
openai/o3
o3 is a well-rounded and powerful model across domains. It sets a new standard for math, science, coding, and visual reasoning tasks. It also excels at technical writing and instruction-following. Use it to think through multi-step problems that involve analysis across text, code, and images.
imagetextfiletext
$2.00$8.00200KApr 2025
openai/o4-mini-deep-research
o4-mini-deep-research is OpenAI's faster, more affordable deep research model—ideal for tackling complex, multi-step research tasks. Note: This model always uses the 'web_search' tool which adds additional cost.
fileimagetexttext
$2.00$8.00200KOct 2025
x-ai/grok-4.20
Grok 4.20 is xAI's newest flagship model with industry-leading speed and agentic tool calling capabilities. It combines the lowest hallucination rate on the market with strict prompt adherance, delivering consistently precise and truthful responses. Reasoning can be enabled/disabled using the `reasoning` `enabled` parameter in the API. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#controlling-reasoning-tokens)
textimagetext
$2.00$6.002MMar 2026
x-ai/grok-4.20-multi-agent
Grok 4.20 Multi-Agent is a variant of xAI’s Grok 4.20 designed for collaborative, agent-based workflows. Multiple agents operate in parallel to conduct deep research, coordinate tool use, and synthesize information across complex tasks. Reasoning effort behavior: - low / medium: 4 agents - high / xhigh: 16 agents
textimagefiletext
$2.00$6.002MMar 2026
x-ai/grok-4.20-beta
Grok 4.20 Beta is xAI's newest flagship model with industry-leading speed and agentic tool calling capabilities. It combines the lowest hallucination rate on the market with strict prompt adherance, delivering consistently precise and truthful responses. Reasoning can be enabled/disabled using the `reasoning` `enabled` parameter in the API. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#controlling-reasoning-tokens)
textimagetext
$2.00$6.002MMar 2026
x-ai/grok-4.20-multi-agent-beta
Grok 4.20 Multi-Agent Beta is a variant of xAI’s Grok 4.20 designed for collaborative, agent-based workflows. Multiple agents operate in parallel to conduct deep research, coordinate tool use, and synthesize information across complex tasks. Reasoning effort behavior: - low / medium: 4 agents - high / xhigh: 16 agents
textimagetext
$2.00$6.002MMar 2026
openai/gpt-4o-search-preview
texttext
$2.50$10.00
openai/gpt-4o
texttext
$2.50$10.00
I
inflection/inflection-3-productivity
texttext
$2.50$10.00
I
inflection/inflection-3-pi
texttext
$2.50$10.00
a
amazon/nova-premier-v1
texttext
$2.50$12.50
cohere/command-a
texttext
$2.50$10.00
cohere/command-r-plus-08-2024
texttext
$2.50$10.00
openai/gpt-4o-2024-11-20
texttext
$2.50$10.00
openai/gpt-4o-2024-08-06
texttext
$2.50$10.00
openai/gpt-5.4
GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window (922K input, 128K output) with support for text and image inputs, enabling high-context reasoning, coding, and multimodal analysis within the same workflow. The model delivers improved performance in coding, document understanding, tool use, and instruction following. It is designed as a strong default for both general-purpose tasks and software engineering, capable of generating production-quality code, synthesizing information across multiple sources, and executing complex multi-step workflows with fewer iterations and greater token efficiency.
textimagefiletext
$2.50$15.001MMar 2026
anthropic/claude-sonnet-4.6
Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative development, complex codebase navigation, end-to-end project management with memory, polished document creation, and confident computer use for web QA and workflow automation.
textimagetext
$3.00$15.001MFeb 2026
A
anthracite-org/magnum-v4-72b
texttext
$3.00$5.00
anthropic/claude-3.7-sonnet
texttext
$3.00$15.00
anthropic/claude-sonnet-4.5
texttext
$3.00$15.00
anthropic/claude-sonnet-4
texttext
$3.00$15.00
openai/gpt-3.5-turbo-16k
texttext
$3.00$4.00
perplexity/sonar-pro-search
texttext
$3.00$15.00
perplexity/sonar-pro
texttext
$3.00$15.00
S
sao10k/l3.1-70b-hanami-x1
texttext
$3.00$3.00
x-ai/grok-3
texttext
$3.00$15.00
x-ai/grok-3-beta
texttext
$3.00$15.00
x-ai/grok-4
texttext
$3.00$15.00
anthropic/claude-3.7-sonnet:thinking
Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and extended, step-by-step processing for complex tasks. The model demonstrates notable improvements in coding, particularly in front-end development and full-stack updates, and excels in agentic workflows, where it can autonomously navigate multi-step processes. Claude 3.7 Sonnet maintains performance parity with its predecessor in standard mode while offering an extended reasoning mode for enhanced accuracy in math, coding, and instruction-following tasks. Read more at the [blog post here](https://www.anthropic.com/news/claude-3-7-sonnet)
textimagefiletext
$3.00$15.00200KFeb 2025
A
alpindale/goliath-120b
texttext
$3.75$7.50
meta-llama/llama-3.1-405b-instruct
texttext
$4.00$4.00
meta-llama/llama-3.1-405b
texttext
$4.00$4.00
A
aion-labs/aion-1.0
Aion-1.0 is a multi-model system designed for high performance across various tasks, including reasoning and coding. It is built on DeepSeek-R1, augmented with additional models and techniques such as Tree of Thoughts (ToT) and Mixture of Experts (MoE). It is Aion Lab's most powerful reasoning model.
texttext
$4.00$8.00131KFeb 2025
R
raifle/sorcererlm-8x22b
texttext
$4.50$4.50
anthropic/claude-opus-4.6
texttext
$5.00$25.00
openai/gpt-4o-2024-05-13
texttext
$5.00$15.00
anthropic/claude-opus-4.5
texttext
$5.00$25.00
anthropic/claude-opus-4.7
Opus 4.7 is the next generation of Anthropic's Opus family, built for long-running, asynchronous agents. Building on the coding and agentic strengths of Opus 4.6, it delivers stronger performance on...
textimagetext
$5.00$25.001M
anthropic/claude-3.5-sonnet
texttext
$6.00$30.00
openai/gpt-4o:extended
texttext
$6.00$18.00
openai/gpt-4-turbo
texttext
$10.00$30.00
openai/gpt-4-turbo-preview
texttext
$10.00$30.00
openai/gpt-4-1106-preview
The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to April 2023.
texttext
$10.00$30.00128KNov 2023
openai/o3-deep-research
o3-deep-research is OpenAI's advanced model for deep research, designed to tackle complex, multi-step research tasks. Note: This model always uses the 'web_search' tool which adds additional cost.
imagetextfiletext
$10.00$40.00200KOct 2025
anthropic/claude-opus-4
texttext
$15.00$75.00
openai/gpt-5-pro
GPT-5 Pro is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like "think hard about this." Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks.
imagetextfiletext
$15.00$120.00400KOct 2025
openai/o1
The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. The o1 models are optimized for math, science, programming, and other STEM-related tasks. They consistently exhibit PhD-level accuracy on benchmarks in physics, chemistry, and biology. Learn more in the [launch announcement](https://openai.com/o1).
textimagefiletext
$15.00$60.00200KDec 2024
anthropic/claude-opus-4.1
texttext
$15.00$75.00
openai/o3-pro
The o-series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o3-pro model uses more compute to think harder and provide consistently better answers. Note that BYOK is required for this model. Set up here: https://openrouter.ai/settings/integrations
textfileimagetext
$20.00$80.00200KJun 2025
openai/gpt-5.2-pro
GPT-5.2 Pro is OpenAI’s most advanced model, offering major improvements in agentic coding and long context performance over GPT-5 Pro. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like "think hard about this." Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks.
imagetextfiletext
$21.00$168.00400KDec 2025
openai/gpt-4-0314
texttext
$30.00$60.00
openai/gpt-4
texttext
$30.00$60.00
anthropic/claude-opus-4.6-fast
Fast-mode variant of [Opus 4.6](/anthropic/claude-opus-4.6) - identical capabilities with higher output speed at premium 6x pricing. Learn more in Anthropic's docs: https://platform.claude.com/docs/en/build-with-claude/fast-mode
textimagetext
$30.00$150.001M
openai/gpt-5.4-pro
GPT-5.4 Pro is OpenAI's most advanced model, building on GPT-5.4's unified architecture with enhanced reasoning capabilities for complex, high-stakes tasks. It features a 1M+ token context window (922K input, 128K output) with support for text and image inputs. Optimized for step-by-step reasoning, instruction following, and accuracy, GPT-5.4 Pro excels at agentic coding, long-context workflows, and multi-step problem solving.
textimagefiletext
$30.00$180.001MMar 2026
openai/o1-pro
The o1 series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o1-pro model uses more compute to think harder and provide consistently better answers.
textimagefiletext
$150.00$600.00200KMar 2025
Support
Support
Hi! How can we help you?
Send a message and we'll get back to you soon.