Nova Micro
Now Serving
-
Provider
Amazon Bedrock
-
Originator
Amazon
-
Max Context Tokens
128000
-
Max Generated Tokens
5000
-
Data Freshness
October 2023
$0.14 Per Million Generated Tokens
Generated token cost by providerNova Micro
Now Serving
-
Version
nova_micro
-
Description
Amazon Nova Micro is a text only model that delivers the lowest latency responses at very low cost. It is highly performant at language understanding, translation, reasoning, code completion, brainstorming, and mathematical problem-solving. With its generation speed of over 200 tokens per second, Amazon Nova Micro is ideal for applications that require fast responses. Amazon Nova is a new generation of state-of-the-art (SOTA) foundation models (FMs) that deliver frontier intelligence and industry leading price-performance, available exclusively on Amazon Bedrock.
$0.035 Per Million Context Tokens
Context token cost by providerNova Lite
Now Serving
-
Provider
Amazon Bedrock
-
Originator
Amazon
-
Max Context Tokens
300000
-
Max Generated Tokens
5000
-
Data Freshness
October 2023
$0.24 Per Million Generated Tokens
Generated token cost by providerNova Lite
Now Serving
-
Version
nova_lite
-
Description
Amazon Nova Lite is a very low-cost multimodal model that is lightning fast for processing image, video, and text inputs. Amazon Nova Lite’s accuracy across a breadth of tasks, coupled with its lightning-fast speed, makes it suitable for a wide range of interactive and high-volume applications where cost is a key consideration. Amazon Nova is a new generation of state-of-the-art (SOTA) foundation models (FMs) that deliver frontier intelligence and industry leading price-performance, available exclusively on Amazon Bedrock.
$0.06 Per Million Context Tokens
Context token cost by providerNova Pro
Now Serving
-
Provider
Amazon Bedrock
-
Originator
Amazon
-
Max Context Tokens
300000
-
Max Generated Tokens
5000
-
Data Freshness
October 2023
$4.0 Per Million Generated Tokens
Generated token cost by providerNova Pro
Now Serving
-
Version
nova_pro
-
Description
Amazon Nova Pro is a highly capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Pro’s capabilities, coupled with its industry-leading speed and cost efficiency, makes it a compelling model for almost any task, including video summarization, Q&A, mathematical reasoning, software development, and AI agents that can execute multi-step workflows. In addition to state-of-the-art accuracy on text and visual intelligence benchmarks, Amazon Nova Pro excels at instruction following and agentic workflows as measured by Comprehensive RAG Benchmark (CRAG), the Berkeley Function Calling Leaderboard, and Mind2Web. Amazon Nova is a new generation of state-of-the-art (SOTA) foundation models (FMs) that deliver frontier intelligence and industry leading price-performance, available exclusively on Amazon Bedrock.
$1.0 Per Million Context Tokens
Context token cost by providerNova Premier
Now Serving
-
Provider
Amazon Bedrock
-
Originator
Amazon
-
Max Context Tokens
1000000
-
Max Generated Tokens
5000
-
Data Freshness
October 2023
$12.5 Per Million Generated Tokens
Generated token cost by providerNova Premier
Now Serving
-
Version
nova_premier
-
Description
Amazon Nova Premier is the most capable of Amazon’s multimodal models for complex reasoning tasks and for use as the best teacher for distilling custom models. Amazon Nova is a new generation of state-of-the-art (SOTA) foundation models (FMs) that deliver frontier intelligence and industry leading price-performance, available exclusively on Amazon Bedrock.
$2.5 Per Million Context Tokens
Context token cost by providerReasoner-R1
Now Serving
-
Provider
DeepSeek
-
Originator
DeepSeek
-
Max Context Tokens
64000
-
Max Generated Tokens
64000
-
Data Freshness
Unpublished
$2.19 Per Million Generated Tokens
Generated token cost by providerReasoner-R1
Now Serving
-
Version
deepseek_reasoner
-
Description
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. DeepSeek-V3 achieves a significant breakthrough in inference speed over previous models. It tops the leaderboard among open-source models and rivals the most advanced closed-source models globally.
$0.55 Per Million Context Tokens
Context token cost by providerChat-V3
Now Serving
-
Provider
DeepSeek
-
Originator
DeepSeek
-
Max Context Tokens
64000
-
Max Generated Tokens
8192
-
Data Freshness
Unpublished
$1.1 Per Million Generated Tokens
Generated token cost by providerChat-V3
Now Serving
-
Version
deepseek_chat
-
Description
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. DeepSeek-V3 achieves a significant breakthrough in inference speed over previous models. It tops the leaderboard among open-source models and rivals the most advanced closed-source models globally.
$0.27 Per Million Context Tokens
Context token cost by providerGPT-3.5 Turbo
Deprecated
-
Provider
OpenAI
-
Originator
OpenAI
-
Max Context Tokens
16385
-
Max Generated Tokens
4096
-
Data Freshness
Sep 2021
$1.5 Per Million Generated Tokens
Generated token cost by providerGPT-3.5 Turbo
Deprecated
-
Version
gpt_3_5_turbo
-
Description
The latest GPT-3.5 Turbo model with higher accuracy at responding in requested formats and a fix for a bug which caused a text encoding issue for non-English language function calls. OpenAI is an AI research and deployment company. Their mission is to ensure that artificial general intelligence—AI systems that are generally smarter than humans—benefits all of humanity.
$0.5 Per Million Context Tokens
Context token cost by providerGPT-4
Deprecated
-
Provider
OpenAI
-
Originator
OpenAI
-
Max Context Tokens
8192
-
Max Generated Tokens
8192
-
Data Freshness
Sep 2021
$60.0 Per Million Generated Tokens
Generated token cost by providerGPT-4
Deprecated
-
Version
gpt_4
-
Description
Snapshot of gpt-4 from June 13th 2023 with improved tool calling support. OpenAI is an AI research and deployment company. Their mission is to ensure that artificial general intelligence—AI systems that are generally smarter than humans—benefits all of humanity.
$30.0 Per Million Context Tokens
Context token cost by providerGPT-4.1
Now Serving
-
Provider
OpenAI
-
Originator
OpenAI
-
Max Context Tokens
1047576
-
Max Generated Tokens
32768
-
Data Freshness
May 2024
$8.0 Per Million Generated Tokens
Generated token cost by providerGPT-4.1
Now Serving
-
Version
gpt_4_1
-
Description
GPT-4.1 is our flagship model for complex tasks. It is well suited for problem solving across domains. OpenAI is an AI research and deployment company. Their mission is to ensure that artificial general intelligence—AI systems that are generally smarter than humans—benefits all of humanity.
$2.0 Per Million Context Tokens
Context token cost by providerGPT-4o
Now Serving
-
Provider
OpenAI
-
Originator
OpenAI
-
Max Context Tokens
128000
-
Max Generated Tokens
16384
-
Data Freshness
Sep 2023
$10.0 Per Million Generated Tokens
Generated token cost by providerGPT-4o
Now Serving
-
Version
gpt_4o
-
Description
GPT-4o ('o' for 'omni') is our versatile, high-intelligence flagship model. It accepts both text and image inputs, and produces text outputs (including Structured Outputs). It is the best model for most tasks, and is our most capable model outside of our o-series models. OpenAI is an AI research and deployment company. Their mission is to ensure that artificial general intelligence—AI systems that are generally smarter than humans—benefits all of humanity.
$2.5 Per Million Context Tokens
Context token cost by provider4o Mini
Now Serving
-
Provider
OpenAI
-
Originator
OpenAI
-
Max Context Tokens
128000
-
Max Generated Tokens
16384
-
Data Freshness
Sep 2023
$0.6 Per Million Generated Tokens
Generated token cost by provider4o Mini
Now Serving
-
Version
gpt_4o_mini
-
Description
GPT-4o mini ('o' for 'omni') is a fast, affordable small model for focused tasks. It accepts both text and image inputs, and produces text outputs (including Structured Outputs). It is ideal for fine-tuning, and model outputs from a larger model like GPT-4o can be distilled to GPT-4o-mini to produce similar results at lower cost and latency. OpenAI is an AI research and deployment company. Their mission is to ensure that artificial general intelligence—AI systems that are generally smarter than humans—benefits all of humanity.
$0.15 Per Million Context Tokens
Context token cost by providero1 Mini
Deprecated
-
Provider
OpenAI
-
Originator
OpenAI
-
Max Context Tokens
128000
-
Max Generated Tokens
65536
-
Data Freshness
Sep 2023
$12.0 Per Million Generated Tokens
Generated token cost by providero1 Mini
Deprecated
-
Version
gpt_o1_mini
-
Description
The o1 reasoning model is designed to solve hard problems across domains. o1-mini is a faster and more affordable reasoning model, but we recommend using the newer o3-mini model that features higher intelligence at the same latency and price as o1-mini. OpenAI is an AI research and deployment company. Their mission is to ensure that artificial general intelligence—AI systems that are generally smarter than humans—benefits all of humanity.
$3.0 Per Million Context Tokens
Context token cost by providero1
Now Serving
-
Provider
OpenAI
-
Originator
OpenAI
-
Max Context Tokens
200000
-
Max Generated Tokens
100000
-
Data Freshness
Sep 2023
$60.0 Per Million Generated Tokens
Generated token cost by providero1
Now Serving
-
Version
o1
-
Description
The o1 series of models are trained with reinforcement learning to perform complex reasoning. o1 models think before they answer, producing a long internal chain of thought before responding to the user. OpenAI is an AI research and deployment company. Their mission is to ensure that artificial general intelligence—AI systems that are generally smarter than humans—benefits all of humanity.
$15.0 Per Million Context Tokens
Context token cost by providero1 Pro
Now Serving
-
Provider
OpenAI
-
Originator
OpenAI
-
Max Context Tokens
200000
-
Max Generated Tokens
100000
-
Data Freshness
Sep 2023
$600.0 Per Million Generated Tokens
Generated token cost by providero1 Pro
Now Serving
-
Version
o1_pro
-
Description
The o1 series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o1-pro model uses more compute to think harder and provide consistently better answers. OpenAI is an AI research and deployment company. Their mission is to ensure that artificial general intelligence—AI systems that are generally smarter than humans—benefits all of humanity.
$150.0 Per Million Context Tokens
Context token cost by providero3 Mini
Now Serving
-
Provider
OpenAI
-
Originator
OpenAI
-
Max Context Tokens
200000
-
Max Generated Tokens
100000
-
Data Freshness
Sep 2023
$4.4 Per Million Generated Tokens
Generated token cost by providero3 Mini
Now Serving
-
Version
o3_mini
-
Description
o3-mini is our newest small reasoning model, providing high intelligence at the same cost and latency targets of o1-mini. o3-mini supports key developer features, like Structured Outputs, function calling, and Batch API. OpenAI is an AI research and deployment company. Their mission is to ensure that artificial general intelligence—AI systems that are generally smarter than humans—benefits all of humanity.
$1.1 Per Million Context Tokens
Context token cost by providero3
Preview Only
-
Provider
OpenAI
-
Originator
OpenAI
-
Max Context Tokens
200000
-
Max Generated Tokens
100000
-
Data Freshness
May 2024
$8.0 Per Million Generated Tokens
Generated token cost by providero3
Preview Only
-
Version
o3
-
Description
o3 is a well-rounded and powerful model across domains. It sets a new standard for math, science, coding, and visual reasoning tasks. It also excels at technical writing and instruction-following. Use it to think through multi-step problems that involve analysis across text, code, and images. OpenAI is an AI research and deployment company. Their mission is to ensure that artificial general intelligence—AI systems that are generally smarter than humans—benefits all of humanity.
$2.0 Per Million Context Tokens
Context token cost by providero3-Pro
Preview Only
-
Provider
OpenAI
-
Originator
OpenAI
-
Max Context Tokens
200000
-
Max Generated Tokens
100000
-
Data Freshness
May 2024
$80.0 Per Million Generated Tokens
Generated token cost by providero3-Pro
Preview Only
-
Version
o3_pro
-
Description
The o-series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o3-pro model uses more compute to think harder and provide consistently better answers. OpenAI is an AI research and deployment company. Their mission is to ensure that artificial general intelligence—AI systems that are generally smarter than humans—benefits all of humanity.
$20.0 Per Million Context Tokens
Context token cost by providero4-Mini
Now Serving
-
Provider
OpenAI
-
Originator
OpenAI
-
Max Context Tokens
200000
-
Max Generated Tokens
100000
-
Data Freshness
May 2024
$2.2 Per Million Generated Tokens
Generated token cost by providero4-Mini
Now Serving
-
Version
o4_mini
-
Description
o4-mini is our latest small o-series model. It's optimized for fast, effective reasoning with exceptionally efficient performance in coding and visual tasks. OpenAI is an AI research and deployment company. Their mission is to ensure that artificial general intelligence—AI systems that are generally smarter than humans—benefits all of humanity.
$0.55 Per Million Context Tokens
Context token cost by providerMistral Small
Now Serving
-
Provider
MistralAI
-
Originator
MistralAI
-
Max Context Tokens
128000
-
Max Generated Tokens
8192
-
Data Freshness
Unpublished
$0.3 Per Million Generated Tokens
Generated token cost by providerMistral Small
Now Serving
-
Version
mistral_small
-
Description
This new model comes with improved text performance, multimodal understanding, and an expanded context window of up to 128k tokens. The model outperforms comparable models like Gemma 3 and GPT-4o Mini, while delivering inference speeds of 150 tokens per second. Mitral AI's mission is to make frontier AI ubiquitous, and to provide tailor-made AI to all the builders. This requires fierce independence, strong commitment to open, portable and customizable solutions, and an extreme focus on shipping the most advanced technology in limited time.
$0.1 Per Million Context Tokens
Context token cost by providerMixtral 8x22B
Deprecated
-
Provider
MistralAI
-
Originator
MistralAI
-
Max Context Tokens
64000
-
Max Generated Tokens
8192
-
Data Freshness
Unpublished
$6.0 Per Million Generated Tokens
Generated token cost by providerMixtral 8x22B
Deprecated
-
Version
mixtral_8_22b
-
Description
A bigger sparse mixture of experts model. As such, it leverages up to 141B parameters but only uses about 39B during inference, leading to better inference throughput at the cost of more vRAM Mitral AI's mission is to make frontier AI ubiquitous, and to provide tailor-made AI to all the builders. This requires fierce independence, strong commitment to open, portable and customizable solutions, and an extreme focus on shipping the most advanced technology in limited time.
$2.0 Per Million Context Tokens
Context token cost by providerMistral Medium
Now Serving
-
Provider
MistralAI
-
Originator
MistralAI
-
Max Context Tokens
128000
-
Max Generated Tokens
8192
-
Data Freshness
Unpublished
$2.0 Per Million Generated Tokens
Generated token cost by providerMistral Medium
Now Serving
-
Version
mistral_medium
-
Description
Mistral Medium 3 delivers frontier performance while being an order of magnitude less expensive. For instance, the model performs at or above 90% of Claude Sonnet 3.7 on benchmarks across the board at a significantly lower cost. Mitral AI's mission is to make frontier AI ubiquitous, and to provide tailor-made AI to all the builders. This requires fierce independence, strong commitment to open, portable and customizable solutions, and an extreme focus on shipping the most advanced technology in limited time.
$0.4 Per Million Context Tokens
Context token cost by providerMistral Large
Now Serving
-
Provider
MistralAI
-
Originator
MistralAI
-
Max Context Tokens
128000
-
Max Generated Tokens
8192
-
Data Freshness
Unpublished
$6.0 Per Million Generated Tokens
Generated token cost by providerMistral Large
Now Serving
-
Version
mistral_large
-
Description
Pixtral Large, a 124B open-weights multimodal model built on top of Mistral Large 2. Pixtral Large is the second model in our multimodal family and demonstrates frontier-level image understanding. Particularly, the model is able to understand documents, charts and natural images, while maintaining the leading text-only understanding of Mistral Large 2. Mitral AI's mission is to make frontier AI ubiquitous, and to provide tailor-made AI to all the builders. This requires fierce independence, strong commitment to open, portable and customizable solutions, and an extreme focus on shipping the most advanced technology in limited time.
$2.0 Per Million Context Tokens
Context token cost by providerMistral Nemo
Deprecated
-
Provider
MistralAI
-
Originator
MistralAI
-
Max Context Tokens
128000
-
Max Generated Tokens
8192
-
Data Freshness
Unpublished
$0.15 Per Million Generated Tokens
Generated token cost by providerMistral Nemo
Deprecated
-
Version
mistral_nemo
-
Description
A 12B model built with the partnership with Nvidia. It is easy to use and a drop-in replacement in any system using Mistral 7B that it supersedes. Mitral AI's mission is to make frontier AI ubiquitous, and to provide tailor-made AI to all the builders. This requires fierce independence, strong commitment to open, portable and customizable solutions, and an extreme focus on shipping the most advanced technology in limited time.
$0.15 Per Million Context Tokens
Context token cost by providerMagistral Small
Now Serving
-
Provider
MistralAI
-
Originator
MistralAI
-
Max Context Tokens
40000
-
Max Generated Tokens
8192
-
Data Freshness
Unpublished
$1.5 Per Million Generated Tokens
Generated token cost by providerMagistral Small
Now Serving
-
Version
magistral_small
-
Description
Magistral — our first reasoning model. Released in both open and enterprise versions, Magistral is designed to think things through — in ways familiar to us — while bringing expertise across professional domains, transparent reasoning that you can follow and verify, along with deep multilingual flexibility. Mitral AI's mission is to make frontier AI ubiquitous, and to provide tailor-made AI to all the builders. This requires fierce independence, strong commitment to open, portable and customizable solutions, and an extreme focus on shipping the most advanced technology in limited time.
$0.5 Per Million Context Tokens
Context token cost by providerMagistral Medium
Now Serving
-
Provider
MistralAI
-
Originator
MistralAI
-
Max Context Tokens
40000
-
Max Generated Tokens
8192
-
Data Freshness
Unpublished
$5.0 Per Million Generated Tokens
Generated token cost by providerMagistral Medium
Now Serving
-
Version
magistral_medium
-
Description
Magistral — our first reasoning model. Released in both open and enterprise versions, Magistral is designed to think things through — in ways familiar to us — while bringing expertise across professional domains, transparent reasoning that you can follow and verify, along with deep multilingual flexibility. Mitral AI's mission is to make frontier AI ubiquitous, and to provide tailor-made AI to all the builders. This requires fierce independence, strong commitment to open, portable and customizable solutions, and an extreme focus on shipping the most advanced technology in limited time.
$2.0 Per Million Context Tokens
Context token cost by providerGemini1.0 Pro
Deprecated
-
Provider
Google VertexAI
-
Originator
Google
-
Max Context Tokens
128000
-
Max Generated Tokens
4096
-
Data Freshness
Feb 2024
$1.5 Per Million Generated Tokens
Generated token cost by providerGemini1.0 Pro
Deprecated
-
Version
gemini_1_0_pro
-
Description
Gemini 1.0 Pro is an NLP model that handles tasks like multi-turn text and code chat, and code generation. Gemini is an interface to a multimodal LLM (handling text, audio, images and more). Gemini is based on Google’s cutting-edge research in LLMs. Google initially launched Gemini (then called Bard) as an experiment in March 2023 in accordance with their AI Principles. Since then, users have turned to Gemini to write compelling emails, debug tricky coding problems, brainstorm ideas for upcoming events, get help learning difficult concepts, and so much more. Today, Gemini is a versatile AI tool that can help you in many ways.
$0.5 Per Million Context Tokens
Context token cost by providerGemini1.5 Pro
Now Serving
-
Provider
Google VertexAI
-
Originator
Google
-
Max Context Tokens
2097152
-
Max Generated Tokens
8192
-
Data Freshness
May 2024
$10.0 Per Million Generated Tokens
Generated token cost by providerGemini1.5 Pro
Now Serving
-
Version
gemini_1_5_pro
-
Description
Gemini 1.5 Pro is a mid-size multimodal model that is optimized for a wide-range of reasoning tasks. 1.5 Pro can process large amounts of data at once, including 2 hours of video, 19 hours of audio, codebases with 60,000 lines of code, or 2,000 pages of text. Gemini is an interface to a multimodal LLM (handling text, audio, images and more). Gemini is based on Google’s cutting-edge research in LLMs. Google initially launched Gemini (then called Bard) as an experiment in March 2023 in accordance with their AI Principles. Since then, users have turned to Gemini to write compelling emails, debug tricky coding problems, brainstorm ideas for upcoming events, get help learning difficult concepts, and so much more. Today, Gemini is a versatile AI tool that can help you in many ways.
$2.5 Per Million Context Tokens
Context token cost by providerGemini2.5 Flash
Now Serving
-
Provider
Google VertexAI
-
Originator
Google
-
Max Context Tokens
1048576
-
Max Generated Tokens
65536
-
Data Freshness
January 2025
$2.5 Per Million Generated Tokens
Generated token cost by providerGemini2.5 Flash
Now Serving
-
Version
gemini_2_5_flash
-
Description
Google's best model in terms of price-performance, offering well-rounded capabilities. 2.5 Flash is best for large scale processing, low-latency, high volume tasks that require thinking, and agentic use cases. Gemini is an interface to a multimodal LLM (handling text, audio, images and more). Gemini is based on Google’s cutting-edge research in LLMs. Google initially launched Gemini (then called Bard) as an experiment in March 2023 in accordance with their AI Principles. Since then, users have turned to Gemini to write compelling emails, debug tricky coding problems, brainstorm ideas for upcoming events, get help learning difficult concepts, and so much more. Today, Gemini is a versatile AI tool that can help you in many ways.
$1.0 Per Million Context Tokens
Context token cost by providerGemini2.5 Flash Lite
Preview Only
-
Provider
Google VertexAI
-
Originator
Google
-
Max Context Tokens
1000000
-
Max Generated Tokens
64000
-
Data Freshness
January 2025
$0.4 Per Million Generated Tokens
Generated token cost by providerGemini2.5 Flash Lite
Preview Only
-
Version
gemini_2_5_flash_lite
-
Description
A Gemini 2.5 Flash model optimized for cost efficiency and low latency. Gemini is an interface to a multimodal LLM (handling text, audio, images and more). Gemini is based on Google’s cutting-edge research in LLMs. Google initially launched Gemini (then called Bard) as an experiment in March 2023 in accordance with their AI Principles. Since then, users have turned to Gemini to write compelling emails, debug tricky coding problems, brainstorm ideas for upcoming events, get help learning difficult concepts, and so much more. Today, Gemini is a versatile AI tool that can help you in many ways.
$0.5 Per Million Context Tokens
Context token cost by providerGemini2.5 Pro
Now Serving
-
Provider
Google VertexAI
-
Originator
Google
-
Max Context Tokens
1048576
-
Max Generated Tokens
65536
-
Data Freshness
January 2025
$15.0 Per Million Generated Tokens
Generated token cost by providerGemini2.5 Pro
Now Serving
-
Version
gemini_2_5_pro
-
Description
Gemini 2.5 Pro is Google's state-of-the-art thinking model, capable of reasoning over complex problems in code, math, and STEM, as well as analyzing large datasets, codebases, and documents using long context. Gemini is an interface to a multimodal LLM (handling text, audio, images and more). Gemini is based on Google’s cutting-edge research in LLMs. Google initially launched Gemini (then called Bard) as an experiment in March 2023 in accordance with their AI Principles. Since then, users have turned to Gemini to write compelling emails, debug tricky coding problems, brainstorm ideas for upcoming events, get help learning difficult concepts, and so much more. Today, Gemini is a versatile AI tool that can help you in many ways.
$2.5 Per Million Context Tokens
Context token cost by providerClaude3 Haiku
Deprecated
-
Provider
Anthropic
-
Originator
Anthropic
-
Max Context Tokens
200000
-
Max Generated Tokens
4096
-
Data Freshness
August 2023
$1.25 Per Million Generated Tokens
Generated token cost by providerClaude3 Haiku
Deprecated
-
Version
claude_v3_haiku
-
Description
Fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance Claude is a next generation AI assistant built for work and trained to be safe, accurate, and secure. They believe AI will have a vast impact on the world. Anthropic is dedicated to building systems that people can rely on and generating research about the opportunities and risks of AI.
$0.25 Per Million Context Tokens
Context token cost by providerClaude3 Sonnet
Deprecated
-
Provider
Anthropic
-
Originator
Anthropic
-
Max Context Tokens
200000
-
Max Generated Tokens
4096
-
Data Freshness
August 2023
$15.0 Per Million Generated Tokens
Generated token cost by providerClaude3 Sonnet
Deprecated
-
Version
claude_v3_sonnet
-
Description
Balance of intelligence and speed. Strong utility, balanced for scaled deployments Claude is a next generation AI assistant built for work and trained to be safe, accurate, and secure. They believe AI will have a vast impact on the world. Anthropic is dedicated to building systems that people can rely on and generating research about the opportunities and risks of AI.
$3.0 Per Million Context Tokens
Context token cost by providerClaude3 Opus
Now Serving
-
Provider
Anthropic
-
Originator
Anthropic
-
Max Context Tokens
200000
-
Max Generated Tokens
4096
-
Data Freshness
August 2023
$75.0 Per Million Generated Tokens
Generated token cost by providerClaude3 Opus
Now Serving
-
Version
claude_v3_opus
-
Description
Powerful model for highly complex tasks. Top-level performance, intelligence, fluency, and understanding Claude is a next generation AI assistant built for work and trained to be safe, accurate, and secure. They believe AI will have a vast impact on the world. Anthropic is dedicated to building systems that people can rely on and generating research about the opportunities and risks of AI.
$15.0 Per Million Context Tokens
Context token cost by providerClaude3.5 Haiku
Now Serving
-
Provider
Anthropic
-
Originator
Anthropic
-
Max Context Tokens
200000
-
Max Generated Tokens
8192
-
Data Freshness
July 2024
$4.0 Per Million Generated Tokens
Generated token cost by providerClaude3.5 Haiku
Now Serving
-
Version
claude_v3_5_haiku
-
Description
Fastest model; serves with intelligence at blazing speeds. Claude is a next generation AI assistant built for work and trained to be safe, accurate, and secure. They believe AI will have a vast impact on the world. Anthropic is dedicated to building systems that people can rely on and generating research about the opportunities and risks of AI.
$0.8 Per Million Context Tokens
Context token cost by providerClaude3.5 Sonnet
Now Serving
-
Provider
Anthropic
-
Originator
Anthropic
-
Max Context Tokens
200000
-
Max Generated Tokens
8192
-
Data Freshness
April 2024
$15.0 Per Million Generated Tokens
Generated token cost by providerClaude3.5 Sonnet
Now Serving
-
Version
claude_v3_5_sonnet
-
Description
Most intelligent model with Highest level of intelligence and capability. Claude is a next generation AI assistant built for work and trained to be safe, accurate, and secure. They believe AI will have a vast impact on the world. Anthropic is dedicated to building systems that people can rely on and generating research about the opportunities and risks of AI.
$3.0 Per Million Context Tokens
Context token cost by providerClaude4 Opus
Now Serving
-
Provider
Anthropic
-
Originator
Anthropic
-
Max Context Tokens
200000
-
Max Generated Tokens
32000
-
Data Freshness
March 2025
$75.0 Per Million Generated Tokens
Generated token cost by providerClaude4 Opus
Now Serving
-
Version
claude_v4_opus
-
Description
Anthropic's most capable model with highest level of intelligence and capability. Claude is a next generation AI assistant built for work and trained to be safe, accurate, and secure. They believe AI will have a vast impact on the world. Anthropic is dedicated to building systems that people can rely on and generating research about the opportunities and risks of AI.
$15.0 Per Million Context Tokens
Context token cost by providerClaude4 Sonnet
Now Serving
-
Provider
Anthropic
-
Originator
Anthropic
-
Max Context Tokens
200000
-
Max Generated Tokens
64000
-
Data Freshness
March 2025
$15.0 Per Million Generated Tokens
Generated token cost by providerClaude4 Sonnet
Now Serving
-
Version
claude_v4_sonnet
-
Description
Anthropic's high-performance model with high intelligence and balanced performance. Claude is a next generation AI assistant built for work and trained to be safe, accurate, and secure. They believe AI will have a vast impact on the world. Anthropic is dedicated to building systems that people can rely on and generating research about the opportunities and risks of AI.
$3.0 Per Million Context Tokens
Context token cost by providerLlama3 8B
Deprecated
-
Provider
Amazon Bedrock
-
Originator
Meta
-
Max Context Tokens
8192
-
Max Generated Tokens
8192
-
Data Freshness
Mar 2023
$0.6 Per Million Generated Tokens
Generated token cost by providerLlama3 8B
Deprecated
-
Version
llama3_8b_instruct
-
Description
Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. Llama 3 uses a tokenizer with a vocabulary of 128K tokens, and was trained on on sequences of 8,192 tokens. Grouped-Query Attention (GQA) is used for all models to improve inference efficiency. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. The open source AI model you can fine-tune, distill and deploy anywhere. Our latest instruction-tuned model is available in 8B, 70B and 405B versions.
$0.3 Per Million Context Tokens
Context token cost by providerLlama3 70B
Deprecated
-
Provider
Amazon Bedrock
-
Originator
Meta
-
Max Context Tokens
8192
-
Max Generated Tokens
8192
-
Data Freshness
Dec 2023
$3.5 Per Million Generated Tokens
Generated token cost by providerLlama3 70B
Deprecated
-
Version
llama3_70b_instruct
-
Description
Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. Llama 3 uses a tokenizer with a vocabulary of 128K tokens, and was trained on on sequences of 8,192 tokens. Grouped-Query Attention (GQA) is used for all models to improve inference efficiency. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. The open source AI model you can fine-tune, distill and deploy anywhere. Our latest instruction-tuned model is available in 8B, 70B and 405B versions.
$2.65 Per Million Context Tokens
Context token cost by providerLlama3.1 8B
Deprecated
-
Provider
Amazon Bedrock
-
Originator
Meta
-
Max Context Tokens
128000
-
Max Generated Tokens
4096
-
Data Freshness
Dec 2023
$0.22 Per Million Generated Tokens
Generated token cost by providerLlama3.1 8B
Deprecated
-
Version
llama3_1_8b_instruct
-
Description
Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. The open source AI model you can fine-tune, distill and deploy anywhere. Our latest instruction-tuned model is available in 8B, 70B and 405B versions.
$0.22 Per Million Context Tokens
Context token cost by providerLlama3.1 70B
Deprecated
-
Provider
Amazon Bedrock
-
Originator
Meta
-
Max Context Tokens
128000
-
Max Generated Tokens
4096
-
Data Freshness
Dec 2023
$0.9 Per Million Generated Tokens
Generated token cost by providerLlama3.1 70B
Deprecated
-
Version
llama3_1_70b_instruct
-
Description
Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. The open source AI model you can fine-tune, distill and deploy anywhere. Our latest instruction-tuned model is available in 8B, 70B and 405B versions.
$0.9 Per Million Context Tokens
Context token cost by providerLlama3.1 405B
Coming Soon
-
Provider
Amazon Bedrock
-
Originator
Meta
-
Max Context Tokens
128000
-
Max Generated Tokens
4096
-
Data Freshness
Dec 2023
$3.0 Per Million Generated Tokens
Generated token cost by providerLlama3.1 405B
Coming Soon
-
Version
llama3_1_405b_instruct
-
Description
Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. The open source AI model you can fine-tune, distill and deploy anywhere. Our latest instruction-tuned model is available in 8B, 70B and 405B versions.
$3.0 Per Million Context Tokens
Context token cost by providerLlama3.2 11B
Now Serving
-
Provider
Amazon Bedrock
-
Originator
Meta
-
Max Context Tokens
128000
-
Max Generated Tokens
4096
-
Data Freshness
December 2023
$0.16 Per Million Generated Tokens
Generated token cost by providerLlama3.2 11B
Now Serving
-
Version
llama3_2_11b_instruct
-
Description
The Llama 3.2-Vision collection of multimodal large language models (LLMs) is a collection of pretrained and instruction-tuned image reasoning generative models in 11B and 90B sizes (text + images in / text out). The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. The models outperform many of the available open source and closed multimodal models on common industry benchmarks. The open source AI model you can fine-tune, distill and deploy anywhere. Our latest instruction-tuned model is available in 8B, 70B and 405B versions.
$0.16 Per Million Context Tokens
Context token cost by providerLlama3.2 90B
Now Serving
-
Provider
Amazon Bedrock
-
Originator
Meta
-
Max Context Tokens
128000
-
Max Generated Tokens
4096
-
Data Freshness
December 2023
$0.72 Per Million Generated Tokens
Generated token cost by providerLlama3.2 90B
Now Serving
-
Version
llama3_2_90b_instruct
-
Description
The Llama 3.2-Vision collection of multimodal large language models (LLMs) is a collection of pretrained and instruction-tuned image reasoning generative models in 11B and 90B sizes (text + images in / text out). The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. The models outperform many of the available open source and closed multimodal models on common industry benchmarks. The open source AI model you can fine-tune, distill and deploy anywhere. Our latest instruction-tuned model is available in 8B, 70B and 405B versions.
$0.72 Per Million Context Tokens
Context token cost by providerLlama3.3 70B
Now Serving
-
Provider
Amazon Bedrock
-
Originator
Meta
-
Max Context Tokens
128000
-
Max Generated Tokens
4096
-
Data Freshness
December 2023
$0.72 Per Million Generated Tokens
Generated token cost by providerLlama3.3 70B
Now Serving
-
Version
llama3_3_70b_instruct
-
Description
Text-only 70B instruction-tuned model that provides enhanced performance relative to Llama 3.1 70B–and to Llama 3.2 90B when used for text-only applications. Llama 3.3 70B delivers similar performance to Llama 3.1 405B while requiring only a fraction of the computational resources. The open source AI model you can fine-tune, distill and deploy anywhere. Our latest instruction-tuned model is available in 8B, 70B and 405B versions.
$0.72 Per Million Context Tokens
Context token cost by providerLlama4 Maverick
Now Serving
-
Provider
Amazon Bedrock
-
Originator
Meta
-
Max Context Tokens
1000000
-
Max Generated Tokens
4096
-
Data Freshness
August 2024
$0.97 Per Million Generated Tokens
Generated token cost by providerLlama4 Maverick
Now Serving
-
Version
llama4_maverick
-
Description
A general purpose model featuring 128 experts and 400 billion total parameters. It excels in text understanding across 12 languages and English image understanding, making it suitable for versatile assistant and chat applications. The open source AI model you can fine-tune, distill and deploy anywhere. Our latest instruction-tuned model is available in 8B, 70B and 405B versions.
$0.24 Per Million Context Tokens
Context token cost by providerLlama4 Scout
Now Serving
-
Provider
Amazon Bedrock
-
Originator
Meta
-
Max Context Tokens
3500000
-
Max Generated Tokens
4096
-
Data Freshness
August 2024
$0.66 Per Million Generated Tokens
Generated token cost by providerLlama4 Scout
Now Serving
-
Version
llama4_scout
-
Description
A general purpose multimodal model with 16 experts, 17 billion active parameters, and 109 billion total parameters. Its multimillion context window enables comprehensive multi-document analysis, establishing it as a uniquely powerful and efficient model in its class. The open source AI model you can fine-tune, distill and deploy anywhere. Our latest instruction-tuned model is available in 8B, 70B and 405B versions.
$0.17 Per Million Context Tokens
Context token cost by providerCommand R
Now Serving
-
Provider
Amazon Bedrock
-
Originator
Cohere
-
Max Context Tokens
128000
-
Max Generated Tokens
4096
-
Data Freshness
Unpublished
$1.5 Per Million Generated Tokens
Generated token cost by providerCommand R
Now Serving
-
Version
command_r
-
Description
Command R is a large language model optimized for conversational interaction and long context tasks. It targets the “scalable” category of models that balance high performance with strong accuracy, enabling companies to move beyond proof of concept and into production. Cohere empowers every developer and enterprise to build amazing products and capture true business value with language AI. They're building the future of language AI driven by cutting-edge research, pioneering the future of language AI for business.
$0.5 Per Million Context Tokens
Context token cost by providerCommand R+
Now Serving
-
Provider
Amazon Bedrock
-
Originator
Cohere
-
Max Context Tokens
128000
-
Max Generated Tokens
4096
-
Data Freshness
Unpublished
$15.0 Per Million Generated Tokens
Generated token cost by providerCommand R+
Now Serving
-
Version
command_r_plus
-
Description
Command R+ is a state-of-the-art RAG-optimized model designed to tackle enterprise-grade workloads. It is most powerful, scalable large language model (LLM) purpose-built to excel at real-world enterprise use cases. Command R+ joins our R-series of LLMs focused on balancing high efficiency with strong accuracy, enabling businesses to move beyond proof-of-concept, and into production with AI. Cohere empowers every developer and enterprise to build amazing products and capture true business value with language AI. They're building the future of language AI driven by cutting-edge research, pioneering the future of language AI for business.