AI Models
The Heroku Managed Inference and Agent add-on supports the following models. The add-on is hosted in two regions: us and eu. However, the add-on can be provisioned and accessed from apps in any Heroku region. Select a model to view information on rate limits, prompt caching, and implementation.
| Model Documentation | Model ID | Region | Supported Inputs | Supported Outputs | API Endpoint | Model Source | Description |
|---|---|---|---|---|---|---|---|
| Amazon Rerank 1.0 | amazon-rerank-1-0 | US, EU | text |
score |
/v1/rerank | Amazon | A reliable, high-performing reranking model backed by AWS infrastructure. |
| Nova Lite | nova-lite | US, EU | text, image, video |
text |
/v1/chat/completions | Amazon | A fast and cost-effective LLM. |
| Nova 2 Lite | nova-2-lite | US, EU | text, image, video |
text |
/v1/chat/completions | Amazon | A fast and cost-effective LLM that supports conversational chat, tool-calling, and advanced reasoning with extended context. |
| Nova Pro | nova-pro | US, EU | text, image, video |
text |
/v1/chat/completions | Amazon | A high-performance LLM designed for complex tasks. |
| Claude 3 Haiku | claude-3-haiku | EU | text, image |
text |
/v1/chat/completions | Anthropic | A fast and affordable LLM that supports chat and tool-calling. |
| Claude 3.5 Haiku | claude-3-5-haiku | US, EU | text, image |
text |
/v1/chat/completions | Anthropic | An affordable and straightforward LLM that supports chat and tool-calling. |
| Claude 3.5 Sonnet Latest | claude-3-5-sonnet-latest | US, EU | text, image |
text |
/v1/chat/completions | Anthropic | A fast and affordable LLM that supports chat and tool-calling. |
| Claude 3.7 Sonnet | claude-3-7-sonnet | US, EU | text, image |
text |
/v1/chat/completions | Anthropic | An intelligent and detail-oriented LLM that supports chat, tool-calling, and enhanced reasoning. |
| Claude 4 Sonnet | claude-4-sonnet | US, EU | text, image |
text |
/v1/chat/completions | Anthropic | An intelligent and detail-oriented LLM that supports chat, tool-calling, and enhanced reasoning. |
| Claude 4.5 Haiku | claude-4-5-haiku | US, EU | text, image |
text |
/v1/chat/completions | Anthropic | A state-of-the-art LLM that supports chat, tool-calling, and enhanced reasoning. |
| Claude 4.5 Sonnet | claude-4-5-sonnet | US, EU | text, image |
text |
/v1/chat/completions | Anthropic | A state-of-the-art LLM optimized for enterprise apps that supports chat, tool-calling, and enhanced reasoning. |
| Claude Sonnet 4.6 | claude-sonnet-4-6 | US, EU | text, image |
text |
/v1/chat/completions | Anthropic | A state-of-the-art LLM designed for complex tasks including data processing, sales forecasting, and content generation. |
| Claude Opus 4.5 | claude-opus-4-5 | US, EU | text, image |
text |
/v1/chat/completions | Anthropic | A next-generation, frontier LLM that supports chat, tool-calling, autonomous coding, effort control, and enhanced reasoning. |
| Claude Opus 4.6 | claude-opus-4-6 | US, EU | text, image |
text |
/v1/chat/completions | Anthropic | A next-generation, frontier LLM that supports chat, tool-calling, autonomous coding, effort control, and enhanced reasoning. |
| Cohere Embed Multilingual | cohere-embed-multilingual | US, EU | text |
embedding |
/v1/embeddings | Cohere | A state-of-the-art embedding model that supports multiple languages and can be helpful for developing RAG search. |
| Cohere Embed V4 | cohere-embed-v4 | US, EU | text |
embedding |
/v1/embeddings | Cohere | A state-of-the-art embedding model that supports over 100 languages and can be helpful for developing RAG search. |
| Cohere Rerank 3.5 | cohere-rerank-3-5 | US, EU | text |
score |
/v1/rerank | Cohere | A reranking model that offers enhanced reasoning, broad data compatibility, and multilingual support. |
| DeepSeek V3.2 | deepseek-v3-2 | US | text |
text |
/v1/chat/completions | DeepSeek | An open-weight LLM that supports conversational chat, tool-calling, and high-efficiency reasoning. |
| MiniMax M2 | minimax-m2 | US | text |
text |
/v1/chat/completions | MiniMax | An open-weight LLM that supports conversational chat, tool-calling, and programming tasks. |
| MiniMax M2.1 | minimax-m2-1 | US | text |
text |
/v1/chat/completions | MiniMax | An open-weight LLM that supports conversational chat, tool-calling, and long-horizon reasoning. |
| Kimi K2 Thinking | kimi-k2-thinking | US | text |
text |
/v1/chat/completions | Moonshot AI | An open-weight LLM that supports conversational chat, tool-calling, and chain-of-thought processing. |
| Kimi K2.5 | kimi-k2-5 | US | text |
text |
/v1/chat/completions | Moonshot AI | An open-weight LLM that supports conversational chat, tool-calling, and multimodal agentic workflows. |
| OpenAI gpt-oss-120b | gpt-oss-120b | US, EU | text |
text |
/v1/chat/completions | OpenAI | An open-weight LLM that supports chat and tool-calling. |
| Qwen3 235B | qwen3-235b | US | text |
text |
/v1/chat/completions | Qwen | An open-weight LLM that supports conversational chat, tool-calling, complex reasoning, and agentic coding. |
| Qwen3 Coder 480B | qwen3-coder-480b | US | text |
text |
/v1/chat/completions | Qwen | An open-weight LLM that supports conversational chat, tool-calling, and agentic coding. |
| Stable Image Ultra | stable-image-ultra | US, EU | text |
image |
/v1/images/generations | Stability AI | A state-of-the-art diffusion (image generation) model. |
| GLM 4.7 | glm-4-7 | US | text |
text |
/v1/chat/completions | Z.ai | An open-weight LLM that supports conversational chat, tool-calling, and stable multi-step reasoning. |
| GLM 4.7 Flash | glm-4-7-flash | US | text |
text |
/v1/chat/completions | Z.ai | An open-weight LLM that supports conversational chat, tool-calling, and low-latency agentic tasks. |
Deprecated Models
The following models are being deprecated and will reach end-of-life on the dates listed below. During the deprecation period, requests to these models return a warning header. Prior to the EOL date, model-specific plans for deprecated models will be converted to the standard plan. After the EOL date, requests to these models return HTTP 410.
| Model | Model ID | Deprecation Date | EOL Date | Replacement |
|---|---|---|---|---|
| Claude 3.5 Sonnet Latest | claude-3-5-sonnet-latest | January 22, 2026 | February 22, 2026 | claude-4-6-sonnet |
| Claude 3.7 Sonnet | claude-3-7-sonnet | March 21, 2026 | April 21, 2026 | claude-4-6-sonnet |
| Claude 3.5 Haiku | claude-3-5-haiku | May 12, 2026 | June 12, 2026 | claude-4-5-haiku |