Skip to main content

[Preview] v1.80.7-stable - RAG API

Krrish Dholakia
CEO, LiteLLM
Ishaan Jaff
CTO, LiteLLM

Deploy this version​

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.80.7

Key Highlights​


RAG API​


Introducing a new RAG API on LiteLLM AI Gateway. You can provide documents (TXT, PDF, DOCX files) to LiteLLM's all-in-one document ingestion pipeline and it will handle OCR recognition, chunking, embedding, and storing data in your vector store of choice (OpenAI, Bedrock, Vertex AI, etc.).

Example usage for ingestion

Ingest txt file Bedrock Knowledge Base
curl -X POST "http://localhost:4000/v1/rag/ingest" \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d "{
\"file\": {
\"filename\": \"document.txt\",
\"content\": \"$(base64 -i document.txt)\",
\"content_type\": \"text/plain\"
},
\"ingest_options\": {
\"vector_store\": {
\"custom_llm_provider\": \"bedrock\"
}
}
}"

Example usage for querying the vector store

Search the Bedrock Knowledge Base
curl -X POST "http://localhost:4000/v1/vector_stores/vs_692658d337c4819183f2ad8488d12fc9/search" \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"query": "What is LiteLLM?",
"custom_llm_provider": "bedrock"
}'

Get Started


Organization Usage​

Users can now filter usage statistics by organization, providing the same granular filtering capabilities available for teams.

Details:

  • Filter usage analytics, spend logs, and activity metrics by organization ID
  • View organization-level breakdowns alongside existing team and user-level filters
  • Consistent filtering experience across all usage and analytics views

PR #16560, PR #17181


New Providers and Endpoints​

New Providers​

ProviderSupported EndpointsDescription
Public AIChat completionsSupport for publicai.co provider
Eleven LabsText-to-speechText-to-speech provider integration

New LLM API Endpoints​

EndpointMethodDescriptionDocumentation
/v1/skillsPOSTAnthropic Skills API for extended context tool callingSkills API
/rag/ingestPOSTUnified RAG API with Vertex AI RAG and Vector StoresRAG API

New Models / Updated Models​

New Model Support​

ProviderModelContext WindowInput ($/1M tokens)Output ($/1M tokens)Features
Anthropicclaude-opus-4-5-20251101200K$5.00$25.00Chat, reasoning, vision, function calling, prompt caching
Bedrockanthropic.claude-opus-4-5-20251101-v1:0200K$5.00$25.00Chat, reasoning, vision, function calling, prompt caching
Bedrockus.anthropic.claude-opus-4-5-20251101-v1:0200K$5.00$25.00Chat, reasoning, vision, function calling, prompt caching
Bedrockamazon.nova-canvas-v1:0--$0.06/imageImage generation
OpenRouteropenrouter/anthropic/claude-opus-4.5200K$5.00$25.00Chat, reasoning, vision, function calling, prompt caching
Vertex AIvertex_ai/claude-opus-4-5200K$5.00$25.00Chat, reasoning, vision, function calling, prompt caching
Vertex AIvertex_ai/claude-opus-4-5@20251101200K$5.00$25.00Chat, reasoning, vision, function calling, prompt caching
Azureazure_ai/claude-opus-4-1200K$15.00$75.00Chat, reasoning, vision, function calling, prompt caching
Azureazure_ai/claude-sonnet-4-5200K$3.00$15.00Chat, reasoning, vision, function calling, prompt caching
Azureazure_ai/claude-haiku-4-5200K$1.00$5.00Chat, reasoning, vision, function calling, prompt caching
Fireworks AIfireworks_ai/accounts/fireworks/models/glm-4p6202K$0.55$2.19Chat, function calling
Public AIpublicai/swiss-ai/apertus-8b-instruct8KFreeFreeChat, function calling
Public AIpublicai/swiss-ai/apertus-70b-instruct8KFreeFreeChat, function calling
Public AIpublicai/aisingapore/Gemma-SEA-LION-v4-27B-IT8KFreeFreeChat, function calling
Public AIpublicai/BSC-LT/salamandra-7b-instruct-tools-16k16KFreeFreeChat, function calling
Public AIpublicai/BSC-LT/ALIA-40b-instruct_Q8_08KFreeFreeChat, function calling
Public AIpublicai/allenai/Olmo-3-7B-Instruct32KFreeFreeChat, function calling
Public AIpublicai/aisingapore/Qwen-SEA-LION-v4-32B-IT32KFreeFreeChat, function calling
Public AIpublicai/allenai/Olmo-3-7B-Think32KFreeFreeChat, function calling, reasoning
Public AIpublicai/allenai/Olmo-3-32B-Think32KFreeFreeChat, function calling, reasoning
Cohereembed-multilingual-light-v3.01K$0.10-Embeddings, supports images
WatsonXwatsonx/whisper-large-v3-turbo-$0.0001/sec-Audio transcription

Features​

  • Anthropic

    • Add claude opus 4.5 model support - PR #17043
    • Add day 0 support for anthropic Tool Search, Programmatic Tool Calling, Input Examples, Effort Parameter - PR #17091, Docs
    • Add Anthropic Effort Parameter support - PR #17091
  • Bedrock

    • Fix bedrock claude opus 4.5 inference profile - only global currently - PR #17101
    • Add OpenAI compatible bedrock imported models (qwen etc) - PR #17097
    • Fix bedrock passthrough auth issue - PR #16879
    • Make Bedrock image generation more consistent - PR #17021
  • Azure

    • Add support for azure anthropic models via chat completion - PR #16886
    • Fix the azure auth format for videos - PR #17009
    • Fix reasoning_effort="none" not working on Azure for GPT-5.1 - PR #17071
    • Add GA protocol as configurable parameter for azure openai realtime api - PR #17096
  • OpenRouter

  • Fireworks AI

    • Add fireworks_ai/accounts/fireworks/models/glm-4p6 - PR #17154
  • Vertex AI

    • Add vertex ai image gen support for both gemini and imagen models - PR #17070
    • Handle global location in context caching - PR #16997
    • Fix CreateCachedContentRequest enum error - PR #16965
    • Use the correct domain for the global location when counting tokens - PR #17116
    • Support Vertex AI batch listing in LiteLLM proxy - PR #17079
    • Fix default sample count for image generation - PR #16403
  • Gemini

    • Add gemini file search support - PR #17124
    • Add gemini-3-pro-image-preview model support for imageSize parameter - PR #17019
    • Handle None or empty contents in Gemini token counter - PR #17020
    • Skip thinking config for image models - PR #17027
  • WatsonX

    • Add audio transcriptions for WatsonX - PR #17160
  • OpenAI

    • Fix gpt-5.1 temperature support when reasoning_effort is "none" or not specified - PR #17011
  • Public AI

  • Cohere

    • Add cost tracking for cohere embed passthrough endpoint - PR #17029
  • Eleven Labs

    • Integrate eleven labs text-to-speech - PR #16573

Bug Fixes​

  • OCI
    • Fix pydantic validation errors during tool call with streaming - PR #16899

LLM API Endpoints​

Features​

  • Skills API (Anthropic)

    • New API - Claude Skills API. Create, List, Delete, Update Claude Skills - PR #17042, Docs
  • RAG API

    • New RAG API on LiteLLM AI Gateway (use with OpenAI Vector Store, Bedrock Knowledge Bases, Vertex AI RAG Engine) - PR #17109
    • Add support for Vertex RAG engine - PR #17117
    • Allow internal user keys to access api, allow using litellm credentials with API - PR #17169
  • Search API

    • Add search API logging and cost tracking in LiteLLM Proxy - PR #17078
  • Responses API

    • Fix prevent duplicate spend logs in Responses API for non-OpenAI providers - PR #16992
    • Support response_format parameter in completion -> responses bridge - PR #16844
    • Fix mcp tool call response logging + remove unmapped param error mid-stream - allows gpt-5 web search to work via responses api - PR #16946
    • Add header passing support for MCP tools in Responses API - PR #16877
  • Image Edits API

  • Audio Transcription API

    • Add transcription exception handling for /audio/transcriptions - PR #16791
    • Fix 401 when audio/transcriptions - PR #17023
  • Embeddings API

    • Add header forwarding in embeddings - PR #16869
  • Passthrough Endpoints

    • Add cost tracking for streaming in vertex ai passthrough - PR #16874
    • Add cost tracking for cohere embed passthrough endpoint - PR #17029
  • Vector Stores

    • Add method for extracting vector store ids from path params - PR #16566
  • General

    • Fix propagate x-litellm-model-id in responses - PR #16986
    • Preserve content field even if null - PR #16988
    • Include server_tool_use in streaming usage - PR #16826
    • Fix Thinking may not be enabled when tool_choice forces tool use - PR #17129
    • Add missing standard logging object fields - PR #17135

Bugs​

  • General
    • Fix vector Store List Endpoint Returns 404 - PR #17229
    • Fix Videos lint errors - PR #17125
    • Do not include plaintext message in exception - PR #17216

Management Endpoints / UI​

Features​

  • Virtual Keys

  • Models + Endpoints

    • Allow adding Bedrock API Key when adding models - PR #17153
    • Add aws_bedrock_runtime_endpoint into Credential Types - PR #17053
    • Change provider create fields to JSON - PR #16985
    • Change model_hub_table to call getUiConfig before Fetching Public Data - PR #17166
    • Improve Wording for Config Models in Model Table - PR #17100
  • Teams & Users

    • Deleting a User From Team Deletes key User Created for Team - PR #17057
    • Hide Default Team Settings From Proxy Admin Viewers - PR #16900
    • Add No Default Models for Team and User Settings - PR #17037
    • User Table Sort by All - PR #17108
    • Org Admin Team Permissions Fix - PR #17110
    • Better Loading State for Internal User Page - PR #17168
  • Permission Management

    • Add reject_metadata_tags to prevent users from sending metadata.tags directly in requests - PR #17088
    • Disable global guardrails by key/team - PR #16983
    • Tool permission argument check - PR #16982
    • Add UI support for configuring tool permission guardrails - PR #17050
  • MCP Gateway

    • Add backend support for OAuth2 auth_type registration via UI - PR #17006
    • Add UI support for registering MCP OAuth2 auth_type - PR #17007
  • General UI Improvements

    • Ensure Unique Keys in Navbar Menu Items - PR #16987
    • Minor Cosmetic Changes for Buttons, Add Notification for Delete Team - PR #16984
    • Change Delete Modals to Common Component - PR #17068
    • Disable edit, delete, info for dynamically generated spend tags - PR #17098
    • Migrate modelInfoCall to ReactQuery - PR #17123
    • Migrate Provider Fields to React Query - PR #17177
    • Fix Flaky Test - PR #17161
    • Change Add Fallback Modal to use Antd Select - PR #17223
  • Infrastructure

    • Non Root Docker Build - PR #17060
    • Add nodejs and npm to docker image for prisma generate - PR #16903
    • Bump: version 0.4.8 → 0.4.9 - PR #17163
  • Helm

    • Enhancement: ServiceMonitor template rendering - PR #17038

Bugs​

  • Database
    • Distinguish permission errors from idempotent errors in Prisma migrations - PR #17064

AI Integrations​

Logging​

  • General
    • Model Armor - Logging guardrail response on llm responses - PR #16977
    • Add missing standard logging object fields - PR #17135
    • Add cost tracking for cohere embed passthrough endpoint - PR #17029
    • Add cost tracking for streaming in vertex ai passthrough - PR #16874

Guardrails​

  • Presidio

    • Add presidio pii masking tutorial with litellm - PR #16969
  • General

    • Prompt security litellm - PR #16365
    • Add guardrails for pass through endpoints - PR #17221
    • Allow adding pass through guardrails through UI - PR #17226

Prompt Management​

  • General
    • AI gateway prompt management documentation - PR #16990

MCP Gateway​

  • OAuth 2.0

    • Add backend support for OAuth2 auth_type registration via UI - PR #17006
    • Add UI support for registering MCP OAuth2 auth_type - PR #17007
  • Tool Permissions

    • Tool permission argument check - PR #16982
    • Add UI support for configuring tool permission guardrails - PR #17050
  • Configuration

    • Remove unused MCP_PROTOCOL_VERSION_HEADER_NAME constant - PR #17008
    • Add header passing support for MCP tools in Responses API - PR #16877
    • Fix missing await - PR #17103

Performance / Loadbalancing / Reliability improvements​

  • Memory Optimization

    • Lazy-load cost_calculator & logging to reduce memory + import time - PR #17089
  • Dependency Management

  • Database Performance

    • Optimize date filtering for spend logs queries - PR #17073
  • Request Handling

    • Add automatic LiteLLM context headers (Pillar integration) - PR #17076
  • Generic API Support

    • Make generic api OSS + support multiple generic API's - PR #17152

Documentation Updates​

  • Provider Documentation

  • General Documentation

    • AI gateway prompt management - PR #16990
    • Cleanup README and improve agent guides - PR #17003
    • Update broken documentation links in README - PR #17002
    • Update version and add preview tag - PR #17032
    • Document model pricing contribution process - PR #17031
    • Document event hook usage - PR #17035
    • Link to logging spec in callback docs - PR #17049
    • Add OpenAI Agents SDK to projects - PR #17203

New Contributors​

  • @prawaan made their first contribution in PR #16997
  • @lior-ps made their first contribution in PR #16365
  • @HaiyiMei made their first contribution in PR #17020
  • @yuya2017 made their first contribution in PR #17064
  • @saar-win made their first contribution in PR #17038
  • @sdip15fa made their first contribution in PR #16965
  • @KeremTurgutlu made their first contribution in PR #16826
  • @choigawoon made their first contribution in PR #17019
  • @SamAcctX made their first contribution in PR #17144
  • @naaa760 made their first contribution in PR #17079
  • @abi-jey made their first contribution in PR #17096
  • @hxyannay made their first contribution in PR #16734

Full Changelog​