Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)
Don't want to get crazy bills because either while you're calling LLM APIs or while your users are calling them? use this.
litellm.max_budget: a global variable you can use to set the max budget (in USD) across all your litellm calls. If this budget is exceeded, it will raise a BudgetExceededError
BudgetManager: A class to help set budgets per user. BudgetManager creates a dictionary to manage the user budgets, where the key is user and the object is their current cost + model-specific costs.
Manage Multiple Deployments
Use this if you're trying to load-balance across multiple deployments (e.g. Azure/OpenAI).
Router prevents failed requests, by picking the deployment which is below rate-limit and has the least amount of tokens used.
In production, Router connects to a Redis Cache to track usage across multiple deployments.
embedding() calls when switched on
liteLLM implements exact match caching and supports the following Caching:
- In-Memory Caching [Default]
- Redis Caching Local
- Redis Caching Hosted