LiteLLM

meta

Nov 11, 2023 1 min read

Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)

Budget Manager

Don't want to get crazy bills because either while you're calling LLM APIs or while your users are calling them? use this.

LiteLLM exposes:

litellm.max_budget: a global variable you can use to set the max budget (in USD) across all your litellm calls. If this budget is exceeded, it will raise a BudgetExceededError
BudgetManager: A class to help set budgets per user. BudgetManager creates a dictionary to manage the user budgets, where the key is user and the object is their current cost + model-specific costs.

Manage Multiple Deployments

Use this if you're trying to load-balance across multiple deployments (e.g. Azure/OpenAI).

Router prevents failed requests, by picking the deployment which is below rate-limit and has the least amount of tokens used.

In production, Router connects to a Redis Cache to track usage across multiple deployments.

Caching `completion()` and `embedding()` calls when switched on

liteLLM implements exact match caching and supports the following Caching:

In-Memory Caching [Default]
Redis Caching Local
Redis Caching Hosted

https://docs.litellm.ai/

LLMs

About the author

meta