Cache Configuration

Conduit includes a powerful caching system that can significantly reduce costs and improve response times by storing and reusing LLM responses for identical requests.

Why Use Caching?

Cost Reduction: Avoid paying for repeated identical requests
Reduced Latency: Cached responses are returned instantly
Improved Reliability: Cached responses work even if providers are down
Consistent Responses: Ensures the same output for the same input

Caching Providers

Conduit supports two caching providers:

In-Memory Cache

Simple to set up, no additional dependencies
Stored in application memory
Lost when the application restarts
Limited by available RAM

Redis Cache

Persistent across application restarts
Shared across multiple Conduit instances
Higher capacity and better performance for production
Requires a Redis server

Configuring Caching

Via Web UI

Navigate to Configuration > Caching
Enable caching by toggling the switch
Select the cache provider (In-Memory or Redis)
Configure provider-specific settings:
- In-Memory: Maximum cache size (MB)
- Redis: Connection string, password, etc.
Set the default Time-To-Live (TTL) for cache entries
Save the configuration

Via Environment Variables

For In-Memory cache:

CONDUIT_CACHE_ENABLED=true
CONDUIT_CACHE_TYPE=InMemory
CONDUIT_CACHE_MAX_SIZE=1024
CONDUIT_CACHE_TTL=3600

For Redis cache:

CONDUIT_CACHE_ENABLED=true
CONDUIT_CACHE_TYPE=Redis
CONDUIT_REDIS_CONNECTION=redis:6379,password=your-password
CONDUIT_CACHE_TTL=3600

Cache Control

Request-Level Cache Control

You can control caching behavior at the request level:

{
  "model": "my-gpt4",
  "messages": [{"role": "user", "content": "Hello!"}],
  "cache_control": {
    "no_cache": false,
    "ttl": 7200
  }
}

The cache_control object supports:

no_cache: Set to true to bypass the cache
ttl: Override the default TTL in seconds

Response Headers

Conduit includes cache-related headers in responses:

X-Cache: HIT or MISS indicating cache status
X-Cache-Key: The hash key used for the cache (if debugging is enabled)
X-Cache-TTL: Remaining TTL in seconds (for cache hits)

Cache Keys

Conduit generates cache keys based on:

The model requested
The complete messages array
Selected request parameters that affect the output

Parameters like temperature, top_p, and max_tokens are included in the cache key since they affect the response, while parameters like stream or user are excluded.

Cache Management

Monitoring Cache Performance

The Web UI provides cache performance metrics:

Navigate to Dashboard > Cache
View statistics:
- Hit rate
- Miss rate
- Item count
- Memory usage
- Average response time savings

Clearing the Cache

You can clear the cache via the Web UI:

Navigate to Configuration > Caching
Click Clear Cache
Confirm the action

Or via API:

curl -X POST http://localhost:5000/admin/cache/clear \
  -H "Authorization: Bearer your-master-key"

Best Practices

Set Appropriate TTLs: Balance freshness vs. performance
Use Redis in Production: For persistence and scaling
Enable for High-Volume Endpoints: Focus on frequently repeated requests
Monitor Cache Performance: Adjust settings based on hit rates
Consider Disabling for Critical Requests: When absolute freshness is required

Next Steps

Learn about Budget Management for cost control
Explore Environment Variables for deployment configuration
See the WebUI Guide for UI-based configuration

Why Use Caching?​

Caching Providers​

In-Memory Cache​

Redis Cache​

Configuring Caching​

Via Web UI​

Via Environment Variables​

Cache Control​

Request-Level Cache Control​

Response Headers​

Cache Keys​

Cache Management​

Monitoring Cache Performance​

Clearing the Cache​

Best Practices​

Next Steps​