Scaling
Guardrails is designed to scale with your application, from early prototypes to high-throughput production systems.
Horizontal scalability
Guardrails services are stateless and support:
- Horizontal scaling
- Load-balanced deployments
- Multi-region architectures
Rate limiting
Built-in rate limit guardrails help:
- Protect downstream systems
- Control cost
- Prevent abuse
Limits can be configured per:
- User
- API key
- Profile
High-volume workloads
For high-throughput use cases:
- Use batching where possible
- Enable cost thresholds
- Monitor execution latency
- Tune guardrail profiles for performance
Performance considerations
Execution latency depends on:
- Number of enabled guardrails
- Complexity of content
- External classifiers (if used)
Best practices:
- Start with essential guardrails
- Add additional checks incrementally
- Measure before optimizing
Caching strategies
You can safely cache:
- Profile definitions
- Guardrail configurations
- Static metadata
Do not cache validation results unless inputs are identical.
Observability at scale
As traffic grows, rely on analytics to:
- Detect regressions
- Identify slow guardrails
- Track success rates
- Monitor failure spikes
Failure handling
Guardrails supports configurable behavior:
- Block on failure
- Warn only
- Redact sensitive output
- Fail open (advanced scenarios)
Choose behavior based on risk tolerance.
Next steps
- Secure your deployment → Security
- Explore analytics and observability
- Integrate Guardrails into CI pipelines