Scaling

Guardrails is designed to scale with your application, from early prototypes to high-throughput production systems.

Horizontal scalability

Guardrails services are stateless and support:

Horizontal scaling
Load-balanced deployments
Multi-region architectures

Rate limiting

Built-in rate limit guardrails help:

Protect downstream systems
Control cost
Prevent abuse

Limits can be configured per:

User
API key
Profile

High-volume workloads

For high-throughput use cases:

Use batching where possible
Enable cost thresholds
Monitor execution latency
Tune guardrail profiles for performance

Performance considerations

Execution latency depends on:

Number of enabled guardrails
Complexity of content
External classifiers (if used)

Best practices:

Start with essential guardrails
Add additional checks incrementally
Measure before optimizing

Caching strategies

You can safely cache:

Profile definitions
Guardrail configurations
Static metadata

Do not cache validation results unless inputs are identical.

Observability at scale

As traffic grows, rely on analytics to:

Detect regressions
Identify slow guardrails
Track success rates
Monitor failure spikes

Failure handling

Guardrails supports configurable behavior:

Block on failure
Warn only
Redact sensitive output
Fail open (advanced scenarios)

Choose behavior based on risk tolerance.

Next steps

Secure your deployment → Security
Explore analytics and observability
Integrate Guardrails into CI pipelines