Output Guardrails
Output guardrails run after the model generates a response.
They ensure that generated content is safe, compliant, and aligned with policy.
What output guardrails protect against
- PII or PHI leakage
- Hallucinated facts
- Unsafe instructions
- Confidential data exposure
- Policy-violating content
- Unstructured or invalid output
Common output guardrails
-
PII Redaction
Removes sensitive personal data. -
Schema Validation
Ensures output matches a required format. -
Citation Requirement
Requires sources for factual claims. -
Confidentiality Enforcement
Prevents internal data exposure.
Enforcement actions
Output guardrails can:
- Block responses
- Modify content (redact, sanitize)
- Attach warnings
- Downgrade confidence
- Emit audit events
Example flow
- Model generates output
- Output guardrails execute
- Violations are detected
- Output is redacted or blocked
- Final response is returned
Best practices
- Use PII redaction by default
- Require schemas for agent outputs
- Enforce citations in regulated domains
Next steps
- Learn about Tool Guardrails
- Learn how to write Custom Guardrails