Three Classes of Guardrail Erosion Resistance
Guardrails fall into three classes by erosion resistance: erasable (convention-dependent), detectable (tool-enforced), and immutable (formally enforced). Social guardrails (documentation, naming conventions, architectural patterns as βhow we do thingsβ) erode fastest because the suggestible actor walks through conventions without noticing. Encoded guardrails (linters, static analysis, CI/CD gates) erode moderately because agents respond to errors but can satisfy checks trivially without satisfying the underlying invariant. Structural guardrails (type systems, capability restrictions, property-based tests, formal verification) resist erosion because they encode mathematical properties that cannot be circumvented by pattern-matching.
Most guardrails prescribed in the Suggestible Actor post are social or encoded. The only erosion-resistant class is structural, which most codebases have very little of.
Connections
- π³AI Reviewing AI: Shared Blind Spots
- π³Encoded Guardrails
- π³Expected Damage: Severity Times Time to Mitigation
- π³Guardrail Erosion Is a Meta-Problem
- π³Social Guardrails
- π³Static Analysis Is Insufficient for AI Code
- π³Structural Guardrails
- π³Three Dimensions of Erosion Resistance Allocation
- πͺΆThe Guardrail Erosion Problem with AI Agents