Three Classes of Guardrail Erosion Resistance
Guardrails fall into three classes by erosion resistance: erasable (convention-dependent), detectable (tool-enforced), and immutable (formally enforced). Social guardrails (documentation, naming conventions, architectural patterns as โhow we do thingsโ) erode fastest because the suggestible actor walks through conventions without noticing. Encoded guardrails (linters, static analysis, CI/CD gates) erode moderately because agents respond to errors but can satisfy checks trivially without satisfying the underlying invariant. Structural guardrails (type systems, capability restrictions, property-based tests, formal verification) resist erosion because they encode mathematical properties that cannot be circumvented by pattern-matching.
Most guardrails prescribed in the Suggestible Actor post are social or encoded. The only erosion-resistant class is structural, which most codebases have very little of.
Related
Linked from
- ๐ฟAI Reviewing AI: Shared Blind Spots
- ๐ฟEncoded Guardrails
- ๐ฟEncoded Guardrails Suppress Symptoms Without Addressing the Cause
- ๐ฟExpected Damage: Severity Times Time to Mitigation
- ๐ฟGuardrail Erosion Is a Meta-Problem
- ๐ฟSocial Guardrails
- ๐ณStatic Analysis Is Insufficient for AI Code
- ๐ฟStructural Guardrails
- ๐ฟThree Dimensions of Erosion Resistance Allocation
- ๐ชถThe Guardrail Erosion Problem with AI Agents