Speaker:: Andrew Bullen Title:: Breaking the Lethal Trifecta (Without Ruining Your Agents) Duration:: 19 min Video:: https://www.youtube.com/watch?v=cNE7P5FkqR8 ## Key Thesis Prompt injection in production agentic systems cannot be eliminated — it must be assumed and contained. Stripe's approach is to accept prompt injections will happen and use architectural guardrails targeting the only two controllable legs of the lethal trifecta (egress and sensitive action authorization) while aggressively solving the UX friction those guardrails introduce, because guardrails that ruin agent usability won't get adopted. ## Synopsis Andrew Bullen is the head of AI security at Stripe, a 10-year company veteran. He opens by noting that prompt injection is widely understood in the security community but scrupulously avoided by most AI developers — it's an inconvenient risk in a "move fast and break things" era. Even experienced AI developers, startup founders, and tech executives often have only a vague understanding of what prompt injection actually is. His thesis: security practitioners must show that building safely does not mean building slowly. He references failure rate data showing models getting better at resisting prompt injection but notes that even 1% or 0.1% failure rates are unacceptable in security contexts. The real problem isn't the current rate — it's that we don't know how prevalent attacks are now, and things will get worse before they get better. The lethal trifecta (a term coined by Simon Willis) defines what an agent needs to exfiltrate data: untrusted content, private data, and egress. Untrusted content can't practically be removed — it's what makes LLMs useful. Private data can't be removed from most real agents. That leaves egress as the only actionable leg. Similarly, for harmful actions, untrusted content can't be removed, leaving only the ability to take sensitive actions unilaterally. Stripe's guardrail architecture has two components. First: prevent egress. This means blocking general web requests from agent services and ensuring SaaS integrations (like Google Docs, Figma) can't be used as data exfiltration vectors. The "safe search" solution: use OpenAI web search with `external_web_access: false` so results come from cache without the agent actually making egress calls. For SaaS, all third-party connections are proxied through a central MCP server called "Toolshed" which enforces rules like "only connect to Stripe tenants" — and has the UX benefit of requiring only one MCP connection for end users. Second: prevent unauthorized sensitive actions. "Sensitive" means production writes or broad communications. Human-in-the-loop confirmation is required for these, with annotations on every tool specifying whether it's production-impacting or broadcast. The guardrails create real UX problems. Three pain points: (1) Agent stoppage — every confirmation request halts forward progress, similar to Claude Code's constant permission prompts. (2) Review fatigue — too many prompts and users tune out. (3) Rubber stamping — with enough fatigue, users approve everything blindly. Solutions being developed: batching confirmations (let non-urgent writes stack up and be reviewed later), offering revert options (do the write, then let users undo rather than blocking), and using an LLM as a second reviewer to catch problems without human involvement. Enforcement at Stripe: agents are tagged by their dependency on foundation model proxies. CI checks enforce that tagged agent services cannot configure open egress without an escalated review process. For tool annotations, both inline agent framework tools and centralized MCP tools require permission annotations (production write, broadcast, etc.), which trigger the central UX confirmation framework. An emerging challenge: increasingly agents write their own code and call APIs directly rather than using MCP tools, requiring proxy-level inspection of agent sandbox traffic with API endpoint annotations. He closes with a leadership point: security isn't about finding the secure way to do things, it's about helping the organization achieve its goals securely. The turbulent high-risk era of AI requires practitioners in the room to be the ones making organizations navigate it safely. ## Key Takeaways - Assume prompt injection will happen; design to prevent damage when it does - Of the lethal trifecta (untrusted content, private data, egress), only egress is practically controllable - Stripe's approach: block egress + require human review for sensitive actions - "Safe search" via OpenAI search with `external_web_access: false` to prevent egress while allowing web data access - All SaaS connections proxied through central "Toolshed" MCP server with tenant restrictions - Three UX failure modes of human-in-loop: stoppage, review fatigue, rubber stamping - Solutions: batching writes, revert-instead-of-block, LLM second reviewer - CI enforcement: tag agent services by foundation model proxy dependency, gate egress config - Agents increasingly write code and hit APIs directly — MCP-level controls may not be sufficient going forward ## Notable Quotes / Data Points - "Most people working in AI are sort of scrupulously avoiding [prompt injection]. Given the AI hype cycle... it's just easier for a lot of companies to pretend prompt injection doesn't exist." - "Even a 1%, even a 0.1% failure chance to attack is not enough." - "Lethal trifecta" coined by Simon Willis: untrusted content + private data + egress = data exfiltration capability - Stripe's central MCP server: "Toolshed" — proxies all SaaS connections, enforces tenant restrictions - OpenAI search `external_web_access: false` setting as safe search implementation - "Security isn't just about figuring out the secure way to do things. It's about helping your organization figure out how to accomplish their goals in a secure way." - Increasing concern: agents writing their own code and hitting APIs directly, bypassing MCP tool controls #unprompted #claude