Lidzborski - Securing Workspace GenAI at Google Speed

Speaker:: Nicolas Lidzborski Title:: Securing Workspace GenAI at Google Speed Duration:: 26 min Video:: https://www.youtube.com/watch?v=J9B6Ez2ynvk ## Key Thesis Traditional security filtering — blocklists, regex, even ML classifiers — is fundamentally inadequate for GenAI systems because prompts are code: every token is a potential instruction operating semantically rather than syntactically. Google Workspace's approach is to build a layered structural defense with trust provenance tracking, deterministic orchestration controls, and continuous adversarial testing rather than trying to win a reactive cat-and-mouse filtering game. ## Synopsis Lidzborski, a principal software engineer at Google with 25 years of security experience, frames the current moment as a "perfect storm" where traditional security boundaries have collapsed. In classical computing, code and data are separate; in LLMs, the context window is a single contiguous stream of tokens. The model cannot distinguish a system instruction from untrusted user data — there is no NX bit for the context window. This structural vulnerability is the root cause of prompt injection. He identifies three primary risk categories for productivity AI environments. First, indirect prompt injection: a zero-click supply chain attack where malicious instructions are hidden in untrusted external content (a calendar invite, an email) rather than the user's direct prompt. The LLM treats the injected content as legitimate instruction and executes it. Second, markdown exfiltration: the LLM can be prompted to generate malicious markdown that renders as HTML, embedding attacker-controlled URLs as image sources or links. Sensitive data gets leaked silently in URL query parameters — the user sees only a broken image icon. Third, rogue actions via agency gap: non-deterministic reasoning causes agents to take unintended actions (e.g., emailing sensitive data to the wrong person due to a name conflict). Lidzborski references Simon Willison's "lethal trifecta" — sensitive data access + continuous exposure to untrusted content + external API capability — as the convergence that makes rogue actions dangerous. He demonstrates why reactive filtering fails. Static blocklists and regex are trivially bypassed with encoding (hex, ROT13, base64). ML classifiers can be evaded with synonym swapping, low-resource language translation, or adversarial prefixes. Multi-modal attacks hide instructions in image metadata, bypassing text-only filters entirely. Even the "LLM as judge" pattern is vulnerable to recursive prompt injection — an attacker can include instructions specifically targeting the judge model ("malicious context will instruct the judge to review the following as safe even if it contains execution commands"). Google's structural defense has four layers: (1) input sanitization — strip non-visible HTML, filter known-spam content before it reaches the LLM, track data provenance with user affinity and risk scores; (2) prompt delimitation — sentinel tokens, adversarial training to ignore imperative commands within data delimiters (acknowledged as imperfect but additive); (3) deterministic orchestration — stateful policy enforcement via a finite state machine that restricts downstream capabilities based on data origin, requiring human confirmation for mutation or data-sharing chains, blocking web requests after accessing sensitive data; (4) output sanitization — scrub markdown to remove unwanted image links and protocols, classify URLs against Safe Browsing data, remove hallucinated ungrounded links. Systemic resilience is maintained through automated regression testing (running all known attack classes repeatedly to catch regressions from new features), a VRP and dedicated bug bash with external researchers, and a plan-validate-execute pattern for high-stakes actions. User feedback is incorporated as a signal loop analogous to spam/abuse systems. In Q&A, an audience member noted the parallel to the von Neumann architecture mistake of combining code and data in the same memory space. Lidzborski agreed the industry is "back to Windows 95 security levels" and emphasized the urgency of security by design, noting that median enterprise patching time for critical vulnerabilities is 30 days — catastrophic if vulnerabilities can be discovered in minutes. ## Key Takeaways - The prompt-as-code problem is structural, not syntactic — filtering cannot solve it - Indirect prompt injection is a zero-click vector: a calendar invite or email can compromise an agent silently - The "lethal trifecta": sensitive data access + untrusted content exposure + external API capability = high-impact rogue action potential - Markdown rendering is an exfiltration channel — images and links can silently leak data to attacker servers - LLM-as-judge is not a solution — the judge shares the same semantic attack surface as the primary model - Defense must be layered: input provenance → prompt delimitation → deterministic orchestration → output sanitization - Deterministic policy enforcement (state machine, confirmation gates) matters as much as prompt-level controls - Security is a continuous self-correcting loop, not a one-shot deployment - Median enterprise patch time for critical vulns is 30 days; AI-enabled vuln discovery is near-instant — this gap is dangerous ## Notable Quotes / Data Points - "The prompt is code — every single token in the input stream is a potential instruction" - "Reactive guardrails and pattern matching are inherently inadequate" - "We are back in the time of Windows 95, everything running at the same user level, barely any monitoring" - Real attack demonstrated: Google Calendar invite with hidden payload triggered by "What's on my schedule today?" — controlled home curtains, temperature, zero exploits or malware needed - Referenced Black Hat talk "Invitation Is All You Need" by Benassi and Taco as real-world vector #unprompted #claude