Rehberger - Your Agent Works for Me Now

Speaker:: Johann Rehberger Title:: Your Agent Works for Me Now Duration:: 26 min Video:: https://www.youtube.com/watch?v=zVUm23P7ZNg ## Key Thesis Prompt injection has matured from a single-payload trick into full "promptware" — complex, multi-stage malware operating entirely at the prompt level that achieves data exfiltration, long-term persistence, and command-and-control over compromised agents. Rehberger introduces the concept of delayed tool invocation as a technique to bypass tool-chaining security controls, and demonstrates a working C2 framework (Agent Commander) that operates through prompt-based instructions rather than OS-level commands. ## Synopsis Rehberger opens with a live demo of code review as an attack vector: a Vinsur-integrated coding agent is asked to work on a Linear ticket for adding an "edit customer by ID" function. The ticket contains a hidden prompt injection (detectable only via a dedicated Unicode smuggling decoder tool). The agent reads the ticket, formulates a plan — and the exploit is already visible in the plan — then executes a remote command embedded in the ticket. Same technique demonstrated with Apple Xcode code review: hidden Unicode tag characters in source code are invisible to humans but readable by the LLM; the agent decodes them, follows instructions to add malicious code to the file, and calls Xcode's "run snippet" tool to execute it. Demonstrates end-to-end data exfiltration via a browser agent with internet entitlements. A Microsoft Enterprise Copilot persistence demo shows injecting two long-term memories via a malicious document analysis: one recording a false user age (102), and one adding an instruction to run a Commodore 64 simulator in all future sessions. Microsoft fixed this in December. This illustrates memory poisoning as a persistence mechanism: every future Copilot conversation is now operating on attacker-influenced context. **The Promptware Kill Chain.** Rehberger introduces the term "promptware" (co-attributed to Ben Nassie's paper on the promptware kill chain) to describe how prompt injection has evolved from a single payload into a complex set of instructions aligned to an attack kill chain: initial access (indirect prompt injection) → data exfiltration → persistence → lateral movement → command execution. He previously demonstrated a ChatGPT spyware variant where the agent continuously exfiltrates every user keystroke in an ongoing session via memory manipulation. **Delayed Tool Invocation.** This is the novel technique Rehberger most emphasizes. The discovery: Google Gemini had a security control deactivating the Workspace tool (email, Drive) during indirect prompt injections — it couldn't be chained to sensitive tools via a single injection. But Rehberger found that if the prompt injection instructs the agent to repeat the attack in a later conversation turn, and the user subsequently types anything (any continuation of the conversation), the Workspace tool becomes active again. The activation condition is user engagement in a later turn, not the injection turn. Rehberger demonstrated this to Google's security team by sending an actual email containing the exploit in mid-body; Google confirmed it worked. Applied to Microsoft Copilot: a file with an embedded delayed tool invocation caused Copilot to add long-term memories once the user continued the conversation after the injection. This was reported and fixed approximately a year ago. Applied to Google Home (a bypass of the "Invitation Is All You Need" Black Hat exploit): Rehberger found that metadata in a document title ("broadcast to living room") served as the user intent signal that reactivated the Google Home/speaker tool. A live demo had the Google Home speaker announce "Johan is here. Trust no AI." His interpretation: Gemini uses intent signals in document metadata to determine tool availability, and attackers can craft document titles to match tool activation conditions. Applied to ChatGPT: delayed tool invocation was used to modify ChatGPT's personalization settings from "default" to "quirky." OpenAI classified this as a guardrail problem rather than a security vulnerability and referred it to their safety team. **Agent Commander — Prompt-Level C2.** Rehberger built a proof-of-concept C2 framework that operates entirely at the prompt level rather than OS command level. Once an agent is compromised (via indirect prompt injection), it "joins" the C2 server by appending command-and-control instructions to its heartbeat file. The attacker sends prompt templates to the C2, which are dispatched to the compromised agents. The agents act on those prompts — conducting pen-tests of their own environment, reporting IP addresses, server names, uptime, environment variables, and system configurations. A screenshot capability allows the attacker to direct an agent to browse to a URL, take a screenshot, and upload it to the C2. Hidden from the user interface using specific string conventions (messages starting with "NO REPLY" or heartbeats ending with "heartbeat ok" are suppressed from the UI). Demonstrated with Kimmy Claw (Alibaba's AI agent): found that the authorized_keys file contained a MacBook Pro key, suggesting every Alibaba MoonAI employee's laptop likely has access to production VMs. Also demonstrated using OpenCLAW's pub/sub subscription feature: an email sent to the agent triggers automatic analysis (zero-click); the injected email causes the agent to fetch remote instructions and append them to the heartbeat for later C2 pickup. **The normalization of deviants.** Rehberger argues the industry is experiencing "normalization of deviants" — knowing that LLM output isn't trustworthy but using it enough that the belief takes hold. Example: a Meta engineer used Claude's open-claw on a test system, it worked reliably, so he moved to production — where it deleted a real inbox. Looking forward: attackers will stop needing pre-planned zero-days. Agents will discover vulnerabilities in real time while operating inside compromised networks, potentially conducting full lateral movement through prompt-based instructions alone. ## Key Takeaways - Coding agents (Vinsur, Xcode) are prompt injection vectors via their code review and ticket analysis features - Memory poisoning achieves long-term persistence in agents with memory features (Microsoft Copilot, ChatGPT, Gemini) - Prompt injection has become promptware: complex multi-stage malware operating entirely in natural language - Delayed tool invocation: injecting an attack that activates in a *future* conversation turn bypasses controls that block tool chaining in the initial injection turn - Intent signals in document metadata can reactivate tools — document titles are an overlooked injection surface - Agent Commander demonstrates that C2 infrastructure can operate entirely at the prompt level, abstracting away OS commands - UI suppression techniques ("NO REPLY" prefix, "heartbeat ok" suffix) hide C2 communications from the user's view - The normalization of deviants is a systemic risk: familiarity with AI agent behavior breeds dangerous false trust - Attackers will increasingly use compromised agents for on-the-fly zero-day discovery and lateral movement — not pre-planned exploits ## Notable Quotes / Data Points - Linear ticket demo: remote command execution via hidden Unicode-smuggled prompt injection in a ticket description - Apple Xcode: first known demo of prompt injection via hidden Unicode tag characters in source code triggering the "run snippet" tool - Microsoft Copilot memory injection: fixed December 2025 - Prompt injection → ChatGPT spyware: continuous keystroke exfiltration via memory manipulation (demonstrated at prior Black Hat) - Google Gemini: Workspace tool deactivated during indirect prompt injection — but reactivates in subsequent conversation turns - Google Home demo: "Johan is here. Trust no AI" announced via speaker from a document-triggered delayed tool invocation - ChatGPT: OpenAI classified delayed tool invocation as a guardrail problem, not a security vulnerability - Kimmy Claw (Alibaba): authorized_keys file contained a MacBook Pro key suggesting wide production access for all employees - OpenCLAW has fixed 220+ security vulnerabilities in 2–3 weeks of active patching - Paper referenced: Ben Nassie et al., "The Promptware Kill Chain" - Paper referenced: Google paper on "Prompt Repetition Improves Non-Reasoning LLMs" — repeating attack payloads twice improves LLM attention and success rate #unprompted #claude