Georgi G - Prompt2Pwn LLMs Winning at Pwn2Own

Speaker:: Georgi G Title:: Prompt2Pwn - LLMs Winning at Pwn2Own Duration:: 25 min Video:: https://www.youtube.com/watch?v=c5XAvRbma6Y ## Key Thesis A bespoke LangChain-based agentic pipeline using a JADX MCP server can find real, exploitable Android vulnerabilities at scale — including a successful Pwn2Own entry that chained two AI-discovered bugs in Samsung applications to achieve one-click camera/microphone access on a Samsung Galaxy. The key to making agents work for vulnerability research is treating them like an intern: give explicit bug class definitions, iterate on entry points one at a time, and invest heavily in tool quality and actionable error messages. ## Synopsis Georgi (Interrupt Labs, UK) has been doing Android vulnerability research and Pwn2Own for over 10 years. His motivation for the AI pivot: Android bugs are increasingly buried in vast application codebases, bug bounty programs have cleared a lot of low-hanging fruit, and decompiled Dalvik bytecode produces human-readable Java — a natural LLM input. Since he had no privacy concerns for Pwn2Own targets (bugs go to the organizers), he had a clean use case. He built his own agentic solution in August of the prior year after finding nothing suitable off the shelf. The architecture: LangChain for agent orchestration, JADX for Android APK decompilation, LiteLLM for model management, Azure AI Foundry for model hosting, Langfuse for observability. The JADX MCP server was based on an open-source implementation — an HTTP server embedded in the JADX plugin exposing its internal API, wrapped by a Python FastMCP layer providing tools like fetch Android manifest, retrieve source for method/class, search classes, etc. Initial mistakes were severe: a completely vague objective ("find bugs in this app") caused the agent to get lost, exhaust token limits, and produce nothing useful. Key fixes: (1) Specify the exact bug class being looked for — URL validation issues, intent handling flaws, etc. (2) Iterate over individual entry points from the attack surface analysis agent rather than scanning the whole app. (3) Break entry points into sub-entry points (individual intent filter data schemes). (4) Fix MCP tool descriptions and error messages — "there's been an error" is useless; actionable error messages let the agent retry intelligently. (5) System prompt improvements: explicit primitives, chain requirements, bug priorities. The agent always wanted to suggest mitigations — explicitly suppressed as token waste. The two bugs found and chained for Pwn2Own: Bug 1 — Smart Touch Call (Samsung interactive customer service app). A URL validation bypass in intent handling allows attacker-controlled content to load in a WebView with a custom WebChromeClient that silently grants camera, microphone, and geolocation without prompts. Prerequisite: an active phone call must be in progress. The agent found the WebView and validation bypass but missed both the custom WebChromeClient (granting permissions without prompts) and the phone-call prerequisite. Bug 2 — Bixby (Samsung AI assistant). Intent handling allows loading any subdomain of `*.samsung.com` into a WebView with a privileged JavaScript interface. Since multiple Samsung subdomains have XSS issues, an attacker can execute JavaScript in the Bixby context and access the privileged interface. The agent correctly identified the loose host validation and the privileged JS interface. The chain: click one link → Bixby WebView → use Bixby's privileged interface to initiate a phone call → fulfill the phone-call prerequisite for the Smart Touch Call bug → gain camera/microphone access. Post-Pwn2Own agent improvements: APK harvesting tool for scale scanning, decoupled attack surface analysis (now a deterministic script — no LLM needed for something fully specified), code deobfuscation agent as a preprocessing step (significantly improved bug finding), running the agent 1-5× per entry point and deduplicating reports via a deduplicator agent, JSON formatting as a post-processing step (enforcing schema inline degraded output quality), and a bug verifier agent. The verifier is a double-edged sword: it correctly identified that Bug 1 was unreachable without an active call — which almost caused the team to discard their most interesting Pwn2Own exploit. Roadmap: static analysis integration with LLM-driven triage, cross-app bug tracking database, and PoC generation with runtime proof-of-vulnerability via ADB MCP and Frida MCP. ## Key Takeaways - First known successful use of an agentic AI pipeline to find and chain bugs at Pwn2Own (Samsung Galaxy, one-click camera/microphone) - Vague objectives are fatal — specify bug class, entry point, and expected primitives explicitly - Treat the agent like an intern: deeply ingrained process knowledge must be externalized into the system prompt - MCP tool quality is critical: actionable error messages enable recovery; vague errors cause failure spirals - Code deobfuscation as an agent preprocessing step significantly improved bug finding quality - Agent found 12+ manually-verified bugs in one target application from the initial basic implementation - Bug verifier agents are powerful but dangerous — can filter out real bugs with unusual prerequisites - Deterministic steps (manifest parsing, attack surface enumeration) should be scripts, not LLM tasks ## Notable Quotes / Data Points - "I'm dealing with an intern. I had to be very explicit about all of these [primitives]. I need to specify some priorities in terms of the actual bug classes I'm looking for." - "The agent always, always, always trying to suggest some fixes and mitigations. I'm like, shut up. Stop wasting my tokens." - Bug chain: Samsung Smart Touch Call (URL validation bypass, silent permissions WebView) + Bixby (loose host validation + privileged JS interface) = one-click camera/microphone access - Pwn2Own demo format: click one link, camera feed pops up - "It found probably over a dozen bugs that had to be manually verified... and it's shite, but it still found stuff" - Running bug hunter 1-5× per entry point and deduplicating: different runs produce different findings on the same entry point #unprompted #claude