Krivka & Vlcek - AI Found 12 Zero-Days in OpenSSL

Speaker:: Adam Krivka & Ondrej Vlcek Title:: AI Found 12 Zero-Days in OpenSSL Duration:: 25 min Video:: https://www.youtube.com/watch?v=IjL2qN1KDe8 ## Key Thesis Isle Security built an agentic, multi-stage vulnerability research pipeline that has discovered 500 confirmed vulnerabilities in open-source software within six months — including 12 in OpenSSL alone — matching the output of Google's Project Naptime/Big Sleep effort. The core argument is that LLM-based reasoning is now sophisticated enough to find deep, non-obvious security bugs at scale, and that the industry must treat this with urgency because offensive actors have equal access to the same technology. ## Synopsis Ondrej Vlcek, a 30-year cybersecurity veteran who wrote one of the original antivirus engines in the 1990s, opens by reframing the talk: the point is not the specific bugs found, but the engine behind them. He situates isle's work alongside other major AI vulnerability research efforts — Google's Project Naptime/Big Sleep, the DARPA AI Cyber Challenge (which went from 35% of planted vulnerabilities found in DEFCON 2024 semifinals to 87% found at DEFCON 2025 finals), and Anthropic's Claude 4.6 finding 500+ vulnerabilities in projects like Ghost, Ghostscript, and the Linux kernel. Isle started not as a vulnerability discovery tool, but as an agentic remediation engine. When they began benchmarking existing scanners against 100,000 historical CVEs, they discovered signature-based scanners had efficacy rates in the low single digits — even on vulnerabilities they could have been trained on. They pivoted to build their own LLM-based scanner, and it started finding previously unknown vulnerabilities as a side effect. Adam Krivka walks through representative findings. The most serious OpenSSL bug is a stack overflow in a component used by email clients, where an attacker controls the length of a field in a data notation structure causing a stack buffer overflow. They also found bugs across Chromium, Firefox, and WebKit. A standout example is a logic inversion bug in Traefik, a popular Kubernetes ingress controller: a configuration flag called `proxy_ssl_verify` when set to true actually sets `insecure_skip_verify` to true — silently disabling TLS authentication. This is the kind of subtle semantic error that pattern-matching scanners are structurally incapable of finding. Isle's pipeline uses a two-phase approach: a breadth-first "broadening" phase where the system generates as many hypotheses as possible about what could be wrong in a given codebase, followed by a focus "narrowing" phase where agents do deep agentic exploration including running the code, crafting PoC/fuzzing where applicable, and using multiple models to critique each other's findings. Parallelism is central — humans are single-threaded, LLMs are not. They emphasize careful context construction, human-in-the-loop for final submission (increasingly a formality), and specialized fine-tuned models with attention to interpretability and continual learning. The talk closes with a call to urgency. Vlcek notes that Daniel Stenberg (author of curl), previously a vocal AI skeptic, publicly called isle's reports "magic" after six months of interaction and now advocates for AI-assisted vulnerability research. The concern is a "vulnerability apocalypse" if defenders don't move faster than attackers, who have access to the same LLM capabilities and unlimited token budgets from nation-state funding. ## Key Takeaways - 500 confirmed vulnerabilities found in 6 months; 133 CVEs minted so far (trailing metric, backlog is large) - Signature/pattern-matching scanners have near-zero efficacy even on known historical CVEs - Multi-model adversarial critique helps combat hallucinations and sycophancy - Heavy parallelism is a core advantage: many hypothesis threads investigated simultaneously - Isle provides fixes alongside reports to avoid "AI slop" that burdens already overworked open-source maintainers - Active GitHub bot deployed on OpenSSL, OpenClaw, OpenEMR, Apache — scanning every PR - General availability product launched at isle.com on the day of the talk - Nation-state actors have essentially unlimited token budgets; defenders must match urgency ## Notable Quotes / Data Points - DARPA AI Cyber Challenge: 35% of vulnerabilities found in 2024 semifinals → 87% in 2025 finals (7 teams, limited tokens) - Isle: 500 vulnerabilities confirmed, ~133 CVEs minted in 6 months (same number as Anthropic's Claude 4.6 release) - Daniel Stenberg (curl): went from "complete skeptic and naysayer to a huge, huge fan" in 6 months - Commercial pattern-matching scanners: "low single-digit percentage success rates" on historical CVEs they could have been trained on - "The vulnerability apocalypse... we only prevent [it] if we move with urgency" #unprompted #claude