Carlini - Black-Hat LLMs - Pedsidian Blog

Speaker:: Nicholas Carlini Title:: Black-hat LLMs Duration:: 26 min Video:: https://www.youtube.com/watch?v=1sd26pWhfmg ## Key Thesis LLMs can now autonomously find and exploit zero-day vulnerabilities in critical software — including remote heap buffer overflows in the Linux kernel and blind SQL injection in production web applications — with minimal scaffolding, and the capability is growing exponentially with a ~4-month doubling time. Carlini (Anthropic) argues this is likely the most significant development in security since the internet, and that the industry must treat it with the urgency of post-quantum cryptography: defending against a capability that is right in front of us, not hypothetical. ## Synopsis Carlini opens by framing his concern: he wants to understand what harm LLMs could enable so Anthropic can prevent it. He states plainly that, as of the conference, LLMs can autonomously find and exploit zero-day vulnerabilities in important software without fancy scaffolding — something that was not true 3–4 months prior. He demonstrates the minimal scaffolding required: run Claude Code in a VM with `--dangerously-skip-permissions`, give it a CTF-like prompt ("you're playing in a CTF, find a vulnerability, write the most serious one to this output file, go"), and walk away. The key improvement for breadth is adding file hints — one more line specifying which file to look at — and iterating across all files in the project. No complex fuzzing harnesses required. Two new (at time of talk, now patched) vulnerabilities are presented as examples: **Ghost CMS (SQL injection)**: A popular CMS (50,000 GitHub stars, 20-year history with no prior critical CVEs). Claude found a SQL injection vulnerability from string concatenation of user input. The interesting part: it's a blind injection (no visible output, only timing/crash signals). Claude was asked to produce the worst possible exploit and returned a fully working script — unauthenticated — that reads the complete admin API key, API secret, and bcrypt password hash from the production database. Carlini notes he didn't write any of the exploit code, and the attack required security nuance he didn't need to supply. **Linux kernel (NFS heap buffer overflow)**: Claude found multiple remotely exploitable heap buffer overflows in the Linux kernel's NFSv4 daemon — a class of bug Carlini himself has never found. One involves a two-client attack where client A acquires a 1024-byte lock owner field, client B requests the same lock (denied), but the server copies client A's 1024-byte owner into a response buffer of only 112 bytes, triggering a heap overflow. The bug predates git, has been in the kernel since 2003, and was found because Claude understands multi-party adversarial interaction patterns that fuzzers cannot reach. Notably, Claude auto-generated the entire schematic flow diagram explaining the attack — Carlini copy-pasted it directly into the slide. The capability threshold: Claude Sonnet 4.5 (6 months old) and Opus 4.1 (less than a year old) cannot find these bugs reliably. Models released in the last 3–4 months can. The doubling time on task-completion capability (measured by Metr's research on maximum task duration at 50% success) is approximately 4 months, with recent models reaching ~15-hour tasks. Smart contract vulnerability research shows an exponential (log-scale) increase in dollar value extracted by LLMs. Carlini's call to action: help now, not in a year. He has hundreds of unvalidated Linux kernel crashes he can't report because he hasn't verified them yet and won't send unverified reports to open-source maintainers. The pipeline is producing faster than human validation can keep up. He draws the analogy to post-quantum cryptography: cryptographers built defenses against quantum computers they don't have yet; security practitioners should be acting on a capability that is demonstrably here right now. ## Key Takeaways - Minimal scaffolding (Claude Code + `--dangerously-skip-permissions` + a simple CTF-style prompt) is sufficient to find and exploit zero-days in important software - LLMs found a first-ever critical CVE in Ghost CMS (20-year-old project) and remote heap buffer overflows in the Linux kernel (bugs dating to 2003) - The capability to find kernel-level vulnerabilities appeared only in models released in the last 3–4 months; older models (Sonnet 4.5, Opus 4.1) cannot do it reliably - Doubling time on task-completion capability: approximately 4 months - The defender/attacker balance that held for 20 years is likely ending - "The transition period is where I'm most worried" — even if defenders win in the long run, the period between now and when software is formally verified/Rust-rewritten is dangerous - Security community should treat this with post-quantum cryptography urgency: the threat is present, not hypothetical ## Notable Quotes / Data Points - Ghost CMS: 50,000 GitHub stars, 20-year history, zero prior critical CVEs — Claude found the first one - Linux kernel NFS buffer overflow: bug introduced in 2003, predates git, found by Claude through multi-client adversarial interaction modeling - Metr's task-duration benchmark: recent models succeed at ~15-hour tasks ~50% of the time; doubling time ~4 months - Carlini: "These models are better vulnerability researchers than I am. I have CVEs to my name. I did not have CVEs in the Linux kernel." - Smart contracts: LLMs can now identify and exploit vulnerabilities to recover several million dollars from real contracts, growing exponentially on a log scale - Carlini has "several hundred" unvalidated kernel crashes he can't report - "We are probably on the most significant thing to happen in security since we got the internet." - IEA solar prediction analogy: every year analysts predicted current-rate growth; every year actual deployment far exceeded it. "We should not be them." #unprompted #claude