Speaker:: Rami McCarthy Title:: Zeal of the Convert: Taming Shai-Hulud with AI Duration:: 24 min Video:: https://www.youtube.com/watch?v=6P77Zbo2TA4 ## Key Thesis AI tools — when used with thoughtful harnesses, feedback loops, and a bias toward distilling findings into deterministic code — can compress weeks of manual supply chain attack victim analysis into days, scaling attribution from 200 to 2,400+ identified victims. The key lesson is not that AI is magic, but that the quality of your harness, your prompting discipline, and how you codify learnings into deterministic methods determines nearly all the value. ## Synopsis McCarthy opens by describing the "year of the worm" — a series of npm/GitHub supply chain attacks (Singularity, Shai-Hulud, chalk attack) in 2025 that leaked large amounts of data to ephemeral GitHub repositories. His task: collect the data before it disappears, analyze it, and attribute victims so they can be notified. He uses this real-world problem as a vehicle for demonstrating practical AI usage patterns. On data collection, McCarthy notes that naive prompting ("use the GitHub CLI to grab these repos") technically works but misses all the non-functional requirements: caching, idempotency, rate limit handling, parallelization, backoffs. He advocates for the RPI loop (Research, Plan, Implement) — never jump straight to implementation. He also recommends the "superpowers" repository's 7-step process for structured AI code generation, and Simon Willison's concept of "hoarding things you know how to do" — building composable utilities whose value compounds over time rather than discarding all vibe-coded artifacts. The data lake ended up being ~250,000 flat files, 30GB — manageable on a laptop but IO-bound for linear scanning. He notes a key limitation: AI never questions the abstraction you give it. Given flat files, it assumes flat files are correct. You have to be the one to ask whether this should be indexed in SQL or Parquet. This maps to broader guidance: AI is excellent at bash and flat files, but knows nothing about whether your current data structure is appropriate. For analysis, McCarthy highlights AI's strengths: one-shot fingerprinting (identifying and deduplicating CI/CD runs to reduce 30,000 apparent victims to 13,000 unique machines), signal extraction (identifying 25 CI/CD platform-specific environment variables), pattern matching, and creative reasoning (e.g., identifying encoded JWTs in the data and recognizing they'd contain extractable claims). The data showed 77% of affected machines were CI/CD runners, not developer laptops. He uses Gemini with a malformed (no newlines) dump of repository names to demonstrate that reasoning models can extract "find me the 10 major companies" with high signal even from garbage input — and that approximate world knowledge (Fortune 100, government domains, top VC-funded startups) enables novel slicing of the data. The dangerous flip side is credulity: AI hallucinated connections between generic strings like "nucleus" and a specific company, or attributed Azure DevOps usage to Microsoft victimhood. For victim attribution (how to find 13,000 companies' security contacts), he built an agentic analysis tool with 69 distinct attribution methods, mostly AI-generated. Key insight: when AI says "give me every company," it will sample the top 10 unless you use automated loop harnesses (whirl loops) that reprompt it to keep going until exit criteria are met. He also discovered a six-stage enrichment chain using poorly-documented Azure DevOps APIs to transit from DevOps slugs through tenant IDs to actual company domains — something AI didn't generate; human insight seeded it, then AI built the implementation. Results: the Shai-Hulud 2.0 incident impacted at least 37 Fortune 100 companies (manually confirmed), and the agentic analysis found 2,400+ total impacted companies — versus 200 found manually over two weeks. ## Key Takeaways - Always use the RPI loop (Research, Plan, Implement) — never jump straight to code generation - AI never questions your data abstraction; you must decide whether flat files, SQL, or Parquet is appropriate - Codify AI analysis findings into deterministic scripts for scalability, consistency, and backtesting - Feedback loops: use AI to sample and generate rules, run rules, measure uplift, repeat — don't rely on AI to analyze everything directly - Inject skepticism deliberately — baseline LLM alignment is too credulous for security attribution work - "Just-in-time programs" (throwaway tools built in minutes) should be a constant tool in your belt - Coverage requires explicit harnesses — AI will always sample unless you force it to keep going - Secret scanning tool rule sets are becoming fungible; engine characteristics matter more than rule sets ## Notable Quotes / Data Points - Shai-Hulud 2.0 impacted at least 37 of the Fortune 100 (manually confirmed) - 2,400+ impacted companies found via AI-assisted attribution vs. 200 found manually over 2 weeks - Data set: 250,000 flat files, 30GB, 13,000 unique victim machines (after deduplication from 30,000) - 77% of affected machines were CI/CD runners - 69 distinct attribution methods in the final agentic analysis tool - Total token cost for the analysis: ~$80 - TruffleHog: ~800 secret detectors; McCarthy ported 58 new rules to a different engine in 30 minutes using Claude - "Rules are becoming fungible. The engine is what starts to matter." #unprompted #claude