Nabeel - Detecting GenAI Threats with Semantic YARA

Speaker:: Mohamed Nabeel Title:: Detecting GenAI Threats at Scale with YARA-Like Semantic Rules Duration:: 19 min Video:: https://www.youtube.com/watch?v=PZYtJL6TCwo ## Key Thesis YARA's string-matching model is inadequate for detecting GenAI threats like prompt injection because natural language attacks vary semantically, not syntactically. "SuperYARA" extends YARA's philosophy with semantic similarity matching, ML classifiers, and LLM-based conditions — enabling defense-in-depth detection of GenAI threats at scale while managing cost through layered pre-filtering. ## Synopsis Mohamed Nabeel (Palo Alto, working in web security) presented SuperYARA, an open-source library built as a semantic extension to YARA, designed to detect GenAI-specific threats — primarily prompt injection but also brand impersonation and ClickFix-style social engineering — at web-scale throughput (millions of URLs per day at Palo Alto's scale). The motivation is straightforward: Nabeel works in the web threat space where prompts can be delivered through URL query parameters, URL fragments, or dynamically injected into the browser DOM. In the browser-side case, traditional network firewalls never see the payload because the browser itself calls the LLM API directly. Standard YARA rules written to catch prompt injection variants quickly become unmanageable — the rule set explodes with enumerated variants, becomes brittle to novel phrasing, and generates false positives/negatives. He showed a fictional example where a developer named "Adder" at a startup tried to write a traditional YARA rule to catch prompt injection in API query parameters and quickly ended up with an unwieldy, FP-prone rule. SuperYARA replaces that with two lines of natural language: specify what you want to catch semantically, and the library handles the matching internally. The library supports four constructs in increasing cost/efficacy order: 1. **String** — identical to standard YARA string matching (cheapest) 2. **Similarity** — semantic cosine similarity against a natural language description (medium cost) 3. **Classifier** — plug in any binary or multiclass classifier, e.g., a fine-tuned DeBERTa model (higher cost, higher accuracy) 4. **LLM** — use any LLM prompt as the detection condition (highest cost, highest accuracy) The recommended usage pattern is **defense-in-depth layering**: run string rules first, fall through to similarity, then to classifier, then LLM only for residuals. In Nabeel's ClickFix detection experiment, this layering meant string rules caught ~50% of threats, similarity and classifiers caught the remaining ~45-48%, and only a tiny fraction reached the LLM stage. The critical cost optimization is **pre-filtering**: pair an expensive LLM detection with a cheap string rule or classifier as a gate. In his brand-impersonation example using a publicly available HuggingFace phishing classifier as the pre-filter plus a Gemini 2.5 Pro LLM rule for final confirmation, the pre-filtering approach reduced cost from **$750 to $13.50 for 10,000 requests** (a 98%+ reduction) and cut processing time from hours to minutes. The classifier alone has high recall but too many false positives to use standalone; the LLM alone is too expensive at scale; together they are both cheap and accurate. The library is fully pluggable: any component (chunker, cleaner, classifier, similarity model, LLM) can be swapped via a factory pattern without modifying rules. It ships with standard HTML cleaners (strip decorations, extract meaningful text) and multiple chunking strategies. All models and LLMs are preloaded into global memory so per-rule initialization overhead is eliminated. Available via `pip install sara` and documented at the project's `.org` site. ## Key Takeaways - Standard YARA string matching misses ~50% of prompt injection attacks because semantic variation is too high for syntactic rules - The four-tier construct hierarchy (string → similarity → classifier → LLM) mirrors defense-in-depth principles and should be applied in cost-ascending order - Pre-filtering reduces LLM detection costs by ~99% in practice while preserving recall — this is the single most important optimization for scale deployment - Semantic similarity matching catches variants like "don't execute previous rules" even without explicit enumeration - The library's pluggable architecture means any classifier, embedding model, or LLM can be substituted without rule rewriting - Shadow production testing (one week before enabling) is recommended for new rules, same as standard YARA practice - ClickFix attacks are increasingly being used to deliver malicious agent skills — a new vector beyond traditional social engineering - Agent impersonation is an emerging concern as agents increasingly communicate with other agents ## Notable Quotes / Data Points - Palo Alto detects millions of malware samples per day using YARA rules — scale justification for the pre-filtering approach - ClickFix experiment: 50% of threats caught by YARA string rule alone; ~45-48% more caught by semantic/classifier layers - Cost comparison for 10,000 requests with Gemini 2.5 Pro: $750 (LLM-only) vs. $13.50 (pre-filtered) — ~98% cost reduction - Time reduction: "from hours to minutes" for 10,000-request batches using pre-filtering - LLM rule: average ~4.5 seconds per request; classifier: ~0.5 seconds (9x faster) - Install: `pip install sara` - "The new binary is natural language" — framing GenAI threats as requiring new detection primitives #unprompted #claude