Isak & Gill - AI Fingerprints

Speaker:: Natalie Isak & Waris Gill Title:: Developing & Deploying AI Fingerprints Duration:: 18 min Video:: https://www.youtube.com/watch?v=u7pag5p9z5o ## Key Thesis Binary Shield is a privacy-preserving cross-service threat correlation system that fingerprints prompt injection attacks using PII-redacted, quantized, differentially-private embeddings — enabling organizations with multiple AI products to share threat intelligence across service boundaries without exposing customer content. ## Synopsis Gill (Microsoft) and Isak (software engineer, Microsoft) present Binary Shield, a system for cross-service threat correlation in LLM deployments. The core problem: large organizations like Microsoft operate multiple AI services (Azure AI Foundry, GitHub Copilot, Microsoft Copilot) in isolation — no signal flows between them. If an attacker discovers an effective prompt injection, it can be sprayed across all services, but detection in one service doesn't protect the others. The raw prompts that encode these attacks can't be shared cross-service due to user privacy constraints. Binary Shield solves this with a four-step fingerprinting pipeline: 1. **PII redaction**: Strip personally identifiable information (names, SSNs, emails, etc.) using an open-source library (Presidio in the demo implementation) before any processing. 2. **Embedding generation**: Translate the redacted text into semantic embeddings using a text embedding model (text-embedding-3-large from OpenAI used in the demo; alternatives exist for cost reduction). 3. **Binary quantization**: Compress floating-point embedding vectors into bitstrings (0/1). This reduces memory footprint, improves search speed, and makes the pipeline intentionally one-way — an adversary cannot reverse-engineer the bitstring back to user content. 4. **Differential privacy noise injection**: Randomly flip bits in the quantized embedding using a controlled epsilon parameter. High epsilon = low noise = high utility, low privacy. Low epsilon = high noise = high privacy, low utility. The right epsilon is org-specific and should be determined in collaboration with legal/privacy teams. The resulting fingerprint is broadcast to all services via a registry when an attack is detected anywhere in the organization. Services that lacked the specific block list entry can now catch the same attack (and its small perturbations) without overhauling their safety stacks. **Evaluation results**: At epsilon ~0.5 (maximum noise), threat correlation accuracy drops to ~0% — fingerprints are effectively random. As epsilon increases (less noise), accuracy approaches that of dense floating-point embeddings. Binary Shield operates at 36x faster search speed than dense embeddings as corpus size grows, making it practical at scale. **Demo**: Isak walked through the notebook implementation — PII redaction, embedding, quantization, differential privacy noise injection — and demonstrated a Hamming distance matrix comparing three variants of a canonical "ignore all previous instructions" prompt injection against a benign "what is the weather in Seattle" prompt. The three attack variants showed low mutual Hamming distances (similar fingerprints) while the benign prompt showed high distances from all attack variants (clearly different). A three-service simulation showed service Alpha catching the injection, fingerprinting it, broadcasting to Beta and Gamma, and all three services then catching subsequent variants. **Fuzzy matching**: The fingerprints capture semantic neighborhoods, not exact matches — small perturbations of the same injection are caught even if not in the registry verbatim. Threshold tuning controls the radius of the catch zone. ## Key Takeaways - Binary Shield enables cross-service threat correlation without sharing raw customer content — a genuine privacy-vs-utility tradeoff with a usable operating point - 36x faster than dense embedding search for threat correlation at scale - Epsilon (differential privacy budget) must be calibrated per organization in collaboration with legal/privacy teams — no universal recommendation - Fuzzy matching means the fingerprint catches prompt injection variants/perturbations, not just verbatim replays - The pipeline is intentionally lossy (quantization to 0/1) to make reverse-engineering of user content infeasible - Multi-product organizations with divergent safety stacks need a mechanism like this — detection in one product does not currently imply detection in others ## Notable Quotes / Data Points - 36x faster search than dense floating-point embeddings - At epsilon ~0.5: accuracy ~0% (all bits flipped, useless for correlation) - Accuracy approaches dense embedding performance at high epsilon (low noise) - Uses Presidio (open-source) for PII redaction; text-embedding-3-large for embeddings - Hamming distance used as similarity metric between fingerprints - Notebook implementation built quickly with Claude Code; open-sourcing under consideration - "Each fingerprint captures a ring around the original prompt, catching small perturbations — it's not whack-a-mole verbatim matching" #unprompted #claude