Gallucci - macOS Vulnerability Research

Speaker:: Olivia Gallucci Title:: macOS Vulnerability Research Duration:: 21 min Video:: https://www.youtube.com/watch?v=_f30RyXc_8Q ## Key Thesis Apple's partial open-source releases can be treated as a continuous sensor stream of change signals, and AI agents can turn that stream into a prioritized vulnerability research queue — but only when the model is treated as a junior analyst that calls deterministic tools rather than as an oracle that invents conclusions. The workflow covers diff-to-hypothesis-to-harness and is explicitly cost-constrained and designed for individual researchers. ## Synopsis Gallucci (Datadog) introduces OSS Sensor, a tool she published that generates a prioritized, evidence-backed research queue from Apple OSS diffs, basic binary features, and log templates. She strongly recommends using the non-AI version as the base and integrating your own AI workflow on top, because her own workflow includes intentional manual intervention steps and plug-and-play AI integration gets expensive fast. She addresses the common misconception about Apple's openness: Apple has 400+ public repos but releases are incomplete and often outdated. Headers reference files that don't exist. Core Foundation, CFNetwork, iOS-specific drivers, and CoreCrypto (with additional redistribution constraints) are missing or only partially available. Despite this, the partial source is still valuable as a map — not the territory, but landmarks are real and can guide binary analysis. The practical workflow is "diff to hypothesis to harness." Step one: ingest Apple OSS tags and tarballs, compute diffs at function and semantic level. Step two: score changes for security relevance — patterns like allocation math, bound checks, entitlement gating, XPC parsing, IOKit external method table changes. Step three: correlate with binaries using tools like `strings`, `otool`, `nm`, and `class-dump`, plus Apple distribution manifests to cross-reference components per release. Step four: emit a ranked queue with rationale and human validation pointers. She cites a canonical example of the technique: in 2016, OSX Reverser diffed syslogd from 101.2 to 101.3, found one changed function where `value + 4` became `value x 4 + 4` (a heap overflow mitigation), and traced it back via a nearby log string. This pattern — surgical binary diffing to find patches that reveal prior vulnerabilities, which may remain unpatched in neighboring files — is the core research methodology. For AI's role, Gallucci is clear: the model does not reason about exploitability. It calls deterministic tools (strings, otool, class-dump, log queries), parses the structured output, retrieves relevant OSS source context, and produces bounded conclusions with citations. She uses a pipeline of specialized agents: a retriever (embedding-based indexing of OSS code, symbols, and log templates), a toolbox of deterministic analyzers, and separate agents for diff triage, re-context, hypothesizing, planning, and analysis. Unified logs are the second signal beyond OSS diffs and binaries. They provide subsystem/category labels that map back to components, stable message strings for binary hunting, and execution hints for reconstructing data flow without full source. She treats logs as a pivot for attack surface identification, not as doctrine — they're noisy, but reveal which components actually handled an input, which error paths fired, and what IPC contracts exist. The fuzz planner agent produces: enumerated candidate services (syscall handlers, IOKit user clients, XPC services, parsers), a minimal test harness with invocation path and entitlement requirements, a seed strategy from extracted dictionaries and log-observed parameters, and success metrics (crash bucketing, sanitizer signals, unique stack traces). It does not produce an exploit. The key constraint she acknowledges: hallucination risk, cost ($1,000+ tokens consumed quickly when feeding entire OS subsystems), and model content restrictions on full-chain offensive research. Her fix: keep outputs bounded to bug class prediction and harness design; humans own exploitability assessment. ## Key Takeaways - Apple's 400+ OSS repos are incomplete but still a valuable "sensor stream" of change signals - Treat the AI as a junior analyst calling deterministic tools, not as a reasoning oracle - Binary diffing after patches reveals exploitation targets — bugs often fixed in one file but not neighboring ones - Unified logs are a high-ROI signal source: subsystem/category labels, error path reconstruction, IPC contract hints - Fuzz planner outputs a plan (harness, seed strategy, success metrics) — not an exploit - Cost is a real constraint: feeding full OS subsystems can burn $1,000+ quickly - Intentional manual checkpoints prevent runaway costs and catch hallucinations - Same pipeline shortens threat detection engineering: faster identification of changed behaviors → faster monitoring updates ## Notable Quotes / Data Points - Apple has 400+ public repos; many releases incomplete, headers reference missing files - 2016 syslogd diff example: `value + 4` → `value × 4 + 4` as heap overflow mitigation, found via binary diff of one changed function - "I don't treat the model as god. I treat it as a junior junior engineer that can read fast, follow procedures, and generate structured hypotheses" - "You hit a grand quickly when you're putting in entire subsystems of operating systems" - "The model is not God. It's a tool-using junior analyst" - Patrick Wardle's public work used as sanity-check reference against hallucinated hypotheses - OSS Sensor licensed GPLv3 intentionally, to honor and propagate the open-source research lineage it builds on #unprompted #claude