Speaker:: Olivia Gallucci
Title:: macOS Vulnerability Research
Duration:: 21 min
Video:: https://www.youtube.com/watch?v=_f30RyXc_8Q
## Key Thesis
Apple's partial open-source releases can be treated as a continuous sensor stream of change signals, and AI agents can turn that stream into a prioritized vulnerability research queue — but only when the model is treated as a junior analyst that calls deterministic tools rather than as an oracle that invents conclusions. The workflow covers diff-to-hypothesis-to-harness and is explicitly cost-constrained and designed for individual researchers.
## Synopsis
Gallucci (Datadog) introduces OSS Sensor, a tool she published that generates a prioritized, evidence-backed research queue from Apple OSS diffs, basic binary features, and log templates. She strongly recommends using the non-AI version as the base and integrating your own AI workflow on top, because her own workflow includes intentional manual intervention steps and plug-and-play AI integration gets expensive fast.
She addresses the common misconception about Apple's openness: Apple has 400+ public repos but releases are incomplete and often outdated. Headers reference files that don't exist. Core Foundation, CFNetwork, iOS-specific drivers, and CoreCrypto (with additional redistribution constraints) are missing or only partially available. Despite this, the partial source is still valuable as a map — not the territory, but landmarks are real and can guide binary analysis.
The practical workflow is "diff to hypothesis to harness." Step one: ingest Apple OSS tags and tarballs, compute diffs at function and semantic level. Step two: score changes for security relevance — patterns like allocation math, bound checks, entitlement gating, XPC parsing, IOKit external method table changes. Step three: correlate with binaries using tools like `strings`, `otool`, `nm`, and `class-dump`, plus Apple distribution manifests to cross-reference components per release. Step four: emit a ranked queue with rationale and human validation pointers.
She cites a canonical example of the technique: in 2016, OSX Reverser diffed syslogd from 101.2 to 101.3, found one changed function where `value + 4` became `value x 4 + 4` (a heap overflow mitigation), and traced it back via a nearby log string. This pattern — surgical binary diffing to find patches that reveal prior vulnerabilities, which may remain unpatched in neighboring files — is the core research methodology.
For AI's role, Gallucci is clear: the model does not reason about exploitability. It calls deterministic tools (strings, otool, class-dump, log queries), parses the structured output, retrieves relevant OSS source context, and produces bounded conclusions with citations. She uses a pipeline of specialized agents: a retriever (embedding-based indexing of OSS code, symbols, and log templates), a toolbox of deterministic analyzers, and separate agents for diff triage, re-context, hypothesizing, planning, and analysis.
Unified logs are the second signal beyond OSS diffs and binaries. They provide subsystem/category labels that map back to components, stable message strings for binary hunting, and execution hints for reconstructing data flow without full source. She treats logs as a pivot for attack surface identification, not as doctrine — they're noisy, but reveal which components actually handled an input, which error paths fired, and what IPC contracts exist.
The fuzz planner agent produces: enumerated candidate services (syscall handlers, IOKit user clients, XPC services, parsers), a minimal test harness with invocation path and entitlement requirements, a seed strategy from extracted dictionaries and log-observed parameters, and success metrics (crash bucketing, sanitizer signals, unique stack traces). It does not produce an exploit.
The key constraint she acknowledges: hallucination risk, cost ($1,000+ tokens consumed quickly when feeding entire OS subsystems), and model content restrictions on full-chain offensive research. Her fix: keep outputs bounded to bug class prediction and harness design; humans own exploitability assessment.
## Key Takeaways
- Apple's 400+ OSS repos are incomplete but still a valuable "sensor stream" of change signals
- Treat the AI as a junior analyst calling deterministic tools, not as a reasoning oracle
- Binary diffing after patches reveals exploitation targets — bugs often fixed in one file but not neighboring ones
- Unified logs are a high-ROI signal source: subsystem/category labels, error path reconstruction, IPC contract hints
- Fuzz planner outputs a plan (harness, seed strategy, success metrics) — not an exploit
- Cost is a real constraint: feeding full OS subsystems can burn $1,000+ quickly
- Intentional manual checkpoints prevent runaway costs and catch hallucinations
- Same pipeline shortens threat detection engineering: faster identification of changed behaviors → faster monitoring updates
## Notable Quotes / Data Points
- Apple has 400+ public repos; many releases incomplete, headers reference missing files
- 2016 syslogd diff example: `value + 4` → `value × 4 + 4` as heap overflow mitigation, found via binary diff of one changed function
- "I don't treat the model as god. I treat it as a junior junior engineer that can read fast, follow procedures, and generate structured hypotheses"
- "You hit a grand quickly when you're putting in entire subsystems of operating systems"
- "The model is not God. It's a tool-using junior analyst"
- Patrick Wardle's public work used as sanity-check reference against hallucinated hypotheses
- OSS Sensor licensed GPLv3 intentionally, to honor and propagate the open-source research lineage it builds on
#unprompted #claude