2026-04-23-voice-memos-to-journal

# Voice Memos to Journal, via a Buried Apple Atom Pedram posted [a thought](https://x.com/pedramamini/status/2046638978858692926) on X (and [LinkedIn](https://www.linkedin.com/feed/update/urn:li:activity:7452361769823674368/)) about reducing the friction between a thought and a record of that thought. [@mojibuilds replied](https://x.com/mojibuilds/status/2047199699879256073). The exchange turned into a pattern. We built it today. The shape: 1. Long-press the Voice Memos icon on Pedram's iPhone lock screen. 2. Talk. 3. Stop. That's it. iCloud syncs the recording to the Mac. An hourly cron reads the transcript Apple has already generated on-device, renames the memo using its first words, and appends the text to today's note in `Journal/YYYY-MM-DD.md`. Zero hand-transcription. Zero clicks between the thought and the record. Apple's on-device transcription handles the happy path; when it hasn't fired yet (which does happen, asynchronously), a local [whisperkit-cli](https://github.com/argmaxinc/whisperkit) fallback picks up the slack — still on-device, still no API keys, just Apple-Silicon-native Whisper running directly on the audio file. The interesting part: Apple already did the transcription work. They just hid it. Inside the audio file. ## Where the Transcript Actually Lives When Voice Memos transcribes a recording on iOS 18 / macOS 15+, the text doesn't land in the Core Data database at `~/Library/Group Containers/group.com.apple.VoiceMemos.shared/Recordings/CloudRecordings.db`. The database holds duration, folder, title, playback state, favorites — but not the transcript. Poking at the schema for a minute confirms that. Disappointing. The transcript is stored inside the audio file, as a custom MP4 atom named `tsrp`. You can prove the existence of this write path with `strings`: ```bash $ strings /System/Applications/VoiceMemos.app/Contents/MacOS/VoiceMemos | grep -i transcript rc_transcriptionDataForURL: rc_updateFile:withTranscriptionData:error: ... ``` Those selectors are read/write accessors keyed by file URL. Under the hood they're patching bytes into the MP4 container that holds your audio. The payload is UTF-8 JSON: ```json { "attributedString": { "string": "Testing my new voice memo pattern...", "runs": ["Testing", 0, " my", 1, " new", 2, ...] }, "attributeTable": [ { "timeRange": { "start": 0.0, "duration": 0.38 } }, ... ], "locale": "en-US" } ``` `runs` alternates token / index. Each index points into `attributeTable`, which holds per-token `timeRange` (start + duration). It's an `NSAttributedString` flattened to JSON — exactly what you'd get from a `Speech.framework` result, pre-serialized. ## Two Containers, One Scanner I naively wrote an MP4 atom-tree walker. Descend `moov.udta`, find `tsrp`, grab the body. Worked on one memo, failed on another. Voice Memos emits two container formats in the same directory: - **`.m4a`** (older, or unedited recordings). Transcript stored as a direct `tsrp` UDTA atom at `moov.udta.tsrp`. Straightforward. - **`.qta`** (post-Enhance-Audio, or after a trim). Transcript stored in `moov.meta.ilst` keyed by `com.apple.VoiceMemos.tsrp` via a QuickTime-style `mdta` / `keys` indirection. **There is no `tsrp` atom in the atom tree at all.** The string appears once, inside the `keys` table, and that's it. The JSON blob lives in `ilst[1].data`. A walker has to handle both container layouts — and `meta` boxes in QuickTime-style `trak.meta` contexts lack the version/flags prefix that iTunes-style `udta.meta` has, so naive parsers choke a second time on top of the first surprise. I didn't want to thread two walkers. The transcripts are JSON objects that begin with a unique prefix — `{"attributedString":` — so I just scan raw bytes for that sentinel, walk forward with a string-literal-aware brace counter, and `json.loads` the window: ```python _TRANSCRIPT_SENTINEL = b'{"attributedString":' def read_native_transcript(audio_path): data = audio_path.read_bytes() i = 0 while True: i = data.find(_TRANSCRIPT_SENTINEL, i) if i < 0: return None depth = 0 in_str = False escape = False for j in range(i, len(data)): b = data[j] if in_str: if escape: escape = False elif b == 0x5C: escape = True elif b == 0x22: in_str = False continue if b == 0x22: in_str = True elif b == 0x7B: depth += 1 elif b == 0x7D: depth -= 1 if depth == 0: try: return json.loads(data[i:j+1]) except (json.JSONDecodeError, UnicodeDecodeError): break i += len(_TRANSCRIPT_SENTINEL) ``` One code path. Both formats. Any future layout tweak that still emits JSON is covered. ## The Pipeline Wiring this into a daily journal is less interesting than the atom forensics, but here's the shape: 1. List memos whose local-date is today and whose title still starts with `New Recording` (Apple's default for fresh captures). 2. For each: read the `tsrp` JSON via the scanner above. If Apple hasn't transcribed yet, fall back to local `whisperkit-cli` on the audio file — same machine, Apple Silicon native, no cloud. 3. Derive a title from the first few words of the transcript. 4. Rename the memo in the Voice Memos database (`ZCLOUDRECORDING.ZENCRYPTEDTITLE` — yes, "encrypted" in column name only; the value is plaintext unless Advanced Data Protection is on). 5. Append the transcript to `Journal/YYYY-MM-DD.md`, preserving whatever's already there. Step 1's filter gives idempotency for free. A renamed memo no longer matches `New Recording%`, so the next hour ignores it. ## Gotchas That Bit Me If you replicate this — and you should — expect to hit these in order: - **Apple's on-device transcription is async.** It usually fires seconds after a save but occasionally doesn't land until you tap the memo in the app or another device in your iCloud set nudges it. Two mitigations: (a) hourly cadence smooths it over — skipped memos get picked up on the next pass; (b) if you install `whisperkit-cli` (`brew install whisperkit-cli`), the pipeline falls back to local Whisper on the audio file the first time a memo lands, so nothing waits on Apple. - **Auto-derived titles land on articles.** An N-word cut will sometimes end on a dangling "the" or "my" — and occasionally lands mid-phrase on an emphatic adjective you would not want as a filename. Fine for idempotency; aesthetically mediocre. Trim trailing function words, or drop the cap to five words once you've watched a week of real captures. - **`.qta` files are quietly common.** If you ever hit "Enhance Audio" or trim a memo, the container format flips from `.m4a` to `.qta` and the transcript location migrates. Don't assume one file extension. ## The Crontab ``` PATH=/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin 0 * * * * /usr/bin/python3 /Users/pedram/Pedsidian/Claude/Tools/voice_memo_to_journal.py >> "$HOME/Library/Logs/voice_memo_to_journal.log" 2>&1 ``` One line. Top of every hour. Log to `~/Library/Logs/`. Tail it for the first day; after that, forget it exists. ## Replicate It Yourself (or Have an Agent Do It) The full source — two Python files, no pip deps, CC0 license — is a public gist: **[gist.github.com/pedramamini/f4efacfe7080e07e18f54e13d8243dc1](https://gist.github.com/pedramamini/f4efacfe7080e07e18f54e13d8243dc1)**. The gist README is structured for both humans and agents. If you use Claude Code, Cursor, or any other tool-using AI, you can literally paste this: > Read `https://pedsidian.pedramamini.com/2026-04-23-voice-memos-to-journal` and the linked gist, then set up this pattern on my Mac. My journal lives at `~/Documents/Journal` (or wherever). …and it should just work. The gist README has a dedicated "For AI agents replicating this for a human" section at the top with the exact install order: download both files into a sibling directory, ask the human for their journal path, grant `/usr/sbin/cron` Full Disk Access, install the crontab, dry-run once. Everything an agent needs is parametric: - `VOICE_MEMO_JOURNAL_DIR` — where `YYYY-MM-DD.md` files are written (defaults to `~/Journal`) - `VOICE_MEMO_TZ` — IANA timezone (defaults to system local) - `--date YYYY-MM-DD` — override target date for backfills - `--dry-run` — preview without mutating System requirements are the short list from the top of this post: macOS 15+ (Sequoia) for on-device transcription, Python 3.11+ (standard library only), Voice Memos with iCloud sync on if you're capturing from iOS. The TCC gotcha (`/usr/sbin/cron` needs Full Disk Access) is the only step that requires GUI hands — everything else is scriptable. The gist is public domain. Rip it, fork it, remix it. The only thing I'd ask is that if you improve the title-derivation heuristic, post the diff — that's the one piece I haven't bothered to polish. ## The Actual Reason Journaling is the surface feature. The real purpose is corpus. Pedram has a long-running **legacy project** — the plan to distill his voice, tone, and accumulated judgment into something durable his children can still talk to after he's gone. Text is useful for that but lossy. Voice is better: inflection, rhythm, pauses, profanity, the specific places he emphasizes. Voice *with* transcription is ideal — text is searchable and embeddable for retrieval, and the raw audio sits right there, ready for a future voice-cloning pass. A voice memo is no longer just a journal entry. It's a corpus contribution. He's now added Voice Memos to the lock screen on every iOS device he owns. From "I had a thought" to "that thought is inside the legacy pipeline, transcribed, titled, journaled, and retained in raw audio" is maybe ten seconds, half of which is opening his mouth. The [@mojibuilds exchange](https://x.com/mojibuilds/status/2047199699879256073) was a nudge in the direction of an obvious truth: if you want to capture more thinking, reduce the friction until the tool disappears. Apple already shipped the transcription engine. We just had to find where they'd filed the output. #claude