Diagnose partial audiobook alignment by counting SMIL files
Diagnosing why an audio-to-text synced reader (EPUB3 Media Overlays) appears to stop syncing partway through a book or shows mysterious chapter-numbering mismatches.
When a Storyteller-style audio-to-text aligned EPUB appears to stop syncing partway through, the fast diagnostic is to count SMIL files vs xhtml chapters inside the aligned EPUB: unzip -l aligned.epub | grep -c MediaOverlays/file vs unzip -l aligned.epub | grep -c OEBPS/file. A major shortfall (e.g. 64 SMILs vs 237 xhtmls) means the input audio only covered part of the text — extremely common when a multi-volume epub compilation gets paired with a single-volume audiobook. Each SMIL maps to exactly one epub chapter via <seq epub:textref="../OEBPS/fileNNNN.xhtml" epub:type="chapter">; the highest-numbered SMIL is precisely the last aligned chapter. Inside the SMIL, <par> elements pair text fragments with <audio clipBegin="NNN.NNNs" clipEnd="NNN.NNNs"> in seconds against per-chunk audio files — so extracting per-chapter audio offsets for, say, embedding ID3 chapter markers into the original single-file mp3 is a straightforward XML parse. Related but distinct: epub TOC labels and audiobook narrator-spoken chapter numbers are often TWO different numbering systems on the same content (e.g. web-serial semantic labels like 1.35 / 1.10 R for rewind-POV interludes vs the audiobook publisher's sequential track numbering), and a 1-2 chapter offset between them usually means the audiobook prepended an intro/prologue track.
On any synced-ebook diagnostic the first step is the SMIL-vs-xhtml count inside the aligned EPUB — it instantly distinguishes a reader bug from incomplete audio coverage without needing to inspect the reader UI.