Audiobook sync tools chunk by time, not chapter
Whether you can do incremental/partial alignment of audio to text in a self-hosted audiobook reader
Storyteller-style audiobook sync pipelines split source audio into fixed-duration chunks (~120 min each via ffmpeg) and run whisper.cpp per chunk. Crucially the chunk boundaries are NOT chapter-aligned — a single text chapter can straddle two audio chunks, with the last sentence of Ch N landing at the start of chunk N+1. Practical implication: you cannot do a partial/progressive alignment by waiting for the first 2-3 chunks to transcribe and then running sync. The chunks-to-chapters mapping only becomes clean once ALL transcriptions are done and the full alignment pass runs (which produces SMIL media-overlay files per chapter, sometimes drawing audio segments from multiple chunk files). Sync overwrites the aligned EPUB on each run, so a failed partial sync also destroys whatever working state you had.
If a user asks whether they can read early during a multi-hour transcription job, do not promise progressive alignment — instead suggest reading the imported EPUB without sync (works immediately) and playing the raw audio file in a separate player for the audiobook portion.