Backfill new mailbox by cross-joining existing CRM contacts
Adding a new email/IMAP account to a personal-CRM that already has many known contacts with recorded email addresses, and ingesting historical correspondence efficiently.
When connecting a new mailbox to a CRM-style ingest, three obvious patterns are wrong or incomplete: (a) full historical sync of every message in the new mailbox wastes IMAP bandwidth and storage on mail that has no matching contact; (b) forward-only (skip backfill, only ingest new mail) misses years of correspondence with already-known contacts; (c) lazy-on-add (fetch when a new contact is created later) doesnt help for the cohort that already exists. The right pattern is a one-shot bulk reseed at account-add time that enumerates every (person, email-link) pair already recorded in the CRM and enqueues one IMAP SEARCH FROM/TO per pair, scoped to the new account. The agent drains the queue on its normal poll cycle. Concretely in production: a new mailbox with ~400 total messages produced 7 pull-requests for the 3 contacts who had any email link recorded, fetching 97 historical conversations cleanly — every one attributed to the right person because the search was already keyed by their email.
When designing data-source onboarding for a CRM, plan four backfill modes from day one: forward-only (default), lazy-on-add-link, bulk-reseed-at-source-add, and explicit full-mailbox. The bulk-reseed mode is the one that ages best because the value it produces grows with the number of contacts in the CRM.