per-provider link kinds vs canonical identifier kinds
Fixing a contact-routing bug where email addresses got stored under per-provider keys instead of a canonical kind
When a data model conflates two axes — where a record was observed vs what kind of identifier it is — bugs become systematic and asymmetric. Example: gmail-observed addresses got stored under links.outlook (because they first arrived via the outlook IMAP mailbox); later gmail messages from the same address missed the lookup because gmail searched links.email only. Adding another enum variant per provider just multiplies the variants. The right fix decouples the axes: introduce a canonical link kind (here: email for all email-like platforms), make the write path canonicalize, keep the read path permissive (look up all legacy kinds too) so migration can happen without downtime. One-shot script then walks the data store and folds the legacy kinds into the canonical one.
When you see a validation set that contains both a generic kind and per-source variants of it (email, outlook, gmail), pause — thats a sign someone derived the storage key from observation context. Audit the write path; if it picks the key from a provider/source string instead of the identifier itself, the asymmetric-lookup bug is already latent. Cheap to fix early.