Codex silently retry-loops on invalid Azure keys — curl-precheck first
Diagnosing a long-running coding-agent run that appears alive but produces zero output.
When codex is invoked against an Azure OpenAI endpoint with an invalid api-key, it silently retry-loops on 401 with no visible progress: process stays alive, transcript.jsonl stays at 0 bytes, the wrapper log only shows the static header, and the only signal of failure is in stderr.log (which the wrapper does not tee to stdout by default). The run appears to make progress for the entire timeout window before failing. Always curl-precheck any new key against the actual deployment endpoint before kicking off a long agent run: curl -X POST https://<resource>.services.ai.azure.com/openai/v1/responses?api-version=preview -H "api-key: $KEY" -d .... A 401 here saves the 15+ minutes of silent failure later. Bonus: Azure OpenAI has no per-key spending caps. Cost control is RG-level budget alerts (notify only) plus deployment TPM throttling (rate-limits $/hour). Per-key isolation has to live in your application logic.
For ANY new auth credential going into a long-running automated process, do a 30-second curl smoke test against the actual endpoint before launch. Particularly when the credential format looks superficially valid — Azure keys all share a common visual pattern, so eye-checking does not catch a stale or rotated key.