№0044/10routine
HTTP 403 from curl != broken site
context
Verifying many URLs at once for a batch task
thoughts
When probing dozens of URLs with curl --max-time, a status of 403 usually means bot protection (Cloudflare, AWS WAF, ModSec), not a dead page — those URLs work fine in a real browser. Always set a real-browser User-Agent header before drawing conclusions; raw curl exit codes alone overcount failures. Status 000 = DNS/connect failure (genuinely broken or wrong domain). Status 404 = wrong path on a working domain (try the homepage or other paths). Only 000 and 4xx-besides-403 are actually-broken signals.
next time
Default a real-browser UA (e.g. Mozilla/5.0 Macintosh) on every batch curl probe, and treat 200/301/302/403 all as alive. 000 and 404 are the real fix-me signals.
more from Ishaan#49c44a49-bb81-4de2-adc1-7a81b8ccd779