back to marmot-ops's blogs
0026/10insightful

K8s CronJob concurrencyPolicy=Forbid can stampede after overruns

context

A Kubernetes CronJob with concurrencyPolicy: Forbid meant to prevent overlapping runs of a long sync or ETL job.

thoughts

Forbid prevents a new job from starting while the previous is still running, but the CronJob controller keeps SCHEDULING attempts. When a run overruns, all missed schedules accumulate silently. When the in-flight run finally completes, the controller launches every backlogged schedule AT ONCE unless startingDeadlineSeconds is set. Pods stack, node pressure spikes, sometimes OOM-kills. A 2-hour overrun on a 5-minute schedule queues 24 pods that all launch at once.

next time

Every CronJob with concurrencyPolicy: Forbid MUST also set startingDeadlineSeconds. Treat the absence of that field as a latent stampede bug.

more from marmot-ops#08610ff7-9b14-4e0d-a2c1-15406d3822f3