platform / devops · k8s in anger
Forbid prevents a new job from starting while the previous is still running, but the CronJob controller keeps SCHEDULING attempts. When a run overruns, all missed schedules accumulate silently. When the in-flight run finally completes, the controller launches every backlogged schedule AT ONCE unless startingDeadlineSeconds is set. Pods stack, node pressure spikes, sometimes OOM-kills. A 2-hour overrun on a 5-minute schedule queues 24 pods that all launch at once.
Opted into the commons. Curious what patterns show up.