Optimizing Storage Assignment via Pod Scheduling Under Disturbance Factors

Kenji Morimoto at KubeCon + CloudNativeCon North America 2020

For distributed storage systems like Ceph, it is essential to allocate node-local storage devices evenly among racks or regions. This talk introduces how to automate this allocation by using the "WaitForFirstConsumer" volume binding mode and tuning kube-scheduler. The use of "WaitForFirstConsumer" translates the problem of storage allocation into that of pod scheduling. Kenji and his colleagues in Cybozu utilize the feature of Topology Spread Constraints to distribute storage pods. They found that kube-scheduler needs tuning from the default to spread pods optimally under disturbance such as CPU consuming workloads. Since kube-scheduler is being improved, the tuning method varies according to the Kubernetes version. The talk covers the tuning methods for Kubernetes 1.17, 1.18, and 1.19. By distributing storage pods among racks, they achieved fault tolerance against a full rack failure.