Speeding Up Analysis Pipelines with Remote Container Images

Ricardo Rocha, Spyridon Trigazis at KubeCon + CloudNativeCon North America 2020

Containers have taken a key role in the daily life of physicists at CERN, helping with packaging and sharing code as well as ensuring analysis reproducibility. This session will describe how processes have been adapted to containerize software releases of tens of gigabytes, and how they're used to process hundreds of petabytes of new data every year. In particular, it will focus on how container images are distributed in a large network of connected sites across the world and show how lazy loading of container images using the containerd remote snapshotter has ensured a flat startup time under 6 seconds while dramatically reducing network traffic. A live demo will include a real physics analysis pipeline of hundreds of parallel jobs using the setup described above.