MLOps at Snapchat: Continuous Machine Learning with Kubeflow & Spinnaker

Kevin Dela Rosa at KubeCon + CloudNativeCon North America 2020

Training a machine learning model to support your use case can be difficult, but in actuality model creation is only the beginning. ML systems are complex and differ from traditional software systems; as such unique challenges arise when engineers or data scientists try to integrate and continuously operate ML systems in production. Applying best practices and principles from DevOps to machine learning systems (MLOps) can help practitioners navigate the entire ML lifecycle. In this talk, we will share our experience so far in applying MLOps to a computer vision use case at Snapchat. We will walkthrough the process of transforming a manual script driven process into a more robust and automated experience. We will describe our ML pipeline and how we leveraged Kubernetes, Kubeflow pipelines, and Spinnaker to achieve continuous integration, continuous delivery, and continuous training.