Machine Learning on Kubernetes at Shell: A Kubeflow Journey

Alex Iankoulski, Vangelis Koukis at KubeCon + CloudNativeCon North America 2020

In this session, Shell describes the lessons learned from working with multiple Machine Learning platforms and tools, the challenges of different systems, why we chose Kubeflow, and how we are now delivering successful models faster and at scale. Follow our journey as we learned how to deploy highly available, scalable, and secure Kubeflow clusters in the public cloud. We will describe the lessons learned and steps taken to improve our deployments including enterprise authentication and authorization, network integration, and data science workflows. We also discuss why we moved away from other platforms and chose Kubeflow, and how it has increased our Data Scientists’ productivity and reduced DevOps overhead. Today our teams are more self-sufficient, and iterate faster to produce production-ready models in a timely fashion. A zero to hero story made possible by Kubeflow and Kubernetes