Federated learning: private distributed ML

Mike Lee Williams at Strange Loop 2019

Federated learning is a way to do machine learning when training data is partitioned between nodes that are either unable or unwilling to share it.

The nodes can be embedded devices, smartphones or even legal entitities like companies or countries. They can be unable to share the data because of engineering constraints like bandwidth or power or legal bright lines such as HIPAA. Or they can be unwilling to share the data because of (very legitimate and topical!) concerns about the security, commercial exploitation and privacy of sensitive personal data.

Federated learning allows the nodes to collaborate to train a machine learning model, without needing to share direct access to their training data with each other or a central authority. Instead they each share partially trained models.

This talk will explain these ideas in more detail. I'll describe a specific instance of a federated learning algorithm (called federated averaging), and I'll explain the ways in which the real world full of malicious actors and distributed systems complicates the naive picture. I'll then talk about the research that is going on right now to harden security, reduce communication costs, and strengthen privacy guarantees.

The hope is that, with federated learning, we no longer need to give up our privacy in order to use life-saving, money-saving, helpful and fun machine learning models.

Mike Lee Williams
Cloudera
@mikepqr

Mike Lee Williams is an engineer at Cloudera where he works on machine learning and products that make machine learning easier. While getting his PhD in astrophysics he spent 2% of his time observing the heavens in beautiful far west Texas, and the other 98% trying to figure out how to fit straight lines to data. He once did a postdoc at the Max Planck Institute for Extraterrestrial Physics, which, amazingly, is a real place.