Talks
Events

Using Clojure+Spark to Find All the Topics on the Interwebs

Hunter Kelly at Clojure/conj 2015

The latest generation of big data tools such as Apache Spark routinely handle petabytes of data while also addressing real-world realities like node and network failures. Spark's (http://spark.apache.org) transformations and operations on data sets are a natural fit with Clojure's everyday use of transformations and reductions. Spark MLlib's excellent implementations of distributed machine learning algorithms puts the power of large-scale analytics in the hands of Clojure developers. At Zalando's Dublin Fashion Insights Centre, we're using the Clojure bindings to Spark and MLlib to answer fashion-related questions that until recently have been nearly impossible to answer quantitatively.

At the Dublin Fashion Insights Centre, we are exploring methods of categorising the web into a set of known fashion related topics. This raises questions such as: How many fashion related topics are there? How closely are they related to each other, or to other non-fashion topics? Furthermore, what topic hierarchies exist in this landscape? Using Clojure and MLlib to harness the data available from crowd-sourced websites such as DMOZ (http://www.dmoz.org, a categorisation of millions of websites) and Common Crawl (http://commoncrawl.org, a monthly crawl of billions of websites), we are answering these questions to understand fashion in a quantitative manner.

About the speaker: Hunter Kelly is a software engineer at Zalando's Fashion Insights Centre in Dublin, Ireland, where he works on an interesting mix of big data and machine learning projects. He's a graduate of the University of California at Berkeley, where he pretty much lived in Emacs and spent too much time hacking in elisp. Ever since, he has yearned for a language with the expressive power of Lisp that could actually be used in the real world. Since learning about Clojure in its early days, he has enjoyed exploring how to think about and express problems in it. Before joining Zalando in 2015, he spent years in the trenches working at companies such as Pixar, Google, a few dot booms, and a few dot flops.