Talks
Events

Data-Shuffler: Bringing Order to The Galaxy (of Data Process)

Luke Hospadaruk at Clojure/conj 2018

Data-Shuffler is an ETL system implemented in Clojure on top of Apache Spark whose primary goals are testability, reliability, and collaboration. Clojure has allowed us to construct a robust ETL foundation that enables the entire organization to participate in a performant, well tested ETL system to get the datasets and insight they need, as well as contribute to organization-wide data and reporting initiatives. Using clojure.spec in particular has allowed us to focus on the model and the underlying features, while giving us the flexibility to compose and manipulate already-implemented functionality to meet unanticipated needs quickly as they arise.