Clojure Where it Counts: Tidying Data Science Workflows

Ben Kamphaus, Pier Federico Gherardini at Clojure/conj 2019

Data cleaning and organization is a tedious and time consuming part of any data science workflow. To address these pain points at the Parker Institute for Cancer Immunotherapy, we turned to Clojure to handle our complex and semantically rich datasets, such as molecular and clinical data from patients undergoing cancer therapy with experimental treatments. Our solution is centered around Datomic, and we have built a number of tools to integrate this database in a data science environment. These tools include a configurable data-driven ETL pipeline (that does not require writing any code or knowledge of Datomic internals) and libraries for writing Datalog queries and using the results directly within R. Rather than re-implementing analysis code in Clojure, we bring Clojure’s unique strengths to an existing data science environment