Low Coordination Distributed Time Series Database

Frederic Branczyk at Go Systems Conf SF 2020

Distributed systems are hard, and we should avoid them at any cost. Arguably Prometheus’ success can in part be linked to its dead simple operational model: a single monolithic binary and everything is in-process, not distributed, and it has its purpose-built local time-series database. While a single Prometheus server can get an organization very far, at scale it is inevitable that Prometheus will need to be scaled out. Thanos is a project that aims to solve the pains of scaling Prometheus. In this talk, Frederic demonstrates how Thanos achieves sane operational complexity by employing distributed systems techniques that minimize the need for coordination.