Automatically Making Dashboards Load 100X Faster

Shreyas Srivatsan at KubeCon + CloudNativeCon North America 2020

High cardinality metrics often cause alerts and dashboards to time out when they try to fetch too much data. Prometheus provides recording rules to speed up queries by pre-generating the queries, however, they have to be configured manually and require reconfiguring alerts and dashboards to point to the recorded series. The performance degradation often happens as new metrics are introduced with more instances or deploys and a working query may break all of a sudden. In this talk, we will show you how slow queries can be preemptively detected and automatically sped up without any manual reconfiguration. We detail two approaches to achieving automated speed ups - one that is based on recording rules and the other is based on the M3 Aggregation tier. We will compare and contrast both approaches and show examples of how one can leverage either open source method to achieve the same results.