Improved TiKV Observability: How We Trace Events under Nanoseconds Latency

Wish Shi, Zhenchi Zhong at KubeCon + CloudNativeCon North America 2020

Observability is beneficial but often comes with a price. When adding tracing to low latency services (e.g. < 1ms), engineers might find notable performance degradation. Besides, trade-off solutions have inherent limitations. For example, sampled tracing may leave errors or unusual latency sampled out and missed. In this talk, Wish Shi and Zhenchi Zhong will share their experience in implementing a high-performance OpenTracing-compatible tracing library, which was originally created for TiKV. The library can trace events under nanoseconds latency without sampling on the modern x64 architecture. Decisions, design details and trade-offs will be presented, along with an open-source implementation available in both Rust and Golang.