Monitoring Kubernetes: Follow the Data

A presentation at Cloud Native London 2018 in in London, UK by Daniel "phrawzty" Maher

At Datadog they help thousands of organizations monitor their infrastructure and applications. In this session, Daniel will dive deeper into the several hundred trillion data points they’ve gathered to extract information about the real-world use of Kubernetes and see trends in container and orchestrator usage. As you look at Kubernetes and container usage data, you will also learn about the top applications being used in orchestrated environments and, using the data, provide insight into which metrics you should watch and how to troubleshoot based on those metrics. In this session, Daniel will explore a framework for your metrics and how to use it to find solutions to the issues that come up. You will also learn the three types of monitoring data; what to collect; what should trigger an alert (avoiding an alert storm and pager fatigue); and how to follow the resources to find the root causes of problems. Although the real-world Kubernetes and container use data is derived from Datadog users, the focus of this session is not tool specific, so you will leave with strategies and frameworks that you can implement in your container-based environments today regardless of the platforms and tools you use.

Resources

The following resources were mentioned during the presentation or are useful additional information.

  • Event Listing

    SkillsMatter went into administration in 2019. Wayback Machine to the rescue!

Buzz and feedback

Here’s what was said about this presentation on social media.