Monitoring Kubernetes: Follow the Data

MONITORING K8S: FOLLOW THE DATA DANIEL MAHER, DATADOG @phrawzty

TRENDS IN CONTAINERISATION 2. THE QUALITIES OF GOOD METRICS 3. SCALING KUBERNETES (W/ METRICS)

DANIEL MAHER D O C S & TA L K S D E V O P S D AY S G R E AT O U T D O O R S GOURMAND SUITS FOR NO REASON @phrawzty

D ATA D O G SAAS-BASED MONITORING T R I L L I O N S O F P O I N T S / D AY WE’RE HIRING: w w w. d a t a d o g h q . c o m / c a r e e r s @datadoghq

https://www.datadoghq.com/docker-adoption/

https://www.datadoghq.com/container-orchestration/

Docker Adoption Dabbler: used Docker during a given month, but hadn’t reached significant use as defined by Adopter.

Docker Adoption Dabbler: used Docker during a given month, but hadn’t reached significant use as defined by Adopter. Adopter: the average number of containers running during the month was at least 50% the number of distinct hosts run, or there were at least as many distinct containers as distinct hosts run during the month.

Docker Adoption Dabbler: used Docker during a given month, but hadn’t reached significant use as defined by Adopter. Adopter: the average number of containers running during the month was at least 50% the number of distinct hosts run, or there were at least as many distinct containers as distinct hosts run during the month. Abandoner: a currently active company that used Docker in the past, but hasn’t used it at all in the last month.

WHAT’S RUNNING?

NGINX Redis Postgres FluentD Elasticsearch Mongo MySQL etcd RabbitMQ HAproxy 0% 10 % 20 % 30 % % Customers Running a Tech in Containers 40 %

Redis Postgres Elasticsearch MySQL MongoDB etcd RabbitMQ 0% 10 % 20 % 30 % % Customers Running Data Stores in Containers 40 %

ENTER: ORCHESTRATION

NEW CONTAINER: WHO DIS?

WHAT’S RUNNING? PART 2: THE RUNNINATING

4 QUALITIES OF GOOD METRICS NOT ALL METRICS ARE EQUAL

MUST BE WELL UNDERSTOOD

SUFFICIENT GRANULARITY

TAGGED & FILTERABLE

LONG-LIVED

CUSTOM METRICS & K8S

MUST BE WELL UNDERSTOOD $ kubectl autoscale rs foo !—min=5 !—max=10 !—cpu-percent=80 Questions: What does cpu percent mean exactly? Of user-space only? Does that even matter? Is that on any one instance? Across the pod? Of a single core? Multi-core? And what about spikes - do those count?

SUFFICIENT GRANULARITY 1 second Peak 46% 1 minute Peak 36% 5 minutes Peak 12%

TAGGED & FILTERABLE “Add more web servers when the frontend is getting busy.” metric: nginx.net.request_per_s scope: kube_container_name: frontend “oh no”

LONG-LIVED “There always seems to be ten web servers running. Should we increase the maximum number in the set?” Questions: What does normal even look like? Is normal stable over time? What about week-ends? Did you just notice now? Seriously, how long has it been this way?

M T W TH F M T W TH F M T W TH F M T W TH F

M T W TH F M T W TH F M T W TH F ALSO, WHAT HAPPENED HERE? M T W TH F

TRENDS IN CONTAINERISATION 2. THE QUALITIES OF GOOD METRICS 3. SCALING KUBERNETES (W/ METRICS)

MERCI ! DANIEL MAHER @phrawzty daniel.maher@datadoghq.com