Member-only story

3 Years of Kubernetes in Production–Here’s What We Learned

Key takeaways from our Kubernetes journal

Komal Venkatesh Ganesan
Better Programming
7 min readSep 9, 2020

Photo by Jessica Lewis on Unsplash

We started out building our first Kubernetes cluster in 2017, version 1.9.4. We had two clusters, one that ran on bare-metal RHEL VMs, and another that ran on AWS EC2.

Today, our Kubernetes infrastructure fleet consists of over 400 virtual machines spread across multiple data-centres. The platform hosts highly-available mission-critical software applications and systems, to manage a massive live network with nearly four million active devices.

Kubernetes eventually made our lives easier, but the journey was a hard one, a paradigm shift. There was a complete transformation in not just our skillset and tools, but also our design and thinking. We had to embrace multiple new technologies and invest massively to upscale and upskill our teams and infrastructure.

Looking back, after three years of running Kubernetes in production, here are key lessons from our journal.

1. The Curious Case of Java Apps

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Komal Venkatesh Ganesan
Komal Venkatesh Ganesan

Written by Komal Venkatesh Ganesan

Engineer — Software / AI / Electronics / Technology. In pursuit of fundamental understanding of elemental physics/science | LinkedIn: https://bit.ly/2DN8rfP

Responses (20)

Write a response

Nice article! A suggestion: item 4 is solved with startup probe, alpha from 1.16, beta from 1.18. ;)

despite all the improvements, there is no denying that Java still has a bad reputation for hogging memory

Hi Komal, thank you for sharing your experience! It's definitely useful.
A group of people from Java community is improving JVM in terms of memory usage efficiency and its elasticity. We have achieved a quite good progress. You can find specific…

I reffer to your point 4 as restart storms. Those probes can be tricky to get right. To agressive and you risk restarts. Too passive and things take forever to start up.