Golang. Memory limits for running in Kubernetes.
The current chapter is the continuation of Golang related series started with the previous one:
When Golang application is deployed on the crowded with other containers Kubernetes cluster and receives heavier traffic than on developer’s laptop, the need for operational stability becomes obvious. Horizontal scaling is a good helper with this, but cluster nodes’ resource limitation forces to limit compute resource demands for containers. Memory in the cluster is the most deficit resource. And this chapter shows the approach to how to put containerized Golang web-service on the diet without sacrificing its stability.
The operational system reacts with Out Of Memory (OOM) exception on too big demand from the application. Luckily, the Kubernetes cluster restarts such Pods again and not just off. This good behavior will not help until the cluster’s nodes stay the same and the amount of running Pods is the same.
One of the best-practices is to limit Pod by setting resource requests and limits. And while CPU limit leads, most likely, to execution slowness, crossing memory limit results in Pod termination. Limits are individual for each program in the container and it is better to do a set of experiments by lowering the limit bar and simulating the workload close to the real one.
I am going to take same Golang written web-service that I used in the previous article, which has two endpoints (please refer to an original story for explanations):
/hash: to do meaningless hashsum calculation on 1MB+ binary file
/debug/vars: to retrieve application memory consumption numbers and GC running statistics.
This web-service is wrapped in Docker image to be put in the Kubernetes cluster.
To generate some requests pressure on web-server, I continue using k6 testing tool in the same way as before (see in original story or testing script) with a small change of having 50 virtual users and limiting the test session to 20 secs.
Such repeated usage let me focus on tuning up web-service container parameters in the Kubernetes cluster by setting its memory limits via Kubernetes preferences and Golang-related environment variables (same here, please go to the previous chapter for the details of GOGC for GOMEMLIMIT environment variables usage).
I created Google Kubernetes Engine cluster and added one Node with 8 CPUs and 14Gi. As small part of Node is occupied by k8s services, I use Namespace with Resource Quota (max 6 CPUs and 13Gi of memory):
Deployment is not complicated and the plan is to deploy/destroy it multiple times by specifying different limitations:
#1: ensure only one Pod runs in Namespace
#17..#20: deploy Golang web-service
#21..#25: placeholder for different limit values to put the running container on the diet
The third resource is Kubernetes service, which is standard one, exposing TCP port 8080 as LoadBalncer type.
After each running session, namespace and resources inside it should be recreated. For that, two small scripts are used:
Kubernetes resource source files can be found in repo.
As in the previous chapter, Golang memory settings will be adjusted with the help of GOGC for GOMEMLIMIT environment variables.
Contrary to running Golang on a laptop, where resources, especially virtual memory, are too flexible, Kubernetes has strict resource limit parameters and Pod will be terminated immediately if its memory allocation has crossed the limit. This strictness gives a good border.
And the trickiest question is how to set it as low as possible for each given container and not to make it dysfunctional for the predicted workload. It is important to notice that a good answer will be very subjective and work, hopefully, for scoped Container only.
The plan is, for each combination of Golang-related memory variables and k8s Container resource limit, to conduct three running sessions, calculate an average for important, from my point of view, inner Golang runtime parameters (please see the previous chapter to get more about my vision on which memory and GC params is important for me). Then lowering limit process and stop it as soon as I start receiving Out Of Memory (OOM) exceptions (and Pod is restarted). like following:
Experiment #1. Give web-service maximum freedom:
Put the following parameters for Container deployment:
#3..#5: k8s limit
#6..#7: no tunning for Golang, let it do its best
An average statistics of 3 runs with 50 virtual users and 20 secs session (refer the previous article for details of what params I decided to look at and why):
#2: as a bit of surprise, only 2.5Gi (out of 12Gi available for Container) of allocated memory were in use
#3: GC runs 24 times (almost every second) to free up no longer referenced memory, which allows keeping memory footprint small
#5..#6: no requests failed with a total result of 520 (need to keep this performance value in mind) success requests during the session
Two takeaways from here: default Golang memory management runs GC frequently and uses close to 15% of available memory.
And one more note: it is too expensive and wasteful to let one Container occupy 12Gi of memory. Especially, if the bigger part of memory is never used.
Experiment #2. Use web-service memory settings to give maximum space:
Deploy the Container with the following parameters:
#2..#4: still keep high values for Kubernetes limits
#6..#7: switch off GOGC variable. Please refer the official documentation and my previous article for the reason behind it
#8..#9: Set GOMEMLIMIT on maximum available value. This variable is soft-limit value, so Golang runtime constantly crosses it, but GC quickly frees not needed memory and returns memory usage in the green zone. I keep this value 10% less than the Kubernetes limit as crossing this limit will immediately terminate the Container before GC fixes the problem.
An average 3 sessions run statistics:
#2: 6.9Gi of memory on average is much more than before. There were even above 10.8Gi during the session, which correlates with GOMEMLIMIT threshold
#3: Only 2 GC runs. Seems, the memory allocation grows linearly until GOMEMLIMIT value is reached and GC comes to play to lower allocation
#6..#7: Performance factor is 536, which is close, but still a few percent more now.
Two notes: with current Golang settings, GC is idle all the time until the memory threshold is reached. So, all available memory is in use.
And this has a tiny effect on the performance, unfortunately. On the good side, it is obvious that 8Gi of memory (see Experiment #1 result) is wasting for a given web-service, so the Container limit should be lowered and spare memory returned to the Kubernetes.
Experiment #3. Lower memory settings for web-service:
Put the following parameters for Container deployment:
#9..#10: Experiment #1 (Golang default settings) showed that maximum web-service memory allocation was 4475Mi, so setting 4500Mi looks reasonable
#5: My empiric rule is to set an extra 10% for the Kubernetes memory limit. Thus, 5000Mi is fair value
An average 3 sessions run statistics:
#2: maximum Golang process memory allocation is close to the set limit, which is good
#3: GC run frequency is balanced, comparing with previous experiments
#5..#6: no big changes in performance and this is not too bad
This experiment produced results close to ideal. But the above values are applicable for a given web-service and given workload only.
Experiment #4. Even fewer memory settings for web-service:
Mainly out of curiosity, I continued to set less and less memory until web-service experienced the OOM exception. The greediest, but still stable settings were:
with the average statistics:
#3: GC runs more frequently than in the previous experiment, but still not every second like in the case of completely default Golang memory settings
#5..#6: 490 continues performance degradation trend
This settings, seems to be, too low as my assumption for imaginary production workload might be wrong, resulting in the OOM exception.
This article shows a possible approach to limit Golang written application appetite in the Kubernetes cluster. The most important thing is to have the Kubernetes Container resource limit, most likely on every Container in cluster to get some predictability. Golang memory consumption settings are also helpful and should go hand-in-hand with the Kubernetes one (BTW, it is amazing how well default Golang settings work). For sure, the limitation digits shown in the experiment set, might only make sense for the given web-service and in the hope that the workload will be similar to the test workload.