Basic Kubernetes features for a dynamic orchestrator of containers

There are two utilities in Kubernetes that allow flexibility to applications within containers, such as Horizontal Pod Autoscaler and Cluster autoscaler. I will try to explain a brief overview of each of them.

Horizontal Pod Autoscaler

HorizontalPodAutoscaler (HPA) is a Kubernetes feature that allows you to automatically scale the number of pods (a set of one or more containers) in a replication controller, deployment, replica set, or state set based on observed metrics such as CPU usage or memory usage.

Imagine that you have an online store. For most of the day, you only need a couple of servers to handle the load. But Black Friday comes, and suddenly you have thousands of users hitting your site. This is where HPA comes into play.

The HPA adjusts the number of pods in a replication or deployment controller based on the current metrics and the goal you have defined. For example, you could tell Kubernetes: “I want the CPU usage of these pods to be 50%.” If CPU usage rises to 70%, Kubernetes will launch more pods to handle the load. If CPU usage drops to 30%, Kubernetes will kill some pods.

HPA in Kubernetes is implemented as a control loop, with a control period defined by the management controller (by default, 15 seconds). During each period, the management controller queries the resource usage metrics against the desired metrics that have been defined and adjusts the number of replicas if necessary.

You can use kubectl autoscale command , or you can define it in a YAML file, similar to how you define other resources in Kubernetes.

Examples of use with Horizontal Pod Autoscaler (HPA)

  1. Create an HPA for a deployment called web-app , with a minimum of 3 pods, a maximum of 10 pods, and a CPU utilization target of 75%:

kubectl autoscale deployment web-app –min=3 –max=10 –cpu-percent=75

  1. Get information about the HPA you just created:

kubectl get hpa web-app

  1. Let’s say your web-app is growing and you need to allow more pods. You can modify the maximum number of pods that the HPA can scale to 20:

kubectl patch hpa web-app -p ‘{“spec”:{“maxReplicas”:20}}’

  1. If you find that your application handles the load well with higher CPU utilization, you can modify the CPU utilization target to 85%:

kubectl patch hpa web-app -p ‘{“spec”:{“targetCPUUtilizationPercentage”:85}}’

  1. If you no longer need the HPA (for example, if you are deleting the application), you can delete it:

kubectl delete hpa web-app

But normally you will can to define a sample YAML for HPA

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
  name: web-app-hpa
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 3
  maxReplicas: 10
  - type: Resource
      name: cpu
        type: Utilization
        averageUtilization: 75

Cluster Autoscaler

On the other hand, the Cluster Autoscaler feature is another powerful feature of Kubernetes that is responsible for automatically scaling the size of your Kubernetes cluster.

While the HorizontalPodAutoscaler is responsible for adjusting the number of pods in your cluster, the Cluster Autoscaler goes one step further and adjusts the number of nodes or servers that make up the computing of your cluster.

Let’s go back to the online store analogy. Let’s say it’s Black Friday and you have a ton of traffic. The HorizontalPodAutoscaler has done a great job launching more pods to handle the load, but now you are running out of resources in your cluster. This is where Cluster Autoscaler comes into play.

Cluster Autoscaler monitors the capacity of your cluster and the demand of your pods. If it sees that there are pods that cannot be scheduled because there are not enough resources, it will add more nodes to the cluster. Similarly, if it sees that there are nodes that have been down for a while (default 10 minutes), and their pods can be rescheduled on other nodes, it will remove those nodes to save resources.

Cluster Autoscaler is especially useful if you are using a cloud service provider that allows you to scale your infrastructure on demand.

You can configure your cluster to automatically adjust to the workload, which can save you a lot of money on infrastructure costs.

Examples of use with Cluster Autoscaler

Suppose you have a Cluster Autoscaler configured in a deployment called cluster-autoscaler in the kube-system namespace :

  1. View your Cluster Autoscaler configuration:

kubectl -n kube-system describe configmap cluster-autoscaler-status

  1. View scaling events for your Cluster Autoscaler:

kubectl -n kube-system describe configmap cluster-autoscaler-events

  1. Let’s say you need to adjust your Cluster Autoscaler configuration to allow a larger number of nodes. You can edit the display (this will open the display in your default text editor, where you can change the settings):

kubectl -n kube-system edit deployment cluster-autoscaler

  1. If you are experiencing problems scaling your cluster, you can view the Cluster Autoscaler logs for more information:

kubectl -n kube-system logs deploy/cluster-autoscaler

  1. If you have made changes to the Cluster Autoscaler configuration and need them to take effect immediately, you can restart the deployment:

kubectl -n kube-system rollout restart deployment cluster-autoscaler

Advantages and disadvantages

Horizontal Pod Autoscaler (HPA)


      1. Pod Autoscaling – HPA can automatically increase or decrease the number of pods in a deployment or replicaset based on resource utilization or custom metrics.

      2. Improves availability : By adjusting the number of pods based on demand, HPA can help maintain application availability even during traffic spikes.

      3. Resource efficiency : By reducing the number of pods when demand is low, HPA can help save resources.


      1. Scaling latency : There may be a delay between an increase in demand and pods scaling to meet that demand.

      2. Metric dependency : HPA depends on the availability and accuracy of metrics to make scaling decisions.

Cluster Autoscaler


      1. Automatic node scaling : Cluster Autoscaler can automatically increase or decrease the number of nodes in a cluster based on demand.

      2. Cost efficiency : By reducing the number of nodes when demand is low, Cluster Autoscaler can help save costs, especially in cloud environments where costs are based on usage.

      3. Improves availability : By adjusting the number of nodes based on demand, Cluster Autoscaler can help maintain application availability even during traffic spikes.


      1. Node startup time : Unlike pods, nodes can take some time to boot, which can delay their ability to respond to demand spikes.

      2. Unforeseen costs : If not configured correctly, Cluster Autoscaler could increase the number of nodes beyond what was anticipated, which could lead to unforeseen costs.

Important aspects to take into account

  1. Configuration : HPA and Cluster Autoscaler configuration should be tailored to the specific needs of your application. For example, if you have an application that experiences traffic spikes during certain hours of the day, you can configure HPA to scale the number of pods during those hours. Or if you are using a cloud service provider that allows you to scale the number of nodes on demand, you can configure Cluster Autoscaler to add nodes during traffic peaks and remove them when they are no longer needed.

  2. Monitoring : It is important to have a monitoring system in place to track the performance of your cluster and the effectiveness of HPA and Cluster Autoscaler. For example, if you see that HPA is constantly scaling the number of pods up and down, it may be a sign that the configuration needs to be adjusted. Or if you see that Cluster Autoscaler is adding and removing nodes frequently, it may be a sign that your application demand is volatile and may need to be stabilized.

  3. Testing : Before deploying HPA and Cluster Autoscaler in production, it is a good practice to test them in a test environment. For example, you could simulate a traffic spike to see how HPA and Cluster Autoscaler respond, and adjust settings accordingly. This can help you avoid surprises when deploying to production.

  4. Costs : If you are using a cloud service provider, it is important to consider the costs associated with scaling your cluster. For example, if Cluster Autoscaler adds many nodes during a traffic spike, it could increase your infrastructure costs. It’s important to understand how your cloud service provider charges for resources and adjust your Cluster Autoscaler settings accordingly.

Wrapping up

Currently there are several providers with very advanced solutions to control how to configure the behavior of a Kubernetes cluster and therefore the costs it generates in our Cloud infrastructure provider. However, many times we can control the behavior quite a bit with these 2 utilities that provide control at the container level and at the server level.

Userful? thanks for share it