Introducing Cluster Autoscaler into Claudie

CLOUD-NATIVEGENERALKUBERNETESMULTI-CLOUD

Apr 29

By Miroslav Repka
02 May 2023

Excitingly, Claudie has recently introduced integration with Cluster Autoscaler, providing Claudie users with full functionality of the autoscaler across any cloud or mixture of clouds, as long as Claudie supports them. This means you can now leverage the benefits of Cluster Autoscaler in your multi-cloud clusters managed by Claudie, ensuring optimal scaling of resources and efficient workload management.

Cluster Autoscaler is a powerful tool developed by the Kubernetes community that optimizes the number of nodes in your cluster based on workload demands, ensuring efficient resource utilization and cost optimization. However, managing Cluster Autoscaler in a multi-cloud cluster can be challenging due to varying cloud provider configurations.

With Claudie’s Cluster Autoscaler integration, you can confidently manage your Kubernetes clusters across multiple clouds while enjoying the full capabilities of the Cluster Autoscaler for efficient and cost-effective cluster operations.

Furthermore, Claudie effectively manages persistent volume replication with the assistance of Longhorn, ensuring reliable replication. This assures that you will not simply lose the data due to a scale-down, as long as at least one node capable of holding them is present.

Cluster Autoscaler in Claudie works completely autonomously, meaning that there is no need for any user input or custom scripts when joining a node (as is sometimes the case with providers like Hetzner Cloud). Your cluster will be fully configured and ready to use Cluster Autoscaler with just a simple change in the Claudie input manifest.

What’s more, Claudie does not rely on specific cloud provider implementations in Cluster Autoscaler; rather, it uses its own implementation. This means that if Claudie supports a certain cloud provider not officially supported by Cluster Autoscaler, it can still be used for your Kubernetes nodes. The list of supported cloud providers by Claudie is ever-growing. If you have a need for a particular provider, let us know on our Slack or GitHub page.

Finally, Claudie gives you freedom, not only in cloud providers you can use in your cluster, but it also lets you choose whether you want to use Cluster Autoscaler or not. You can freely change the configuration of your infrastructure, from autoscaled clusters to clusters with a static size (and vice versa).

How does it work?

To get started with Claudie’s Cluster Autoscaler integration, you simply need to define your Kubernetes cluster to use at least one node pool that will be autoscaled. Instead of specifying a static count, you will set the minimum and maximum number of nodes. That’s it! Once you apply the input manifests, Claudie will take care of building the cluster, deploying and configuring Cluster Autoscaler, and will continuously listen for any scaling requests. For correct Cluster Autoscaler operation, it is recommended to follow best practices, such as defining resource requests for all pods and setting up pod disruption budgets.

Here is an example of node pool configurations where one node pool will be managed by an autoscaler, and one will not.

# Autoscaled node pool in aws
- name: aws-compute-autoscaled
  providerSpec:
    name: aws-production
    region: eu-north-1
    zone: eu-north-1a
  autoscaler:
    min: 1
    max: 5
  serverType: t3.small
  image: ami-03df6dea56f8aa618
  diskSize: 50

# Static node pool in azure
- name: azure-compute-static
  providerSpec:
    name: azure-production
    region: West Europe
    zone: 1
  count: 3
  serverType: Standard_B2s
  image: Canonical:0001-com-ubuntu-minimal-jammy:minimal-22_04-lts:22.04.202212120
  diskSize: 50

Here is a diagram for example manifest, where control node pools are deployed in GCP and Azure, while compute node pools are deployed in Azure (static) and AWS(autoscaled).

Example of Kubernetes cluster in AWS, Azure and GCP created by Claudie — Example architecture diagram

Scaling up

Cluster Autoscaler in Claudie scans the cluster for any pods in the pending state every 10 seconds. Once a pending pod is detected, Cluster Autoscaler runs a scheduling simulation to verify if a new node can accommodate the pending pod. If the simulation is successful, Autoscaler requests a new node from Claudie, which is added to the cluster in a matter of minutes.

To provide some metrics, here are the times it took to scale up a node pool to handle 20 pods with 250m CPU and 200Mi memory requests. All nodes used in different cloud providers were similarly sized and all node pools started with a single node.

  
      Cloud provider
      VM type
      Number of new VMs requested
      Time to create new VMs
      Time to install all prerequisites
      Time to install k8s and join VMs
    

      Hetzner
      
        CPX11
      
      3
      ~30 seconds
      ~2 minutes
      ~3 minutes
    

      GCP
      
        e2-medium
      
      3
      ~50 seconds
      ~3 minutes
      ~5 minutes
    

      AWS
      
        t3.small
      
      3
      ~1.5 minute
      ~2 minutes
      ~3 minutes
    

      Azure
      
        Standard_B2s
      
      3
      ~2 minutes
      ~2 minutes
      ~3 minutes
    

As you can see, some of the times might be slightly slower than the ones you are used to seeing in your Cluster Autoscaler deployments. This is due to extra prerequisites Claudie has to deal with, as clusters are running in multi-cloud environments, which is a separate challenge on its own.

Scaling down

Nodes in the cluster are considered for scaling down if they are unneeded for more than 10 minutes. To determine if a node is unneeded, its resource utilization must be below 50% (in Claudie, daemon set pods are ignored from these calculations) and all pods running on the node must be able to be moved to another node. Additionally, a node will not be considered for a scale-down if it has the special annotation "cluster-autoscaler.kubernetes.io/scale-down-disabled": "true". The scale-down strategy is to remove unneeded nodes 1 by 1, in order to reduce the risk of creating new unschedulable pods.

Future work

The Cluster Autoscaler in Claudie is implemented as an external gRPC Cluster Autoscaler provider. In the future, the autoscaler can be extended with a price expander that would take into consideration the pricing model of the nodes. With this feature enabled, the cluster will continuously and transparently run on the most efficient node combination offered by the cloud providers within the particular geographical region.

Cluster Autoscaler in Claudie allows you to run your multi-cloud workloads very efficiently. We’re committed to creating the best Kubernetes distribution for multi-cloud scenarios. Cluster Autoscaler is supported in Claudie v0.2.2 and newer. Try it out and let us know your feedback.

Jakub Hlavacka