Autoscaling Explained| HPA Vs VPA — Kubernetes

6 min readAug 20, 2024

Autoscaling is a cloud computing feature that enables organizations to scale cloud services such as server capacities or virtual machines up or down automatically, based on defined situations such as traffic utilization levels. Cloud computing providers, such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP), offer autoscaling tools.

Scaling Types

Horizontal Scaling (scale-out/in)
Vertical Scaling (scale up/down)
Cluster Autoscaler

buymeacoffee ☕ 👈 Click the link

Increase (or) decrease the number of nodes → HPA

Increases (or) decreases the available memory (or) processing power for existing nodes → VPA

Automatically adds (or) removes nodes in a cluster based on all pod’s requested resources.

HPA Works

The Kubernetes Horizontal Pod Autoscaler automatically scales the number of pods in a deployment, replication controller, (or) replica set based on that resource’s CPU utilization.

HPA can also be configured to make scaling decisions based on custom (or) external metrics. HPA is a great tool to ensure that critical applications are elastic and can scale out to meet increasing demand as well scale down to ensure optimal resource usage.

VPA Works

The Kubernetes Vertical Pod Autoscaler automatically adjusts the CPU and memory reservations for your pods to help “right size” your applications. This can help you to better use your cluster resources and free up CPU and memory for other pods.

VPA aims to reduce the maintenance overhead of configuring resource requests and limits for containers and improve the utilization of cluster resources.

Cluster Autoscaler Works

The Kubernetes Cluster Autoscaler automatically adjusts the number of nodes in your cluster when pods fail to launch due to lack of resources (or) when nodes in the cluster are underutilized and their pods can be rescheduled on to other nodes in the cluster.

The cluster autoscaler is a Kubernetes tool that increases or decreases the size of a Kubernetes cluster by adding (or) removing nodes, based on the presence of pending pods and node utilization metrics.

Benefits of Kubernetes Auto-Scaling

Improved Application Performance
Cost Optimization
Enhanced Fault Tolerance
Simplified Management

Best Practices for Kubernetes Auto-Scaling

Monitor and Analyze
Set Appropriate Scaling Metrics

Metric Server is an essential component in Kubernetes for collecting metrics on resources used by pods and other objects. These metrics are crucial for implementing Horizontal Pod Autoscaling (HPA) and Vertical Pod Autoscaling (VPA).

Vertical Pod Autoscaling (VPA)

Create a manifest file and name it as components.yml

apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
    rbac.authorization.k8s.io/aggregate-to-admin: "true"
    rbac.authorization.k8s.io/aggregate-to-edit: "true"
    rbac.authorization.k8s.io/aggregate-to-view: "true"
  name: system:aggregated-metrics-reader
rules:
- apiGroups:
  - metrics.k8s.io
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server-auth-reader
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server:system:auth-delegator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:metrics-server
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  ports:
  - name: https
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  strategy:
    rollingUpdate:
      maxUnavailable: 0
  template:
    metadata:
      labels:
        k8s-app: metrics-server
    spec:
      containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=10250
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-use-node-status-port
        - --metric-resolution=15s
        image: registry.k8s.io/metrics-server/metrics-server:v0.7.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /livez
            port: https
            scheme: HTTPS
          periodSeconds: 10
        name: metrics-server
        ports:
        - containerPort: 10250
          name: https
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readyz
            port: https
            scheme: HTTPS
          initialDelaySeconds: 20
          periodSeconds: 10
        resources:
          requests:
            cpu: 100m
            memory: 200Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
          seccompProfile:
            type: RuntimeDefault
        volumeMounts:
        - mountPath: /tmp
          name: tmp-dir
      nodeSelector:
        kubernetes.io/os: linux
      priorityClassName: system-cluster-critical
      serviceAccountName: metrics-server
      volumes:
      - emptyDir: {}
        name: tmp-dir
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  labels:
    k8s-app: metrics-server
  name: v1beta1.metrics.k8s.io
spec:
  group: metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: metrics-server
    namespace: kube-system
  version: v1beta1
  versionPriority: 100

Once you have installed Metric Server, it is important to verify that it is running correctly.

kubectl apply -f components.yml

Check the Metric Server pods are in the “Running” ?

kubectl get pods -n kube-system

If the pods are not in the “Running” state (or) if you suspect that something is not working correctly, check the pod logs for more details

kubectl get pods -n kube-system | grep metrics-server

Check the pod logs for more details

kubectl logs -n kube-system <pod_name>

Metric Server is collecting and providing metrics properly
Node metrics → You can also view resource usage metrics for nodes

kubectl top nodes

Create a Horizontal Pod Autoscaler for an application and make sure it uses the metrics provided by the Metric Server:
It is necessary to have a Deployment created to perform this test, if you do not have one at the moment, you can skip this part and continue with the tutorial.

kubectl autoscale deployment <deployment_name> --cpu-percent=50 --min=1 --max=10

Check the status of the HPA

kubectl get hpa

Horizontal Pod Autoscaling (HPA)

First, we create a deployment that deploys a simple application. In this example, we will use an Nginx application.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginx
          resources:
            requests:
              cpu: "25m"
            limits:
              cpu: "100m"

Apply this file

kubectl apply -f nginx-hpa.yml

Now, we create a service to expose the Nginx application.

apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
  selector:
    app: nginx
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80

Apply this file

kubectl apply -f service.yml

Created the HPA for our Nginx deployment, configured to scale based on CPU utilization.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    kind: Deployment
    name: nginx-deployment
    apiVersion: apps/v1
  minReplicas: 1
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 30

Apply

kubectl apply -f nginx-hpa.yml

Generate load on the Nginx application. This can be done using tools like kubectl run to launch pods that make continuous requests to the Nginx service.

After a few minutes, check the status of the HPA and the pods

kubectl get hpa
kubectl get pods
Kubectl top pod

Thank you 🙏 for taking the time to read our blog.