Autoscaling Explained| HPA Vs VPA — Kubernetes

Always learning
6 min readAug 20, 2024

--

Autoscaling is a cloud computing feature that enables organizations to scale cloud services such as server capacities or virtual machines up or down automatically, based on defined situations such as traffic utilization levels. Cloud computing providers, such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP), offer autoscaling tools.

Scaling Types

  1. Horizontal Scaling (scale-out/in)
  2. Vertical Scaling (scale up/down)
  3. Cluster Autoscaler

buymeacoffee ☕ 👈 Click the link

Increase (or) decrease the number of nodes → HPA

Increases (or) decreases the available memory (or) processing power for existing nodes → VPA

Automatically adds (or) removes nodes in a cluster based on all pod’s requested resources.

HPA Works

The Kubernetes Horizontal Pod Autoscaler automatically scales the number of pods in a deployment, replication controller, (or) replica set based on that resource’s CPU utilization.

HPA can also be configured to make scaling decisions based on custom (or) external metrics. HPA is a great tool to ensure that critical applications are elastic and can scale out to meet increasing demand as well scale down to ensure optimal resource usage.

VPA Works

The Kubernetes Vertical Pod Autoscaler automatically adjusts the CPU and memory reservations for your pods to help “right size” your applications. This can help you to better use your cluster resources and free up CPU and memory for other pods.

VPA aims to reduce the maintenance overhead of configuring resource requests and limits for containers and improve the utilization of cluster resources.

Cluster Autoscaler Works

The Kubernetes Cluster Autoscaler automatically adjusts the number of nodes in your cluster when pods fail to launch due to lack of resources (or) when nodes in the cluster are underutilized and their pods can be rescheduled on to other nodes in the cluster.

The cluster autoscaler is a Kubernetes tool that increases or decreases the size of a Kubernetes cluster by adding (or) removing nodes, based on the presence of pending pods and node utilization metrics.

Benefits of Kubernetes Auto-Scaling

  1. Improved Application Performance
  2. Cost Optimization
  3. Enhanced Fault Tolerance
  4. Simplified Management

Best Practices for Kubernetes Auto-Scaling

  1. Monitor and Analyze
  2. Set Appropriate Scaling Metrics

Metric Server is an essential component in Kubernetes for collecting metrics on resources used by pods and other objects. These metrics are crucial for implementing Horizontal Pod Autoscaling (HPA) and Vertical Pod Autoscaling (VPA).

Vertical Pod Autoscaling (VPA)

Create a manifest file and name it as components.yml

apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
name: system:aggregated-metrics-reader
rules:
- apiGroups:
- metrics.k8s.io
resources:
- pods
- nodes
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
rules:
- apiGroups:
- ""
resources:
- nodes/metrics
verbs:
- get
- apiGroups:
- ""
resources:
- pods
- nodes
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
ports:
- name: https
port: 443
protocol: TCP
targetPort: https
selector:
k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=10250
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
image: registry.k8s.io/metrics-server/metrics-server:v0.7.1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
name: metrics-server
ports:
- containerPort: 10250
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
initialDelaySeconds: 20
periodSeconds: 10
resources:
requests:
cpu: 100m
memory: 200Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
volumeMounts:
- mountPath: /tmp
name: tmp-dir
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
volumes:
- emptyDir: {}
name: tmp-dir
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
labels:
k8s-app: metrics-server
name: v1beta1.metrics.k8s.io
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
version: v1beta1
versionPriority: 100

Once you have installed Metric Server, it is important to verify that it is running correctly.

kubectl apply -f components.yml

Check the Metric Server pods are in the “Running” ?

kubectl get pods -n kube-system

If the pods are not in the “Running” state (or) if you suspect that something is not working correctly, check the pod logs for more details

kubectl get pods -n kube-system | grep metrics-server

Check the pod logs for more details

kubectl logs -n kube-system <pod_name>

Metric Server is collecting and providing metrics properly
Node metrics → You can also view resource usage metrics for nodes

kubectl top nodes

Create a Horizontal Pod Autoscaler for an application and make sure it uses the metrics provided by the Metric Server:
It is necessary to have a Deployment created to perform this test, if you do not have one at the moment, you can skip this part and continue with the tutorial.

kubectl autoscale deployment <deployment_name> --cpu-percent=50 --min=1 --max=10

Check the status of the HPA

kubectl get hpa

Horizontal Pod Autoscaling (HPA)

First, we create a deployment that deploys a simple application. In this example, we will use an Nginx application.

apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
resources:
requests:
cpu: "25m"
limits:
cpu: "100m"

Apply this file

kubectl apply -f nginx-hpa.yml 

Now, we create a service to expose the Nginx application.

apiVersion: v1
kind: Service
metadata:
name: nginx-service
spec:
selector:
app: nginx
ports:
- protocol: TCP
port: 80
targetPort: 80

Apply this file

kubectl apply -f service.yml

Created the HPA for our Nginx deployment, configured to scale based on CPU utilization.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: nginx-hpa
spec:
scaleTargetRef:
kind: Deployment
name: nginx-deployment
apiVersion: apps/v1
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 30

Apply

kubectl apply -f nginx-hpa.yml

Generate load on the Nginx application. This can be done using tools like kubectl run to launch pods that make continuous requests to the Nginx service.

After a few minutes, check the status of the HPA and the pods

kubectl get hpa
kubectl get pods
Kubectl top pod

Thank you 🙏 for taking the time to read our blog.

--

--

Always learning

கற்றுக் கொள்ளும் மாணவன்...