Workload autoscaling
Workload autoscaling lets Instabase autoscale data services based on demand. Autoscaling optimizes service resources to maximize efficiency and performance for any workload at a given time. Workload autoscaling also removes the need to manually size services and presents cost saving opportunities.
Autoscaling is performed with Kubernetes HorizontalPodAutoscalers (HPAs) based on CPU usage for conversion-service
, ocr-msft-lite
, ocr-msft-v3
, and ocr-service
.
See the infrastructure requirements documentation for information on the required Kubernetes components.
Enable workload autoscaling
You can enable workload autoscaling during or after an upgrade or installation.
Workload autoscaling is in public preview and is disabled by default. If after enabling workload autoscaling you notice performance issues or resourcing constraints, contact Instabase Support.
Enable autoscaling during an upgrade
To enable autoscaling during an upgrade:
-
Before upgrading, edit the
control-plane.yml
file in the release you’re upgrading to and enable theENABLE_AUTOSCALING
andIS_CUSTOMER_HOSTED_AUTOSCALING
environment variables:-
Unzip the
installation.zip
file for the release you’re upgrading to. -
On the command line, navigate to and open the
control-plane.yml
file (installation/control-plane/control-plane.yml
). -
Change the value of the
ENABLE_AUTOSCALING
environment variable toTrue
. -
Change the value of the
IS_CUSTOMER_HOSTED_AUTOSCALING
environment variable toTrue
. -
Save your changes. You apply this updated
control-plane.yml
file when updating Deployment Manager at the start of the upgrade or installation process.
-
-
Add the base configurations that enable autoscaling to the release’s
base_configs.zip
file:-
Unzip the
base_configs.zip
file contained within the release’sinstallation
folder. -
Locate the
autoscaling
folder in theinstallation
folder (installation
>additional_configs
>autoscaling
). -
Move the config files in the
autoscaling
folder to the unzippedbase_configs
folder. -
Select all files in the
base_configs
folder and compress them, creating a new .zip file of base configs. -
Rename the file
base_configs.zip
. This updated .zip file is what you upload during the upgrade.
-
-
(Optional) If your deployment uses custom resourcing sizing or if you didn’t select a resource sizing option during the upgrade, create patches that define the
minReplicas
andmaxReplicas
values for each autoscaled service’s corresponding HPA service. For example, a patch targetingautoscaler-conversion-service
sets the autoscaling range forconversion-service
. The required steps are:-
Calculate the
minReplicas
values for all autoscaled services’ corresponding HPA services. -
In the release’s
installation
file, locate thecustom-hpa-patches
folder (installation
>optional_patches
>custom-hpa-patches
). This folder contains patches to configureminReplicas
andmaxReplicas
values for each HPA service. Or, reference the sample patch in this article. -
Edit each HPA service patch to define the
minReplicas
andmaxReplicas
values. Use your calculatedminReplicas
values and define themaxReplicas
values based on your preferred resourcing sizing.
-
-
Update the
default_patches.zip
file to include the following patches:-
All patches contained in the
enable-autoscaling
folder (installation
>optional_patches
>enable-autoscaling
). -
(Optional) If using custom or undefined resource sizing, any patches used to manually define an HPA service’s
minReplicas
andmaxReplicas
values.
To update the
default_patches.zip file
-
Unzip the
default_patches.zip
file contained within the release’sinstallation
folder. -
Add the
enable-autoscaling
patches (and optional edited HPA service patches) to the now unzippeddefault_patches
folder. -
Select all files in the
default_patches
folder and compress them, creating a new .zip file of patches. -
Rename the file
default_patches.zip
. This updated .zip file is what you upload during the upgrade.
-
-
During the upgrade, upload the updated
base_configs.zip
file and thedefault_patches.zip
file. Thedefault_patches.zip
andbase_configs.zip
files contain all patches and configurations required to configure autoscaling in your deployment.InfoYou must upload the
default_patches.zip
file during the upgrade even if you didn’t add custom patches to it.
If you selected a resource sizing option during the upgrade process and have previously decommissioned (set replicas
to 0
) conversion-service
, ocr-msft-lite
, ocr-msft-v3
, or ocr-service
, after the upgrade you must reset the service’s decommissioned state. Selecting a resource sizing option automatically updates a service’s replicas
count to a non-zero value. You can reset the service’s replicas
count using a patch or with the following kubectl command: kubectl scale --replicas=0 <name of decommissioned service> -n $IB_NS
, where $IB_NS
is your Instabase namespace.
Enable autoscaling during an installation
Follow the same steps as enabling autoscaling during an upgrade. However, if your deployment uses custom resourcing sizing, you don’t need to create patches that define the minReplicas
and maxReplicas
values for each autoscaled service’s corresponding HPA service.
Enable autoscaling outside of an upgrade or installation
If your deployment is already on release 23.07 or later, you can enable autoscaling at any time:
-
Enable the
ENABLE_AUTOSCALING
environment variable:-
On the command line, run the following command:
kubectl edit deployment/deployment-control-plane -n $IB_NS
, where$IB_NS
is your Instabase namespace. -
Locate the
ENABLE_AUTOSCALING
environment variable. -
Set the value to
True
. -
Save your changes.
-
-
Enable the
IS_CUSTOMER_HOSTED_AUTOSCALING
environment variable:-
On the command line, run the following command:
kubectl edit deployment/deployment-control-plane -n $IB_NS
, where$IB_NS
is your Instabase namespace. -
Locate the
IS_CUSTOMER_HOSTED_AUTOSCALING
environment variable. -
Set the value to
True
. -
Save your changes.
-
-
From the Deployment Manager Base Configs tab, update your deployment’s base configs to include the base configs required for workload autoscaling. (If you already included the workload autoscaling base configs when upgrading or installing, you can skip this step.)
-
Unzip the
base_configs.zip
file contained within the release’sinstallation
folder. -
Locate the
autoscaling
folder in theinstallation
folder (installation
>additional_configs
>autoscaling
). -
Move the config files in the
autoscaling
folder to the now unzippedbase_configs
folder. -
Select all files in the
base_configs
folder and compress them, creating a new .zip file of base configs. -
Rename the file
base_configs.zip
. -
From the Deployment Manager Base Configs tab, update your base configs.
-
-
From the Deployment Manager Configs tab, apply the patches required to enable workload autoscaling. All required patches are in the release’s
enable_autoscaling
patches folder (installation
>optional_patches
>enable-autoscaling
).
The enable-autoscaling
patches folder isn’t present in the 23.07 release bundle. You can find the folder in the 23.10 release bundle, or create your own patches for each service, using the following patch template:
# This patch will associate the HPA with the deployment allowing the HPA to autoscale the deployment's replica count
# target: <name of HPA service>
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: <name of deployment service>
You can verify that all HPAs are working and have the desired minReplicas
and maxReplicas
values from the HPAs tab of the Infra Dashboard. If needed, you can adjust the minimum and maximum replica count.
Configure workload autoscaling
For deployments using Instabase standard resourcing sizing, Deployment Manager automatically determines and applies appropriate minReplicas
and maxReplicas
values for all autoscaled services. For deployments using custom resourcing sizing, however, you must set the minReplicas
and maxReplicas
values for the HPA services corresponding to the following autoscaled deployment services:
Deployment service | Corresponding HPA service |
---|---|
ocr-msft-v3 | autoscaler-ocr-msft-v3 |
ocr-msft-lite | autoscaler-ocr-msft-lite |
ocr-service | autoscaler-ocr-service |
conversion-service | autoscaler-conversion-service |
While the maxReplicas
value can be set based on your preferred resourcing sizing, the minReplicas
value must be calculated based on the number of celery-app-tasks
pods in the deployment.
To adjust a service’s minReplicas
and maxReplicas
values, apply the following patch to each HPA service.
# target: <name of HPA service>
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
spec:
maxReplicas: <max replicas>
minReplicas: <min replicas>
You can find sample patches for modifying HPA services in the release’s installation.zip
file, in the custom-hpa-patches
folder (installation
> optional_patches
> custom-hpa-patches
). If you use these sample patches, you must still define the minReplicas
and maxReplicas
values.
Calculate HPA minReplicas for autoscaled services
To calculate minReplicas
for an HPA service, use the following formulas, where n
is the number of celery-app-tasks
pods in your deployment:
The ceil()
function returns the smallest integer value that’s greater than or equal to the calculated number.
Deployment service | Corresponding HPA service | minReplicas formula |
---|---|---|
ocr-msft-v3 | autoscaler-ocr-msft-v3 | ceil(0.28 * n) |
ocr-msft-lite | autoscaler-ocr-msft-lite | ceil(0.57 * n) |
ocr-service | autoscaler-ocr-service | ceil(0.28 * n) |
conversion-service | autoscaler-conversion-service | ceil(0.28 * n) |
Disable autoscaling
You can disable autoscaling at any time. The process differs slightly based on whether your deployment uses custom resourcing sizing or standard Instabase resourcing sizing.
Deployments with standard resourcing sizing
To disable autoscaling in deployments using standard Instabase resourcing sizing:
-
Disable the
ENABLE_AUTOSCALING
environment variable:-
On the command line, run the following command:
kubectl edit deployment/deployment-control-plane -n $IB_NS
, where$IB_NS
is your Instabase namespace. -
Locate the
ENABLE_AUTOSCALING
environment variable. -
Set the value to
False
. -
Save your changes.
-
-
Call the Push latest materialized configs to cluster API to redeploy the deployment with autoscaling disabled.
-
Call the Update cluster size API to reset your resourcing sizing.
-
From the Deployment Manager Configs tab, apply the patches required to disable workload autoscaling. All required patches are in the release’s
disable_autoscaling
patches folder (installation
>optional_patches
>disable-autoscaling
). These patches dissociate each previously autoscaled service from its corresponding HPA.
The disable-autoscaling
patches folder isn’t present in the 23.07 release bundle. You can find the folder in the 23.10 release bundle, or create your own patches for each service, using the following patch template:
# This patch will disassociate the HPA with the deployment stopping the HPA from autoscaling the deployment's replica count
# target: <name of HPA service>
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: disassociated
Deployments with custom resourcing sizing
To disable autoscaling in deployments using custom resourcing sizing:
-
Disable the
ENABLE_AUTOSCALING
environment variable:-
On the command line, run the following command:
kubectl edit deployment/deployment-control-plane -n $IB_NS
-
Locate the
ENABLE_AUTOSCALING
environment variable. -
Set the value to
False
. -
Save your changes.
-
-
Call the Push latest materialized configs to cluster API to redeploy the deployment with autoscaling disabled.
-
From the Deployment Manager Configs tab, apply the patches required to disable workload autoscaling. All required patches are in the release’s
disable_autoscaling
patches folder (installation
>optional_patches
>disable-autoscaling
). These patches dissociate each previously autoscaled service from its corresponding HPA.
The disable-autoscaling
patches folder isn’t present in the 23.07 release bundle. You can find the folder in the 23.10 release bundle, or create your own patches for each service, using the following patch template:
# This patch will disassociate the HPA with the deployment stopping the HPA from autoscaling the deployment's replica count
# target: <name of HPA service>
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: disassociated
Verifying autoscaling configuration changes
You can verify that patches targeting HPAs have applied successfully using the HPAs tab of the Deployment Manager infra dashboard.
To confirm that an HPA’s replica count has updated successfully:
-
Open the Deployment Manager HPAs tab (All apps > Deployment Manager > Infra Dashboard > HPAs).
-
On the Horizontal Pod Autoscalers, select the updated HPA.
-
Confirm that the General Info section lists the correct Min Replicas and Max Replicas values.
To confirm that an HPA is active:
-
Open the Deployment Manager HPAs tab (All apps > Deployment Manager > Infra Dashboard > HPAs).
-
On the Horizontal Pod Autoscalers dashboard, select the updated HPA.
-
Verify that the Conditions table includes an AbleToScale condition. This condition means that CPU metrics are available and the HPA is active.
Enabling autoscaling controllers
Instabase offers several autoscaling controllers that can help optimize infrastructure costs during periods of low activity. HPA based autoscaling is not required for these controllers to run.
To enable autoscaling controllers, set the environment variable ENABLE_AUTOSCALING_CONTROLLERS
to "true"
on deployment-control-plane
.
Binary Autoscaler
The binary autoscaling controller scales non-HPA-based resources to 0 replicas when idle, and to the desired number of replicas when the service is needed. This controller is useful for GPU-based services such as deployment-ray-model-training-worker
as, when combined with a node autoscaler, the controller removes the GPU node from the environment when no model training is in progress.
To enable the binary autoscaler for deployment-ray-model-training-worker
:
-
Set the following environment variables on
deployment-control-plane
:-
ENABLE_BINARY_AUTOSCALING
:"true"
-
BINARY_AUTOSCALING_DEPLOYMENT_NAMES
:"deployment-ray-model-training-worker"
-
-
Using Deployment Manager, apply the following patch to
deployment-ray-model-training-worker
:
# target: deployment-ray-model-training-worker
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment-ray-model-training-worker
annotations:
autoscaling/enabled: "true"
autoscaling/max_replicas: "1"
autoscaling/queries: "max_over_time(clamp_min(sum(ray_tasks{State!~\"FINISHED|FAILED\"}[15s]), 0)[30m]) or vector(0)"
You can also use this controller to scale down deployment-celery-webdriver-tasks
and deployment-celery-core-tasks
.
To enable the binary autoscaler for deployment-celery-webdriver-tasks
and deployment-celery-core-tasks
:
These instructions assume you previously enabled the binary autoscaler for deployment-ray-model-training-worker
, including setting the ENABLE_BINARY_AUTOSCALING
variable to "true"
.
-
In
deployment-control-plane
, update theBINARY_AUTOSCALING_DEPLOYMENT_NAMES
variable to"deployment-ray-model-training-worker,deployment-celery-core-tasks,deployment-celery-webdriver-tasks"
. -
Using Deployment Manager, apply the following patch to
deployment-celery-core-tasks
:
# target: deployment-celery-core-tasks
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment-celery-core-tasks
annotations:
autoscaling/enabled: "true"
autoscaling/max_replicas: "{SET_TO_CURRENT_REPLICA_COUNT}" // ADJUST THIS
autoscaling/queries: max_over_time(sum(rabbitmq_queue_messages_unacked{queue="celery-core-tasks"})[30m])&max_over_time(sum(rabbitmq_queue_messages_ready{queue="celery-core-tasks"})[30m])
- Apply the following patch to
deployment-celery-webdriver-tasks
:
# target: deployment-celery-webdriver-tasks
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment-celery-webdriver-tasks
annotations:
autoscaling/enabled: "true"
autoscaling/max_replicas: "{SET_TO_CURRENT_REPLICA_COUNT}" // ADJUST THIS
autoscaling/queries: max_over_time(sum(rabbitmq_queue_messages_unacked{queue="celery-webdriver-tasks"})[30m])&max_over_time(sum(rabbitmq_queue_messages_ready{queue="celery-webdriver-tasks"})[30m])
Scale down to zero controller
The scale down to zero controller scales down HPA-based resources to 0 replicas when idle, and to the desired number of replicas when the service is needed.
To enable the scale down to zero controller:
-
In
deployment-control-plane
, set theENABLE_SCALE_DOWN_TO_ZERO_CONTROLLER
environment variable to"true"
. -
Onboard the services
deployment-conversion-service
,deployment-ocr-msft-lite
,deployment-ocr-msft-v3
, anddeployment-ocr-service
by using Deployment Manager to apply the following patches:
- Apply to
deployment-conversion-service
:
# target: deployment-conversion-service
kind: Deployment
metadata:
annotations:
autoscaling/scale_down_to_zero_enabled: "true"
autoscaling/scale_down_to_zero_query: 'max_over_time(sum(rate(envoy_cluster_upstream_rq_xx{envoy_cluster_name=~"cluster-conversion-service-27979"}))[4d]) or vector(0)&max_over_time(sum(rate(envoy_cluster_upstream_rq_active{envoy_cluster_name=~"cluster-conversion-service-27979"}))[4dd]) or vector(0)'
autoscaling/scale_down_to_zero_freshness_query: 'sum(rate(envoy_cluster_upstream_rq_active{envoy_cluster_name=~"cluster-conversion-service-27979"}[30s]))'
- Apply to
deployment-ocr-msft-lite
:
# target: deployment-ocr-msft-lite
kind: Deployment
metadata:
annotations:
autoscaling/scale_down_to_zero_enabled: "true"
autoscaling/scale_down_to_zero_query: 'max_over_time(sum(rate(envoy_cluster_upstream_rq_xx{envoy_cluster_name=~"cluster-ocr-msft-lite-25000"}))[4d]) or vector(0)&max_over_time(sum(rate(envoy_cluster_upstream_rq_active{envoy_cluster_name=~"cluster-ocr-msft-lite-25000"}))[4d]) or vector(0)'
autoscaling/scale_down_to_zero_freshness_query: 'sum(rate(envoy_cluster_upstream_rq_active{envoy_cluster_name=~"cluster-ocr-msft-lite-25000"}[30s]))'
- Apply to
deployment-ocr-msft-v3
:
# target: deployment-ocr-msft-v3
kind: Deployment
metadata:
annotations:
autoscaling/scale_down_to_zero_enabled: "true"
autoscaling/scale_down_to_zero_query: 'max_over_time(sum(rate(envoy_cluster_upstream_rq_xx{envoy_cluster_name=~"cluster-ocr-msft-v3-25001"}))[4d]) or vector(0)&max_over_time(sum(rate(envoy_cluster_upstream_rq_active{envoy_cluster_name=~"cluster-ocr-msft-v3-25001"}))[4d]) or vector(0)'
autoscaling/scale_down_to_zero_freshness_query: 'sum(rate(envoy_cluster_upstream_rq_active{envoy_cluster_name=~"cluster-ocr-msft-v3-25001"}[30s]))'
- Apply to
deployment-ocr-service
:
# target: deployment-ocr-service
kind: Deployment
metadata:
annotations:
autoscaling/scale_down_to_zero_enabled: "true"
autoscaling/scale_down_to_zero_query: 'max_over_time(sum(rate(envoy_cluster_upstream_rq_xx{envoy_cluster_name=~"(cluster-ocr-service-27068|cluster-ocr-service-27090)"}))[4d]) or vector(0)&max_over_time(sum(rate(envoy_cluster_upstream_rq_active{envoy_cluster_name=~"cluster-ocr-service-27068|cluster-ocr-service-27090"}))[4d]) or vector(0)'
autoscaling/scale_down_to_zero_freshness_query: 'sum(rate(envoy_cluster_upstream_rq_active{envoy_cluster_name=~"(cluster-ocr-service-27068|cluster-ocr-service-27090)"}[30s]))'