Model training and inference requirements
Model inference and training have specific requirements to ensure the best performance. Model inference relies on sufficient local volume to reach production level throughput, while model training requires a robust GPU to handle the demands of training jobs.
Model inference
Model inference functionality is provided through the model service. To support reliable inference, you must allocate a sizable amount of resources towards the model service.
Model service pods must be provisioned with at least 16 GB of memory and four cores each. Inference is supported only on CPUs, not GPUs.
To ensure production-level throughput, Instabase utilizes local volume store models. It is essential to provision an adequate disk volume on nodes where model service pods are deployed. It is recommended to allocate a minimum of 50 GB local volume capacity to each model service pod. However, the necessary volume capacity may vary based on the specific models required.
Minimum requirements:
- Model service: 16 GB of memory, 4 CPUs each, 50 GB volume size.
Model training
The model training services in the Instabase platform supports custom fine-tuning of deep learning models, using your proprietary datasets. Model training lets you leverage massive pre-trained models to achieve high accuracy in extraction and classification tasks even with limited datasets.
The Instabase platform provides two infrastructure options to support model training:
- Celery worker-based model training tasks
- Ray model training (introduced in public preview in Release 23.04)
Model training tasks
Model training functionality is provided through model-training-tasks-gpu
. Model training is supported only in environments with GPUs. The number of concurrent training tasks is capped by the number of model training task GPU replicas. Training jobs can run for up to six hours, and all models must be trained through Instabase applications: ML Studio, Annotator, or Classifier.
Ray model training
The functionality for Ray model training is made available via ray-head
and ray-model-training-worker
. In a Ray cluster, the Ray head node is a dedicated CPU node that performs singleton processes responsible for managing the cluster and assigning training tasks to the worker node. The number of replicas for the Ray head is always set to 1
. On the other hand, the worker node is a GPU node responsible for executing the actual computational tasks for model training jobs, with one GPU assigned to each worker. The number of simultaneous training tasks is limited by the number of Ray model training worker replicas. Additionally, all models must undergo training using Instabase applications, such as ML Studio.
GPU requirements
List of supported GPUs and card performance:
GPU | FP16 TFLOPS | VRAM (GB) |
---|---|---|
T4 | 65 | 16GB |
A100 | 312 | 40GB |
V100 | 112 | 32 GB |
A10 | 125 | 24 GB |
A30 | 165 | 24 GB |
A40 | 150 | 48 GB |
T4 is the lowest tier GPU Instabase supports. Training on a T4 GPU shows that the time to train a large model on a LayoutLM model for 10 epochs for a 1000 page dataset is about three hours, with 30 minutes spent on preprocessing. This is a rough estimate and differs depending on the base model.
Alternatively, you can provide GPU support by meeting these requirements:
-
Hardware support for CUDA 11.7 (including any relevant drivers).
-
Ability to run the NVIDIA device plugin.
-
Meets compute requirements (relative to T4, see above).
Node requirements
Model training tasks & Ray model training worker
-
Memory: The amount of memory required is determined by the GPU VRAM and varies depending on the dataset and model. The minimum amount of RAM needed is either 16 GB or the GPU’s VRAM size, whichever is larger.
-
CPU: A minimum of four provisioned cores, primarily for data preparation for model training.
Ray head
-
Memory: The minimum amount of RAM needed is 16 GB.
-
CPU: A minimum of two provisioned cores, primarily for task orchestration and cluster management.