Configure Microsoft OCR
Deploying Microsoft optical character recognition (OCR) containers in the Instabase environment enables additional flexibility for different use cases and provide new features. To use Microsoft (OCR) in Instabase, you must configure it correctly.
To run a flow with Microsoft OCR, change the OCR model type to Microsoft OCR in the Process Files step.
Configuration status
Configuration is required to implement Microsoft OCR to run in your Instabase environment. To confirm that your Microsoft OCR setup is correctly configured, check the logs of a pod in the ocr-msft
deployment.
Correct configuration logs:
Now listening on: http://[::]:5000
Application started. Press Ctrl+C to shut down.
Initialize OneOCR...
Failures in the logs might look like this:
fail: Microsoft.CloudAI.Containers.Http.CloudClient[0]
Failed to reach endpoint: 'No such device or address'. Trying 9 more times.
The Failed to reach endpoint
message indicates that your Microsoft OCR cannot connect to the metering endpoint. An HTTPS (port 443) request to the Microsoft metering endpoint must be allowed. This request can be proxied.
Configure an HTTP proxy for the OCR container
Set up and configure an HTTP Proxy for the OCR container to send metric recording requests to connect with the Microsoft Metering Endpoint.
-
If an HTTP proxy is already set up, then set the
HTTP_PROXY
environment variable to the address of that proxy on the YAML for the Microsoft OCR deployment. -
If an HTTP proxy is not set up, work with your Instabase administration team to get this proxy setup on your Kubernetes nodes.
-
If the HTTP proxy has specific authentication requirements, work with Instabase support to get authentication supported in the
ocr-msft
containers.
Set NO_PROXY
to exclude service
Ensure that the celery-app-tasks
deployment is not using the HTTP proxy. If it is, set the NO_PROXY
environment variable in the YAML deployment file to exclude the ocr-msft
service from using that proxy.
Upgrading existing Microsoft Read OCR containers to version 3.2 or later
Microsoft Read OCR containers version 3.2 and later handle licenses differently than earlier containers. Instead of an expiry time built into the image, Instabase supplies you with a license file that has a fixed expiry.
When upgrading ocr-msft-v3
on Instabase 22.01 or 22.05, you must also update the YAML deployment file. The updated deployment file contains new environment variables and mounts that are required for Read OCR 3.2. To get the updated deployment file, contact Instabase support. The deployment file must contain the following environment variables:
-
Mounts__License
: The path for the mounted license file. We recommend keeping the default value of/var/run/license
, because an empty directory is being mounted at/var/run
. -
Mounts__Output
: The directory used by the container to store some internal data during the OCR process. Default value:/var/run/output
. -
LICENSE_FILE
: The license file supplied to the container. Because this is sensitive information, the value is automatically picked up from a secret key reference. -
QueueNamePrefix
: This prefix is required if Microsoft lite-OCR is being used in the same environment. Default value:msft-read-max-
. -
StorageTimeToLiveInMinutes
: How long the NFS storage retains the OCR results. Default value:60
. Setting values higher than the default might cause the NFS storage space to grow quickly.