Third-party libraries for UDFs
Instabase supports several third-party libraries that you can reference in custom, user-defined functions (UDFs). If you have an on-premises environment, you can add additional third-party libraries by extending Instabase images.
Default third-party libraries
Instabase includes commonly used libraries with your deployment. These libraries can be imported like any other Python library.
The following third-party libraries are included by default with your Instabase deployment:
-
beautifulsoup v. 4.6.3, supports HTML and XML parsing.
-
dateparser v. 0.7.0,
import dateparser
to use.dateparser
provides modules to parse localized dates in almost any string formats commonly found on web pages. -
nltk v. 3.4.5,
import nltk
to use.nltk
is a library for handling human language. -
numpy v. 1.16.5,
import numpy
to use.numpy
is the fundamental package for scientific computing with Python. -
opencv-contrib-python
v. 3.4.2.17,import cv2
to use. This library also addsopencv-python
. Open Source Computer Vision Library (OpenCV) is an open-source computer vision and machine learning software library used for various image processing operations. -
pandas v. 1.1.0, a data analysis and manipulation tool.
-
pdfkit v. 0.6.1, for converting HTML to PDFs.
-
pillow v. 8.1.2,
import PIL
to use. Pillow is a library for various image processing operations. -
pypdf2 v. 1.26.0, a PDF utility library.
-
regex, v. 2018.01.10, a regex library.
-
requests v. 2.22.0, supports HTTP requests to external services.
-
scikit-learn v. 0.21.3, a machine learning library.
-
scikit-image v. 0.16.2, an image processing library.
-
scipy v. 1.3.1, supports many math and image processing modules.
-
spacy v. 2.3.2, an NLP library.
-
xlwt v. 1.3.0, to write to Excel sheets.
-
xlsxwriter v. 1.0.2, to write to Excel sheets.
Enabling additional third-party libraries
In on-prem environments, your infrastructure administrator can install additional packages on top of the base container. After extending your Instabase images, you can reference your third-party libraries in UDFs.
To reference additional third-party libraries, you must extend celery-app-tasks
.
Extending celery-app-tasks
On a machine with Docker installed and with access to Instabase and your local repo:
-
Run
vim Dockerfile
. -
Reference the following Dockerfile to extend
celery-app-tasks
.from gcr.io/instabase-public/celery-app-tasks:YOUR_RECENT_RELEASE # Your infrastructure team will have received several images to deploy Instabase. Find the tag that matches your most recent release, and replace YOUR_RECENT_RELEASE with the tag. USER root # Package installations and changes must be performed as root. RUN pip install --user [package] # Install your list of packages here. For example: RUN pip install -- user oracledb==1.0.1 USER 9999 # For security, we do not run our services as root. So, you must switch back to the ib-user UID.
-
Run
docker build -t gcr.io/instabase-public/celery-app-tasks:YOUR_RECENT_RELEASE
to build your extended container.NoteAgain replace YOUR_RECENT_RELEASE with the tag that matches your most recent release.
-
Pull the extended image from Instabase and push it to a repository of your choice, then deploy it in place of the Instabase-provided image.
You can now reference your additional third-party libraries in UDFs.