Third-party libraries for UDFs and Notebook
Instabase supports several third-party libraries which you can reference in UDFs. If the library you want isn’t available by default, you can add additional third-party libraries by extending Instabase images.
Default third-party libraries
Instabase includes some commonly-used libraries with your deployment. These libraries can be imported like any other Python library.
The following third-party libraries are included by default with your Instabase deployment:
-
beautifulsoup v. 4.6.3, supports HTML and XML parsing
-
dateparser v. 0.7.0,
import dateparser
to use dateparser provides modules to easily parse localized dates in almost any string formats commonly found on web pages -
nltk v. 3.4.5,
import nltk
to use. nltk is a library for handling human language -
numpy v. 1.16.5,
import numpy
to use. numpy is the fundamental package for scientific computing with Python -
opencv-contrib-python v. 3.4.2.17, which will also pull in opencv-python, pull
import cv2
to use. OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library that is used for various image processing operations -
pandas v. 1.1.0, a data analysis and manipulation tool
-
pdfkit v. 0.6.1, for converting HTML to PDFs
-
pillow v. 8.1.2,
import PIL
to use. Pillow is a library for various image processing operations -
pypdf2 v. 1.26.0, a PDF utility library
-
regex, v. 2018.01.10, a regex library
-
requests v. 2.22.0, supports HTTP requests to external services
-
scikit-learn v. 0.21.3, a machine learning library
-
scikit-image v. 0.16.2, an image processing library
-
scipy v. 1.3.1, supports many math and image processing modules
-
spacy v. 2.3.2, an NLP library
-
xlwt v. 1.3.0, to write to Excel sheets
-
xlsxwriter v. 1.0.2, to write to Excel sheets
Adding additional third-party libraries
Instabase allows infrastructure administrators to extend the capabilities of a few key services by using our container as a base, and installing additional packages on top of it. After extending your Instabase images, you can reference your third-party libraries in UDFs.
You’ll extend two images: celery-app-tasks
, letting you reference aditional third-party libraries in UDFs and user-notebook-standalone
, letting you reference the libraries in Notebook.
Extending celery-app-tasks
On a machine with Docker installed and with access to Instabase and your local repo:
-
Run
vim Dockerfile
. -
Reference the following Dockerfile to extend
celery-app-tasks
.from gcr.io/instabase-public/celery-app-tasks:YOUR_RECENT_RELEASE # Your infrastructure team will have received several images to deploy Instabase. Find the tag that matches your most recent release, and replace YOUR_RECENT_RELEASE with the tag. USER root # Package installations and changes must be performed as root. RUN pip install --user [package] # Install your list of packages here. For example: RUN pip install -- user oracledb==1.0.1 USER 9999 # For security, we do not run our services as root. So, you must switch back to the ib-user UID.
-
Run
docker build -t gcr.io/instabase-public/celery-app-tasks:YOUR_RECENT_RELEASE
to build your extended container.
Note: Again replace YOUR_RECENT_RELEASE with the tag that matches your most recent release.
- Pull the extended image from Instabase and push it to a repository of your choice, then deploy it in place of the Instabase-provided image.
Extending user-notebook-standalone
On a machine with Docker installed and with access to Instabase and your local repo:
-
Run
vim Dockerfile
. -
Reference the following Dockerfile to extend
user-notebook-standalone
.from gcr.io/instabase-public/user-notebook-standalone:YOUR_RECENT_RELEASE # Your infrastructure team will have received several images to deploy Instabase. Find the tag that matches your most recent release, and replace YOUR_RECENT_RELEASE with the tag. USER root # Package installations and changes must be performed as root. RUN /bin/bash -c "source activate python3 && conda install -q -y '[package]' " && conda clean -tipsy # Install your list of packages here. For example: RUN /bin/bash -c "source activate python3 && conda install -q -y 'pyfiglet=0.7.6' " && conda clean -tipsy USER 9999 # For security, we do not run our services as root. So, you must switch back to the ib-user UID.
-
Run
docker build -t gcr.io/instabase-public/user-notebook-standalone:YOUR_RECENT_RELEASE
to build your extended container.
Note: Again replace YOUR_RECENT_RELEASE with the tag that matches your most recent release.
- Pull the extended image from Instabase and push it to a repository of your choice, then deploy it in place of the Instabase-provided image.
You can now reference your additional third-party libraries in UDFs or in Notebook.