Third-party libraries for UDFs and Notebook

Instabase supports several third-party libraries which you can reference in UDFs. If the library you want isn’t available by default, you can add additional third-party libraries by extending Instabase images.

Default third-party libraries

Instabase includes some commonly-used libraries with your deployment. These libraries can be imported like any other Python library.

The following third-party libraries are included by default with your Instabase deployment:

  • beautifulsoup v. 4.6.3, supports HTML and XML parsing

  • dateparser v. 0.7.0, import dateparser to use dateparser provides modules to easily parse localized dates in almost any string formats commonly found on web pages

  • nltk v. 3.4.5, import nltk to use. nltk is a library for handling human language

  • numpy v. 1.16.5, import numpy to use. numpy is the fundamental package for scientific computing with Python

  • opencv-contrib-python v. 3.4.2.17, which will also pull in opencv-python, pull import cv2 to use. OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library that is used for various image processing operations

  • pandas v. 1.1.0, a data analysis and manipulation tool

  • pdfkit v. 0.6.1, for converting HTML to PDFs

  • pillow v. 8.1.2, import PIL to use. Pillow is a library for various image processing operations

  • pypdf2 v. 1.26.0, a PDF utility library

  • regex, v. 2018.01.10, a regex library

  • requests v. 2.22.0, supports HTTP requests to external services

  • scikit-learn v. 0.21.3, a machine learning library

  • scikit-image v. 0.16.2, an image processing library

  • scipy v. 1.3.1, supports many math and image processing modules

  • spacy v. 2.3.2, an NLP library

  • xlwt v. 1.3.0, to write to Excel sheets

  • xlsxwriter v. 1.0.2, to write to Excel sheets

Adding additional third-party libraries

Instabase allows infrastructure administrators to extend the capabilities of a few key services by using our container as a base, and installing additional packages on top of it. After extending your Instabase images, you can reference your third-party libraries in UDFs.

You’ll extend two images: celery-app-tasks, letting you reference aditional third-party libraries in UDFs and user-notebook-standalone, letting you reference the libraries in Notebook.

Extending celery-app-tasks

On a machine with Docker installed and with access to Instabase and your local repo:

  1. Run vim Dockerfile.

  2. Reference the following Dockerfile to extend celery-app-tasks.

    from gcr.io/instabase-public/celery-app-tasks:YOUR_RECENT_RELEASE
    # Your infrastructure team will have received several images to deploy Instabase. Find the tag that matches your most recent release, and replace YOUR_RECENT_RELEASE with the tag.
    
    USER root
    # Package installations and changes must be performed as root.
    
    RUN pip install --user [package]
    # Install your list of packages here. For example: RUN pip install -- user oracledb==1.0.1
    
    USER 9999
    # For security, we do not run our services as root. So, you must switch back to the ib-user UID.  
    
  3. Run docker build -t gcr.io/instabase-public/celery-app-tasks:YOUR_RECENT_RELEASE to build your extended container.

Note: Again replace YOUR_RECENT_RELEASE with the tag that matches your most recent release.

  1. Pull the extended image from Instabase and push it to a repository of your choice, then deploy it in place of the Instabase-provided image.

Extending user-notebook-standalone

On a machine with Docker installed and with access to Instabase and your local repo:

  1. Run vim Dockerfile.

  2. Reference the following Dockerfile to extend user-notebook-standalone.

    from gcr.io/instabase-public/user-notebook-standalone:YOUR_RECENT_RELEASE
    # Your infrastructure team will have received several images to deploy Instabase. Find the tag that matches your most recent release, and replace YOUR_RECENT_RELEASE with the tag.
    
    USER root
    # Package installations and changes must be performed as root.
    
    RUN /bin/bash -c "source activate python3 && conda install -q -y '[package]'
    " && conda clean -tipsy
    # Install your list of packages here. For example: RUN /bin/bash -c "source activate python3 && conda install -q -y 'pyfiglet=0.7.6' " && conda clean -tipsy
    
    USER 9999
    # For security, we do not run our services as root. So, you must switch back to the ib-user UID.
    
  3. Run docker build -t gcr.io/instabase-public/user-notebook-standalone:YOUR_RECENT_RELEASE to build your extended container.

Note: Again replace YOUR_RECENT_RELEASE with the tag that matches your most recent release.

  1. Pull the extended image from Instabase and push it to a repository of your choice, then deploy it in place of the Instabase-provided image.

You can now reference your additional third-party libraries in UDFs or in Notebook.