UDFs on Instabase
UDFs let you add custom functionality in Flow and Refiner.
- Flow has an “Apply UDF” step that allows you to run a UDF as a step in a Flow
- Runtime input variables contain information for your UDFs
- UDFs are registered in a Python file (for example,
scripts.py
) with a special registration function - Use custom Python modules to call helper functions that are common across several script directories
Providing access
Site admins can use Settings > Site > Access Controls to grant users Execute UDFs access across the entire system. See Allowing users to run UDFs.
Script folders and registration functions
Script folders are directories on Instabase that contain your script files. Using a UDF in a Flow or Refiner requires configuring the app to point to your script folder:
-
Click Choose scripts directory from an Apply UDF step.
-
Click the Scripts button in Refiner Settings.
Registering UDFs
A UDF must be registered before it can be used in Flow or Refiner. You can register a UDF by importing the register_fn
decorator and adding @register_fn
above the UDF you wish to register. Decorator parameters include:
-
name
(string): Registered name of the function. This is the name you use to call the function in Flow or Refiner. If no name is specified, the registered name defaults to the name of the function. -
provenance
(boolean): Whether or not the UDF is provenance-tracked. The default value isTrue
, which sets the UDF to be provenance-tracked.
You can register both a provenance-tracked and untracked version of a function with the same name
parameter.
Here are some examples of how to use the decorator:
# Import the decorator
from instabase.provenance.registration import register_fn
# This function is registered under the name "custom_greeting"
# and uses provenance-tracking.
@register_fn(name='custom_greeting')
def custom_greeting_v(name: Value[str], **kwargs) -> Value[str]:
return Value('Hi ') + name
# This function is registered under the name "custom_greeting"
# and does not use provenance-tracking.
@register_fn(name='custom_greeting', provenance=False)
def custom_greeting(name: str, **kwargs) -> str:
return 'Hi ' + name
# This function is registered under the name "greeting"
# and uses provenance-tracking.
# Note that this decorator has no parameters specified, so the default
# behavior is to register the function name and set provenance to True.
@register_fn
def greeting(name: Value[str], **kwargs) -> Value[str]:
return Value('Hi ') + name
Alternatively, you can register a UDF using the legacy approach of defining a register
function in any of the Python files in your scripts folder. Here’s an example:
def custom_function_fn(content, *args, **kwargs):
pass
def register(name_to_fn):
more_fns = {
'custom_function': {
'fn': custom_function_fn,
'ex': '', # Example usage of the function.
'desc': '' # Description of the function.
}
}
name_to_fn.update(more_fns)
To invoke the custom function, run as a custom Refiner or Apply UDF function:
custom_function(INPUT_COL) # INPUT_COL is a special variable that is defined in the execution environment
Subfolders in a scripts folder
You can have multiple subfolders within a scripts
folder. Script files in the subfolders can be imported and used in other script files. Relative paths are supported.
Restriction: Only script files at the root level of the scripts
folder can register custom functions to Flow.
Importing files in a scripts folder
Within the scripts
folder, use relative paths to import variables from one file to another file. Import is supported using this syntax: from <file> import <variables>
.
For example, if a script folder contains the following files:
my-user/my-repo/fs/Instabase Drive/samples/scripts/
|
+---python_file1.py
| function_1()
|
+---python_file2.py
| function_2()
|
+---folder1/
|
+---python_file3.py
| function_3()
|
+---python_file4.py
function_4()
From python_file1.py
, use the following statements to import function_2
and function 3
:
from .python_file2 import function_2
from .folder1.python_file3 import function_3
Files in subfolders in the scripts
folder can also import variables from other files. For example, from the python_file3.py
file, use the following statements to import function_2
and function_4
:
from ..python_file2 import function_2
from .python_file4 import function_4
Python files outside of the scripts
folder cannot be imported.
Invoking a UDF
UDFs are invoked in these general categories:
-
Scripts attached to a Flow step
-
Fetcher
-
Process Files
-
Map Records
-
Apply Classifier
-
Apply UDF
-
-
Scripts inside of extraction programs
-
Refiner programs (
.ibprog
) -
Sheet programs (
.ibsheet
)
-
-
Scripts run at specific times within a Flow
-
Custom Classifier
-
Pre and Post Run Custom Hooks
-
Runtime input variables
Runtime input variables contain information for your UDFs:
-
Context about your Flow (
root_output_folder
,input_filepath
). This information is about the entire Flow itself, containing the input file being processed and the root of the final output folder. -
Filesystem access using the
ibfile
object. -
Context for your step (
parsed_ibocr
,input_ibocr_record
). These details are about a particular file or record you are processing.
UDFs also have access to the runtime_config
, which is a user-defined dictionary of Strings passed into the binary at runtime. You can access this dictionary with the CONFIG
column in the kwargs
that are passed into the UDF:
runtime_configs, err = kwargs['_FN_CONTEXT_KEY'].get_by_col_name('CONFIG')
Because UDFs are executed in objects, functions, and in a variety of different contexts, you must specifically define the dictionary, object, or input variable that contains the input for your particular step or UDF hook point.
Use the following input variables to provide the runtime arguments to your UDF:
Input variable | Access with |
---|---|
Custom Classifier | ModelMetadataDict |
Custom Classifier in Metaflow | Special input variables |
Custom Fetcher | FetcherContext object |
Custom formulas in Refiner programs | Special input variables |
Custom formulas in Sheet programs | Special input variables |
Custom image filters | FilterConfigDict |
Map records | Special input variables |
Pre and Post Run Custom Hooks | FlowInfoDict |
UDF formula | Special input variables |
See Configuring Flow steps for details on these objects, dictionaries, and special input variables.
Importing other files in scripts directory
You can import modules from other files in the same scripts directories using absolute or relative paths.
Package payload
The package payload is the src
folder with Python files. This src
folder contains resources for the Refiner programs and UDFs for the Flow.
The src
folder can contain subfolders for package management.
For example, this src
folder hierarchy shows multiple undefined file and folder names in the root src
folder.
src/
|
|
+---__init__.py
|
|
+---python_file1.py
| function_1()
|
+---subpkg1/
|
+---python_file2.py
| function_2()
|
+---subpkg2/
|
+---python_file3.py
function_3()
Absolute module paths
If you are using absolute module paths, the root directory of the path is the location of the scripts folder. For example, if your scripts are located at user/repo/fs/Instabase Drive/files/samples/src/
, the root package name is src
. With this directory structure, the following examples are valid import statements:
import src
import src.python_file1
import src.subpkg1
from src.subpkg1 import python_file2
from src.subpkg1.python_file2 import function_2
Relative module paths
Alternatively, you can use relative paths to import other files in the src
folder.
For example, to use function_1()
in python_file2.py
:
from ..python_file1 import function_1
Custom Python modules
After your UDFs get beyond a certain complexity, you’ll likely have helper functions which are common across several script folders. How do you manage that complexity? Our recommendation is to use custom Python modules.
The supported Custom Python module roots are ib.market
and ib.custom
. A module root is like a namespace where you insert custom logic.
ib.market modules
ib.market
modules must be published to Marketplace as a pypkg solution.
Marketplace is an installation-wide distribution store. Modules accessed under ib.market
are common across the entire installation.
To create ib.market
modules, see Publishing developer packages.
Using ib.market modules in UDFs
To use the Python packages in UDFs, import them by prepending ib.market
:
import ib.market.<solution-name>
import ib.market.<solution-name>.<submodules>
from ib.market.<solution-name>.<submodules> import <obj-name>
where:
-
<solution-name>
is the name in thepackage.json
of the solution published on Marketplace -
<submodules>
is the module to import the desired object from, as contained in thesrc
folder hierarchy -
<obj-name>
is the name of the variable, function, class, and so on
For this example src
folder and a <solution-name>
of my_pkg
, these are valid import statements:
import ib.market.my_pkg
import ib.market.my_pkg.python_file1
from ib.market.my_pkg.python_file1 import function_1
from ib.market.my_pkg.subpkg1.python_file2 import function_2
from ib.market.my_pkg.subpkg1.subpkg2.python_file3 import function_3
Using a specific version of a package
A Python package can have multiple versions available in the Marketplace. However, a flow binary uses only one package version. By default, the import statement imports the latest version of a package.
-
To define a fixed version of a package, create an
ib_requirements.txt
file in the same folder as the UDF scripts. The format of theib_requirements.txt
file follows the Pythonrequirements.txt
file format. -
To import packages of specific versions, specify a list of
<pkg_name>==<version_num>
.
Using multiple versions of the same package in a Flow Binary is not supported.
Restrictions for ib.market
- All
ib.market
code must first be published to Marketplace.
ib.custom modules
All features with custom Python modules are fully supported across all places where UDFs are used.
Custom Python modules in the ib.custom
module must reside in the same filesystem drive as its intended use.
Modules under ib.custom
are imported from Python files stored in the .flow
folder of the filesystem drive where the ib.custom
import occurs. For example, if a UDF sits nested somewhere within my-user/my-repo/fs/Instabase Drive
, then files stored within my-user/my-repo/fs/Instabase Drive/.flow/ib/custom
will be available for import under the ib.custom
root module in that UDF.
To create the .flow/ib/custom
folder for a UDF nested somewhere within my-user/my-repo/fs/Instabase Drive
:
-
Navigate to
my-user/my-repo/fs/Instabase Drive
in the UI -
Create a folder named
.flow
, and navigate tomy-user/my-repo/fs/Instabase Drive/.flow
-
Create a folder named
ib
, and navigate tomy-user/my-repo/fs/Instabase Drive/.flow/ib
-
Create a folder named
custom
, and navigate tomy-user/my-repo/fs/Instabase Drive/.flow/ib/custom
Within that my-user/my-repo/fs/Instabase Drive/.flow/ib/custom
folder, you can add any Python files. For example, the hierarchy of a valid folder structure is:
my-user/my-repo/fs/Instabase Drive/.flow/ib/custom/
|
+---python_file1.py
| function_1()
|
+---subpkg1/
|
+---python_file2.py
| function_2()
|
+---subpkg2/
|
+---python_file3.py
function_3()
You can use the Python files in the filesystem drive .flow
folder by importing them in UDFs, prepending them with ib.custom
. For example:
import ib.custom.<submodules>
where:
-
<submodules>
is the module from which to import the desired object, as contained in the.flow/ib/custom
folder hierarchy -
<obj-name>
is the name of the variable, function, or class
Suppose you have string helpers called strutils.py
in my-user/my-repo/fs/Instabase Drive/.flow/ib/custom/common/strutils.py
and strutils.py
has a method inside of it called decode
. Then, all of the UDFs can refer to strutils.py
logic through a custom import, like:
from ib.custom.common.strutils import decode
def custom_use_libs_fn(val, **kwargs):
return decode(val)
def register(name_to_fn):
name_to_fn.update({
'custom_use_libs': {
'fn': custom_use_libs_fn
}
})
ib.custom module restrictions
- For a UDF in a particular workspace called
my-user/my-repo/fs/Instabase Drive/
, the root directory where all of the custom code resides ismy-user/my-repo/fs/Instabase Drive/.flow/ib/custom
.
Reading raw files
To enable a custom Python module to read raw files from the filesystem, use the load_file
function with ib.custom
modules.
from ib.custom.utils import load_file
def my_classifier_fn():
content, err = load_file('ib/custom/data/myfile.txt')
if err:
print('Reading was unsuccessful')
else:
print('Binary content for file {}'.format(content))
For example, you can store ML models or small files like scikit-learn models, and then load them with the load_file
function.
Store these files in the .flow/ib/custom
root drive.
For this example, store the myfile.txt
file in:
my-user/my-repo/fs/Instabase Drive/.flow/ib/custom/data/myfile.txt
Accessing resource files in UDFs
Use the Resource Reader to access an external file from within a UDF and dynamically locate your external files at runtime. Any file can be accessed using the Resource Reader, common files are CSV, JSON and XML. The Resource Reader is useful for loading external static information into a Flow.
Resource folder name
For the resource folder to be correctly loaded, the folder must be named _resources
. This reserved folder name defines that files in this folder can be used in UDFs.
Resource folder location
Resource folders are supported in Refiner, Flow, and Solution.
Resource folders are not supported for Flow Binaries.
-
For Flows, including Metaflows and Multiflows, the
_resources
folder must be in the same directory as the.ibflow
files. -
For Solutions, the
_resources
folder must be in the same packaging directory as the.ibflowbin
andpackage.json
files. -
For Refiner, the
_resources
folder location is specified in the UDF function when callingget_resource_reader(resources_path)
.- The path to the
_resources
folder is a relative path starting from the location of the.ibprog
Refiner program. When a Refiner program is used in a Flow, the_resources folder
in the Flow is used (the same directory as.ibflow
files).
For the following example folder structure, use
get_resource_reader('../../Workflows/_resources')
. - The path to the
resources_folder_test/
|
+---Samples/
|
+---prog/
|
+---resource_test.ibprog
|
+---Workflows/
|
+---_resources/
|
+---resources_test.ibflow
Using the resource reader client
To read files from the resource folder, you must use the resource reader client. To retrieve this client from within a UDF, call get_resource_folder()
on the FNContext
object.
When calling this resource reader with Refiner, specify the path to the resources folder relative to the .ibprog
file.
Loading files from the resource folder
To read files using the resource reader client, call load_file(filepath)
on the resource reader client. The path given to the function is relative to the resource folder. This function returns the bytes of the file and, potentially, an IBError object if an error occurs.
Writing files to the resource folder
To write files using the resource reader client, call write_file(filepath, contents)
on the resource reader client. Again, the path given to the function is relative to the resource folder. This function writes the bytes given in contents
to the file at filepath
and returns a tuple that specifies if the write was successful and, potentially, an IBError if an error occurs.
Resource reader example in a UDF
First, get the clients that the resource_reader calls, pass it in the formula or call it from _FN_CONTEXT_KEY
:
def custom_resource_fn(**kwargs):
clients, err = kwargs['_FN_CONTEXT_KEY'].get_by_col_name('CLIENTS')
resource_reader = clients.resource_reader
After you get the client:
def custom_resource_fn(**kwargs):
resource_reader = clients.resource_reader
contents, ib_err = resource_reader.load_file('resource.txt')
success, ib_err = resource_reader.write_file('other_resource.txt', b'contents of this file')
return contents
def register(name_to_fn):
more_fns = {
'custom_fn_name': {
'fn': custom_resource_fn,
'ex': '',
'desc': ''
}
}
name_to_fn.update(more_fns)