Apply UDF
Configuration options for the Apply UDF step.
Input File Extension
The filename extension that identifies the file type to process.
Output File Extension
The filename extension that identifies the file type to generate as output of this step.
Input Folder
Select the input folder that contains the files.
Output Folder
Select the output folder to store the generated output.
Formula
Type or paste a registered custom function. Each function can also use special runtime variables.
Special input variables
The input variables that are available to UDF formulas are:
INPUT_COL
is the raw text- If the Input File Type is
IBOCR
, thenINPUT_COL
corresponds to the IBOCR object as text. Use the ParsedIBOCR object to access its fields.
- If the Input File Type is
INPUT_FILEPATH
is the full path to the current document being processed- For example,
/user/repo/fs/Instabase Drive/path/to/input/file1.pdf
.
- For example,
ROOT_OUTPUT_FOLDER
is the absolute path to the Flow’s output directory.CONFIG
is a set of key-value pairs that are dynamically passed at runtime into a flow binary.- An example runtime config:
{"key1": "val1", "key2": "val2"}
- An example runtime config:
CLIENTS
is an object that contains all of the clients the UDF has access to. This object contains theibfile
object, whose API is compatible with the Instabase Python notebook API. To see the supported methods, see IBFile.REFINER_FNS
is an object that provides an API for executing Refiner functions within the UDF. To see the supported method, see REFINER_FNS.TOKEN_FRAMEWORK_REGISTRY
is an object for interfacing with the TokenMatcher capabilities. For supported methods, see TokenFrameworkRegistry.
Logging
Use Python’s standard logging
library to log messages from an Apply UDF step. You can filter to see only the logs from UDFs by selecting the “Show Developer Logs Only” option.
Note: Flow logs currently have a size limit of 20MB per job ID by default. As a good practice, avoid logging binary values (like images), entire IBDOCs, or extraction results that might contain PII. Logs are stored in the file system.
Note: Logging in UDFs used to be done by the LOGGER object from function context. Although LOGGER
is still supported, we recommend you to directly use the logging
library from Python now.
Extra settings
Click Extra Settings to access these configuration settings.
Scripts directory
When this folder is selected, all .py
files are used to refine the output.