Developing flows

Table of Contents

Flow development involves configuring the steps of your workflow and testing your flow using sample documents. Flows integrate the modules you create in other Instabase apps, including classification and extraction models, refiners, validations, and other code.

If the documents you’re processing match an existing Marketplace solution, you might be able to skip or fast-track flow development by basing your solution on a Marketplace offering.

Flow development is typically handled by a solution developer. See the Solution Builder guide for information about creating a basic flow.

Planning a flow

Flow steps occur in a specific order, but whether you include certain steps depends on your document processing requirements.

All flows begin with the process files step, which performs OCR on sample documents. If your sample documents include multiple records, such as in multipage PDFs, you then map records. To classify different types of documents, you next apply classifier After records are classified, it’s a good idea to verify classification with apply checkpoint. Then, you can filter records into different branches, where they’re ready for the run extraction model step. After extraction, you can refine or validate data, redact fields, and perform other data processing tasks as needed.

A basic flow that includes all available pre-built steps might be ordered something like this:

Process Files — Converts various document formats into machine-readable text.
Map Records — Splits multipage documents into separate records.
Apply Classifier — Identifies record type, or class, using a classification model.
Apply Checkpoint — Verifies details identified in a previous step according to validation formulas, and triggers a review for failed validations.

Branch flow into multiple document streams based on class.
Filter — Filters records based on class. Place a filter at the top of each branch, and for the filter parameter, specify the document class that you want to allow through the filter.

Tip

If you don’t validate classification before branching and filtering, or if your production flow might include unclassifiable documents, create an additional branch that filters for other, which is the class Instabase assigns by default to documents that can’t be classified.
Run Extraction Model — Extracts data from records using extraction models.
Apply Refiner — Reformats extracted data according to your specifications.
Apply Checkpoint — Verifies details identified in a previous step according to validation formulas, and triggers a review for failed validations.
Apply Redactor to Refined Fields — Redacts extracted data.
Doc Gen — Generates Word documents that report extracted data.
Combine — Combines branches into singular flow output.

Tip

By default, steps are named according to their step type. You can change step names to make flows easier to understand by clicking the edit (pencil) icon in the step panel. For example, it can be helpful to rename filter steps to indicate what class of records are allowed through in a given branch.

Flow modules

These are the flow steps and events that integrate modules or code created in other Instabase apps.

Info

Importing a module into a flow copies the module’s code to the flow’s modules folder, so any future edits you make to the original module aren’t reflected in your flow.

Step or event	Module	Description
Process Files	Reader	Optionally specifies custom OCR configurations created in Reader.
Apply Classifier	Classification model	Specifies a classification model created in ML Studio.
Apply Checkpoint	Checkpoint	Specifies validations created in the Validations app.
Run Extraction Model	Extraction model	Specifies an extraction model created in ML Studio.
Apply Refiner	Refiner	Specifies refiner formulas created in the Refiner app.
Doc Gen	Doc Gen	Specifies a `field_mapper.json` configuration file with field mappings.
Map UDF, Reduce UDF, Pre-Flow UDF, Post-Flow UDF	UDF	Specifies a custom Python script.

How checkpoints work and where to position them

Checkpoints are defined by the apply checkpoint step. Checkpoints verify classification or extraction data against validation rules that you write. In production, checkpoints that fail validation are queued for review by a human reviewer.

Checkpoints are typically inserted after classification and extraction steps to verify document class and extracted data. As a best practice, use a checkpoint after classification but before branching to ensure that records are routed to the correct branch for data extraction.