Developing flows
Flow development involves configuring the steps of your workflow and testing your flow using sample documents. Flows integrate the modules you create in other Instabase apps, including classification and extraction models, refiners, validations, and other code.
If the documents you’re processing match an existing Marketplace solution, you might be able to skip or fast-track flow development by basing your solution on a Marketplace offering.
Flow development is typically handled by a solution developer. See the Solution Builder guide for information about creating a basic flow.
Planning a flow
Flow steps occur in a specific order, but whether you include certain steps depends on your document processing requirements.
All flows begin with the process files step, which performs OCR on sample documents. If your sample documents include multiple records, such as in multipage PDFs, you then map records. To classify different types of documents, you next apply classifier After records are classified, it’s a good idea to verify classification with apply checkpoint. Then, you can filter records into different branches, where they’re ready for the run extraction model step. After extraction, you can refine or validate data, redact fields, and perform other data processing tasks as needed.
A basic flow that includes all available pre-built steps might be ordered something like this:
-
Process Files — Converts various document formats into machine-readable text.
-
Map Records — Splits multipage documents into separate records.
-
Apply Classifier — Identifies record type, or class, using a classification model.
-
Apply Checkpoint — Verifies details identified in a previous step according to validation formulas, and triggers a review for failed validations.
Branch flow into multiple document streams based on class.
-
Filter — Filters records based on class. Place a filter at the top of each branch, and for the filter parameter, specify the document class that you want to allow through the filter.
TipIf you don’t validate classification before branching and filtering, or if your production flow might include unclassifiable documents, create an additional branch that filters for
other
, which is the class Instabase assigns by default to documents that can’t be classified. -
Run Extraction Model — Extracts data from records using extraction models.
-
Apply Refiner — Reformats extracted data according to your specifications.
-
Apply Checkpoint — Verifies details identified in a previous step according to validation formulas, and triggers a review for failed validations.
-
Apply Redactor to Refined Fields — Redacts extracted data.
-
Doc Gen — Generates Word documents that report extracted data.
-
Combine — Combines branches into singular flow output.
By default, steps are named according to their step type. You can change step names to make flows easier to understand by clicking the edit (pencil) icon in the step panel. For example, it can be helpful to rename filter steps to indicate what class of records are allowed through in a given branch.
Flow modules
These are the flow steps and events that integrate modules or code created in other Instabase apps.
Importing a module into a flow copies the module’s code to the flow’s modules folder, so any future edits you make to the original module aren’t reflected in your flow.
Step or event | Module | Description |
---|---|---|
Process Files | Reader | Optionally specifies custom OCR configurations created in Reader. |
Apply Classifier | Classification model | Specifies a classification model created in ML Studio. |
Apply Checkpoint | Checkpoint | Specifies validations created in the Validations app. |
Run Extraction Model | Extraction model | Specifies an extraction model created in ML Studio. |
Apply Refiner | Refiner | Specifies refiner formulas created in the Refiner app. |
Doc Gen | Doc Gen | Specifies a field_mapper.json configuration file with field mappings. |
Map UDF, Reduce UDF, Pre-Flow UDF, Post-Flow UDF | UDF | Specifies a custom Python script. |
How checkpoints work and where to position them
Checkpoints are defined by the apply checkpoint step. Checkpoints verify classification or extraction data against validation rules that you write. In production, checkpoints that fail validation are queued for review by a human reviewer.
Checkpoints are typically inserted after classification and extraction steps to verify document class and extracted data. As a best practice, use a checkpoint after classification but before branching to ensure that records are routed to the correct branch for data extraction.