Flow V2 guide
Instabase’s term for an automation pipeline is a Flow.
After you’ve built a Flow, the result is a reusable pipeline that can be used to perform repetitive processes on similar document types. A Flow’s solution can also be used as an API within a real-time processing system.
In this guide, we will create a Flow that processes paystubs.
By the end of this guide, you should be able to:
- Describe and identify an Instabase project template
- Create a basic Refiner project template
- Add simple extraction and business logic to your project template
- Create an end-to-end workflow around your project template
This guide covers legacy Flow V2. For a guide to working with the latest version of Flow, see Developing flows.
Prerequisites
For this exercise, we’ll be working with ADP and Gusto paystubs. You can download them here:
1. Instabase workspaces
An Instabase workspace is a directory where you store your files and data. If you’re familiar with git
, workspaces used to be called repositories, and accomplish the same purpose.
Creating a workspace
Activity
-
Log in to Instabase
-
On the left modal, select the Instabase icon, hover over Workspaces, and then select New Workspace in the top right of your window
-
Keep the default
owner
, which is your user, and add a workspace name (for example,practice-flow
) -
Add a short description (REQUIRED)
-
Leave
Private
selected, then Create Workspace. Your workspace will be created with two default folders:files
, andnotebooks
-
For this guide, we don’t need this structure. Remove both
files
andnotebooks
by right-clicking, and selecting Delete
2. Introducing Instabase project templates
Barring certain customizations and edge cases, Instabase requires you to structure your files and data in a specific format. We’ll refer to this structure as a “project template” for the rest of this guide.
You might have a project template for obscuring sensitive data on tax forms, or for getting personal details off of drivers licenses.
When you are in an Instabase workspace, you can create a new project from a set of the most commonly used project templates. A few examples include:
- OCR: for transforming images into text
- Redactor: for obscuring sensitive information in images
- Refiner: for extracting relevant information from text documents
Generating a project template
When you create a new project template, all of the folders and files you need for a project of that type are automatically created.
The project template that you’ll create in this guide will be used to extract information from ADP and Gusto paystubs.
In practice, humans know what an ADP paystub file looks like. If we gather a few of these paystubs together, we can train Instabase on how to find an ADP file just as easily. We can do the same for many different types of files that must be stored in a specific directory structure.
Activity
-
At the top of your workspace, select the New dropdown.
-
Hover over New project, then select General Extraction Project.
-
Name your project “ADP Paystub Extractor” and select Create Project. Your project’s parent folder that will contain your new project is
Instabase Drive
. -
You should’ve already downloaded a few ADP Paystubs found in the prerequisites section above. On your computer, make sure you’ve unzipped the
adp-paystubs
directory. -
In the drop-down menu, select Choose folder from computer, select Select From Computer, navigate to the
adp-paystubs
folder that you unzipped earlier, select Upload, and select Upload again. -
The upload process can take a minute or two. When it’s done, select View Project to open the project creation results.
3. .ibflow (the long way)
Project template or not, the assembly of an Instabase Flow is represented by a specific file type, called .ibflow
.
Nearly every Flow contains the following structure:
-
Process Files is a universal document intake step. It is more than just Optical Character Recognition: it also converts files and homogenizes their output type.
-
Map Records, which allows you to re-organize files, slices up the incoming documents into the record boundaries you prefer. For example, if two (or more) paystubs are stored on the same input file, you can define this sort for your Flow, allowing proper segmentation. This step isn’t needed for every dataset, but it is good practice to include.
-
The repetitive process that you’re automating. This might be redaction, refinement, or any of the other App functionality found on Instabase.
-
Merge Records, which combines all of your processed files into a single tidy report and output.
Flow has a specific convention for naming the output of each step. The number of the step is provided (s1, s2, etc) as well as the name of the step. It is important to become familiar with Flow output directories because the output of s1_process_files
and s2_map_records
are often used as the input to other Instabase Apps, such as Refiner.
Running a Flow
We still haven’t created any processes with our files, but this exercise will allow us to access our existing .ibflow
file and observe the results of a processed Flow.
Activity
-
Return to your root folder,
Instabase Drive
, expand theADP Paystub Extractor
folder, and selectworkflow.ibflow
to open the Flow. Notice the structure mentioned in the section above. -
In the upper-right corner of the page, select Tools > Run.
-
We’ll be running this Flow on our input data, so after selecting Choose Folder, select the folder called input, then select Open.
-
Select Run. A blue modal will appear as the Flow runs, and a green modal will appear when the Flow completes. It’s okay if you receive an error message here, as we’ve skipped the third step.
-
To view the output, do one of these actions:
- Select View output in the green modal.
- Return to ADP Paystub Extractor project using the breadcrumbs at the top of the screen, and then expand or select the out folder.
-
Notice that each step of the Flow is now represented as a folder within your project’s file structure.
-
Return to the
ADB Paystub Extractor
directory and open theviewme.ibrecipebook
file.
4. Viewing a project template
Each type of “project” that you create on Instabase is assembled with different key ingredients, though some projects might share a step like “View OCR results”. Each project template is organized to keep your focus on the important pieces—data, your functions, and your output—rather than worry about Instabase’s rules and file structures under the hood.
Navigating a project template
You’re not performing any actions on your data in this activity. It’s a brief tour to give you an understanding of what’s going on under the hood.
Let’s view each of the three default steps of the Refiner project template.
Activity
-
From the
viewme.ibrecipebook
file, select View under the “View OCR Results” section. This takes you to thes2_map_records
directory that we saw in the previous activity.- Each input file now has a corresponding
.ibdoc
file that is the image file paired with its OCR data. The.ibdoc
file is the standard Instabase file format that is the output of the process files step. - Select any of these files to launch the Review OCR app that provides a window into OCR results, as well as any data extraction that has occurred.
- Toggle between the image and the text that Instabase extracted from the image by selecting the image icon and the A icon. Notice how the extraction preserved the text’s spacing and structure. How did it handle the tricky diagonal “THIS IS NOT A CHECK” text?
- Each input file now has a corresponding
-
Return to the
ADB Paystub Extractor
directory, open theviewme.ibrecipebook
file, and in the Edit/Run Flow section, select Edit to edit the.ibflow
file we viewed in the previous exercise. In the future, you can use this button to get to your Flow process instead of worrying about where things are located within your Instabase file structure. -
Return to the
ADB Paystub Extractor
directory, open theviewme.ibrecipebook
file, and in the Edit Refiner 5 program section, select Edit to open a spreadsheet that we’ll edit in the next activity.
5. Automating a Refiner process
Refinement in Instabase is synonymous with extraction. Do you have a lot of files and you want to refine them to distill key values like “Name” or “Net Pay”? You accomplish this distillation with a Refiner Flow.
Refining Paystubs
Refiner has many features, as well as a plugin system, that we’ll delve into later. For now, we will explore how to create a field. A field is an entity we might want to extract, like a name
or a pay_date
.
Activity
-
On the page that you were on when you finished the last activity, select + New Field to add a new text field.
-
From the bottom pane, you can edit, rename, and test fields. Try changing the value of Field name from
field_1
togreeting
. -
In the text field on the left side of the bottom pane, type
echo('hello')
. Notice that the right side of the bottom pane updates with relevant function documentation.echo
is a standard function that simply means “write” or “post”.'hello'
(in single quotes—double will error), is just a value (argument) that we’re askingecho
to post. -
Select Run Field to populate this field for all the documents
-
Save this Refiner program by selecting Save in the top-right corner of the page.
6. Updating an existing Flow
As you add new files, edit existing functions, and cultivate outputs, you’ll sometimes find it valuable to modify and re-run your Flows.
A completed Flow
Activity
-
Return to your
viewme.ibrecipebook
file, and select Edit under Edit/Run Flow. -
At the top right of your
workflow.ibflow
view, select Tools > Run. -
For “Input Folder”, select Choose folder, then
input
, then Open. Finally, select Run. -
Return to your
Instabase Drive
, then unfurl theout
folder. Now, you’ll see a new addition,s3_apply_refiner
. -
Navigate to the
s3_apply_refiner
folder and open one of the results to review your extracted output. You should see values forgreeting
, the field you created in the Refiner program. -
To view all of these fields combined, you can find the
out.ibocr
file, ins4_merge_files
. Here, you’ll see all extracted values for the set of documents.
Conclusion
That’s it! You’ve successfully built an end-to-end Flow that allows you to process ADP Paystubs.
You should feel able to complete the following tasks:
- Describe and identify an Instabase project template
- Create a basic Refiner project template
- Create an end-to-end workflow around your project template
If not, reach out to us at training@instabase.com. We’d love to chat about any questions, comments, or concerns that you might’ve had in completing this guide.
Next steps
If you’re feeling advanced, try to build another Flow without guidance! We included some Gusto paystubs in the pre-requisites for you to practice on.