Flow API
Use Flow API to run a flow step or a flow, get flow execution status, export flow output, and more.
Run a step
Use this API to run a step in a flow.
Request
Send a POST
request to url_base/flow/run_step_async
with the post body encoded as JSON:
POST url_base/flow/run_step_async
{
"ibflow_path": "instabase/path/to/flow/file.ibflow",
"step_index": 1
}
-
ibflow_path: The path reference to your flow file
-
step_index: The step index to execute. Valid values are 0 to len(steps) - 1.
The HTTP call returns immediately while the step proceeds asynchronously in the background.
Response
The HTTP call returns immediately while the step execution proceeds asynchronously in the background.
If successful, the response contains a job_id
field that you can use to check the status of the execution.
See the Job Status API.
HTTP STATUS CODE 200
# body
{
"status": "OK",
"data": {
"job_id": <string>,
"output_folder": <string>
}
}
The response body is a JSON object with the following fields:
-
status
:"OK"
-
data
: A JSON object with the following fields:-
job_id
: A unique identifier for the job. -
output_folder
: The full path to the root output folder.
-
Run a fully configured step
Use this API to run a standalone process files step.
Request
Run a fully configurable step by sending a POST
to url_base/flow/run_step_async
with the post body encoded as JSON:
POST url_base/flow/run_step_async
{
"step_json_str": '{"kwargs": {}, "step_name": "process_files"}'
}
step_json_str
: The fully configurable step is provided in JSON as a string instep_json_str
:
{
"step_name": "process_files",
"kwargs": {
"input_folder": "/owner/repo/drive/fs/files/input",
"output_folder": "owner/repo/drive/fs/files/out",
"process_type": "images_to_txt",
"settings": {
"ocr_page_type": "high_quality_doc",
"output_format_layout": "layout_per_page",
"page_range_str": "",
"encryption_config": "",
"ocr_config": ""
}
}
}
The kwargs
map keyword arguments:
-
input_folder
: Absolute path to input directory -
output_folder
: Absolute path to output directory -
process_type
: Which extensions to process:-
“auto_to_txt” - identify the extension to process for each file
-
“images_to_txt” - process all image files, for example, pdf, tif, jpeg, and so on
-
“pdf_to_txt”- process only pdf
-
-
settings
: A map of extra settings for the Process File step.-
ocr_page_type
: “default”, “high_quality_doc”, or “low_quality_doc”. We recommend using the default OCR model that is automatically selected based on the page being analyzed. -
output_format_layout
: “layout_per_page” or “layout_per_doc” -
page_range_str
: Restrict page ranges, for example, “1-5” -
encryption_config
: A JSON string that is related to opening encrypted files -
ocr_config
: A JSON string that is related to OCR flags
-
For details on extra settings, see the Process Files config options.
Run a flow
Method | Syntax |
---|---|
POST | URL_BASE/api/v1/flow/run_flow_async |
Description
Run a flow.
Request parameters
Parameters are required unless marked as optional.
Name | Type | Description |
---|---|---|
ibflow_path |
string | The path reference to your flow file. |
input_dir |
string | The folder containing the data to run the flow on. |
output_dir |
string | Optional. The folder containing the output files. Defaults to input_dir/../out/ . |
output_has_run_id |
boolean | Optional. Whether output should be written into a directory with a unique, timestamped run_id . Defaults to false. |
delete_out_dir |
boolean | Optional. Whether to delete any existing content in the output folder, before running the binary. Is nonoperational if output_has_run_id is true. Defaults to false. |
log_to_timeline |
boolean | Optional. Enable developer logs from Refiner and UDFs. Defaults to false. |
notification_emails |
list | Optional. List of emails that will be notified when the run is completed. |
tags |
list | Optional. List of string tags to attach with this flow run. Flow runs can be later searched using these tags using the Flow dashboard or using the List API. |
step_timeout |
integer | Optional. Timeout in seconds for each flow step. When set to 0, the platform picks an appropriate timeout for each step; when set to -1, step timeouts are disabled. Defaults to 0. |
pipeline_ids |
list | Optional. Associate current flow run with a pipeline. Define a list of pipeline IDs, which can be retrieved from the Flow Pipeline API. |
save_binary_to_output |
boolean | Optional. Save the flow binary in the output folder. Defaults to true. |
webhook_config |
dict | Optional. Configure a webhook URL that will be notified on flow completion. See the Webhook Configuration section for more information. |
webhook_config/url |
string | Webhook URL to which a notification event will be sent when the run is completed. |
webhook_config/headers |
dict | Optional. Provide headers for the notification event as a dictionary, where each key-value pair represents a header name and its corresponding value. |
The Flow API allows you to run multiple instances of a flow at the same time. To avoid write conflicts, we recommend using the output_has_run_id
option, which places the output of each flow in a separate directory.
Response schema
All keys are returned in the response by default, unless marked as optional.
Key | Description |
---|---|
status |
Status of the request. Possible values are: OK, ERROR |
data/job_id |
A unique identifier for the job. |
data/output_folder |
The full path to the root output folder. |
Examples
Request
url = url_base + '/api/v1/flow/run_flow_async'
args = {
'input_dir': "jaydoe/my_repo/fs/Instabase Drive/flow_proj/data/input",
'ibflow_path': "jaydoe/my_repo/fs/Instabase Drive/flow_proj/my_flow.ibflow",
'output_has_run_id': True,
}
json_data = json.dumps(args)
headers = {
'Authorization': 'Bearer {0}'.format(token)
}
r = requests.post(url, data=json_data, headers=headers)
resp_data = json.loads(r.content)
Response
The HTTP call returns immediately while the flow execution proceeds asynchronously in the background. If successful, the response contains a job_id
field that you can use to check the status of your execution.
{
"status": "OK",
"data": {
"job_id": "756be65b-0eaf-4192-bea5-176f0377b0f8",
"output_folder": "jaydoe/my_repo/fs/Instabase Drive/flow_proj/data/run_jaydoe/2023-04-12-20:22:36/756be65b-0eaf-4192-bea5-176f0377b0f8/out"
}
}
Restart a flow
Method | Syntax |
---|---|
POST | URL_BASE/api/v1/flow/restart |
Description
Restart a flow from the beginning, under a new job ID.
Request parameters
Name | Type | Description |
---|---|---|
job_id |
string | The ID of the job to restart. |
Response schema
All keys are returned in the response by default, unless marked as optional.
Key | Description |
---|
data/output_folder
| The full path to the root output folder. |
Examples
Request
url = url_base + '/api/v1/flow/restart'
args = {
'job_id': "b8f94216-c271-44f2-982c-3e279c6b446d",
}
json_data = json.dumps(args)
headers = {
'Authorization': 'Bearer {0}'.format(token)
}
r = requests.post(url, data=json_data, headers=headers)
resp_data = json.loads(r.content)
Response
If successful, the response contains the job_id
of the new job as well as its output folder.
{
"status": "OK",
"data": {
"job_id": "756be65b-0eaf-4192-bea5-176f0377b0f8",
"output_folder": "jaydoe/my_repo/fs/Instabase Drive/flow_proj/data/run_jaydoe/2023-04-12-20:22:36/756be65b-0eaf-4192-bea5-176f0377b0f8/out"
}
}
Run a metaflow
Method | Syntax |
---|---|
POST | URL_BASE/api/v1/flow/run_metaflow_async |
Description
Run a metaflow.
Request parameters
Parameters are required unless marked as optional.
Name | Type | Description |
---|---|---|
input_dir |
string | The folder that contains the data to run the metaflow on. |
flow_root_dir |
string | The path to a folder that contains the .ibflow files. |
classifier_file_path |
string | The path to the .ibclassifier file. |
delete_out_dir |
boolean | Optional. Whether to delete any existing content in the output folder, before running the binary. Is a no-op if output_has_run_id is true. Defaults to false. |
output_has_run_id |
boolean | Optional. Whether output should be written into a directory with a unique, timestamped run_id. Defaults to false. |
Response schema
All keys are returned in the response by default, unless marked as optional.
Key | Description |
---|---|
status |
Status of the request. Possible values are: OK, ERROR |
data/job_id |
A unique identifier for the job. |
data/output_folder |
The full path to the root output folder. |
Examples
Request
url = url_base + '/api/v1/flow/run_metaflow_async'
api_args = {
'input_dir': "/jaydoe/my_repo/fs/Instabase Drive/flow_proj/data/input",
'flow_root_dir': "/jaydoe/my_repo/fs/Instabase Drive/flow_proj/workflows",
'classifier_file_path': "/jaydoe/my_repo/fs/Instabase Drive/flow_proj/classifiers/my_classifier.ibclassifier",
'delete_out_dir': False,
'output_has_run_id': True,
}
json_data = json.dumps(api_args)
headers = {
'Authorization': 'Bearer {0}'.format(token),
}
r = requests.post(url, data=json_data, headers=headers)
resp_data = json.loads(r.content)
Response
The HTTP call returns immediately while the binary execution proceeds asynchronously in the background. If successful, the response contains a job_id
field that you can use to check the status of the job execution.
HTTP STATUS CODE 200
# body
{
"status": "OK",
"data": {
"job_id": <string>,
"output_folder": <string>
}
}
Job status
Use the job status API to get the execution status of a flow or metaflow job.
Refer to the Job Status API documentation.