Job API
Potentially data-heavy workflows, such as running a flow or refiner, can have long run times. As a result, these jobs are run asynchronously: when you call the API, the job is started and a job ID is returned, instead of an immediate result. You can use this job ID with the Job API to read and modify the status of the jobs.
In this document, URL_BASE
refers to the root URL of your Instabase instance, such as https://www.instabase.com
.
Job Status
Method | Syntax |
---|---|
GET | URL_BASE/api/v1/jobs/status |
Description
Use this API to get the current status of a job and information about the results when the job is done.
Request parameters
Parameters are required unless marked as optional.
Name | Type | Description | Values |
---|---|---|---|
job_id |
string | job_id reported by the initial execution request. |
|
type |
string | Job type of the job. | flow, refiner, job, async, group |
The most commonly used type is flow
.
Response schema
All keys are returned in the response by default, unless marked as optional.
Key | Description | Value |
---|---|---|
status |
Status of request. | OK , ERROR |
msg |
Job status message. | |
state |
Job state. | PENDING , DONE , COMPLETE |
job_id |
The unique identifier for the job. | |
results |
Job results information. Present only if job is in a terminal state: PAUSED , COMPLETE , FAILED , CANCELLED , STOPPED_AT_CHECKPOINT |
|
results/status |
Job completion status. | OK , ERROR |
results/output_folder |
Job output folder. Present if status equals OK |
|
results/msg |
Job error message. Present if status equals ERROR |
|
cur_status |
Job type specific JSON-encoded string containing current status of job. | |
cur_status/status |
The current state of the Flow job. | RUNNING , PAUSED , COMPLETE , FAILED , CANCELLED , STOPPED_AT_CHECKPOINT |
cur_status/finish_timestamp |
10-digit Unix timestamp of Flow end. This field is not available for RUNNING flows. |
|
cur_status/curProgress |
Progress of the job. | [0-1] |
cur_status/reviewer |
Assigned reviewer of job. Empty if none. | Instabase username |
cur_status/review_state |
Current state of job in the review process | NONE , IN REVIEW , COMPLETED , NOT_COMPLETED |
cur_status/flow_job_metrics |
Metrics for the flow job | |
cur_status/flow_review_metrics |
Metrics for the flow review | |
cur_status/run_summary/recordsWithMsg |
The number of records that failed jobs grouped by failure type. | |
cur_status/run_summary/numRecords |
The number of records processed. | |
cur_status/run_summary/numRuntimeErrors |
The number of execution errors. | |
cur_status/run_summary/numFiles |
The number of files processed. | |
cur_status/finish_timestamp |
Finish timestamp of the job. | null if still running. |
State Diagram
State diagram for Flow V3 jobs:
Examples
Request
import json, requests, time
job_id = 'uuid_from_run_binary_async_api_call'
url = url_base + f'/api/v1/jobs/status?job_id={job_id}&type=flow'
headers = {
'Authorization': 'Bearer {0}'.format(token)
}
still_running = True
while still_running:
r = requests.get(url, headers=headers)
resp = json.loads(r.content)
still_running = (resp['status'] == 'OK') and (resp['state'] != 'DONE')
print('still running')
time.sleep(1)
print('Job finished!')
print(resp)
Response
{
"status": "OK",
"msg": "Completed single_flow",
"state": "DONE",
"is_waiting_for_resources": false,
"job_id": "c29e1c75-f7a3-4bf0-a1e8-2ad664b35fa5",
"results": [
{
"status": "OK",
"output_folder": "/jaydoe/my-repo/fs/Instabase Drive/flow/out"
}
],
"cur_status": "{\"job_id\": \"c29e1c75-f7a3-4bf0-a1e8-2ad664b35fa5\", \"flow_index\": 0, \"status\": \"COMPLETE\", \"finish_timestamp\": 1659728573413224848, \"curProgress\": 1.0, \"curMsg\": \"Completed single_flow\", \"reviewer\": \"jaydoe\", \"review_state\": \"IN_REVIEW\", \"flow_job_metrics\": null, \"flow_review_metrics\": null, \"run_summary\": {\"numRecords\": 4, \"numRuntimeErrors\": 0, \"numCheckpointFailed\": 0}}",
"finish_timestamp": 1659728573413224848,
"binary_mode": true
}
Job Logs
Method | Syntax |
---|---|
GET | URL_BASE/api/v1/jobs/get_logs |
Description
Use this API to get the logs of the job.
Request parameters
Parameters are required unless marked as optional.
Name | Type | Description | Values |
---|---|---|---|
job_id |
string | job_id reported by the initial execution request. |
|
offset |
int | Optional. Page offset at which to start fetching logs from. |
The most commonly used type is flow
.
Response schema
All keys are returned in the response by default, unless marked as optional.
Key | Description | Value |
---|---|---|
logs |
Array of logs. | OK , ERROR |
next_offset |
Next page offset to fetch logs from. |
Examples
Request
import requests
job_id = 'uuid_from_run_binary_async_api_call'
url = url_base + f'/api/v1/jobs/get_logs?job_id={job_id}'
headers = {
'Authorization': 'Bearer {0}'.format(token)
}
r = requests.get(url, headers=headers)
resp = json.loads(r.content)
print(resp)
Response
{
"logs": [
{"job-id": "4700d0b5-e1ce-4eb2-a265-39e2e63cc65a", "task-id": "4700d0b5-e1ce-4eb2-a265-39e2e63cc65a-Stage1", "level": "INFO", "ts": "2022-12-07 23:27:54,636", "trace_id": "86a9bc25a1e8a436", "span_id": "5c63c2bd3fed5a90", "log": "Starting Task"},
{"job-id": "4700d0b5-e1ce-4eb2-a265-39e2e63cc65a", "task-id": "4700d0b5-e1ce-4eb2-a265-39e2e63cc65a-Stage1", "level": "INFO", "ts": "2022-12-07 23:27:54,723", "trace_id": "86a9bc25a1e8a436", "span_id": "5c63c2bd3fed5a90", "log": "Initialized flow datastore"}
],
"next_offset": 1
}
List Jobs
Method | Syntax |
---|---|
GET | URL_BASE/api/v1/jobs/list |
Description
Get a list of Flow binary jobs.
Request parameters
Parameters are required unless marked as optional.
Parameter | Type | Description | Values |
---|---|---|---|
limit |
integer | Optional. The maximum number of Flow jobs to return. | jobs limit per response (default 20) |
offset |
integer | Optional. Initial Flow job index to start returning jobs from. Used for pagination with limit. | starting index (default 0) |
from_timestamp |
integer | Optional. 10-digit Unix timestamp. Returns all jobs started after this timestamp. | starting timestamp (default is one week before current timestamp) |
to_timestamp |
integer | Optional. 10-digit Unix timestamp. Returns all jobs started before this timestamp. | ending timestamp (default is current timestamp) |
state |
string | Optional. Returns Flow jobs in any of the input states. When not passed, all Flow jobs are returned. | A comma-separated list of PENDING , COMPLETE , FAILED , CANCELLED , RUNNING , PAUSED , CHECKPOINT_FAILED |
user |
string | Optional. Returns only the Flow jobs started by this user. When a username is not passed, all Flow jobs for all users are returned. | Instabase username |
priority |
integer | Optional. Returns all the jobs with the given priority. | 0-9 |
tags |
string | Optional. Returns all Flow jobs that were started with any of the input tags. See how to attach tags to Flow jobs in Run a Flow Binary API. | A comma-separated list of tags |
pipeline_ids |
string | Optional. Returns all Flow jobs that are associated with any of the input pipelines. | A comma-separated list of pipeline ids |
review_state |
string | Optional. Returns all Flow jobs that are in any of the given review states. | comma-separated list of NONE , IN REVIEW , COMPLETED , NOT_COMPLETED |
job_id |
string | Optional. ID associated with each job, or the partial ID. Returns the singular Flow job associated with that job id. | A valid job id |
job_ids |
list | Optional. Returns all Flow jobs associated with any of the job ids passed in the list. | A comma-separated list of job ids. |
reviewer |
string | Optional. Returns all Flow jobs that were reviewed and are being reviewed by this user. | Instabase username |
Response schema
Key | Type | Description | Value |
---|---|---|---|
jobs |
list | List of jobs. | |
jobs/curMSG |
string | A message with details on Flow execution status. | |
jobs/state |
string | The current state of the job. | RUNNING , COMPLETE , FAILED , CANCELLED , STOPPED_AT_CHECKPOINT |
jobs/tags |
list | A list of tags that are attached to the given Flow job. | A list of tags associated with the job |
jobs/job_id |
string | The unique identifier for the job. | |
jobs/input_folder |
string | Input folder containing the data for the flow. | A valid filepath |
jobs/output_folder |
string | Output folder containing the results of each step of the flow. | A valid filepath |
jobs/source_path |
string | Path to the Flow binary. | A valid filepath |
jobs/flow_type |
string | The flow type. | single_flow or metaflow or flows |
jobs/is_flow_v3 |
boolean | Whether the Flow is a v3 Flow or not. | True , False |
jobs/start_timestamp |
integer | 10-digit Unix timestamp of Flow start. | |
jobs/finish_timestamp |
integer | 10-digit Unix timestamp of Flow end. This field is not available for RUNNING flows. |
|
jobs/runtime_sec |
number | Running time of the Flow in seconds. | |
jobs/username |
string | User that started the Flow | Instabase username |
jobs/priority |
int | Priority of the job | 0-9 |
jobs/run_summary/recordsWithMsg |
number | The number of records that failed jobs grouped by failure type. | |
jobs/run_summary/numRecords |
number | The number of records processed. | |
jobs/run_summary/numRuntimeErrors |
number | The number of execution errors. | |
jobs/run_summary/numFiles |
number | The number of files processed. | |
jobs/reviewer |
string | Assigned reviewer of job. Empty if none. | Instabase username |
jobs/review_state |
string | Current state of job in the review process | NONE , IN REVIEW , COMPLETED , NOT_COMPLETED |
jobs/flow_pipeline_infos |
list[dict] | Contains the associated pipeline id and pipeline name in each dictionary | |
jobs/flow_job_metrics |
list | Metrics for the flow job | |
jobs/flow_review_metrics |
list | Metrics for the flow review | |
next_page |
string | Paginated URL to get next page of results | A valid request url |
Examples
Request
curl $URL_BASE'/api/v1/jobs/list?limit=1&offset=1&from_timestamp=1594188122&to_timestamp=1594792922&state=COMPLETE&user=user234*tags=foo,bar'
Response
{
"jobs": [
{
"curMsg": "Completed Flow",
"state": "COMPLETE",
"tags": [
"foo"
],
"flow_path": "folder/tests/fs/exampleFlowibflowbin"
"job_id": "this execution's job id",
"input_folder": "input folder for this Flow",
"output_folder": "output folder for this Flow",
"flow_type": "single_flow",
"is_flow_v3": true,
"start_timestamp": 1594203413.959266,
"finish_timestamp": 1594204300.505522,
"runtime_sec": 886.5462560653687,
"username": "user234",
"source_path": "path to .ibflowbin file",
"run_summary": {
"recordsWithMsg": {
"Error <Unable to connect to OCR> in step process_files": 3
},
"numRecords": 113,
"numRuntimeErrors": 3
"numFiles": 5
},
"reviewer": "user456",
"review_state": "NONE",
"flow_pipeline_infos": [],
"flow_job_metrics": "any job metrics if available",
"flow_review_metrics": "any review metrics if available"
}
],
"next_page": "/api/v1/jobs/list?limit=1&offset=2&from_timestamp=1594188122&to_timestamp=1594792922&state=COMPLETE"
}
Pause Job
Method | Syntax |
---|---|
GET | URL_BASE/api/v1/jobs/pause |
Description
Pause a running Flow.
Request parameters
Parameter | Type | Description | Values |
---|---|---|---|
job_id |
string | job_id of the Flow. |
Response schema
All keys are returned in the response by default, unless marked as optional.
Key | Description | Value |
---|---|---|
status |
Status of the request. | OK, ERROR |
Resume Job
Method | Syntax |
---|---|
GET | URL_BASE/api/v1/jobs/resume |
Description
Resume a paused Flow.
Request parameters
Parameter | Type | Description | Values |
---|---|---|---|
job_id |
string | job_id of the Flow. |
Response schema
All keys are returned in the response by default, unless marked as optional.
Key | Description | Value |
---|---|---|
status |
Status of the request. | OK, ERROR |
To verify that the job has resumed, check the job status).
Cancel Job
Method | Syntax |
---|---|
GET | URL_BASE/api/v1/jobs/cancel |
Description
Cancel a running or paused Flow.
Request parameters
Parameter | Type | Description | Values |
---|---|---|---|
job_id |
string | job_id of the Flow. |
Response schema
All keys are returned in the response by default, unless marked as optional.
Key | Description | Value |
---|---|---|
status |
Status of the request. | OK, ERROR |
The HTTP call returns immediately while cancelling proceeds asynchronously in the background.
A cancelled Flow Binary cannot be resumed.
Retry Job
Method | Syntax |
---|---|
GET | URL_BASE/api/v1/jobs/retry |
Description
Retry a failed Flow job.
If you do not want the job to show up in the Flow Review dashboard anymore, make sure to mark the job as reviewed in Flow Review before retrying.
Request parameters
Parameters are required unless marked as optional.
Name | Type | Description | Values |
---|---|---|---|
job_id |
string | job_id of the flow. | |
type |
dict | Optional. Specifying a type retries files within the flow with only a specific type of failure. If type is omitted, all failed files are retried. | all, checkpoint_failure, step_failure |
type=all
: retry all failed files.type=checkpoint_failure
: resume files that paused at a checkpoint because of validation failure, and continue to execute the rest of the flow steps.type=step_failure
: rerun files that errored out at a step (for example, due to a timeout error) from the point at which the step failed. This type also re-runs any downstream steps that are dependant on the earlier failed step.
Response schema
All keys are returned in the response by default, unless marked as optional.
Key | Description | Value |
---|---|---|
status |
Status of the request. | OK, ERROR |
The HTTP call returns immediately while retrying proceeds asynchronously in the background. To verify that the job has started running, check the job status.
Pause All Running Jobs
Method | Syntax |
---|---|
GET | URL_BASE/api/v1/jobs/pause_all |
Description
Pause a group of running Flow jobs.
Request parameters
Parameter | Type | Description | Values |
---|---|---|---|
to_timestamp |
integer | Optional. 10-digit Unix timestamp. When not passed, the default is set to the current timestamp. | |
from_timestamp |
integer | Optional. 10-digit Unix timestamp. When not passed, the default is set to a week before the current timestamp. | |
user |
string | Optional. Instabase username. For admins only, includes only the Flow executions started by this user. When a user name is not passed, all Flow executions for all users are included. | |
tags |
string | Optional. Comma separated list of tags. Includes running Flow executions that were started with any of the input tags. |
A request which does not include any optional parameters pauses all running jobs from the past week.
Response schema
All keys are returned in the response by default, unless marked as optional.
Key | Description | Value |
---|---|---|
status |
Status of the request. | OK, ERROR |
Examples
Request
curl $URL_BASE'/api/v1/jobs/pause_all?&from_timestamp=1594188122&to_timestamp=1594792922&tags=foo,bar'
Response
{ "status": "OK" }
Resume All Paused Jobs
Method | Syntax |
---|---|
GET | URL_BASE/api/v1/jobs/resume_all |
Description
Resume a group of paused Flow jobs.
Request parameters
Parameter | Type | Description | Values |
---|---|---|---|
to_timestamp |
integer | Optional. 10-digit Unix timestamp. When not passed, the default is set to the current timestamp. | |
from_timestamp |
integer | Optional. 10-digit Unix timestamp. When not passed, the default is set to a week before the current timestamp. | |
user |
string | Optional. Instabase username. For admins only, includes only the Flow executions started by this user. When a user name is not passed, all Flow executions for all users are included. | |
tags |
string | Optional. Comma separated list of tags. Includes running Flow executions that were started with any of the input tags. |
A request which does not include any optional parameters resumes all paused jobs from the past week.
Response schema
All keys are returned in the response by default, unless marked as optional.
Key | Description | Value |
---|---|---|
status |
Status of the request. | OK, ERROR |
Cancel All Jobs
Cancel a group of running Flow jobs by sending a GET
request to URL_BASE/api/v1/jobs/cancel_all
:
Description
Cancel a group of running or paused Flow jobs.
When authenticated with a site admin account, all jobs matching the criteria will be cancelled. From a non-admin account, only jobs that the authenticated user has access to that match the criteria will be cancelled.
Request parameters
Parameter | Type | Description | Values |
---|---|---|---|
to_timestamp |
integer | Optional. 10-digit Unix timestamp. When not passed, the default is set to the current timestamp. | |
from_timestamp |
integer | Optional. 10-digit Unix timestamp. When not passed, the default is set to a week before the current timestamp. | |
user |
string | Optional. Instabase username. Admin permissions required. Only cancel jobs started by this user; when not passed, all jobs for all users are included. For non-admin accounts, this parameter is not used and only the authenticated user’s jobs will be cancelled. | |
tags |
string | Optional. Comma separated list of tags. Includes running Flow executions that were started with any of the input tags. |
A request which does not include any optional parameters resumes all paused jobs from the past week.
Response schema
All keys are returned in the response by default, unless marked as optional.
Key | Description | Value |
---|---|---|
status |
Status of the request. | OK, ERROR |
Update job tags
Method | Syntax |
---|---|
POST | URL_BASE/api/v1/jobs/tags |
Description
Update the tags associated with a job.
Request parameters
Parameter | Type | Description | Values |
---|---|---|---|
job_id |
string | The ID of the job you want to update tags for. | A valid job ID. |
names |
string | Optional. The new tags to apply to the job. If this parameter is not included, any existing tags for the job_id provided are deleted. |
A list of tags. |
Response schema
All keys are returned in the response by default, unless marked as optional.
Key | Type | Description | Value |
---|---|---|---|
status |
string | Whether the API call succeeded. | OK or ERROR |
updated_tags |
list | A list of strings with the new tags associated with the job. | An empty or non-empty list of the new tags for the job. |
msg |
string | Optional. Error message. Present only if status is ERROR . |
A string describing the error encountered. |
Examples
Request
url = url_base + '/api/v1/jobs/tags`
args = {
'job_id': "f403399d-7ac7-4285-bb06-f7ad82ddbea2",
'names': ["tag1","tag2","tag3"],
}
json_data = json.dumps(args)
headers = {
'Authorization': 'Bearer {0}'.format(token)
}
r = requests.post(url, data=json_data, headers=headers)
resp_data = json.loads(r.content)
Response
The response body is a JSON object. If successful:
HTTP STATUS CODE 200
# body
{
"status": "OK",
"updated_tags": [
"tag1",
"tag2",
"tag3"
]
}