Model API
Use the Model API to get information about a specific training, evaluation, or pruning job in a model project.
Metrics API
Method | Syntax |
---|---|
GET | api/v2/model/metrics |
Description
Use this API to get metric information for a job, including F1 scores, confusion matrices, dataset record counts, and more.
URL parameters
Parameters are required unless marked as optional.
Name | Type | Description |
---|---|---|
model_project_path |
string | The path to the model project’s folder. |
job_id |
string | The specific job’s ID. |
job_type |
string | The job type. Valid job types are training , pruning , evaluation . |
Request body
The request has no body.
Response status
Unless otherwise specified, a 2XX status code indicates the request was successful.
Response schema
Key | Description |
---|---|
metrics |
A list of tables (dictionary) that include F1 Score, Precision, Recall, Support, ECE, and train-evaluation loss datapoints. If the job is from a classification model, a confusion matrix table is included. If hyperparameter tuning was done, a table for hyperparameter search results is included. |
metrics[i]/title |
The title of the table. |
metrics[i]/subtitle |
A description of the data in the table. |
metrics[i]/headers |
A list of column names. |
metrics[i]/rows |
A list of row entry lists. |
metrics[i]/show |
An optional boolean for whether or not the table is shown in a job’s metrics tab. |
platform_version |
The Instabase platform version string, such as "23.01.0" . |
ibformers_version |
The ibformers version string, such as "2.0.1" . |
train_count |
A dictionary that maps class name to its number of annotated train records (one class for extraction models, multiple for classification models). |
train_total |
The total number of train records used. |
train_datasets |
A list of associated datasets that contained annotated train records. |
test_count |
A dictionary that maps class name to its number of annotated test records (one class for extraction models, multiple for classification models). |
test_total |
The total number of test records used. |
test_datasets |
A list of associated datasets that contained annotated test records. |
hyperparams |
A dictionary mapping Hyperparameter names to their values. |
In order to receive complete results, you might need to retrain/reprune/reevaluate models initially ran before 23.07. Without rerunning, older jobs will trigger a best-effort retrieval, but might return incomplete results, such as missing a confusion matrix
table in metrics
or returning an empty platform_version
value.
Examples
Example for a classification job
Request
import json, requests, time
model_project_path = 'user1/my-repo/fs/Instabase Drive/PaystubClassifier'
job_id = 'dc991a28-bd92-47bb-ajj8-e0f2ca930e51'
job_type = 'training'
url = url_base + f'/api/v2/model/metrics?model_project_path={model_project_path}&job_id={job_id}&job_type={job_type}'
headers = {
'Authorization': 'Bearer {0}'.format(token)
}
still_running = True
while still_running:
r = requests.get(url, headers=headers)
resp = json.loads(r.content)
still_running = (resp['status'] == 'OK') and (resp['state'] != 'DONE')
print('still running')
time.sleep(1)
print('Request finished!')
print(resp)
Response
{
"metrics": [
{
"title": "Record-wise Classifier Metrics",
"subtitle": "Performance of the Classifier model measured on the Record level. Measure how accurately model is classifing each Record to the given class",
"headers": ["Class Type", "F1 Score", "Precision", "Recall", "Support", "ECE"],
"rows": [
["W2", "90.91%", "100.00%", "83.33%", 6.0, "0.051"],
["ADP", "88.89%", "80.00%", "100.00%", 4.0, "0.125"],
["macro avg", "89.90%", "90.00%", "91.67%", 10.0, "N/A"],
["weighted avg", "90.10%", "92.00%", "90.00%", 10.0, "N/A"]
]
},
{
"title": "Chunk-wise Classifier Metrics",
"subtitle": "Performance of the Classifier model measured on the Chunk level. Measure how accurately model is classifing each Chunk to the given class",
"headers": ["Class Type", "F1 Score", "Precision", "Recall", "Support", "ECE"],
"rows": [
["W2", "90.91%", "100.00%", "83.33%", 6.0, "0.051"],
["ADP", "88.89%", "80.00%", "100.00%", 4.0, "0.125"],
["macro avg", "89.90%", "90.00%", "91.67%", 10.0, "N/A"],
["weighted avg", "90.10%", "92.00%", "90.00%", 10.0, "N/A"]
]
},
{
"title": "Train-Validation Curve",
"subtitle": "",
"headers": ["Dataset Split", "Epoch 1", "2", "3"],
"rows": [
["Train", 0.54, 0.21, 0.15],
["Eval", 0.25, 0.31, 0.29]
],
"show": false
},
{
"title": "Confusion Matrix",
"subtitle": "",
"headers": ["class name", "W2", "ADP"],
"rows": [
["W2", 5, 1],
["ADP", 0, 4]
],
"show": false
},
{
"title": "Hyperparameter search results",
"subtitle": "Top 5 runs from hyperparameter search, sorted by the objective value on validation dataset",
"headers": ["Trial number", "learning_rate", "num_train_epochs", "class_weights_ins_power", "gradient_accumulation_steps", "Objective value"],
"rows": [
["0", "4.565751263925242e-06", "18", "0.6417572172915128", "2", "1.0000"],
["1", "1.7003557680081774e-05", "6", "0.5791602253545789", "2", "1.0000"],
["2", "6.981284268085946e-06", "13", "0.47444889629309955", "1", "1.0000"],
["3", "6.801683402386447e-06", "21", "0.4066896735271678", "2", "1.0000"],
["5", "1.8975639033091587e-05", "14", "0.4435692137254761", "2", "1.0000"]
]
}
],
"platform_version": "23.05.0",
"ibformers_version": "2.1.0",
"train_count": {
"W2": 10,
"ADP": 14
},
"train_total": 24,
"train_datasets": ["Paystubs"],
"test_count": {
"W2": 4,
"ADP": 3
},
"test_total": 7,
"test_datasets": ["Paystubs"],
"hyperparams": {
"model_name": "layoutlm-base-uncased",
"model_source": "instabase",
"task_name": "SEQUENCE_CLASSIFICATION",
"batch_size": 4,
"gradient_accumulation_steps": 1,
"max_length": 512,
"chunk_overlap": 64,
"learning_rate": 5e-05,
"lr_scheduler_type":
"constant_with_warmup",
"use_mixed_precision": true,
"loss_type": "ce_ins",
"num_train_epochs": 5,
"task_type": "classification",
"npages_to_filter": 20,
"class_weights_ins_power": 0.2,
"do_hyperparam_optimization": true,
"do_calibration": true,
"class_weights_ins_power": 0.3,
"hp_search_num_trials": 20,
"calibration_model": "PlattScalingCalibrationModel",
"hp_search_param_space": []
}
}
Example for an extraction job
Request
import json, requests, time
model_project_path = 'user1/my-repo/fs/Instabase Drive/W2ExtractionModel'
job_id = 'dc991a28-bd92-47bb-ajj8-e0f2ca930e38'
job_type = 'training'
url = url_base + f'/api/v2/model/metrics?model_project_path={model_project_path}&job_id={job_id}&job_type={job_type}'
headers = {
'Authorization': 'Bearer {0}'.format(token)
}
still_running = True
while still_running:
r = requests.get(url, headers=headers)
resp = json.loads(r.content)
still_running = (resp['status'] == 'OK') and (resp['state'] != 'DONE')
print('still running')
time.sleep(1)
print('Request finished!')
print(resp)
Response
{
"metrics": [
{
"title": "Individual fields level metrics",
"subtitle": "Accuracy scores for individual fields learned by the model",
"headers": [
"Field Name",
"Precision",
"Recall",
"F1 Score",
"Support",
"ECE"
],
"rows": [
[
"Micro Average",
"83.33%",
"83.33%",
"83.33%",
12,
"N/A"
],
[
"Macro Average",
"83.33%",
"83.33%",
"83.33%",
"N/A",
"N/A"
],
[
"Gross Pay",
"83.33%",
"83.33%",
"83.33%",
6,
"0.166"
],
[
"Pay Date",
"83.33%",
"83.33%",
"83.33%",
6,
"0.223"
]
]
},
{
"title": "Individual token level metrics",
"subtitle": "Accuracy scores for individual fields learned by the model on token level",
"headers": [
"Field Name",
"Precision",
"Recall",
"F1 Score",
"Support"
],
"rows": [
[
"Micro Average",
"100.00%",
"92.59%",
"96.15%",
27
],
[
"Macro Average",
"100.00%",
"90.36%",
"94.87%",
"N/A"
],
[
"Gross Pay",
"100.00%",
"95.00%",
"97.44%",
20
],
[
"Pay Date",
"100.00%",
"85.71%",
"92.31%",
7
]
]
},
{
"title": "Train-Validation Curve",
"subtitle": "",
"headers": ["Dataset Split", "Epoch 1", "2", "3"],
"rows": [
["Train", 0.54, 0.21, 0.15],
["Eval", 0.25, 0.31, 0.29]
],
"show": false
}
{
"title": "Hyperparameter search results",
"subtitle": "Top 5 runs from hyperparameter search, sorted by the objective value on validation dataset",
"headers": ["Trial number", "learning_rate", "num_train_epochs", "class_weights_ins_power", "gradient_accumulation_steps", "Objective value"],
"rows": [
["0", "4.565751263925242e-06", "18", "0.6417572172915128", "2", "1.0000"],
["1", "1.7003557680081774e-05", "6", "0.5791602253545789", "2", "1.0000"],
["2", "6.981284268085946e-06", "13", "0.47444889629309955", "1", "1.0000"],
["3", "6.801683402386447e-06", "21", "0.4066896735271678", "2", "1.0000"],
["5", "1.8975639033091587e-05", "14", "0.4435692137254761", "2", "1.0000"]
]
}
],
"platform_version": "23.05.0",
"ibformers_version": "2.1.0",
"train_count": {
"W2": 10
},
"train_total": 10,
"train_datasets": ["Paystubs"],
"test_count": {
"W2": 4
},
"test_total": 4,
"test_datasets": ["Paystubs"],
"hyperparams": {
"model_name": "instalm-base-draft",
"model_source": "instabase",
"task_name": "TOKEN_CLASSIFICATION",
"batch_size": 4,
"gradient_accumulation_steps": 1,
"max_length": 512,
"chunk_overlap": 64,
"learning_rate": 5e-05,
"lr_scheduler_type":
"constant_with_warmup",
"use_mixed_precision": true,
"loss_type": "ce_ins",
"num_train_epochs": 5,
"task_type": "classification",
"npages_to_filter": 20,
"class_weights_ins_power": 0.2,
"do_hyperparam_optimization": true,
"do_calibration": true,
"class_weights_ins_power": 0.3,
"hp_search_num_trials": 20,
"calibration_model": "PlattScalingCalibrationModel",
"hp_search_param_space": []
}
}