Model API

Use the Model API to get information about a specific training, evaluation, or pruning job in a model project.

Metrics API

Method Syntax
GET api/v2/model/metrics


Use this API to get metric information for a job, including F1 scores, confusion matrices, dataset record counts, and more.

URL parameters

Parameters are required unless marked as optional.

Name Type Description
model_project_path string The path to the model project’s folder.
job_id string The specific job’s ID.
job_type string The job type. Valid job types are training, pruning, evaluation.

Request body

The request has no body.

Response status

Unless otherwise specified, a 2XX status code indicates the request was successful.

Response schema

Key Description
metrics A list of tables (dictionary) that include F1 Score, Precision, Recall, Support, ECE, and train-evaluation loss datapoints. If the job is from a classification model, a confusion matrix table is included. If hyperparameter tuning was done, a table for hyperparameter search results is included.
metrics[i]/title The title of the table.
metrics[i]/subtitle A description of the data in the table.
metrics[i]/headers A list of column names.
metrics[i]/rows A list of row entry lists.
metrics[i]/show An optional boolean for whether or not the table should be shown in a job’s metrics tab.
platform_version The Instabase platform version string, such as "23.01.0".
ibformers_version The ibformers version string, such as "2.0.1".
train_count A dictionary that maps class name to its number of annotated train records (one class for extraction models, multiple for classification models).
train_total The total number of train records used.
train_datasets A list of associated datasets that contained annotated train records.
test_count A dictionary that maps class name to its number of annotated test records (one class for extraction models, multiple for classification models).
test_total The total number of test records used.
test_datasets A list of associated datasets that contained annotated test records.
hyperparams A dictionary mapping Hyperparameter names to their values.

In order to receive complete results, you might need to retrain/reprune/reevaluate models initially ran before 23.07. Without rerunning, older jobs will trigger a best-effort retrieval, but may return incomplete results, such as missing a confusion matrix table in metrics or returning an empty platform_version value.


Example for a classification job


import json, requests, time

model_project_path = 'user1/my-repo/fs/Instabase Drive/PaystubClassifier'
job_id = 'dc991a28-bd92-47bb-ajj8-e0f2ca930e51'
job_type = 'training'
url = url_base + f'/api/v2/model/metrics?model_project_path={model_project_path}&job_id={job_id}&job_type={job_type}'

headers = {
  'Authorization': 'Bearer {0}'.format(token)

still_running = True
while still_running:
    r = requests.get(url, headers=headers)
    resp = json.loads(r.content)
    still_running = (resp['status'] == 'OK') and (resp['state'] != 'DONE')
    print('still running')

print('Request finished!')


  "metrics": [
      "title": "Record-wise Classifier Metrics",
      "subtitle": "Performance of the Classifier model measured on the Record level. Measure how accurately model is classifing each Record to the given class",
      "headers": ["Class Type", "F1 Score", "Precision", "Recall", "Support", "ECE"],
      "rows": [
        ["W2", "90.91%", "100.00%", "83.33%", 6.0, "0.051"],
        ["ADP", "88.89%", "80.00%", "100.00%", 4.0, "0.125"],
        ["macro avg", "89.90%", "90.00%", "91.67%", 10.0, "N/A"],
        ["weighted avg", "90.10%", "92.00%", "90.00%", 10.0, "N/A"]
      "title": "Chunk-wise Classifier Metrics",
      "subtitle": "Performance of the Classifier model measured on the Chunk level. Measure how accurately model is classifing each Chunk to the given class",
      "headers": ["Class Type", "F1 Score", "Precision", "Recall", "Support", "ECE"],
      "rows": [
        ["W2", "90.91%", "100.00%", "83.33%", 6.0, "0.051"],
        ["ADP", "88.89%", "80.00%", "100.00%", 4.0, "0.125"],
        ["macro avg", "89.90%", "90.00%", "91.67%", 10.0, "N/A"],
        ["weighted avg", "90.10%", "92.00%", "90.00%", 10.0, "N/A"]
      "title": "Train-Validation Curve",
      "subtitle": "",
      "headers": ["Dataset Split", "Epoch 1", "2", "3"],
      "rows": [
        ["Train", 0.54, 0.21, 0.15],
        ["Eval", 0.25, 0.31, 0.29]
      "show": false
      "title": "Confusion Matrix",
      "subtitle": "",
      "headers": ["class name", "W2", "ADP"],
      "rows": [
        ["W2", 5, 1],
        ["ADP", 0, 4]
      "show": false
      "title": "Hyperparameter search results",
      "subtitle": "Top 5 runs from hyperparameter search, sorted by the objective value on validation dataset",
      "headers": ["Trial number", "learning_rate", "num_train_epochs", "class_weights_ins_power", "gradient_accumulation_steps", "Objective value"],
      "rows": [
        ["0", "4.565751263925242e-06", "18", "0.6417572172915128", "2", "1.0000"],
        ["1", "1.7003557680081774e-05", "6", "0.5791602253545789", "2", "1.0000"],
        ["2", "6.981284268085946e-06", "13", "0.47444889629309955", "1", "1.0000"],
        ["3", "6.801683402386447e-06", "21", "0.4066896735271678", "2", "1.0000"],
        ["5", "1.8975639033091587e-05", "14", "0.4435692137254761", "2", "1.0000"]
  "platform_version": "23.05.0",
  "ibformers_version": "2.1.0",
  "train_count": {
    "W2": 10,
    "ADP": 14
  "train_total": 24,
  "train_datasets": ["Paystubs"],
  "test_count": {
    "W2": 4,
    "ADP": 3
  "test_total": 7,
  "test_datasets": ["Paystubs"],
  "hyperparams": {
    "model_name": "layoutlm-base-uncased",
    "model_source": "instabase",
    "batch_size": 4,
    "gradient_accumulation_steps": 1,
    "max_length": 512,
    "chunk_overlap": 64,
    "learning_rate": 5e-05,
    "use_mixed_precision": true,
    "loss_type": "ce_ins",
    "num_train_epochs": 5,
    "task_type": "classification",
    "npages_to_filter": 20,
    "class_weights_ins_power": 0.2,
    "do_hyperparam_optimization": true,
    "do_calibration": true,
    "class_weights_ins_power": 0.3,
    "hp_search_num_trials": 20,
    "calibration_model": "PlattScalingCalibrationModel",
    "hp_search_param_space": []

Example for an extraction job


import json, requests, time

model_project_path = 'user1/my-repo/fs/Instabase Drive/W2ExtractionModel'
job_id = 'dc991a28-bd92-47bb-ajj8-e0f2ca930e38'
job_type = 'training'
url = url_base + f'/api/v2/model/metrics?model_project_path={model_project_path}&job_id={job_id}&job_type={job_type}'

headers = {
  'Authorization': 'Bearer {0}'.format(token)

still_running = True
while still_running:
    r = requests.get(url, headers=headers)
    resp = json.loads(r.content)
    still_running = (resp['status'] == 'OK') and (resp['state'] != 'DONE')
    print('still running')

print('Request finished!')


  "metrics": [
      "title": "Individual fields level metrics",
      "subtitle": "Accuracy scores for individual fields learned by the model",
      "headers": [
          "Field Name",
          "F1 Score",
      "rows": [
              "Micro Average",
              "Macro Average",
              "Gross Pay",
              "Pay Date",
      "title": "Individual token level metrics",
      "subtitle": "Accuracy scores for individual fields learned by the model on token level",
      "headers": [
          "Field Name",
          "F1 Score",
      "rows": [
              "Micro Average",
              "Macro Average",
              "Gross Pay",
              "Pay Date",
      "title": "Train-Validation Curve",
      "subtitle": "",
      "headers": ["Dataset Split", "Epoch 1", "2", "3"],
      "rows": [
        ["Train", 0.54, 0.21, 0.15],
        ["Eval", 0.25, 0.31, 0.29]
      "show": false
      "title": "Hyperparameter search results",
      "subtitle": "Top 5 runs from hyperparameter search, sorted by the objective value on validation dataset",
      "headers": ["Trial number", "learning_rate", "num_train_epochs", "class_weights_ins_power", "gradient_accumulation_steps", "Objective value"],
      "rows": [
        ["0", "4.565751263925242e-06", "18", "0.6417572172915128", "2", "1.0000"],
        ["1", "1.7003557680081774e-05", "6", "0.5791602253545789", "2", "1.0000"],
        ["2", "6.981284268085946e-06", "13", "0.47444889629309955", "1", "1.0000"],
        ["3", "6.801683402386447e-06", "21", "0.4066896735271678", "2", "1.0000"],
        ["5", "1.8975639033091587e-05", "14", "0.4435692137254761", "2", "1.0000"]
  "platform_version": "23.05.0",
  "ibformers_version": "2.1.0",
  "train_count": {
    "W2": 10
  "train_total": 10,
  "train_datasets": ["Paystubs"],
  "test_count": {
    "W2": 4
  "test_total": 4,
  "test_datasets": ["Paystubs"],
  "hyperparams": {
    "model_name": "instalm-base-draft",
    "model_source": "instabase",
    "task_name": "TOKEN_CLASSIFICATION",
    "batch_size": 4,
    "gradient_accumulation_steps": 1,
    "max_length": 512,
    "chunk_overlap": 64,
    "learning_rate": 5e-05,
    "use_mixed_precision": true,
    "loss_type": "ce_ins",
    "num_train_epochs": 5,
    "task_type": "classification",
    "npages_to_filter": 20,
    "class_weights_ins_power": 0.2,
    "do_hyperparam_optimization": true,
    "do_calibration": true,
    "class_weights_ins_power": 0.3,
    "hp_search_num_trials": 20,
    "calibration_model": "PlattScalingCalibrationModel",
    "hp_search_param_space": []