22.10 Release notes

Table of Contents

Release 22.10 is a major release that introduces new features, improvements, and bug fixes.

Release 22.10.29

Bug fixes

ML Studio

Annotations could be lost when you moved or exported them due to timestamp mismatches in a now-obsolete backwards compatibility check.

Release 22.10.22

Bug fixes

ML Studio

Erroneous dataset modification warnings appeared if data was stored on NFS drives or if users were editing different datasets in the same model.

Refiner

A refiner with more than one dropdown output field incorrectly contained the same value in all dropdown output fields.
When running flows with an extraction model step, an incorrect FieldExtraction error occurred.

Release 22.10.0

New features and enhancements

Platform

A new Diagnostics app (beta) lets you test basic file-system functionality and measure the performance of your file storage system. The Diagnostics app is admin-only and in beta release.
You can create and manage custom HTTP endpoints using a series of new APIs. Custom endpoints follow the format URL_BASE/api/v1/http-endpoint/u/<custom-endpoint>. Any incoming request to the endpoint runs a user-defined handler function and returns the HTTP response.

Custom HTTP endpoints are useful for integrating Instabase with external systems. For example, you can define a custom HTTP endpoint to implement a webhook handler to integrate with an upstream system like Blend, or to return the output response to downstream systems in a specific format.

See the custom HTTP endpoint APIs documentation to learn more.
This release adds support for client-side encryption and decryption of large files (greater than 1 GB), such as models. Previously, the maximum file size supported for client-side encryption was 100 MB.
Instabase now provides log aggregation tooling for collecting and storing logs of Instabase services. You can aggregate logs across Instabase services and store them for purposes including querying, debugging, activity analysis, and alerting. See the log aggregation tooling documentation to learn more.
This release includes several improvements to distributed tracing setup and coverage. Key features include:
- Tracing coverage has been added across services and at different layers, including traces for HTTP calls, RPC calls, and job creation.
- The tracing backend now initializes Elasticsearch on startup to store trace data. This is run through an init container for Jaeger deployment.
- A new Jaeger datasource in Grafana lets you query traces.
You can now monitor storage usage across pods, including persistent storage mounted to containers, using the following new metrics: instabase_container_mount_path_free_bytes, instabase_container_mount_path_size_bytes, and instabase_container_mount_path_used_bytes.
You can add an expiration date when creating an access token in OAuth management.

Services

The model service now supports Open Neural Network Exchange (ONNX) Runtime with entity detection models. This update brings model inference optimizations, including reduced inference time and memory usage. All new versions of checkbox and signature detection models shipped with 22.10 support ONNX Runtime.

Note: To use ONNX Runtime with model-service, set the environment variable ENABLE_ONNX_MODEL_INFERENCE to true.
You can view storage-level performance statistics for Azure Blob storage in the Grafana file service dashboard. This improves file-service observability and can help you understand if slow file-service performance is a result of external storage performance. To view the statistics in Grafana, select the storage-level-stats row of the file service dashboard.
The Grafana model service dashboard now displays model-specific statistics. This addition provides visibility into how individual models are performing and makes it possible to discover and troubleshoot model-specific problems. Previously, only statistics for model-service as a whole were available.

Flow

This release introduces the run extraction model step to Flow v3. This step runs model inference from a trained ML Studio model to extract relevant output fields from the input records.

The run extraction model step enables you to chain multiple extraction steps and run multiple models on the same set of documents in a single flow. Both text and table extraction are supported.

To use the run extraction model step in your flows, train an extraction model in ML Studio and import the resulting extraction module into Flow so that you can attach it to this step. We recommend that you use the run extraction model step in Flow instead of running models with Refiner.
Flow v3 supports adding Pre flow UDF and Post flow UDF events, letting you run user-defined functions at the start or stop of a flow.
The apply refiner and apply classifier steps now adjust their timeouts based on page count, removing the need to set custom timeouts when working with large documents.
You can create if/then rules in an apply checkpoint step’s validation module now that the Validations app supports branching logic.
The flow dashboard now displays both the number of files and the number of records in the job status column.
Flow review has several usability and design improvements, including:
- You can click and drag to reorder pages within a record in the records grid tab.
- You can click and drag or use arrow keys to move pages from one record to another, in any sequence or order.
- You can delete pages from a record while on the records grid tab.
- An updated table editor provides an improved user experience, including support for multi-row and multi-column selection and inserting rows at a specific location in the table.
When importing a module into a flow from ML Studio, you’ll see a warning if the flow already contains the module. This avoids the risk of overwriting a module with the same name in the selected flow.
You can view a plaintext version of table annotations in ML Studio, helping you see how words are mapped to cells.
Clear, actionable error messages display in ML Studio when:
- An input directory cannot be found.
- An image fails to load.
- You add a field that already exists in the schema.

ML Studio

ML Studio can now automatically select a randomized set of records in a dataset to be test records. By default, 30% of the dataset’s records are randomly selected, though you can customize the percentage when training a model. To enable automatic test record selection, turn on the Automatically assign test records toggle when creating a dataset. You can also enable the setting for existing datasets on the dataset’s Info settings (Settings > Info > Edit > Automatically assign test records).
This release includes several usability and design improvements when working with table entities and annotations in ML Studio, including:
- You can rearrange table segments within and between fields.
- Table annotation suggestions display only when contextually relevant.
- Table entities display only when contextually relevant.
You can add header labels to columns and rows when annotating a table. If a table extraction model is trained with table annotations that contain labels, the model then outputs tables with predicted row and column labels.

Reader

Reader has several optimizations to improve runtime:
- Support for chunking non-PDF files (without attachments) for parallel processing in the Process Files step. Previously, this was only supported for PDF files. To enable this functionality, set the ENABLE_PROCESS_FILES_SPLIT_FOR_NON_PDF_FILES environment variable to true.
- Entity model requests are triggered as each page completes OCR, mproved Entity runner execution by triggering Entity Model requests as each page completes OCR and adding parallelism as well.

Refiner

Import logging support has been added to Refiner. To log values and view them in the Refiner log panel, add import logging to a user-defined function (UDF) and use any native Python logging function, such as logging.info or logging.warning. This also means you no longer have to use the kwargs[LOGGER] approach to view logs in Refiner.
Refiner now supports showing text page-by-page. Previously, Refiner displayed all text on a single page, even if the selected record had multiple pages.
When you reference a Dev Exchange package in a UDF script file, a version of that package is now pinned to that script. When a Refiner linked to the UDF script executes, the script references the pinned version of the package, even if a newer version of the package has been published.

Note: The package version is pinned by adding a new requirement in the ib_requirements.txt file in the scripts directory.
The following Refiner functions now have provenance-tracked versions:
- scan_right_repeated and scan_line_repeated.
- assert_true, assert_not_empty, and assert_not_blurry.
- map_create, map_keys, map_delete, map_copy, map_update, map_values.
Refiner has three new functions focused on tables:
- table-list-get: Gets a specific table from a list of tables.
- merge-tables: Concatenates multiple tables into one table.
- table-get-range: Slices a table using a specified row and column range.
You’ll see speed improvements when running Refiner functions, particularly with long documents. For example, the run time for the scan_line function on a 100 page document is reduced from three minutes to three seconds.

Scheduler

Scheduler has an updated user interface and several usability and design improvements:
- You can view the time at which a scheduled job will run next.
- You can click the new Trigger button to immediately run a scheduled job.

Test Runner

The Test Runner app has an updated user interface and several usability and design improvements:
- From the test library tab, you can view, manage, and run all test suites.
- You can select multiple test suites to run at the same time.
- From the execution history tab you can view test run results, statistics, and errors.

Bug fixes

Platform

Previously, downloading large folders as a ZIP file could result in excessive memory usage and a corrupted file. This has been fixed.
You can no longer create a folder with an empty, spaces-only name in the Create a new folder dialog. You’ll also see an error alert if trying to create a folder with an empty name.
This release resolves a bug that prevented users from resetting their passwords on initial login in environments with email verification disabled.

Services

If an error occurs during asynchronous file-service operations, the length of the error message can no longer prevent the job status from updating in the database. Previously, the job status update could fail if the error message was too long.
This release fixes a bug in release 22.08 that affected model-service stability. Failure to load models will no longer make model-service unresponsive.

Flow

This release fixes issues with the redactor step in Flow v3 where a configuration might not be applied correctly to all redactor steps. This also fixes an issue where if you ran a flow containing redactor steps from a compiled binary, you might get an error saying the configuration file could not be found.
If a document failed to process, such as if you got a memory exception, the document might appear to be missing in your flow results. This has been fixed.
Previously, if empty pages were passed into an apply classifier step, the output for the empty pages had invalid metadata that caused downstream model inference to fail. This issue is fixed in this release.

Now, the apply classifier step in v3 flows always skips empty pages. In v2 flows, the apply classifier step has an option to skip empty pages, which defaults to True. In v1 flows, the apply classifier step always runs with this bug. If you are still using v1 flows, migrate them to a current version of Flow.
When resuming a flow after a checkpoint in Flow v2, the .ibflowresults file is no longer missing files.

ML Studio

If you click on a table segment in ML Studio, it now automatically scrolls to the table annotation.
Accepting annotation suggestions on a record that has been split or merged after being trained no longer results in faulty annotations.
The table annotation suggestion button no longer appears when annotating text fields.
Clicking and dragging over a table no longer adds or changes a table annotation.
When annotating one field over multiple pages, ML Studio no longer scrolls to the location of the previous annotation when adding a new annotation.

Refiner

When changing the Refiner input folder to a new folder with the same name, Refiner no longer fails to load read-only fields in the new folder.
You can now use the merge_tables function with only one table as the input without encountering an error.
You can now concatenate tables with merged cells without encountering an error.
The scan_right_repeated function now returns all matches when multiple labels are on the same line.
Images now load properly in Refiner when files are processed through Flow with write_converted_image set to False.

Deployment guide

If you are upgrading to release 22.10 from any release prior to 22.08, before upgrading you must delete the service-model-service and service-core-platform-service services. As part of recent work to improve load balancing across pods, both model-service and core-platform-service run in headless mode in releases 22.08 through 22.10.
- To delete service-model-service, run the following command: kubectl delete service service-model-service -n [your namespace]
- To delete service-core-platform-service, run the following command: kubectl delete service service-core-platform-service -n [your namespace]
After upgrading to release 22.10, you must complete the following steps:
1. From the Instabase desktop, open All apps > Admin > Configuration.
2. Under Service setup, verify if the Database tables are set up message displays.
3. If you don’t see the message, click Set up. If successful, you see the Database tables are set up message.

Deprecations

The PDF service is deprecated in release 22.10. By default, all pdf-service PDF processing operations are instead handled by alternate libraries serviced by celery-app-tasks. This is a breaking change. We recommend verifying the performance of your solutions against release 22.10 before upgrading.

To verify performance, upgrade your testing or staging environment to 22.10 and validate all solutions in the environment. You can use the Comparison app to validate solution accuracy by comparing actual results against expected field values.

Based on the results of your performance testing, you may need to create a solution migration plan and modify your solutions before upgrading your production environment to 22.10. For some solutions, small adjustments may be needed for post processing. In other cases, re-annotating the dataset and retraining the model may be required.

Note: A solution-level override flag is available to continue using pdf-service for specific solutions. This override is available only until the 23.01 release and exists to support solution migration plans that require additional time for testing and validation. To continue using pdf-service, set the ENABLE_APP_TASKS_PDF_UTILS and ENABLE_APP_TASKS_IMAGE_AND_TEXT_UTILS environment variables to false during migration.
The ib_intelligence package is deprecated and no longer needs to be installed during deployment to run models on Instabase. Previously, ib_intelligence was available as a downloadable Marketplace package; it’s now included in the deployment by default. As the ib_intelligence package is deprecated, it should not be used in UDFs. Existing models will continue to run without the package.
(Upcoming) The Instabase filesystem v1 APIs are scheduled to be deprecated in release 23.10, at the earliest. This follows the Instabase filesystem v2 APIs made available with release 22.08. We recommend beginning to transition your Instabase integrations to the newer filesystem v2 API endpoints.