Scan Box
The scan_box
Refiner function extracts text from a rectangle (box) in the image domain based on a label within that rectangle.
How to use Scan Box
In the image above, you can use scan_box
to extract the employer’s name and address from the rectangle by using the label 'c Employer\'s name'
. Make sure that the label specified is unique, or restrict the input text to the area surrounding your label. If the label is not unique, scan_box
finds the box based on the first occurrence of the label, similar to other scan functions.
A best practice is to use the label
argument with scan_box
. The specified label can span one line at most.
The resulting text and the found box are provenance tracked. In Refiner, you are able to view the found box for your label.
Example of basic usage
scan_box(INPUT_COL, label='c Employer\'s name')
Expected output:
c Employer's name, address, and ZIP code
The Big Company
123 Main Street
Anywhere, PA 12345
Accepted arguments
The scan_box
function accepts the same arguments as other scan
functions that allow you to craft how you scan the input text.
The following labels are unique to scan_box
:
pixel_tolerance
The pixel_tolerance
argument accepts an integer value to specify the number of pixels that words can be past the region box borders, or the region box borders can be past/before the found label.
By default, this value is set to 2.
exclude_label_line
The exclude_label_line
argument accepts a boolean value. If set to true, the line that contains your label is excluded from the output.
By default, this value is false.
An example:
scan_box(INPUT_COL, label='c Employer\'s name', exclude_label_line=true)
The line with "Employer\'s name"
is then removed from the output upon returning. The resulting output is:
The Big Company
123 Main Street
Anywhere, PA 12345
Enable line detection and OCR Config settings
For scan_box
to work, you must enable line detection "find_lines": true
in the Process Files settings. If you create a new Refiner Project with the project creation wizard, enable line detection in the Process Files step of your post_install_script.flow
. Run the post-install script on your input folder. Your Refiner program is ready to be used with scan_box
.
Enable these OCR Config settings with "find_lines"
so the resulting OCR Config looks like:
{
"produce_word_metadata": true,
"produce_metadata_list": true,
"force_image_ocr": true,
"write_converted_image": true,
"find_lines": true
}