PDF functions
get_pdf_fonts
get_pdf_fonts(ibocr)
Get PDF Fonts associated with provided input
NOTE: The flavour of the function that takes INPUT_IBOCR will be deprecated
after September 30th 2019. Please use in INPUT_IBOCR_RECORD.
Args:
ibocr (Union[IBOCRRecordDict, IBOCRRecord]): Could be either a:
- Dictionary with info about one ibocr record
- The IBOCRRecord itself
Returns:
Returns pdf fonts used across this entire document
Examples:
get_pdf_fonts(INPUT_IBOCR) -> [{'name': 'TimesNewRoman', 'type': 'Type1', 'encoding': 'PDFEncoding'}]
get_pdf_fonts(INPUT_IBOCR_RECORD) -> [{'name': 'TimesNewRoman', 'type': 'Type1', 'encoding': 'PDFEncoding'}]
get_pdf_metadata
get_pdf_metadata(ibocr, field_name)
Get PDF Metadata associated with provided input
NOTE: The flavour of the function that takes INPUT_IBOCR will be deprecated
after September 30th 2019. Please use in INPUT_IBOCR_RECORD.
Args:
ibocr (Union[IBOCRRecordDict, IBOCRRecord]): Could be either a:
- Dictionary with info about one ibocr record
- The IBOCRRecord itself
field_name (string): PDF metadata field name to retrieve. Valid field names
are: title, author, subject, keywords_str, creator, producer,
creation_timestamp, modification_timestamp, trapped_str.
Timestamps are provided in seconds since epoch. See PDDocumentInformation
for information about what each field indicates.
Returns:
Returns pdf metadata given the specified field
Examples:
get_pdf_metadata(INPUT_IBOCR, 'title') -> "title of the PDF"
get_pdf_metadata(INPUT_IBOCR_RECORD, 'title') -> "title of the PDF"