Digitization support reference
These references outline supported file types, settings, and languages for document digitization.
Supported file types
These are the supported file types in Reader and in the Flow process files and fetch files steps.
OCR Required | Spatial Layout | Process Attachments | Metadata Generation | Password-protected Attachment Support | Segmentation | |
---|---|---|---|---|---|---|
Machine readable PDF | No | Yes | Fetch-files step only | Yes | Fetch-files step only | Reader app only |
Scanned PDF/Images* | Yes | Yes | Fetch-files step only | Yes | Fetch-files step only | Reader app only |
.xlsx | No | Yes | Yes | No | Process-files and fetch-files step only | Reader app only |
.docx | No | Yes | No | No | Process-files and fetch-files step only | Reader app only |
.rtf | No | No | No | No | No | Reader app only |
.pptx | No | No | No | No | No | Reader app only |
.eml | No | Yes | Yes | No | Process-files and fetch-files step only | Reader app only |
.msg | No | Yes | Yes | No | Process-files and fetch-files step only | Reader app only |
.html | No | Yes | N/A | No | N/A | Reader app only |
.mht | No | Yes | Yes | No | No | Reader app only |
.csv | No | N/A | N/A | N/A | N/A | Reader app only |
.txt | No | N/A | N/A | N/A | N/A | Reader app only |
* Images include .bmp, .gif (single frame), .ico, .jpeg, .jpg, .png, .tif and .tiff file types.
Supported settings by file type
These are the supported settings by file type in Reader and in the Flow process files step.
Write Converted Image | Write Thumbnail | Correct Orientation | Correct Resolution | Find Lines** | Find Barcodes | |
---|---|---|---|---|---|---|
Machine readable PDF | Yes | Yes | N/A | N/A | Yes | Yes |
Scanned PDF/Images* | Yes | Yes | Yes | Yes | Yes | Yes |
.xlsx | Yes | Yes | N/A | N/A | Yes | Yes |
.docx | Yes | Yes | N/A | N/A | Yes | Yes |
.pptx | No | No | N/A | N/A | No | No |
.eml | Yes | Yes | N/A | N/A | Yes | Yes |
.msg | No | No | N/A | N/A | N/A | N/A |
.html | Yes | Yes | N/A | N/A | Yes | Yes |
.mht | No | No | N/A | N/A | Yes | Yes |
.csv | No | No | N/A | N/A | N/A | N/A |
.txt | No | No | N/A | N/A | N/A | N/A |
* Images include .bmp, .gif (single frame), .ico, .jpeg, .jpg, .png, .tif and .tiff file types.
** Find-lines is supported through the force_image_ocr
setting.
Supported languages and language codes
These are the supported languages in Reader and in the Flow process files step.
Languages supported by Google Vision (Cloud)
Code | Language |
---|---|
af | Afrikaans |
sq | Albanian |
ar | Arabic |
hy | Armenian |
be | Belarusian |
bn | Bengali |
bg | Bulgarian |
ca | Catalan; Valencian |
zh | Chinese |
hr | Croatian |
cs | Czech |
da | Danish |
nl | Dutch |
en | English |
et | Estonian |
fil | Filipino |
fi | Finnish |
fr | French |
de | German |
el | Greek, Modern |
gu | Gujarati |
iw | Hebrew |
hi | Hindi |
hu | Hungarian |
is | Icelandic |
id | Indonesian |
it | Italian |
ja | Japanese |
kn | Kannada |
km | Khmer |
ko | Korean |
lo | Lao |
lv | Latvian |
lt | Lithuanian |
mk | Macedonian |
ms | Malay |
ml | Malayalam |
mr | Marathi (Marāṭhī) |
ne | Nepali |
no | Norwegian |
fa | Persian |
pl | Polish |
pt | Portuguese |
pa | Panjabi, Punjabi |
ro | Romanian |
ru | Russian |
ru-PETR1708 | Russian |
sr | Serbian |
sr-Latn | Serbian |
sk | Slovak |
sl | Slovene |
es | Spanish; Castilian |
sv | Swedish |
ta | Tamil |
te | Telugu |
th | Thai |
tr | Turkish |
uk | Ukrainian |
vi | Vietnamese |
yi | Yiddish |
Languages supported by Microsoft OCR
Code | Language |
---|---|
ar | Arabic |
zh-Hans | Chinese (Simplified) |
zh-Hant | Chinese (Traditional) |
cs | Czech |
da | Danish |
nl | Dutch |
en | English |
fi | Finnish |
fr | French |
de | German |
el | Greek, Modern |
hu | Hungarian |
it | Italian |
ja | Japanese |
ko | Korean |
pl | Polish |
pt | Portuguese |
ro | Romanian |
ru | Russian |
sr-cyrl | Serbian (Cyrillic) |
sr-latn | Serbian (Latin) |
sk | Slovak |
es | Spanish; Castilian |
sv | Swedish |
tr | Turkish |
Languages supported by Tesseract
Code | Language |
---|---|
af | Afrikaans |
am | Amharic |
ar | Arabic |
as | Assamese |
az | Azerbaijani |
be | Belarusian |
bg | Bulgarian |
bn | Bengali |
br | Breton |
bs | Bosnian |
ca | Catalan; Valencian |
cs | Czech |
cy | Welsh |
da | Danish |
de | German |
dz | Dzongkha |
el | Greek, Modern |
en | English |
eo | Esperanto |
es | Spanish; Castilian |
et | Estonian |
fi | Finnish |
fr | French |
ga | Irish |
gl | Galician |
gu | Gujarati |
he | Hebrew (modern) |
hi | Hindi |
hr | Croatian |
ht | Haitian; Haitian Creole |
hu | Hungarian |
id | Indonesian |
is | Icelandic |
it | Italian |
iu | Inuktitut |
ja | Japanese |
jv | Javanese |
ka | Georgian |
kk | Kazakh |
km | Khmer |
kn | Kannada |
ko | Korean |
ku | Kurdish |
ky | Kirghiz, Kyrgyz |
la | Latin |
lb | Luxembourgish, Letzeburgesch |
lo | Lao |
lt | Lithuanian |
lv | Latvian |
mi | Māori |
mk | Macedonian |
ml | Malayalam |
mn | Mongolian |
mr | Marathi (Marāṭhī) |
ms | Malay |
mt | Maltese |
my | Burmese |
ne | Nepali |
nl | Dutch |
no | Norwegian |
oc | Occitan |
or | Oriya |
pa | Panjabi, Punjabi |
pl | Polish |
ps | Pashto, Pushto |
pt | Portuguese |
qu | Quechua |
ru | Russian |
sa | Sanskrit (Saṁskṛta) |
sd | Sindhi |
si | Sinhala, Sinhalese |
sk | Slovak |
sl | Slovene |
sr | Serbian |
su | Sundanese |
sv | Swedish |
sw | Swahili |
ta | Tamil |
te | Telugu |
tg | Tajik |
th | Thai |
ti | Tigrinya |
tl | Tagalog |
to | Tongan |
tr | Turkish |
tt | Tatar |
ug | Uighur, Uyghur |
uk | Ukrainian |
ur | Urdu |
uz | Uzbek |
vi | Vietnamese |
yi | Yiddish |
yo | Yoruba |
zh-Hans | Chinese (Simplified) |
zh-Hant | Chinese (Traditional) |