Edit

Share via


Document processing models

This content applies to: checkmark v4.0 (GA) | Previous versions: blue-checkmark v3.1 (GA) blue-checkmark v3.0 (GA) blue-checkmark v2.1 (GA)

This content applies to: checkmark v3.1 (GA) | Latest version: purple-checkmark v4.0 (GA) | Previous versions: blue-checkmark v3.0 blue-checkmark v2.1

This content applies to: checkmark v3.0 (GA) | Latest versions: purple-checkmark v4.0 (GA) purple-checkmark v3.1 | Previous version: blue-checkmark v2.1

This content applies to: checkmark v2.1 | Latest version: blue-checkmark v4.0 (GA)

Azure AI Document Intelligence supports various models that you can use to add intelligent document processing to your apps and flows. You can use a prebuilt domain-specific model or train a custom model tailored to your specific business needs and use cases. You can use Document Intelligence with the REST API or Python, C#, Java, and JavaScript client libraries.

Note

Document processing projects that involve financial data, protected health data, personal data, or highly sensitive data require careful attention. Be sure to comply with all national/regional and industry-specific requirements.

Model overview

The following table shows the generally available (GA) models for each stable API.

Model type Model 2024-11-30 (GA) 2023-07-31 (GA) 2022-08-31 (GA) v2.1 (GA)
Document analysis models Read ✔️ ✔️ ✔️ Not available
Document analysis models Layout ✔️ ✔️ ✔️ ✔️
Document analysis models General document** Supported in
layout model
✔️ ✔️ Not available
Prebuilt models Bank check ✔️ Not available Not available Not available
Prebuilt models Bank statement ✔️ Not available Not available Not available
Prebuilt models payStub ✔️ Not available Not available Not available
Prebuilt models Contract ✔️ ✔️ Not available Not available
Prebuilt models Health insurance card ✔️ ✔️ ✔️ Not available
Prebuilt models ID document ✔️ ✔️ ✔️ ✔️
Prebuilt models Invoice ✔️ ✔️ ✔️ ✔️
Prebuilt models Receipt ✔️ ✔️ ✔️ ✔️
Prebuilt models US unified tax* ✔️ Not available Not available Not available
Prebuilt models US 1040 tax* ✔️ ✔️ Not available Not available
Prebuilt models US 1095 tax* ✔️ Not available Not available Not available
Prebuilt models US 1098 tax* ✔️ Not available Not available Not available
Prebuilt models US 1099 tax* ✔️ Not available Not available Not available
Prebuilt models US W2 tax ✔️ ✔️ ✔️ Not available
Prebuilt models US W4 tax ✔️ Not available Not available Not available
Prebuilt models US mortgage 1003 URLA ✔️ Not available Not available Not available
Prebuilt models US mortgage 1004 URAR ✔️ Not available Not available Not available
Prebuilt models US mortgage 1005 ✔️ Not available Not available Not available
Prebuilt models US mortgage 1008 summary ✔️ Not available Not available Not available
Prebuilt models US mortgage closing disclosure ✔️ Not available Not available Not available
Prebuilt models Marriage certificate ✔️ Not available Not available Not available
Prebuilt models Credit card ✔️ Not available Not available Not available
Prebuilt models Business card deprecated ✔️ ✔️ ✔️
Custom classification model Custom classifier ✔️ ✔️ Not available Not available
Custom extraction model Custom neural ✔️ ✔️ ✔️ Not available
Custom extraction model Custom template ✔️ ✔️ ✔️ ✔️
Custom extraction model Custom composed ✔️ ✔️ ✔️ ✔️
All models Add-on capabilities ✔️ ✔️ Not available Not available

* Contains submodels. See the model-specific information for supported variations and subtypes.
** All the capabilities for the general document model are available in the layout model. The general model is no longer supported.

Latency

Latency is the amount of time it takes for an API server to handle and process an incoming request and deliver the outgoing response to the client. The time to analyze a document depends on the size (for example, number of pages) and associated content on each page. Document Intelligence is a multitenant asynchronous service where latency for similar documents is comparable but not always identical. Occasional variability in latency and performance is inherent in any microservice-based, stateless service that processes images and large documents at scale. Although we're continuously scaling up the hardware and capacity and scaling capabilities, you might still have latency issues at runtime.

Add-on capability

The following add-on capabilities are available for Document Intelligence. For all models except the business card model, Document Intelligence now supports add-on capabilities to allow for more sophisticated analysis. You can enable and disable these optional capabilities depending on the scenario of the document extraction. The following add-on capabilities are available for the 2023-07-31 (GA) and later API version:

Add-on capability Add-on/Free 2024-11-30 (GA) 2023-07-31 (GA) 2022-08-31 (GA) v2.1 (GA)
Font property extraction Add-on ✔️ ✔️ Not available Not available
Formula extraction Add-on ✔️ ✔️ Not available Not available
High-resolution extraction Add-on ✔️ ✔️ Not available Not available
Barcode extraction Free ✔️ ✔️ Not available Not available
Language detection Free ✔️ ✔️ Not available Not available
Key/value pairs Free ✔️ Not available Not available Not available
Query fields Add-on* ✔️ Not available Not available Not available
Searchable PDF Add-on* ✔️ Not available Not available Not available

Model analysis features

Model ID Content extraction Query fields Paragraphs Paragraph roles Selection marks Tables Key/value pairs Languages Barcodes Document analysis Formulas* Style font* High resolution* Searchable PDF
prebuilt-read O O O O O O
prebuilt-layout O O O O O O
prebuilt-contract O O O O
prebuilt-healthInsuranceCard.us O O O O O
prebuilt-idDocument O O O O O
prebuilt-invoice O O O O O O
prebuilt-receipt O O O O O
prebuilt-marriageCertificate.us O O O O O
prebuilt-creditCard O O O O O
prebuilt-check.us O O O O O
prebuilt-payStub.us O O O O O
prebuilt-bankStatement O O O O O
prebuilt-mortgage.us.1003 O O O O O
prebuilt-mortgage.us.1004 O O O O O
prebuilt-mortgage.us.1005 O O O O O
prebuilt-mortgage.us.1008 O O O O O
prebuilt-mortgage.us.closingDisclosure O O O O O
prebuilt-tax.us O O O O O
prebuilt-tax.us.w2 O O O O O
prebuilt-tax.us.w4 O O O O O
prebuilt-tax.us.1040 (various) O O O O O
prebuilt-tax.us.1095A O O O O O
prebuilt-tax.us.1095C O O O O O
prebuilt-tax.us.1098 O O O O O
prebuilt-tax.us.1098E O O O O O
prebuilt-tax.us.1098T O O O O O
prebuilt-tax.us.1099 (various) O O O O O
prebuilt-tax.us.1099SSA O O O O O
{ customModelName } O O O O O

✓ - Enabled
O - Optional
* - Premium features incur extra costs

Query fields are priced differently from the other add-on features. For more information, see Pricing.

Bounding box and polygon coordinates

A bounding box (polygon in v3.0 and later versions) is an abstract rectangle that surrounds text elements in a document. A bounding box is used as a reference point for object detection:

  • The bounding box specifies position by using an x and y coordinate plane presented in an array of four numerical pairs. Each pair represents a corner of the box in the following order: upper left, upper right, lower right, lower left.
  • Image coordinates are presented in pixels. For a PDF, coordinates are presented in inches.

Language support

The universal models in Document Intelligence that are based on deep learning support many languages. The models can extract multilingual text from your images and documents, including text lines with mixed languages. Language support varies by Document Intelligence service functionality. For a complete list, see the following articles:

Regional availability

Document Intelligence is generally available in many of the 60+ Azure global infrastructure regions.

To help choose the region that's best for you and your customers, see Azure geographies.

Model details

This section describes the output that you can expect from each model. You can extend the output of most models with add-on features.

Read OCR

The Read API uses optical character recognition (OCR) to analyze and extract lines and words, their locations, detected languages, and handwriting style, if detected.

This sample document was processed by using Document Intelligence Studio.

Screenshot that shows a sample document processed by using Document Intelligence Studio Read.

Layout analysis

The layout analysis model analyzes and extracts text, tables, selection marks, and other structure elements like titles, section headings, page headers, and page footers.

This sample document was processed by using Document Intelligence Studio.

Screenshot that shows a sample newspaper page processed by using Document Intelligence Studio.

Health insurance card

The health insurance card model combines powerful OCR capabilities with deep learning models to analyze and extract key information from US health insurance cards.

This sample US health insurance card was processed by using Document Intelligence Studio.

Screenshot that shows a sample US health insurance card analysis in Document Intelligence Studio.

US tax documents

The US tax document models analyze and extract key fields and line items from a select group of tax documents. The API supports the analysis of English-language US tax documents of various formats and quality, including phone-captured images, scanned documents, and digital PDFs. The following models are currently supported:

Model Description Model ID
US tax W-2 Extract taxable compensation details. prebuilt-tax.us.w2
US tax W-4 Extract taxable compensation details. prebuilt-tax.us.w4
US tax 1040 Extract mortgage interest details. prebuilt-tax.us.1040 (variations)
US tax 1095 Extract health insurance details. prebuilt-tax.us.1095 (variations)
US tax 1098 Extract mortgage interest details. prebuilt-tax.us.1098 (variations)
US tax 1099 Extract income received from sources other than employer. prebuilt-tax.us.1099 (variations)

This sample W-2 document was processed by using Document Intelligence Studio.

Screenshot that shows a sample W-2 document.

US mortgage documents

The US mortgage document models analyze and extract key fields that include borrower, loan, and property information from a select group of mortgage documents. The API supports the analysis of English-language US mortgage documents of various formats and quality, including phone-captured images, scanned documents, and digital PDFs. The following models are currently supported.

Model Description Model ID
1003 End-User License Agreement Extract loan, borrower, property details. prebuilt-mortgage.us.1003
1004 Uniform Residential Appraisal Report (URAR) Extract loan, borrower, property details. prebuilt-mortgage.us.1004
1005 Verification of employment Extract loan, borrower, property details. prebuilt-mortgage.us.1005
1008 Summary document Extract borrower, seller, property, mortgage, and underwriting details. prebuilt-mortgage.us.1008
Closing Disclosure Extract closing, transaction costs, and loan details. prebuilt-mortgage.us.closingDisclosure

This sample Closing Disclosure document was processed by using Document Intelligence Studio.

Screenshot that shows a sample closing disclosure.

Contract

The contract model analyzes and extracts key fields and line items from contractual agreements, including parties, jurisdictions, contract ID, and title. The model currently supports English-language contract documents.

This sample contract was processed by using Document Intelligence Studio.

Screenshot that shows contract model extraction using Document Intelligence Studio.

US bank check

The contract model analyzes and extracts key fields from US bank checks, including check details, account details, amount, and memo.

This bank check sample was processed by using Document Intelligence Studio.

Screenshot that shows bank check model extraction by using Document Intelligence Studio.

US bank statement

The bank statement model analyzes and extracts key fields and line items from US bank statements account number, bank details, statement details, and transaction details.

This sample bank statement was processed by using Document Intelligence Studio.

Screenshot that shows bank statement model extraction by using Document Intelligence Studio.

payStub

The payStub model analyzes and extracts key fields and line items from documents and files with payroll-related information.

This sample pay stub was processed by using Document Intelligence Studio.

Screenshot that shows payStub model extraction by using Document Intelligence Studio.

Invoice

The invoice model automates the processing of invoices to extract the customer name, billing address, due date, amount due, line items, and other key data.

This sample invoice was processed by using Document Intelligence Studio.

Screenshot that shows a sample invoice.

Receipt

Use the receipt model to scan sales receipts for the merchant name, dates, line items, quantities, and totals from printed and handwritten receipts. Version v3.0 also supports single-page hotel receipt processing.

This sample receipt was processed by using Document Intelligence Studio.

Screenshot that shows a sample receipt.

Identity document

Use the identity document (ID) model to process US driver's licenses (all 50 states and District of Columbia) and biographical pages from international passports (excluding visa and other travel documents) to extract key fields.

This sample US driver's license was processed by using Document Intelligence Studio.

Screenshot that shows a sample identification card.

Marriage certificate

Use the marriage certificate model to process US marriage certificates to extract key fields, including the individuals, date, and location.

This sample US marriage certificate was processed by using Document Intelligence Studio.

Screenshot that shows a sample marriage certificate.

Credit card

Use the credit card model to process credit and debit cards to extract key fields.

This sample credit card was processed by using Document Intelligence Studio.

Screenshot that shows a sample credit card.

Custom models

Custom models are broadly classified into two types. Custom classification models that support classification of a "document type" and custom extraction models that can extract a defined schema from a specific document type.

Diagram that shows types of custom models and associated model build modes.

Custom document models analyze and extract data from forms and documents specific to your business. They recognize form fields within your distinct content and extract key/value pairs and table data. You need only one example of the form type to get started.

Version v3.0 and later custom models support signature detection in custom template (form) and cross-page tables in both template and neural models. Signature detection looks for the presence of a signature, not the identity of the person who signs the document. If the model returns unsigned for signature detection, the model didn't find a signature in the defined field.

This sample custom template was processed by using Document Intelligence Studio.

Screenshot that shows Document Intelligence analyzing a custom form.

Custom extraction

The custom extraction model comes in two types: custom template and custom neural. To create a custom extraction model, label a dataset of documents with the values you want extracted and train the model on the labeled dataset. You need only five examples of the same form or document type to get started.

This sample custom extraction was processed by using Document Intelligence Studio.

Screenshot that shows custom extraction model analysis in Document Intelligence Studio.

Custom classifier

With the custom classification model, you can identify the document type before you invoke the extraction model. The classification model is available starting with the 2023-07-31 (GA) API. Training a custom classification model requires at least two distinct classes and a minimum of five samples per class.

Composed models

A composed model is created by taking a collection of custom models and assigning them to a single model built from your form types. You can assign multiple custom models to a composed model that are called with a single model ID. You can assign up to 200 trained custom models to a single composed model.

This sample composed model is in Document Intelligence Studio.

Screenshot that shows the Document Intelligence Studio Compose custom model pane.

Input requirements

The following file formats are supported.

Model PDF Image:
JPEG/JPG, PNG, BMP, TIFF, HEIF
Office:
Word (DOCX), Excel (XLSX), PowerPoint (PPTX), HTML
Read
Layout
General document
Prebuilt
Custom extraction
Custom classification
  • Photos and scans: For best results, provide one clear photo or high-quality scan per document.
  • PDFs and TIFFs: For PDFs and TIFFs, up to 2,000 pages can be processed. (With a free-tier subscription, only the first two pages are processed.)
  • File size: The file size for analyzing documents is 500 MB for the paid (S0) tier and 4 MB for the free (F0) tier.
  • Image dimensions: The dimensions must be between 50 pixels x 50 pixels and 10,000 pixels x 10,000 pixels.
  • Password locks: If your PDFs are password-locked, you must remove the lock before submission.
  • Text height: The minimum height of the text to be extracted is 12 pixels for a 1024 x 768-pixel image. This dimension corresponds to about 8-point text at 150 dots per inch.
  • Custom model training: The maximum number of pages for training data is 500 for the custom template model and 50,000 for the custom neural model.
  • Custom extraction model training: The total size of training data is 50 MB for template model and 1 GB for the neural model.
  • Custom classification model training: The total size of training data is 1 GB with a maximum of 10,000 pages. For 2024-11-30 (GA), the total size of training data is 2 GB with a maximum of 10,000 pages.
  • Office file types (DOCX, XLSX, PPTX): The maximum string length limit is 8 million characters.

Note

The Sample Labeling tool doesn't support the BMP file format. The limitation derives from the tool not the Document Intelligence Service.

Version migration

Learn how to use Document Intelligence v3.0 in your applications by following the steps in the Document Intelligence v3.1 migration guide.

Model Description
Document analysis
Layout Extract text and layout information from documents.
Prebuilt
Invoice Extract key information from English-language and Spanish-language invoices.
Receipt Extract key information from English-language receipts.
ID document Extract key information from US driver's licenses and international passports.
Business card Extract key information from English-language business cards.
Custom
Custom Extract data from forms and documents specific to your business. Custom models are trained for your distinct data and use cases.
Composed Compose a collection of custom models and assign them to a single model built from your form types.

Layout

The Layout API analyzes and extracts text, tables and headers, selection marks, and structure information from documents.

This sample document was processed by using the Sample Labeling tool.

Screenshot that shows layout analysis by using the Sample Labeling tool.

Invoice

The invoice model analyzes and extracts key information from sales invoices. The API analyzes invoices in various formats and extracts key information such as customer name, billing address, due date, and amount due.

This sample invoice was processed by using the Sample Labeling tool.

Screenshot that shows a sample invoice analysis by using the Sample Labeling tool.

Receipt

The receipt model analyzes and extracts key information from printed and handwritten sales receipts.

This sample receipt was processed by using the Sample Labeling tool.

Screenshot that shows a sample receipt.

ID document

The ID document model analyzes and extracts key information from the following documents:

  • US driver's licenses (all 50 states and District of Columbia)
  • Biographical pages from international passports (excluding visa and other travel documents). The API analyzes and extracts identity documents.

This sample US driver's license was processed by using the Sample Labeling tool.

Screenshot that shows a sample identification card.

Business card

The business card model analyzes and extracts key information from business card images.

This sample business card was processed by using the Sample Labeling tool.

Screenshot that shows a sample business card.

Custom

Custom models analyze and extract data from forms and documents specific to your business. The API is a machine-learning program trained to recognize form fields within your distinct content and extract key/value pairs and table data. You need only five examples of the same form type to get started. You can train your custom model with or without labeled datasets.

This sample custom model was processed by using the Sample Labeling tool.

Screenshot that shows the Document Intelligence tool analyzing a custom form pane.

Composed custom model

A composed model is created by taking a collection of custom models and assigning them to a single model built from your form types. You can assign multiple custom models to a composed model that are called with a single model ID. You can assign up to 100 trained custom models to a single composed model.

This composed model pane was processed by using the Sample Labeling tool.

Screenshot that shows the Document Intelligence Studio Compose custom model pane.

Model data extraction

Model Text extraction Language detection Selection marks Tables Paragraphs Paragraph roles Key/value pairs Fields
Layout
Invoice
Receipt
ID Document
Business Card
Custom Form

Input requirements

The following file formats are supported.

Model PDF Image:
JPEG/JPG, PNG, BMP, TIFF, HEIF
Office:
Word (DOCX), Excel (XLSX), PowerPoint (PPTX), HTML
Read
Layout
General document
Prebuilt
Custom extraction
Custom classification
  • Photos and scans: For best results, provide one clear photo or high-quality scan per document.
  • PDFs and TIFFs: For PDFs and TIFFs, up to 2,000 pages can be processed. (With a free-tier subscription, only the first two pages are processed.)
  • File size: The file size for analyzing documents is 500 MB for the paid (S0) tier and 4 MB for the free (F0) tier.
  • Image dimensions: The dimensions must be between 50 pixels x 50 pixels and 10,000 pixels x 10,000 pixels.
  • Password locks: If your PDFs are password-locked, you must remove the lock before submission.
  • Text height: The minimum height of the text to be extracted is 12 pixels for a 1024 x 768-pixel image. This dimension corresponds to about 8-point text at 150 dots per inch.
  • Custom model training: The maximum number of pages for training data is 500 for the custom template model and 50,000 for the custom neural model.
  • Custom extraction model training: The total size of training data is 50 MB for template model and 1 GB for the neural model.
  • Custom classification model training: The total size of training data is 1 GB with a maximum of 10,000 pages. For 2024-11-30 (GA), the total size of training data is 2 GB with a maximum of 10,000 pages.
  • Office file types (DOCX, XLSX, PPTX): The maximum string length limit is 8 million characters.

Note

The Sample Labeling tool doesn't support the BMP file format. The limitation derives from the tool not Document Intelligence.

Version migration

You can learn how to use Document Intelligence v3.0 in your applications by following the steps in the Document Intelligence v3.1 migration guide