Azure Content Understanding Classifier Support Confidence Scores?

Question

Azure Content Understanding Classifier Support Confidence Scores?

Cameron Kenny 0

We're testing the 2025-05-01-preview classifier, and it's working well, but there’s no confidence score in the classification/splitting output. Is this expected? And is there a roadmap to include per-category confidence like Document Intelligence has for field extraction?

Example output:

{"id":"8a62e2b6-d021-4b6a-b767-7fdbf9c34227","status":"Succeeded","result":{"classifierId":"mortgage_classifier","apiVersion":"2025-05-01-preview","createdAt":"2025-05-26T10:34:30.0071401Z","contents":[{"category":"company_filing","kind":"document","startPageNumber":1,"endPageNumber":22},{"category":"employment_contract","kind":"document","startPageNumber":23,"endPageNumber":35},{"category":"company_filing","kind":"document","startPageNumber":36,"endPageNumber":36},{"category":"payslips","kind":"document","startPageNumber":37,"endPageNumber":37}]}}%

Ravada Shivaprasad 1,115 Reputation points Microsoft External Staff Moderator

2025-05-26T23:47:53.6333333+00:00

Hi Cameron Kenny

The absence of confidence scores in the classification and splitting output of the 2025-05-01-preview classifier is expected behavior, as confidence scores are typically associated with field extraction rather than classification. Azure AI Document Intelligence provides confidence scores for field extraction tasks, but these serve a different purpose and are not currently part of the classifier’s output. Despite this, the classifier remains highly effective for its intended purpose, offering robust support for both single and multi-document classification with high accuracy. It can analyze single- or multi-file documents to determine whether an input file fits into a predefined category and is capable of handling scenarios such as single-document classification, multi-document identification within a single file, and multiple instances of the same document type within one file Microsoft documentation.

Currently, there is no explicit roadmap announcement regarding the addition of per-category confidence scores to the classifier. However, users seeking more insight into classification performance can consider several alternatives. These include regularly reviewing the latest Azure AI Document Intelligence release notes for updates, checking whether confidence scores are available in different API versions, and exploring custom models that provide confidence scores for extracted fields Microsoft documentation.

Azure AI Document Intelligence continues to evolve, with the latest 2025-05-01-preview release introducing new capabilities for document classification and splitting. For the most up-to-date details on its features and improvements, Please refer to the official Microsoft documentation here.

Thanks
Ravada Shivaprasad 1,115 Reputation points Microsoft External Staff Moderator

2025-05-27T23:05:33.01+00:00

Hi Cameron Kenny

Just checking in to see if the above answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Thanks
Ravada Shivaprasad 1,115 Reputation points Microsoft External Staff Moderator

2025-05-28T20:05:35.3233333+00:00

Hi Cameron Kenny

Following up to see if the above answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Thanks

2 answers

Your answer

Ravada Shivaprasad 1,115 Reputation points Microsoft External Staff Moderator

2025-05-26T23:47:53.6333333+00:00

Hi Cameron Kenny

The absence of confidence scores in the classification and splitting output of the 2025-05-01-preview classifier is expected behavior, as confidence scores are typically associated with field extraction rather than classification. Azure AI Document Intelligence provides confidence scores for field extraction tasks, but these serve a different purpose and are not currently part of the classifier’s output. Despite this, the classifier remains highly effective for its intended purpose, offering robust support for both single and multi-document classification with high accuracy. It can analyze single- or multi-file documents to determine whether an input file fits into a predefined category and is capable of handling scenarios such as single-document classification, multi-document identification within a single file, and multiple instances of the same document type within one file Microsoft documentation.

Currently, there is no explicit roadmap announcement regarding the addition of per-category confidence scores to the classifier. However, users seeking more insight into classification performance can consider several alternatives. These include regularly reviewing the latest Azure AI Document Intelligence release notes for updates, checking whether confidence scores are available in different API versions, and exploring custom models that provide confidence scores for extracted fields Microsoft documentation.

Azure AI Document Intelligence continues to evolve, with the latest 2025-05-01-preview release introducing new capabilities for document classification and splitting. For the most up-to-date details on its features and improvements, Please refer to the official Microsoft documentation here.

Thanks
Ravada Shivaprasad 1,115 Reputation points Microsoft External Staff Moderator

2025-05-27T23:05:33.01+00:00

Hi Cameron Kenny

Just checking in to see if the above answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Thanks
Ravada Shivaprasad 1,115 Reputation points Microsoft External Staff Moderator

2025-05-28T20:05:35.3233333+00:00

Hi Cameron Kenny

Following up to see if the above answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Thanks

Answer 1

Hello Brikesh Kumar,

Thank you for posting your question in the Microsoft Q&A forum.

This is a critical observation, and the behavior you're describing points to a specific and important characteristic of the Azure Document Intelligence classifier when operating in splitMode="auto" on very large files.

The key is to understand that the process involves two distinct steps:

Splitting: First, the service must find the boundaries between documents within your large PDF. It analyzes pages to determine where one document ends and the next begins.
Classification: Second, for each identified document span (page range), it must classify it into one of your trained categories.

Your problem is likely occurring in Step 1 (Splitting), not Step 2 (Classification).

Why Confidence is Zero for Many Splits? - When you submit a 750-page PDF, the splitMode="auto" algorithm is working extremely hard to find logical break points. It's looking for signals like:

Changes in layout, fonts, and formatting.
The presence of what look like document headers or footers.
Patterns that suggest a natural boundary.

For a file of that size, especially if it contains many similar-looking documents (like thousands of payslips or invoices from the same company), the splitting algorithm can become less confident about the exact boundaries.

When the splitting service has low confidence that it has found a true, distinct document, it still must return something. It will often return a page range, but it assigns a classification confidence of 0. This is a clear indicator that: "I found a block of pages, but I am not confident enough that it is a coherent document to even try classifying it against your trained models."

This is very different from it being confident that the content is not one of your classes. It's saying the foundational step splitting failed for that segment.

The "confidence": 0 is not a classification confidence problem per se; it's a splitting confidence problem. The service is telling you that it could not reliably isolate a discrete document from that specific page range in your massive file, so it doesn't trust the subsequent classification result enough to give it a score.

Please, let me know the response helps answer your question? If the above answer helped, please do not forget to "Accept Answer" as this may help other community members to refer the info if facing a similar issue. 🙂

Brikesh Kumar 0 Reputation points

2025-08-28T15:16:12.2233333+00:00

Hi @Suwarna S Kale

Thank you for swift response. I appreciate it. Given what you explained. What alternative solution you would propose or what solution have you seen working. Since I do have use case where the file would have 1000 pages and the I need to indentify the documents in it.

Thanks,

Brikesh

Answer 2

Hello Cameron Kenny,

Thank you for posting your question in the Microsoft Q&A forum.

The absence of confidence scores in the classification/splitting output of the 2025-05-01-preview classifier appears to be expected behavior, as confidence scores are typically provided for field extraction rather than classification. Currently, there is no explicit mention of per-category confidence scores for classification in the roadmap, but improvements to Document Intelligence are continuously being made.

To resolve this, you may consider below configs:

Reviewing the latest Azure AI Document Intelligence release notes for updates.
Checking if confidence scores are available in a different API version.
Exploring alternative methods, such as custom models, which provide confidence scores for extracted fields.

Some reference documentations may help:

If the above answer helped, please do not forget to "Accept Answer" as this may help other community members to refer the info if facing a similar issue. Your contribution to the Microsoft Q&A community is highly appreciated.

Brikesh Kumar 0 Reputation points

2025-08-27T21:00:14.5833333+00:00

Hello @Suwarna S Kale

I am using the custom classifier with split mode = auto. The API version beign used is 2024-11-30. When I give it large file to find all the trained label, I also don't get confidence level for each of the documents identified. I give it a 750 pages pdf file and it returns some 200 document types in it but all of them show confidence as 0.

The same does not happen when I give small document like 100 pages. It comes back with the labels it finds and also gives the confidence score also.

Because of not return any confidence, my classification fails to classify the section of the file to any of the known types that the classifier is trained on.

So it doesn't seem to be related to API version. Can you please check and let me know what could be wrong.

Thanks

Brikesh
Brikesh Kumar 0 Reputation points

2025-08-28T15:05:06.8+00:00

Hi @Suwarna S Kale

Thank you for swift response. I appreciate it. Given what you explained. What alternative solution you would propose or what solution have you seen working. Since I do have use case where the file would have 1000 pages and the I need to indentify the documents in it.

Thanks,

Brikesh

Share via

Azure Content Understanding Classifier Support Confidence Scores?

2 answers

Your answer