Custom Model Not Extracting Correct Data from Scorecard PDFs

Vaibhav 0 Reputation points
2025-08-25T19:39:13.9666667+00:00

While training a custom model for scorecard PDFs, the output is inconsistent. Despite providing correctly labeled training data, the model often fails to pick values, sometimes extracts no data, or retrieves incorrect data. Issue persists across multiple documents and needs guidance to fix this issue.

Azure AI Document Intelligence
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Vinodh247 37,216 Reputation points MVP Volunteer Moderator
    2025-08-26T01:26:16.5333333+00:00

    If your Azure Document Intelligence custom model is not extracting the correct data from scorecard PDFs, the issue usually comes down to a few practical factors. Start by checking the quality of the PDFs; they need to be machine-readable, so if they are scans or images, use OCR preprocessing to clean and align them properly. Next, review your labeled data in the labeling tool to ensure consistency and accuracy, and make sure you have provided enough samples, ideally 15/20 per layout variation. If your scorecards have fixed layouts, a custom template model is best; for variable layouts, use a custom neural model and ensure the training set covers every variation. When testing, enable the includeFieldElements=true parameter in the API to inspect raw OCR output and confirm whether text is detected correctly. If text is recognized but mapped incorrectly, adjust your labels to be anchored to keywords instead of fixed positions. For documents with multiple formats, train a layout classification model first and route each type to its corresponding extractor. After retraining the model, check field confidence scores and use basic post-processing rules or regex to validate and clean the extracted data. If the problem persists, export your labeled data, try retraining in a new project, and escalate the issue to Microsoft support with sample documents and your model ID for deeper troubleshooting.

    Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.