Production Impact – Microsoft Purview Custom SIT Exclusions Not Honored
In Microsoft Purview, a custom Sensitive Information Type (SIT) configured with exclusion logic is not functioning as expected. Despite defining exclusions for specific keywords and phrases, files containing those excluded terms are still being flagged during both SIT console testing and Data Explorer scans. This behavior is resulting in false positives, raising concerns about the reliability of Purview’s exclusion logic or its implementation. The issue is currently impacting the production environment and requires urgent investigation to determine whether this is due to limitations in exclusion processing, regex/proximity-matching behavior, or a potential regression in the Purview SIT framework.
Microsoft Security | Microsoft Purview
-
Venkat Reddy Navari • 5,815 Reputation points • Microsoft External Staff • Moderator
2025-08-20T11:52:47.1933333+00:00 Hi ZTS When exclusion logic in custom Sensitive Information Types (SITs) doesn’t behave as expected either during SIT console testing or in Data Explorer it often comes down to how the rules are structured or how proximity/confidence levels are configured.
Here are a few things you can review:
Verify how exclusions are defined: If you’re using keyword-based exclusions, ensure they're scoped properly either within the regex pattern itself (via negative lookaheads/behind) or by defining supporting elements correctly. Misconfigured logic may still trigger a match even when exclusions are present.
Review proximity and confidence settings: Sometimes matches may be flagged due to supporting elements nearby scoring higher. Try adjusting the proximity distance or lowering the confidence level temporarily to observe if this affects the behavior.
Inspect the SIT JSON configuration: Export and check your SIT definition to make sure:
- Exclusions are being applied at the right match level
- Regex patterns aren’t too broad or overlapping
- Supporting elements are properly nested and ordered
You can refer to this Microsoft Learn doc for examples on exclusion patterns: Sensitive information type regex validators and additional checks
Use minimal test cases: To isolate whether the issue lies with the exclusion logic or the rule structure itself, test a simplified version of the SIT with just a base pattern and a single exclusion condition.
I hope this information helps. Please do let us know if you have any further queries.
Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.
-
ZTS • 60 Reputation points
2025-08-21T10:23:29.82+00:00 can you sahre steps how yo peformed below?
Verify how exclusions are defined: If you’re using keyword-based exclusions, ensure they're scoped properly either within the regex pattern itself (via negative lookaheads/behind) or by defining supporting elements correctly. Misconfigured logic may still trigger a match even when exclusions are present.
Review proximity and confidence settings: Sometimes matches may be flagged due to supporting elements nearby scoring higher. Try adjusting the proximity distance or lowering the confidence level temporarily to observe if this affects the behavior.
Inspect the SIT JSON configuration: Export and check your SIT definition to make sure:
- Exclusions are being applied at the right match level
- Regex patterns aren’t too broad or overlapping
- Supporting elements are properly nested and ordered
You can refer to this Microsoft Learn doc for examples on exclusion patterns: Sensitive information type regex validators and additional checks
Use minimal test cases: To isolate whether the issue lies with the exclusion logic or the rule structure itself, test a simplified version of the SIT with just a base pattern and a single exclusion condition.
-
ZTS • 60 Reputation points
2025-08-22T04:14:34.47+00:00 Can you provide example configuration?
-
ZTS • 60 Reputation points
2025-08-22T04:17:44.9133333+00:00 Additional information. It is working in other file format except PDF. Is this a limitation?
-
ZTS • 60 Reputation points
2025-08-22T09:45:00.11+00:00 HI any updates?
-
Venkat Reddy Navari • 5,815 Reputation points • Microsoft External Staff • Moderator
2025-08-22T10:32:25.8266667+00:00 ZTS Thanks for the update. Yes, that behavior is expected not all Purview information protection and classification features are fully supported for PDF files. While formats like Word, Excel, and PowerPoint handle sensitivity labels and inspection consistently, PDF support has certain limitations depending on whether it’s a text-based PDF or image-based (scanned) PDF.
For reference, Microsoft document: Supported file types for sensitivity labels in Microsoft Purview.
If you specifically need classification/labeling to work with PDFs, you may want to check whether the files are OCR-processed (searchable text), since image-only PDFs often won’t be scanned/classified.
Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.
-
Venkat Reddy Navari • 5,815 Reputation points • Microsoft External Staff • Moderator
2025-08-25T12:14:16.1166667+00:00 ZTS We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
-
ZTS • 60 Reputation points
2025-08-26T06:44:03.7066667+00:00 The limitation was confirmed and tested? Ay additional MSFT URLs to support this limitation?
-
Venkat Reddy Navari • 5,815 Reputation points • Microsoft External Staff • Moderator
2025-08-26T12:29:19.2433333+00:00 ZTS Yes, the limitation with exclusions not being honored in PDF files has been confirmed. Microsoft has documented this behavior, and the references below specifically call out how support differs for PDFs compared to Word, Excel, and PowerPoint:
These cover the official scope of PDF handling. At present, this is considered a product limitation rather than a misconfiguration.
Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.
-
ZTS • 60 Reputation points
2025-08-27T12:42:08.1033333+00:00 The test file which we were using, was text searchable. It detects for Primary element was showing.
Can you test this in your environment and confirmed?
-
Venkat Reddy Navari • 5,815 Reputation points • Microsoft External Staff • Moderator
2025-08-28T07:19:55.73+00:00 ZTS Thanks for confirming the file was text searchable. Could you please share the details requested in the private message.
-
ZTS • 60 Reputation points
2025-08-29T11:48:27.2166667+00:00 Here are the steps.
Custom SIT Implementation in Microsoft Purview
- Create the Custom SIT
- Go to Microsoft Purview portal.
- Navigate to Information Protection > Sensitive info types.
- Select Create sensitive info type.
- Provide a name and description.
- Define Detection Criteria
- Add a pattern using:
- Regular expression (RegEx) to match specific data formats.
- Optional keyword list to strengthen detection.
- Confidence level (Low, Medium, High) based on match accuracy.
- Optional keyword list to strengthen detection.
- Regular expression (RegEx) to match specific data formats.
- Add Exclusion Filters
- Use built-in exclusion functionality to ignore certain keywords or phrases.
- Example: Exclude terms like “test”, “sample”, or “demo” that should not trigger a match.
- This is done by adding a TextMatchFilter with logic set to Exclude.
- Save and Publish
- Review the SIT configuration.
- Save and publish it for use in DLP policies or auto-labeling.
Validation Process
A. Positive Match Testing
- Create sample content with valid patterns.
- Confirm that the SIT detects these correctly.
B. Exclusion Testing
- Create content with valid patterns that include excluded keywords.
- Confirm that these are not flagged.
C. Review in Content Explorer
- Use Content Explorer in Purview to verify detection across sample files.
- Check match accuracy and confirm exclusions are working.
-
ZTS • 60 Reputation points
2025-08-29T11:49:43.28+00:00 Are you able to create simple rule and test it with PDF?
Sign in to comment