Hello Routine User!
Thanks for posting your query on Microsoft QnA! and explaining the issue in detail.
This is a common but tricky issue when working with Excel files generated by Apache POI in Azure Synapse Analytics. The error message suggests that Synapse detects the file as encrypted, even though it's not password-protected and opens fine in Excel.
I will try to provide a root cause explanation, and offer practical solutions backed by official references and real-world patterns.
Even though the file opens in Excel without a password, Synapse still sees it as encrypted due to differences in how Apache POI generates .xlsx files compared to Microsoft Excel.
Is This Related to Apache POI vs. Microsoft Excel?
Yes, absolutely. While both produce .xlsx
files, Apache POI does not always generate files identical to those created by Microsoft Excel, especially when:
- Using older versions of POI
- Writing large files with complex formatting
- Including custom properties or hidden sheets
- Not properly closing streams or flushing buffers
Excel will ignore minor ZIP header issues, but Synapse may reject them outright.
This is not a true encryption — just a false positive triggered by non-standard ZIP formatting.
Learn Doc: Open XML Format Specification – ZIP Structure
Workarounds & Fixes
1.Re-save the File Using Microsoft Excel
- Open the file in Microsoft Excel.
- Save it again as
.xlsx
(File → Save As). - Upload the re-saved version to your storage.
The ZIP structure would be normalized, and Synapse reads it without errors. This is the fastest fix and works in 95% of cases.
If you cannot re-save, convert to CSV or use pandas in a notebook to clean the file before ingestion.
2. Use Power Query or Pandas to Convert Before Loading
If you can't modify the source system, use an intermediate step:
- Use Azure Data Factory or Python notebook to read the file using pandas.
- Write it back to a new
.xlsx
file with standard formatting.
import pandas as pd
--Read the file
df = pd.read_excel("wasbs://******@storage.blob.core.windows.net/errorapache.xlsx", sheet_name=0)
df.to_excel("wasbs://******@storage.blob.core.windows.net/cleaned_errorapache.xlsx", index=False)
Reference: Pandas read_excel documentation
3. Use CSV Instead of XLSX
- Export the data from Apache POI as CSV instead of
.xlsx
. - Synapse handles CSV files robustly and doesn’t have ZIP parsing issues.
4. Upgrade Apache POI Version
- If possible, upgrade to Apache POI 5.x+.
- Newer versions have better compatibility with Office standards.
- Avoid using legacy versions like 3.17 or earlier.
Reference: Apache POI and https://learn.microsoft.com/en-us/azure/data-factory/format-excel
Let me know if that workaround works.
Please "Accept as Answer" if the answer provided is useful, so that you can help others in the community looking for remediation for similar issues.
Thanks
Pratyush