Hello — I'm hitting a blocking error when running an ADF pipeline that reads from Oracle (using Oracle connector 2.0) and writes to Parquet. I suspect it’s related to decimal/float precision mapping, but I can't find a definitive fix. Any guidance appreciated.
ErrorCode=ParquetJavaInvocationException,
Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,
Message=An error occurred when invoking java, message: java.lang.ArrayIndexOutOfBoundsException:
Index 126 out of bounds for length 38
total entry:1
com.microsoft.datatransfer.bridge.parquet.ParquetWriterBuilderBridge.addDecimalColumn(ParquetWriterBuilderBridge.java:107)
Context:
- Azure Data Factory Copy Activity (Oracle connector v2.0) → sink Parquet (ADLS).
Source DB: Oracle (TASY schema, many functions used in SELECT).
Some source numeric columns are Oracle FLOAT
/ NUMBER
and appear to map to a very large precision (example: FLOAT(126)
).
The Java Parquet writer fails when trying to create a decimal column with precision > 38.
What I already tried:
In the SELECT I forced numeric columns to CAST(... AS BINARY_DOUBLE)
to force double instead of decimal.
As an alternative, I converted numbers to strings using REPLACE(TO_CHAR(...),' ,','.')
to avoid decimal types in Parquet.
Ensured column mapping in Copy Activity shows Double (or String) for those columns.
Tested the SELECT directly in Oracle (works).
Temporarily wrote results to CSV (works) — so problem seems to be Parquet/decimal mapping.
Verified permissions and removed @item()
dynamic references (running as a static query).
Despite these changes, I still hit the ArrayIndexOutOfBoundsException
when writing Parquet in some runs (sometimes after other small adjustments).
Questions / requests:
Is this a known bug in ADF Oracle connector v2.0 (or Parquet writer) when encountering Oracle FLOAT with very large precision (e.g. FLOAT(126))?
What is the recommended way to force a safe schema for Parquet output from ADF when the source may have FLOAT/NUMBER with undefined precision? (e.g. best practice: CAST -> NUMBER(p,s) / BINARY_DOUBLE / string?)
Are there specific Copy Activity / Sink settings (Parquet format options, enable decimal conversion, or advanced mapping) that prevent the Parquet writer from trying to create DECIMAL(>38)?
Any driver / connector versions, JVM flags, or hotfixes I should apply? (I’m currently using Oracle connector v2.0).
If I must keep numeric types (not strings) for downstream consumers, what’s the safest approach to preserve numeric semantics while avoiding this error? (e.g. cast to NUMBER(38,10) in SQL, use Data Flow, use Azure Databricks job to transform before writing Parquet?)
What additional logs or diagnostic info would be helpful to share (pipeline JSON, full stack trace, sample row) to help troubleshoot?
What I can share if helpful:
Full error stack trace from ADF run.
The SELECT query used (with CASTs / REPLACE).
Pipeline + Copy Activity JSON.
A small sample of problematic values (if needed).
Thanks in advance — any pointers, known issues, or concrete config examples will help me move forward.Context:
Azure Data Factory Copy Activity (Oracle connector v2.0) → sink Parquet (ADLS).
Source DB: Oracle (TASY schema, many functions used in SELECT).
Some source numeric columns are Oracle FLOAT
/ NUMBER
and appear to map to a very large precision (example: FLOAT(126)
).
The Java Parquet writer fails when trying to create a decimal column with precision > 38.
What I already tried:
In the SELECT I forced numeric columns to CAST(... AS BINARY_DOUBLE)
to force double instead of decimal.
As an alternative, I converted numbers to strings using REPLACE(TO_CHAR(...),' ,','.')
to avoid decimal types in Parquet.
Ensured column mapping in Copy Activity shows Double (or String) for those columns.
Tested the SELECT directly in Oracle (works).
Temporarily wrote results to CSV (works) — so problem seems to be Parquet/decimal mapping.
Verified permissions and removed @item()
dynamic references (running as a static query).
Despite these changes, I still hit the ArrayIndexOutOfBoundsException
when writing Parquet in some runs (sometimes after other small adjustments).
Questions / requests:
Is this a known bug in ADF Oracle connector v2.0 (or Parquet writer) when encountering Oracle FLOAT with very large precision (e.g. FLOAT(126))?
What is the recommended way to force a safe schema for Parquet output from ADF when the source may have FLOAT/NUMBER with undefined precision? (e.g. best practice: CAST -> NUMBER(p,s) / BINARY_DOUBLE / string?)
Are there specific Copy Activity / Sink settings (Parquet format options, enable decimal conversion, or advanced mapping) that prevent the Parquet writer from trying to create DECIMAL(>38)?
Any driver / connector versions, JVM flags, or hotfixes I should apply? (I’m currently using Oracle connector v2.0).
If I must keep numeric types (not strings) for downstream consumers, what’s the safest approach to preserve numeric semantics while avoiding this error? (e.g. cast to NUMBER(38,10) in SQL, use Data Flow, use Azure Databricks job to transform before writing Parquet?)
What additional logs or diagnostic info would be helpful to share (pipeline JSON, full stack trace, sample row) to help troubleshoot?
What I can share if helpful:
Full error stack trace from ADF run.
The SELECT query used (with CASTs / REPLACE).
Pipeline + Copy Activity JSON.
A small sample of problematic values (if needed).
Thanks in advance — any pointers, known issues, or concrete config examples will help me move forward.