: Error during execution: Azure Data Factory Oracle connector 2.0 → Parquet write failure (ArrayIndexOutOfBounds: Index 126 out of bounds for length 38)

Maria Gabriele Alves Ramos 0 Reputation points
2025-08-21T19:20:47.48+00:00

Hello — I'm hitting a blocking error when running an ADF pipeline that reads from Oracle (using Oracle connector 2.0) and writes to Parquet. I suspect it’s related to decimal/float precision mapping, but I can't find a definitive fix. Any guidance appreciated.

ErrorCode=ParquetJavaInvocationException,

Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,

Message=An error occurred when invoking java, message: java.lang.ArrayIndexOutOfBoundsException:

Index 126 out of bounds for length 38

total entry:1

com.microsoft.datatransfer.bridge.parquet.ParquetWriterBuilderBridge.addDecimalColumn(ParquetWriterBuilderBridge.java:107)

Context:

  • Azure Data Factory Copy Activity (Oracle connector v2.0) → sink Parquet (ADLS).

Source DB: Oracle (TASY schema, many functions used in SELECT).

Some source numeric columns are Oracle FLOAT / NUMBER and appear to map to a very large precision (example: FLOAT(126)).

The Java Parquet writer fails when trying to create a decimal column with precision > 38.

What I already tried:

In the SELECT I forced numeric columns to CAST(... AS BINARY_DOUBLE) to force double instead of decimal.

As an alternative, I converted numbers to strings using REPLACE(TO_CHAR(...),' ,','.') to avoid decimal types in Parquet.

Ensured column mapping in Copy Activity shows Double (or String) for those columns.

Tested the SELECT directly in Oracle (works).

Temporarily wrote results to CSV (works) — so problem seems to be Parquet/decimal mapping.

Verified permissions and removed @item() dynamic references (running as a static query).

Despite these changes, I still hit the ArrayIndexOutOfBoundsException when writing Parquet in some runs (sometimes after other small adjustments).

Questions / requests:

Is this a known bug in ADF Oracle connector v2.0 (or Parquet writer) when encountering Oracle FLOAT with very large precision (e.g. FLOAT(126))?

What is the recommended way to force a safe schema for Parquet output from ADF when the source may have FLOAT/NUMBER with undefined precision? (e.g. best practice: CAST -> NUMBER(p,s) / BINARY_DOUBLE / string?)

Are there specific Copy Activity / Sink settings (Parquet format options, enable decimal conversion, or advanced mapping) that prevent the Parquet writer from trying to create DECIMAL(>38)?

Any driver / connector versions, JVM flags, or hotfixes I should apply? (I’m currently using Oracle connector v2.0).

If I must keep numeric types (not strings) for downstream consumers, what’s the safest approach to preserve numeric semantics while avoiding this error? (e.g. cast to NUMBER(38,10) in SQL, use Data Flow, use Azure Databricks job to transform before writing Parquet?)

What additional logs or diagnostic info would be helpful to share (pipeline JSON, full stack trace, sample row) to help troubleshoot?

What I can share if helpful:

Full error stack trace from ADF run.

The SELECT query used (with CASTs / REPLACE).

Pipeline + Copy Activity JSON.

A small sample of problematic values (if needed).

Thanks in advance — any pointers, known issues, or concrete config examples will help me move forward.Context:

Azure Data Factory Copy Activity (Oracle connector v2.0) → sink Parquet (ADLS).

Source DB: Oracle (TASY schema, many functions used in SELECT).

Some source numeric columns are Oracle FLOAT / NUMBER and appear to map to a very large precision (example: FLOAT(126)).

The Java Parquet writer fails when trying to create a decimal column with precision > 38.

What I already tried:

In the SELECT I forced numeric columns to CAST(... AS BINARY_DOUBLE) to force double instead of decimal.

As an alternative, I converted numbers to strings using REPLACE(TO_CHAR(...),' ,','.') to avoid decimal types in Parquet.

Ensured column mapping in Copy Activity shows Double (or String) for those columns.

Tested the SELECT directly in Oracle (works).

Temporarily wrote results to CSV (works) — so problem seems to be Parquet/decimal mapping.

Verified permissions and removed @item() dynamic references (running as a static query).

Despite these changes, I still hit the ArrayIndexOutOfBoundsException when writing Parquet in some runs (sometimes after other small adjustments).

Questions / requests:

Is this a known bug in ADF Oracle connector v2.0 (or Parquet writer) when encountering Oracle FLOAT with very large precision (e.g. FLOAT(126))?

What is the recommended way to force a safe schema for Parquet output from ADF when the source may have FLOAT/NUMBER with undefined precision? (e.g. best practice: CAST -> NUMBER(p,s) / BINARY_DOUBLE / string?)

Are there specific Copy Activity / Sink settings (Parquet format options, enable decimal conversion, or advanced mapping) that prevent the Parquet writer from trying to create DECIMAL(>38)?

Any driver / connector versions, JVM flags, or hotfixes I should apply? (I’m currently using Oracle connector v2.0).

If I must keep numeric types (not strings) for downstream consumers, what’s the safest approach to preserve numeric semantics while avoiding this error? (e.g. cast to NUMBER(38,10) in SQL, use Data Flow, use Azure Databricks job to transform before writing Parquet?)

What additional logs or diagnostic info would be helpful to share (pipeline JSON, full stack trace, sample row) to help troubleshoot?

What I can share if helpful:

Full error stack trace from ADF run.

The SELECT query used (with CASTs / REPLACE).

Pipeline + Copy Activity JSON.

A small sample of problematic values (if needed).

Thanks in advance — any pointers, known issues, or concrete config examples will help me move forward.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
{count} votes

1 answer

Sort by: Most helpful
  1. Maria Gabriele Alves Ramos 0 Reputation points
    2025-08-26T16:45:07.9433333+00:00

    I encountered the same issue when using the Oracle connector in Azure Data Factory. To resolve it, I used the CAST function with the NUMBER type, which is standard in Oracle. However, it's important to explicitly define the precision and scale when casting, like CAST(column AS NUMBER(15,2)).

    If you use CAST(column AS NUMBER) without specifying precision, the error may still occur, especially when writing to formats like Parquet. Defining the precision ensures that ADF can correctly map the data types during transformation and sink operations.

    Let me know if you need an example or further clarification!


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.