Timestamp column from Parquet file (created with Pandas) becomes NULL values in Data Flow

hh 0 Reputation points
2025-08-26T04:25:25.26+00:00

Hello,

I'm facing an issue where a timestamp column in a Parquet file is consistently read as NULL within an Azure Data Flow. Critically, the Data Flow does not throw any errors, making the problem difficult to debug.

Here is a summary of my environment and the problem:

Scenario:

  • I am using a Data Flow within Azure Data Factory to read a Parquet file.
  • In the Data Flow's data preview and subsequent transformations, all values in the target date/timestamp column are NULL.

Parquet File Generation Details:

  • The Parquet file is generated using a Python script.
  • I use a Pandas DataFrame and export it to a Parquet file using the to_parquet() function.
  • The source column in the Pandas DataFrame has a datetime64[ns] data type.
  • When creating the file, I am providing a PyArrow schema that maps this column to pyarrow.timestamp('s').
  • The date format is YYYY-MM-DD hh:mm:ss.

How Azure Data Factory Interprets the Schema:

  • In the source Dataset within Data Factory, the schema for the column is inferred as TIMESTAMP_MILLIS.
  • Within the Data Flow's source projection, the column is recognized as a timestamp type.

My Question:

Given that I'm writing the timestamp with second-level precision (timestamp('s')) but Data Factory seems to be interpreting it as millisecond-level precision (TIMESTAMP_MILLIS), could this mismatch be causing the values to be silently dropped and replaced with NULLs?

How can I configure my process to ensure the date values are loaded correctly? Should I change my PyArrow schema to use timestamp('ms') to align with Data Factory, or is there a setting within the Data Flow that I need to adjust?

This is a critical issue for us as it blocks all our date-based data processing. Any help or guidance would be greatly appreciated.

Thank you.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.