Data Pipeline to push files from External System (Adobe) to ADLS Gen2

David 60 Reputation points
2025-08-25T10:44:18.69+00:00

Hello,

May someone please share detail step by step for below scenario.

Adobe Campaign push Email Delivery and Email Tracking data objects (2 files) to File Storage on Bronze Layer.

which component I have to use to for polling and grab the new file and load to ADLS Gen2 Bronze layer.

I don't have any control on Adobe environment.

Azure Databricks will check on schedule. If there are any new files available Databricks pulls 2 files (Email delivery & tracking files).

can we use Azure Blob Storage events to trigger workflows when new files land, but how to do that, may someone please help with steps.

we have access to Azure Event Grid and Azure Functions in our environment and want to implement as event-driven processing.

cron job will run inside a service endpoint within our private subnet

Please help me how the flow will look like.

Image : image (6)

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
0 comments No comments
{count} votes

Accepted answer
  1. Marcin Policht 54,995 Reputation points MVP Volunteer Moderator
    2025-08-25T11:38:09.4966667+00:00

    Since you don't have control over the Adobe environment, you'll be polling/pulling the files into your ADLS Gen2 Bronze layer. Once the files land in ADLS, you can use Azure Event Grid + Function App to react to new file arrivals. You can accomplish this by leveraging Azure Event Grid and an Azure Function app.

    1. Enable Event Grid on the Storage account
    • Go to your Storage Account in Azure Portal.
    • Under Events, click + Event Subscription.
    • Choose Event Grid Schema.
    • Set Event Types → "BlobCreated" (you care about new files only).
    • Choose the destination → Azure Function App.
    1. Create an Azure Function app (event triggered)
    • In Azure Portal, create a Function App.
    • Choose Event Grid Trigger as the template.
    • This function will fire whenever a new file lands in your ADLS Gen2 container.

    Example function (Python):

    import logging
    import azure.functions as func
    from azure.storage.filedatalake import DataLakeServiceClient
    
    def main(event: func.EventGridEvent):
        result = json.loads(event.get_body())
        file_url = result['data']['url']
        
        logging.info(f"New file detected: {file_url}")
        
        # (Optional) Add validation: only process Adobe Campaign files
        if "email_delivery" in file_url or "email_tracking" in file_url:
            logging.info("Processing Adobe Campaign file...")
            # Here you can trigger Databricks job, move file, or parse metadata
    
    1. Trigger Databricks job from the function app

    Instead of polling, you can call Databricks REST API (or Azure Data Factory pipeline) from inside the Function App when the file lands:

    import requests
    
    DATABRICKS_JOB_ID = "12345"
    DATABRICKS_TOKEN = "your_pat_token"
    DATABRICKS_URL = "https://<databricks-instance>/api/2.0/jobs/run-now"
    
    def trigger_databricks_job():
        response = requests.post(
            DATABRICKS_URL,
            headers={"Authorization": f"Bearer {DATABRICKS_TOKEN}"},
            json={"job_id": DATABRICKS_JOB_ID}
        )
        logging.info(f"Databricks Job triggered: {response.json()}")
    
    1. Secure within a private subnet
    • Since you mentioned the cron job runs inside a private subnet, make sure:
      • Storage account has Private Endpoint enabled.
      • Function App runs in Premium Plan with VNET integration.
      • Event Grid delivers events securely via private endpoints.

    So effectively, the workflow summary would be:

    1. Adobe Campaign pushes files → ADLS Gen2 (Bronze Layer).
    2. ADLS fires BlobCreated Event → Event Grid.
    3. Event Grid → triggers Function App.
    4. Function App → triggers Databricks Job (to process delivery & tracking files).

    If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

    hth

    Marcin


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.