Share via


Azure Databricks

Azure Databricks offers a unified platform for scalable data management, governance, and analytics, combining streamlined workflows with the ability to handle diverse data types efficiently

This connector is available in the following products and regions:

Service Class Regions
Copilot Studio Premium All Power Automate regions except the following:
     -   US Government (GCC)
     -   US Government (GCC High)
     -   China Cloud operated by 21Vianet
     -   US Department of Defense (DoD)
Power Apps Premium All Power Apps regions except the following:
     -   US Government (GCC)
     -   US Government (GCC High)
     -   China Cloud operated by 21Vianet
     -   US Department of Defense (DoD)
Power Automate Premium All Power Automate regions except the following:
     -   US Government (GCC)
     -   US Government (GCC High)
     -   China Cloud operated by 21Vianet
     -   US Department of Defense (DoD)
Contact
Name Databricks Support
URL https://help.databricks.com
Email eng-partner-eco-help@databricks.com
Connector Metadata
Publisher Databricks Inc.
Website https://www.databricks.com/
Privacy policy https://www.databricks.com/legal/privacynotice
Categories Data

Connect to Azure Databricks from Microsoft Power Platform

This page explains how to connect to Azure Databricks from Microsoft Power Platform by adding Azure Databricks as a data connection. When connected, you can use your Azure Databricks data from the following platforms:

  • Power Apps: Build applications that can read from and write to Azure Databricks, while preserving your Azure Databricks governance controls.
  • Power Automate: Build flows and add actions that enable executing custom SQL or an existing Job and get back the results.
  • Copilot Studio: Build custom agents using your Azure Databricks data as a knowledge source.

Before you begin

Before you connect to Azure Databricks from Power Platform, you must meet the following requirements:

  • You have a Microsoft Entra ID (formerly Azure Active Directory) account.
  • You have a premium Power Apps license.
  • You have an Azure Databricks account.
  • You have access to a SQL warehouse in Azure Databricks.

Optional: Connect with Azure Virtual Networks

If your Azure Databricks workspace uses Virtual Networks, there are two ways to connect:

  1. Integrate Power Platform with resources inside your virtual network without exposing them over the public internet. To connect to the private endpoint of your Azure Databricks workspace, do the following after you configure private connectivity to Azure Databricks:

    For more information about virtual networks, see Virtual Network support overview.

  2. Enable access with hybrid deployment, where a front-end private link with a public endpoint is protected by a Workspace IP Access List. To enable access, do the following:

    1. Enable public access at workspace level. For more details, see Configure IP access lists for workspaces.
    2. Add the AzureConnectors IP range, or specific Power Platform IP range based on your environment's region, to your Workspace IP access list.

Optional: Create a Microsoft Entra Service Principal

Important

If Azure Databricks and Power Platform are in different tenants, you must use Service Principals for authentication.

Before connecting, complete the following steps to create, set up, and assign a Microsoft Entra Service Principal to your Azure Databricks account or workspace:

Step 1: Add an Azure Databricks connection to Power Platform

Note: If you're using Copilot Studio, we recommend creating the Databricks connection in Power Apps or Power Automate. Then it can be used in Copilot Studio.

To add an Azure Databricks connection, do the following:

  1. In Power Apps or Power Automate, from the sidebar, click Connections.

  2. Click + New connection in the upper-left corner.

  3. Search for "Azure Databricks" using the search bar in the upper-right.

  4. Select the Azure Databricks tile.

  5. Select your Authentication type from the drop down menu.

  6. Select your authentication method and enter your authentication information.

    • If your Power Platform deployment and Azure Databricks account are in the same Microsoft Entra tenant, you can use OAuth connection. Enter the following information:

      • For Server Hostname, enter the Azure Databricks SQL warehouse hostname.
      • For HTTP Path, enter the SQL warehouse HTTP path.
      • Click Create.
      • Sign in with your Microsoft Entra ID.
    • Service principal connection can be used in any scenario. Before connecting, create a Microsoft Entra service principal. Enter the following information:

      • For Client ID, enter the service principal ID.
      • For Client Secret, enter the service principal secret.
      • For Tenant, enter the service principal tenant.
      • For Hostname, enter the Azure Databricks SQL warehouse hostname.
      • For HTTP Path, enter the SQL warehouse HTTP path.
      • (Optional) You can rename or share the service principal connection with your team members after the connection is created.
    • To find your Azure Databricks SQL warehouse connection details, see Get connection details for an Azure Databricks compute resource.

  7. Click Create.

Step 2: Use the Azure Databricks connection

After you create an Azure Databricks connection in Power Apps or Power Automate, you can use your Azure Databricks data to create Power canvas apps, Power Automate flows, and Copilot Studio agents.

Use your Azure Databricks data to build Power canvas apps

Important

You can only use canvas apps if directly connecting to Azure Databricks in the app. You can't use virtual tables.

To add your Azure Databricks data to your application, do the following:

  1. From the leftmost navigation bar, click Create.
  2. Click Start with a blank canvas and select your desired canvas size to create a new canvas app.
  3. From your application, click Add data > Connectors > Azure Databricks. Select the Azure Databricks connection you created.
  4. Select a catalog from the Choose a dataset sidebar.
  5. From the Choose a dataset sidebar, select all the tables you want to connect your canvas app to.
  6. Click Connect.

Data operations in Power Apps:

The connector supports create, update, and delete operations, but only for tables that have a primary key defined. When performing create operations you must always specify the primary key.

Note: Azure Databricks supports generated identity columns. In this case, primary key values are automatically generated on the server during row creation and cannot be manually specified.

Use your Azure Databricks data to build Power Automate flows

The Statement Execution API and the Jobs API are exposed within Power Automate, allowing you to write SQL statements and execute existing Jobs. To create a Power Automate flow using Azure Databricks as an action, do the following:

  1. From the leftmost navigation bar, click Create.
  2. Create a flow and add any trigger type.
  3. From your new flow, click + and search for "Databricks" to see the available actions.

To write SQL, select one of the following actions:

  • Execute a SQL Statement: Write and run a SQL statement. Enter the following:

    • For Body/warehouse_id, enter the ID of the warehouse upon which to execute the SQL statement.
    • For Body/statement_id, enter the ID of the SQL statement to execute.
    • For more about the advanced parameters, see here.
  • Check status and get results: Check the status of a SQL statement and gather results. Enter the following:

    • For Statement ID, enter the ID returned when the SQL statement was executed.
    • For more about the parameter, see here.
  • Cancel the execution of a statement: Terminate execution of a SQL statement. Enter the following:

    • For Statement ID, enter the ID of the SQL statement to terminate.
    • For more about the parameter, see here.
  • Get result by chunk index: Get results by chunk index, which is suitable for large result sets. Enter the following:

    • For Statement ID, enter the ID of the SQL statement whose results you want to retrieve.
    • For Chunk index, enter the target chunk index.
    • For more about the parameters, see here.

To interact with an existing Databricks Job, select one of the following actions:

  • List Jobs: Retrieves a list of jobs. For more information see here.
  • Trigger a new job run: Runs a job and returns the run_id of the triggered run. For more information see here.
  • Get a single Job run: Returns metadata about a run, including run status (e.g., RUNNING, SUCCESS, FAILED), start and end time, execution durations, cluster information, etc. For more information see here.
  • Cancel a Job run: Cancels a job run or a task run. For more information, see here.
  • Get the output for a single job run: Retrieves the output and metadata of a single task run. For more information, see here.

Use Azure Databricks as a knowledge source in Copilot Studio

To add your Azure Databricks data as a knowledge source to a Copilot Studio agent, do the following:

  1. From the sidebar, click Agent.
  2. Select an existing agent or create a new agent by clicking + New agent.
    • Describe the agent by inputting a message and then click Create.
    • Or, click Skip to manually specify the agent's information.
  3. In the Knowledge tab, click + Knowledge.
  4. Click Advanced.
  5. Select Azure Databricks as the knowledge source.
  6. Input the catalog name your data is in.
  7. Click Connect.
  8. Select the tables you want your agent to use as a knowledge source and click Add.

Create Dataverse virtual tables with your Azure Databricks data

You can also create Dataverse virtual tables with the Azure Databricks connector. Virtual tables, also known as virtual entities, integrate data from external systems with Microsoft Dataverse. A virtual table defines a table in Dataverse without storing the physical table in the Dataverse database. To learn more about virtual tables, see Get started with virtual tables (entities).

Note

Although virtual tables do not consume Dataverse storage capacity, Databricks recommends you to use direct connections for better performance.

You must have the System Customizer or System Admin role. For more information, see security roles for Power Platform.

Follow these steps to create a Dataverse virtual table:

  1. In Power Apps, from the sidebar, click Tables.

  2. Click + New Table from the menu bar and select Create a virtual table.

  3. Select an existing Azure Databricks connection or create a new connection to Azure Databricks. To add a new connection, see Step 1: Add an Azure Databricks connection to Power Platform.

    Databricks recommends to use a service principal connection to create a virtual table.

  4. Click Next.

  5. Select the tables to represent as a Dataverse virtual table.

    • Dataverse virtual tables require a primary key. Therefore, views cannot be virtual tables, but materialized views can.
  6. Click Next.

  7. Configure the virtual table by updating the details of the table, if necessary.

  8. Click Next.

  9. Confirm the details of the data source and click Finish.

  10. Use the Dataverse virtual table in Power Apps, Power Automate, and Copilot Studio.

For a list of known limitations of Dataverse virtual tables, see Known limitations and troubleshooting.

Conduct batch updates

If you need to perform bulk create, update, or delete operations in response to Power Apps inputs, Databricks recommends to implement a Power Automate flow. To accomplish this, do the following:

  1. Create a canvas app using your Azure Databricks connection in Power Apps.

  2. Create a Power Automate flow using the Azure Databricks connection and use Power Apps as the trigger.

  3. In the Power Automate trigger, add the input fields that you want to pass from Power Apps to Power Automate.

  4. Create a collection object within Power Apps to collect all of your changes.

  5. Add the Power Automate flow to your canvas app.

  6. Call the Power Automate flow from your canvas app and iterate over the collection using a ForAll command.

    ForAll(collectionName, FlowName.Run(input field 1, input field 2, input field 3, …)
    

Concurrent writes

Row-level concurrency reduces conflicts between concurrent write operations by detecting changes at the row-level and automatically resolving conflicts that occur when concurrent writes update or delete different rows in the same data file.

Row-level concurrency is included in Databricks Runtime 14.2 or above. Row-level concurrency is supported by default for the following types of tables:

  • Tables with deletion vectors enabled and without partitioning
  • Tables with liquid clustering, unless deletion vectors are disabled

To enable deletion vectors, run the following SQL command:

ALTER TABLE table_name SET TBLPROPERTIES ('delta.enableDeletionVectors' = true);

For more information about concurrent write conflicts in Azure Databricks, see Isolation levels and write conflicts on Azure Databricks.

Add Azure Databricks to a data policy

By adding Azure Databricks to a Business data policy, Azure Databricks can't share data with connectors in other groups. This protects your data and prevents it from being shared with those who should not have access to it. For more information, see Manage data policies.

To add the Azure Databricks connector to a Power Platform data policy:

  1. From any Power Platform application, click the settings gear in the upper-right side, and select Admin Center.
  2. From the sidebar, click Policies > Data Policies.
  3. If you are using the new admin center, click Security > Data and Privacy > Data Policy.
  4. Click + New Policy or select an existing policy.
  5. If creating a new policy, enter a name.
  6. Select an environment to add to your policy and click + Add to policy above.
  7. Click Next.
  8. Search for and select the Azure Databricks connector.
  9. Click Move to Business and click Next.
  10. Review your policy and click Create policy.

Limitations

  • The Power Platform connector does not support government clouds.

Power App limitations

The following PowerFx formulas calculate values using only the data that has been retrieved locally:

Category Formula
Table function - GroupBy
- Distinct
Aggregation - CountRows
- StdevP
- StdevS

Creating a connection

The connector supports the following authentication types:

OAuth Connection OAuth Connection All regions Not shareable
Service Principal Connection Service Principal Connection All regions Shareable
Default [DEPRECATED] This option is only for older connections without an explicit authentication type, and is only provided for backward compatibility. All regions Not shareable

OAuth Connection

Auth ID: oauth2-auth

Applicable: All regions

OAuth Connection

This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.

Name Type Description Required
Server Hostname (Example: adb-3980263885549757139.2.azuredatabricks.net) string Server name of Databricks workspace True
HTTP Path (Example: /sql/1.0/warehouses/a9c4e781bd29f315) string HTTP Path of Databricks SQL Warehouse True

Service Principal Connection

Auth ID: oAuthClientCredentials

Applicable: All regions

Service Principal Connection

This is shareable connection. If the power app is shared with another user, connection is shared as well. For more information, please see the Connectors overview for canvas apps - Power Apps | Microsoft Docs

Name Type Description Required
Client ID string True
Client Secret securestring True
Tenant string True
Server Hostname (Example: adb-3980263885549757139.2.azuredatabricks.net) string Server name of Databricks workspace True
HTTP Path (Example: /sql/1.0/warehouses/a9c4e781bd29f315) string HTTP Path of Databricks SQL Warehouse True

Default [DEPRECATED]

Applicable: All regions

This option is only for older connections without an explicit authentication type, and is only provided for backward compatibility.

This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.

Throttling Limits

Name Calls Renewal Period
API calls per connection 100 60 seconds

Actions

Cancel a run

Cancels a job run or a task run. The run is canceled asynchronously, so it may still be running when this request completes.

Cancel statement execution

Requests that an executing statement be canceled. Callers must poll for status to see the terminal state.

Check status and get results

Get the status, manifest and results of the statement

Execute a SQL statement

Execute a SQL statement and optionally await its results for a specified time.

Get a single job run

Retrieves the metadata of a run. Large arrays in the results will be paginated when they exceed 100 elements. A request for a single run will return all properties for that run, and the first 100 elements of array properties (tasks, job_clusters, job_parameters and repair_history). Use the next_page_token field to check for more results and pass its value as the page_token in subsequent requests. If any array properties have more than 100 elements, additional results will be returned on subsequent requests. Arrays without additional results will be empty on later pages.

Get result by chunk index

After the statement execution has SUCCEEDED, this request can be used to fetch any chunk by index.

Get the output for a single run

Retrieve the output and metadata of a single task run. When a notebook task returns a value through the dbutils.notebook.exit() call, you can use this endpoint to retrieve that value. Azure Databricks restricts this API to returning the first 5 MB of the output. To return a larger result, you can store job results in a cloud storage service. This endpoint validates that the run_id parameter is valid and returns an HTTP status code 400 if the run_id parameter is invalid. Runs are automatically removed after 60 days. If you to want to reference them beyond 60 days, you must save old run results before they expire.

List jobs

Retrieves a list of jobs.

Trigger a new job run

Run a job and return the run_id of the triggered run.

Cancel a run

Cancels a job run or a task run. The run is canceled asynchronously, so it may still be running when this request completes.

Parameters

Name Key Required Type Description
run_id
run_id True integer

This field is required.

Cancel statement execution

Requests that an executing statement be canceled. Callers must poll for status to see the terminal state.

Parameters

Name Key Required Type Description
Statement ID
statement_id True string

Statement ID

Check status and get results

Get the status, manifest and results of the statement

Parameters

Name Key Required Type Description
Statement ID
statement_id True string

Statement ID

Returns

Statement execution response

Execute a SQL statement

Execute a SQL statement and optionally await its results for a specified time.

Parameters

Name Key Required Type Description
warehouse_id
warehouse_id True string

Target warehouse ID

statement
statement True string

The SQL statement to execute. The statement can optionally be parameterized, see parameters

name
name True string

Parameter marker name

type
type string

Parameter data type

value
value string

Parameter value

catalog
catalog string

Default catalog for execution

schema
schema string

Default schema for execution

disposition
disposition string

Result fetching mode

format
format string

Result set format

on_wait_timeout
on_wait_timeout string

Action on timeout

wait_timeout
wait_timeout string

Result wait timeout

byte_limit
byte_limit integer

Result byte limit

row_limit
row_limit integer

Result row limit

Returns

Statement execution response

Get a single job run

Retrieves the metadata of a run. Large arrays in the results will be paginated when they exceed 100 elements. A request for a single run will return all properties for that run, and the first 100 elements of array properties (tasks, job_clusters, job_parameters and repair_history). Use the next_page_token field to check for more results and pass its value as the page_token in subsequent requests. If any array properties have more than 100 elements, additional results will be returned on subsequent requests. Arrays without additional results will be empty on later pages.

Parameters

Name Key Required Type Description
Run ID
run_id True integer

The canonical identifier of the run for which to retrieve the metadata. This field is required.

Include History
include_history boolean

Whether to include the repair history in the response.

Include Resolved Values
include_resolved_values boolean

Whether to include resolved parameter values in the response.

Page Token
page_token string

Use next_page_token returned from the previous GetRun response to request the next page of the run's array properties.

Returns

Body
JobsRun

Get result by chunk index

After the statement execution has SUCCEEDED, this request can be used to fetch any chunk by index.

Parameters

Name Key Required Type Description
Statement ID
statement_id True string

Statement ID

Chunk index
chunk_index True string

Chunk index

Returns

Get the output for a single run

Retrieve the output and metadata of a single task run. When a notebook task returns a value through the dbutils.notebook.exit() call, you can use this endpoint to retrieve that value. Azure Databricks restricts this API to returning the first 5 MB of the output. To return a larger result, you can store job results in a cloud storage service. This endpoint validates that the run_id parameter is valid and returns an HTTP status code 400 if the run_id parameter is invalid. Runs are automatically removed after 60 days. If you to want to reference them beyond 60 days, you must save old run results before they expire.

Parameters

Name Key Required Type Description
Run ID
run_id True integer

The canonical identifier for the run.

Returns

List jobs

Retrieves a list of jobs.

Parameters

Name Key Required Type Description
Limit
limit integer

The number of jobs to return. This value must be greater than 0 and less or equal to 100. The default value is 20.

Expand Tasks
expand_tasks boolean

Whether to include task and cluster details in the response. Note that only the first 100 elements will be shown. Use :method:jobs/get to paginate through all tasks and clusters.

Job Name
name string

A filter on the list based on the exact (case insensitive) job name.

Page Token
page_token string

Use next_page_token or prev_page_token returned from the previous request to list the next or previous page of jobs respectively.

Returns

Trigger a new job run

Run a job and return the run_id of the triggered run.

Parameters

Name Key Required Type Description
idempotency_token
idempotency_token string

An optional token to guarantee the idempotency of job run requests. If a run with the provided token already exists, the request does not create a new run but returns the ID of the existing run instead. If a run with the provided token is deleted, an error is returned. If you specify the idempotency token, upon failure you can retry until the request succeeds. Azure Databricks guarantees that exactly one run is launched with that idempotency token. This token must have at most 64 characters. For more information, see How to ensure idempotency for jobs.

job_id
job_id True integer

The ID of the job to be executed

job_parameters
job_parameters object

Job-level parameters used in the run. for example "param": "overriding_val"

only
only array of string

A list of task keys to run inside of the job. If this field is not provided, all tasks in the job will be run.

performance_target
performance_target string
full_refresh
full_refresh boolean

If true, triggers a full refresh on the delta live table.

enabled
enabled True boolean

If true, enable queueing for the job. This is a required field.

Returns

Definitions

Object

SqlBaseChunkInfo

Metadata for a result set chunk

Name Path Type Description
byte_count
byte_count integer

Number of bytes in the result chunk

chunk_index
chunk_index integer

Position in the sequence of result set chunks

row_count
row_count integer

Number of rows in the result chunk

row_offset
row_offset integer

Starting row offset in the result set

SqlColumnInfo

Name Path Type Description
name
name string

Column name

position
position integer

Column position (0-based)

type_interval_type
type_interval_type string

Interval type format

type_name
type_name SqlColumnInfoTypeName

The name of the base data type. This doesn't include details for complex types such as STRUCT, MAP or ARRAY.

type_precision
type_precision integer

Number of digits for DECIMAL type

type_scale
type_scale integer

Number of decimal places for DECIMAL type

type_text
type_text string

Full SQL type specification

SqlColumnInfoTypeName

The name of the base data type. This doesn't include details for complex types such as STRUCT, MAP or ARRAY.

The name of the base data type. This doesn't include details for complex types such as STRUCT, MAP or ARRAY.

SqlStatementResponse

Statement execution response

Name Path Type Description
manifest
manifest SqlResultManifest

Result set schema and metadata

result
result SqlResultData
statement_id
statement_id string

Statement ID

status
status SqlStatementStatus

Statement execution status

SqlResultManifest

Result set schema and metadata

Name Path Type Description
chunks
chunks array of SqlBaseChunkInfo

Result chunk metadata

format
format string
schema
schema SqlResultSchema

Result set column definitions

total_byte_count
total_byte_count integer

Total bytes in result set

total_chunk_count
total_chunk_count integer

Total number of chunks

total_row_count
total_row_count integer

Total number of rows

truncated
truncated boolean

Result truncation status

SqlStatementStatus

Statement execution status

Name Path Type Description
error
error SqlServiceError
state
state SqlStatementState

Statement execution state

SqlStatementState

Statement execution state

Statement execution state

SqlServiceError

Name Path Type Description
error_code
error_code string
message
message string

Error message

SqlResultSchema

Result set column definitions

Name Path Type Description
column_count
column_count integer
columns
columns array of SqlColumnInfo

SqlResultData

Name Path Type Description
byte_count
byte_count integer

Bytes in result chunk

chunk_index
chunk_index integer

Chunk position

data_array
data_array SqlJsonArray

Array of arrays with string values

external_links
external_links array of SqlExternalLink
next_chunk_index
next_chunk_index integer

Next chunk index

next_chunk_internal_link
next_chunk_internal_link string

Next chunk link

row_count
row_count integer

Rows in chunk

row_offset
row_offset integer

Starting row offset

SqlJsonArray

Array of arrays with string values

Name Path Type Description
Items
array of
Name Path Type Description
byte_count
byte_count integer

Bytes in chunk

chunk_index
chunk_index integer

Chunk position

expiration
expiration date-time

Link expiration time

external_link
external_link string
http_headers
http_headers object

Required HTTP headers

next_chunk_index
next_chunk_index integer

Next chunk index

next_chunk_internal_link
next_chunk_internal_link string

Next chunk link

row_count
row_count integer

Rows in chunk

row_offset
row_offset integer

Starting row offset

JobsRunNowResponse

Name Path Type Description
run_id
run_id integer

The globally unique ID of the newly triggered run.

JobsPerformanceTarget

JobsPipelineParams

Name Path Type Description
full_refresh
full_refresh boolean

If true, triggers a full refresh on the delta live table.

JobsQueueSettings

Name Path Type Description
enabled
enabled boolean

If true, enable queueing for the job. This is a required field.

JobsListJobsResponse

Name Path Type Description
jobs
jobs array of JobsBaseJob

The list of jobs. Only included in the response if there are jobs to list.

next_page_token
next_page_token string

A token that can be used to list the next page of jobs (if applicable).

prev_page_token
prev_page_token string

A token that can be used to list the previous page of jobs (if applicable).

JobsBaseJob

Name Path Type Description
created_time
created_time integer

The time at which this job was created in epoch milliseconds (milliseconds since 1/1/1970 UTC).

creator_user_name
creator_user_name string

The creator user name. This field won’t be included in the response if the user has already been deleted.

effective_budget_policy_id
effective_budget_policy_id uuid

The id of the budget policy used by this job for cost attribution purposes. This may be set through (in order of precedence): 1. Budget admins through the account or workspace console 2. Jobs UI in the job details page and Jobs API using budget_policy_id 3. Inferred default based on accessible budget policies of the run_as identity on job creation or modification.

has_more
has_more boolean

Indicates if the job has more array properties (tasks, job_clusters) that are not shown. They can be accessed via :method:jobs/get endpoint. It is only relevant for API 2.2 :method:jobs/list requests with expand_tasks=true.

job_id
job_id integer

The canonical identifier for this job.

settings
settings JobsJobSettings
trigger_state
trigger_state JobsTriggerStateProto

JobsJobSettings

Name Path Type Description
budget_policy_id
budget_policy_id uuid

The id of the user specified budget policy to use for this job. If not specified, a default budget policy may be applied when creating or modifying the job. See effective_budget_policy_id for the budget policy used by this workload.

continuous
continuous JobsContinuous
deployment
deployment JobsJobDeployment
description
description string

An optional description for the job. The maximum length is 27700 characters in UTF-8 encoding.

edit_mode
edit_mode JobsJobEditMode
email_notifications
email_notifications JobsJobEmailNotifications
environments
environments array of JobsJobEnvironment

A list of task execution environment specifications that can be referenced by serverless tasks of this job. An environment is required to be present for serverless tasks. For serverless notebook tasks, the environment is accessible in the notebook environment panel. For other serverless tasks, the task environment is required to be specified using environment_key in the task settings.

git_source
git_source JobsGitSource
health
health JobsJobsHealthRules
job_clusters
job_clusters array of JobsJobCluster

A list of job cluster specifications that can be shared and reused by tasks of this job. Libraries cannot be declared in a shared job cluster. You must declare dependent libraries in task settings.

max_concurrent_runs
max_concurrent_runs integer

An optional maximum allowed number of concurrent runs of the job. Set this value if you want to be able to execute multiple runs of the same job concurrently. This is useful for example if you trigger your job on a frequent schedule and want to allow consecutive runs to overlap with each other, or if you want to trigger multiple runs which differ by their input parameters. This setting affects only new runs. For example, suppose the job’s concurrency is 4 and there are 4 concurrent active runs. Then setting the concurrency to 3 won’t kill any of the active runs. However, from then on, new runs are skipped unless there are fewer than 3 active runs. This value cannot exceed 1000. Setting this value to 0 causes all new runs to be skipped.

name
name string

An optional name for the job. The maximum length is 4096 bytes in UTF-8 encoding.

notification_settings
notification_settings JobsJobNotificationSettings
parameters
parameters array of JobsJobParameterDefinition

Job-level parameter definitions

performance_target
performance_target JobsPerformanceTarget
queue
queue JobsQueueSettings
run_as
run_as JobsJobRunAs
schedule
schedule JobsCronSchedule
tags
tags object

A map of tags associated with the job. These are forwarded to the cluster as cluster tags for jobs clusters, and are subject to the same limitations as cluster tags. A maximum of 25 tags can be added to the job.

tasks
tasks array of JobsTask

A list of task specifications to be executed by this job. It supports up to 1000 elements in write endpoints (:method:jobs/create, :method:jobs/reset, :method:jobs/update, :method:jobs/submit). Read endpoints return only 100 tasks. If more than 100 tasks are available, you can paginate through them using :method:jobs/get. Use the next_page_token field at the object root to determine if more results are available.

timeout_seconds
timeout_seconds integer

An optional timeout applied to each run of this job. A value of 0 means no timeout.

trigger
trigger JobsTriggerSettings
webhook_notifications
webhook_notifications JobsWebhookNotifications

JobsContinuous

Name Path Type Description
pause_status
pause_status JobsPauseStatus

JobsPauseStatus

JobsJobDeployment

Name Path Type Description
kind
kind JobsJobDeploymentKind
metadata_file_path
metadata_file_path string

Path of the file that contains deployment metadata.

JobsJobDeploymentKind

JobsJobEditMode

JobsJobEmailNotifications

Name Path Type Description
on_duration_warning_threshold_exceeded
on_duration_warning_threshold_exceeded array of string

A list of email addresses to be notified when the duration of a run exceeds the threshold specified for the RUN_DURATION_SECONDS metric in the health field. If no rule for the RUN_DURATION_SECONDS metric is specified in the health field for the job, notifications are not sent.

on_failure
on_failure array of string

A list of email addresses to be notified when a run unsuccessfully completes. A run is considered to have completed unsuccessfully if it ends with an INTERNAL_ERROR life_cycle_state or a FAILED, or TIMED_OUT result_state. If this is not specified on job creation, reset, or update the list is empty, and notifications are not sent.

on_start
on_start array of string

A list of email addresses to be notified when a run begins. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent.

on_streaming_backlog_exceeded
on_streaming_backlog_exceeded array of string

A list of email addresses to notify when any streaming backlog thresholds are exceeded for any stream. Streaming backlog thresholds can be set in the health field using the following metrics: STREAMING_BACKLOG_BYTES, STREAMING_BACKLOG_RECORDS, STREAMING_BACKLOG_SECONDS, or STREAMING_BACKLOG_FILES. Alerting is based on the 10-minute average of these metrics. If the issue persists, notifications are resent every 30 minutes.

on_success
on_success array of string

A list of email addresses to be notified when a run successfully completes. A run is considered to have completed successfully if it ends with a TERMINATED life_cycle_state and a SUCCESS result_state. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent.

JobsJobEnvironment

Name Path Type Description
environment_key
environment_key string

The key of an environment. It has to be unique within a job.

spec
spec ComputeEnvironment

ComputeEnvironment

Name Path Type Description
dependencies
dependencies array of string

List of pip dependencies, as supported by the version of pip in this environment. Each dependency is a valid pip requirements file line per https://pip.pypa.io/en/stable/reference/requirements-file-format/. Allowed dependencies include a requirement specifier, an archive URL, a local project path (such as WSFS or UC Volumes in Azure Databricks), or a VCS project URL.

environment_version
environment_version string

Required. Environment version used by the environment. Each version comes with a specific Python version and a set of Python packages. The version is a string, consisting of an integer. See https://learn.microsoft.com/azure/databricks/release-notes/serverless/#serverless-environment-versions.

JobsGitSource

Name Path Type Description
git_branch
git_branch string

Name of the branch to be checked out and used by this job. This field cannot be specified in conjunction with git_tag or git_commit.

git_commit
git_commit string

Commit to be checked out and used by this job. This field cannot be specified in conjunction with git_branch or git_tag.

git_provider
git_provider JobsGitProvider
git_snapshot
git_snapshot JobsGitSnapshot
git_tag
git_tag string

Name of the tag to be checked out and used by this job. This field cannot be specified in conjunction with git_branch or git_commit.

git_url
git_url string

URL of the repository to be cloned by this job.

JobsGitProvider

JobsGitSnapshot

Name Path Type Description
used_commit
used_commit string

Commit that was used to execute the run. If git_branch was specified, this points to the HEAD of the branch at the time of the run; if git_tag was specified, this points to the commit the tag points to.

JobsJobsHealthRules

Name Path Type Description
rules
rules array of JobsJobsHealthRule

JobsJobsHealthRule

Name Path Type Description
metric
metric JobsJobsHealthMetric
op
op JobsJobsHealthOperator
value
value integer

Specifies the threshold value that the health metric should obey to satisfy the health rule.

JobsJobsHealthMetric

JobsJobsHealthOperator

JobsJobCluster

Name Path Type Description
job_cluster_key
job_cluster_key string

A unique name for the job cluster. This field is required and must be unique within the job. JobTaskSettings may refer to this field to determine which cluster to launch for the task execution.

new_cluster
new_cluster ComputeClusterSpec

ComputeClusterSpec

Name Path Type Description
apply_policy_default_values
apply_policy_default_values boolean

When set to true, fixed and default values from the policy will be used for fields that are omitted. When set to false, only fixed values from the policy will be applied.

autoscale
autoscale ComputeAutoScale
autotermination_minutes
autotermination_minutes integer

Automatically terminates the cluster after it is inactive for this time in minutes. If not set, this cluster will not be automatically terminated. If specified, the threshold must be between 10 and 10000 minutes. Users can also set this value to 0 to explicitly disable automatic termination.

azure_attributes
azure_attributes ComputeAzureAttributes
cluster_log_conf
cluster_log_conf ComputeClusterLogConf
cluster_name
cluster_name string

Cluster name requested by the user. This doesn't have to be unique. If not specified at creation, the cluster name will be an empty string. For job clusters, the cluster name is automatically set based on the job and job run IDs.

custom_tags
custom_tags object

Additional tags for cluster resources. Azure Databricks will tag all cluster resources (e.g., AWS instances and EBS volumes) with these tags in addition to default_tags. Notes: - Currently, Azure Databricks allows at most 45 custom tags - Clusters can only reuse cloud resources if the resources' tags are a subset of the cluster tags

data_security_mode
data_security_mode ComputeDataSecurityMode
docker_image
docker_image ComputeDockerImage
driver_instance_pool_id
driver_instance_pool_id string

The optional ID of the instance pool for the driver of the cluster belongs. The pool cluster uses the instance pool with id (instance_pool_id) if the driver pool is not assigned.

driver_node_type_id
driver_node_type_id string

The node type of the Spark driver. Note that this field is optional; if unset, the driver node type will be set as the same value as node_type_id defined above. This field, along with node_type_id, should not be set if virtual_cluster_size is set. If both driver_node_type_id, node_type_id, and virtual_cluster_size are specified, driver_node_type_id and node_type_id take precedence.

enable_elastic_disk
enable_elastic_disk boolean

Autoscaling Local Storage: when enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space. This feature requires specific AWS permissions to function correctly - refer to the User Guide for more details.

enable_local_disk_encryption
enable_local_disk_encryption boolean

Whether to enable LUKS on cluster VMs' local disks

init_scripts
init_scripts array of ComputeInitScriptInfo

The configuration for storing init scripts. Any number of destinations can be specified. The scripts are executed sequentially in the order provided. If cluster_log_conf is specified, init script logs are sent to <destination>/<cluster-ID>/init_scripts.

instance_pool_id
instance_pool_id string

The optional ID of the instance pool to which the cluster belongs.

is_single_node
is_single_node boolean

This field can only be used when kind = CLASSIC_PREVIEW. When set to true, Azure Databricks will automatically set single node related custom_tags, spark_conf, and num_workers

kind
kind ComputeKind
node_type_id
node_type_id string

This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads. A list of available node types can be retrieved by using the :method:clusters/listNodeTypes API call.

num_workers
num_workers integer

Number of worker nodes that this cluster should have. A cluster has one Spark Driver and num_workers Executors for a total of num_workers + 1 Spark nodes. Note: When reading the properties of a cluster, this field reflects the desired number of workers rather than the actual current number of workers. For instance, if a cluster is resized from 5 to 10 workers, this field will immediately be updated to reflect the target size of 10 workers, whereas the workers listed in spark_info will gradually increase from 5 to 10 as the new nodes are provisioned.

policy_id
policy_id string

The ID of the cluster policy used to create the cluster if applicable.

runtime_engine
runtime_engine ComputeRuntimeEngine
single_user_name
single_user_name string

Single user name if data_security_mode is SINGLE_USER

spark_conf
spark_conf object

An object containing a set of optional, user-specified Spark configuration key-value pairs. Users can also pass in a string of extra JVM options to the driver and the executors via spark.driver.extraJavaOptions and spark.executor.extraJavaOptions respectively.

spark_env_vars
spark_env_vars object

An object containing a set of optional, user-specified environment variable key-value pairs. Please note that key-value pair of the form (X,Y) will be exported as is (i.e., export X='Y') while launching the driver and workers. In order to specify an additional set of SPARK_DAEMON_JAVA_OPTS, we recommend appending them to $SPARK_DAEMON_JAVA_OPTS as shown in the example below. This ensures that all default databricks managed environmental variables are included as well. Example Spark environment variables: {"SPARK_WORKER_MEMORY": "28000m", "SPARK_LOCAL_DIRS": "/local_disk0"} or {"SPARK_DAEMON_JAVA_OPTS": "$SPARK_DAEMON_JAVA_OPTS -Dspark.shuffle.service.enabled=true"}

spark_version
spark_version string

The Spark version of the cluster, e.g. 3.3.x-scala2.11. A list of available Spark versions can be retrieved by using the :method:clusters/sparkVersions API call.

ssh_public_keys
ssh_public_keys array of string

SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. Up to 10 keys can be specified.

use_ml_runtime
use_ml_runtime boolean

This field can only be used when kind = CLASSIC_PREVIEW. effective_spark_version is determined by spark_version (DBR release), this field use_ml_runtime, and whether node_type_id is gpu node or not.

workload_type
workload_type ComputeWorkloadType

ComputeAutoScale

Name Path Type Description
max_workers
max_workers integer

The maximum number of workers to which the cluster can scale up when overloaded. Note that max_workers must be strictly greater than min_workers.

min_workers
min_workers integer

The minimum number of workers to which the cluster can scale down when underutilized. It is also the initial number of workers the cluster will have after creation.

ComputeAzureAttributes

Name Path Type Description
availability
availability ComputeAzureAvailability
first_on_demand
first_on_demand integer

The first first_on_demand nodes of the cluster will be placed on on-demand instances. This value should be greater than 0, to make sure the cluster driver node is placed on an on-demand instance. If this value is greater than or equal to the current cluster size, all nodes will be placed on on-demand instances. If this value is less than the current cluster size, first_on_demand nodes will be placed on on-demand instances and the remainder will be placed on availability instances. Note that this value does not affect cluster size and cannot currently be mutated over the lifetime of a cluster.

log_analytics_info
log_analytics_info ComputeLogAnalyticsInfo
spot_bid_max_price
spot_bid_max_price double

The max bid price to be used for Azure spot instances. The Max price for the bid cannot be higher than the on-demand price of the instance. If not specified, the default value is -1, which specifies that the instance cannot be evicted on the basis of price, and only on the basis of availability. Further, the value should > 0 or -1.

ComputeAzureAvailability

ComputeLogAnalyticsInfo

Name Path Type Description
log_analytics_primary_key
log_analytics_primary_key string
log_analytics_workspace_id
log_analytics_workspace_id string

ComputeClusterLogConf

Name Path Type Description
dbfs
dbfs ComputeDbfsStorageInfo
volumes
volumes ComputeVolumesStorageInfo

ComputeDbfsStorageInfo

Name Path Type Description
destination
destination string

dbfs destination, e.g. dbfs:/my/path

ComputeVolumesStorageInfo

Name Path Type Description
destination
destination string

UC Volumes destination, e.g. /Volumes/catalog/schema/vol1/init-scripts/setup-datadog.sh or dbfs:/Volumes/catalog/schema/vol1/init-scripts/setup-datadog.sh

ComputeDataSecurityMode

ComputeDockerImage

Name Path Type Description
basic_auth
basic_auth ComputeDockerBasicAuth
url
url string

URL of the docker image.

ComputeDockerBasicAuth

Name Path Type Description
password
password string

Password of the user

username
username string

Name of the user

ComputeInitScriptInfo

Name Path Type Description
abfss
abfss ComputeAdlsgen2Info
file
file ComputeLocalFileInfo
gcs
gcs ComputeGcsStorageInfo
volumes
volumes ComputeVolumesStorageInfo
workspace
workspace ComputeWorkspaceStorageInfo

ComputeAdlsgen2Info

Name Path Type Description
destination
destination string

abfss destination, e.g. abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<directory-name>.

ComputeLocalFileInfo

Name Path Type Description
destination
destination string

local file destination, e.g. file:/my/local/file.sh

ComputeGcsStorageInfo

Name Path Type Description
destination
destination string

GCS destination/URI, e.g. gs://my-bucket/some-prefix

ComputeWorkspaceStorageInfo

Name Path Type Description
destination
destination string

wsfs destination, e.g. workspace:/cluster-init-scripts/setup-datadog.sh

ComputeKind

ComputeRuntimeEngine

ComputeWorkloadType

Name Path Type Description
clients
clients ComputeClientsTypes

ComputeClientsTypes

Name Path Type Description
jobs
jobs boolean

With jobs set, the cluster can be used for jobs

notebooks
notebooks boolean

With notebooks set, this cluster can be used for notebooks

JobsJobNotificationSettings

Name Path Type Description
no_alert_for_canceled_runs
no_alert_for_canceled_runs boolean

If true, do not send notifications to recipients specified in on_failure if the run is canceled.

no_alert_for_skipped_runs
no_alert_for_skipped_runs boolean

If true, do not send notifications to recipients specified in on_failure if the run is skipped.

JobsJobParameterDefinition

Name Path Type Description
default
default string

Default value of the parameter.

name
name string

The name of the defined parameter. May only contain alphanumeric characters, _, -, and .

JobsJobRunAs

Name Path Type Description
service_principal_name
service_principal_name string

Application ID of an active service principal. Setting this field requires the servicePrincipal/user role.

user_name
user_name string

The email of an active workspace user. Non-admin users can only set this field to their own email.

JobsCronSchedule

Name Path Type Description
pause_status
pause_status JobsPauseStatus
quartz_cron_expression
quartz_cron_expression string

A Cron expression using Quartz syntax that describes the schedule for a job. See Cron Trigger for details. This field is required.

timezone_id
timezone_id string

A Java timezone ID. The schedule for a job is resolved with respect to this timezone. See Java TimeZone for details. This field is required.

JobsTask

Name Path Type Description
clean_rooms_notebook_task
clean_rooms_notebook_task Object
condition_task
condition_task JobsConditionTask
dashboard_task
dashboard_task JobsDashboardTask
dbt_task
dbt_task Object
depends_on
depends_on array of JobsTaskDependency

An optional array of objects specifying the dependency graph of the task. All tasks specified in this field must complete before executing this task. The task will run only if the run_if condition is true. The key is task_key, and the value is the name assigned to the dependent task.

description
description string

An optional description for this task.

disable_auto_optimization
disable_auto_optimization boolean

An option to disable auto optimization in serverless

email_notifications
email_notifications JobsTaskEmailNotifications
environment_key
environment_key string

The key that references an environment spec in a job. This field is required for Python script, Python wheel and dbt tasks when using serverless compute.

existing_cluster_id
existing_cluster_id string

If existing_cluster_id, the ID of an existing cluster that is used for all runs. When running jobs or tasks on an existing cluster, you may need to manually restart the cluster if it stops responding. We suggest running jobs and tasks on new clusters for greater reliability

for_each_task
for_each_task JobsForEachTask
health
health JobsJobsHealthRules
job_cluster_key
job_cluster_key string

If job_cluster_key, this task is executed reusing the cluster specified in job.settings.job_clusters.

libraries
libraries array of ComputeLibrary

An optional list of libraries to be installed on the cluster. The default value is an empty list.

max_retries
max_retries integer

An optional maximum number of times to retry an unsuccessful run. A run is considered to be unsuccessful if it completes with the FAILED result_state or INTERNAL_ERROR life_cycle_state. The value -1 means to retry indefinitely and the value 0 means to never retry.

min_retry_interval_millis
min_retry_interval_millis integer

An optional minimal interval in milliseconds between the start of the failed run and the subsequent retry run. The default behavior is that unsuccessful runs are immediately retried.

new_cluster
new_cluster ComputeClusterSpec
notebook_task
notebook_task JobsNotebookTask
notification_settings
notification_settings JobsTaskNotificationSettings
pipeline_task
pipeline_task JobsPipelineTask
power_bi_task
power_bi_task Object
python_wheel_task
python_wheel_task JobsPythonWheelTask
retry_on_timeout
retry_on_timeout boolean

An optional policy to specify whether to retry a job when it times out. The default behavior is to not retry on timeout.

run_if
run_if JobsRunIf
run_job_task
run_job_task JobsRunJobTask
spark_jar_task
spark_jar_task JobsSparkJarTask
spark_python_task
spark_python_task JobsSparkPythonTask
spark_submit_task
spark_submit_task JobsSparkSubmitTask
sql_task
sql_task Object
task_key
task_key string

A unique name for the task. This field is used to refer to this task from other tasks. This field is required and must be unique within its parent job. On Update or Reset, this field is used to reference the tasks to be updated or reset.

timeout_seconds
timeout_seconds integer

An optional timeout applied to each run of this job task. A value of 0 means no timeout.

webhook_notifications
webhook_notifications JobsWebhookNotifications

JobsConditionTask

Name Path Type Description
left
left string

The left operand of the condition task. Can be either a string value or a job state or parameter reference.

op
op JobsConditionTaskOp
right
right string

The right operand of the condition task. Can be either a string value or a job state or parameter reference.

JobsConditionTaskOp

JobsDashboardTask

Name Path Type Description
dashboard_id
dashboard_id string

The identifier of the dashboard to refresh.

subscription
subscription JobsSubscription
warehouse_id
warehouse_id string

Optional: The warehouse id to execute the dashboard with for the schedule. If not specified, the default warehouse of the dashboard will be used.

JobsSubscription

Name Path Type Description
custom_subject
custom_subject string

Optional: Allows users to specify a custom subject line on the email sent to subscribers.

paused
paused boolean

When true, the subscription will not send emails.

subscribers
subscribers array of JobsSubscriptionSubscriber

The list of subscribers to send the snapshot of the dashboard to.

JobsSubscriptionSubscriber

Name Path Type Description
destination_id
destination_id string

A snapshot of the dashboard will be sent to the destination when the destination_id field is present.

user_name
user_name string

A snapshot of the dashboard will be sent to the user's email when the user_name field is present.

JobsSource

JobsTaskDependency

Name Path Type Description
outcome
outcome string

Can only be specified on condition task dependencies. The outcome of the dependent task that must be met for this task to run.

task_key
task_key string

The name of the task this task depends on.

JobsTaskEmailNotifications

Name Path Type Description
on_duration_warning_threshold_exceeded
on_duration_warning_threshold_exceeded array of string

A list of email addresses to be notified when the duration of a run exceeds the threshold specified for the RUN_DURATION_SECONDS metric in the health field. If no rule for the RUN_DURATION_SECONDS metric is specified in the health field for the job, notifications are not sent.

on_failure
on_failure array of string

A list of email addresses to be notified when a run unsuccessfully completes. A run is considered to have completed unsuccessfully if it ends with an INTERNAL_ERROR life_cycle_state or a FAILED, or TIMED_OUT result_state. If this is not specified on job creation, reset, or update the list is empty, and notifications are not sent.

on_start
on_start array of string

A list of email addresses to be notified when a run begins. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent.

on_streaming_backlog_exceeded
on_streaming_backlog_exceeded array of string

A list of email addresses to notify when any streaming backlog thresholds are exceeded for any stream. Streaming backlog thresholds can be set in the health field using the following metrics: STREAMING_BACKLOG_BYTES, STREAMING_BACKLOG_RECORDS, STREAMING_BACKLOG_SECONDS, or STREAMING_BACKLOG_FILES. Alerting is based on the 10-minute average of these metrics. If the issue persists, notifications are resent every 30 minutes.

on_success
on_success array of string

A list of email addresses to be notified when a run successfully completes. A run is considered to have completed successfully if it ends with a TERMINATED life_cycle_state and a SUCCESS result_state. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent.

ComputeLibrary

Name Path Type Description
cran
cran ComputeRCranLibrary
jar
jar string

URI of the JAR library to install. Supported URIs include Workspace paths, Unity Catalog Volumes paths, and ADLS URIs. For example: { "jar": "/Workspace/path/to/library.jar" }, { "jar" : "/Volumes/path/to/library.jar" } or { "jar": "abfss://my-bucket/library.jar" }. If ADLS is used, please make sure the cluster has read access on the library. You may need to launch the cluster with a Microsoft Entra ID service principal to access the ADLS URI.

maven
maven ComputeMavenLibrary
pypi
pypi ComputePythonPyPiLibrary
requirements
requirements string

URI of the requirements.txt file to install. Only Workspace paths and Unity Catalog Volumes paths are supported. For example: { "requirements": "/Workspace/path/to/requirements.txt" } or { "requirements" : "/Volumes/path/to/requirements.txt" }

whl
whl string

URI of the wheel library to install. Supported URIs include Workspace paths, Unity Catalog Volumes paths, and ADLS URIs. For example: { "whl": "/Workspace/path/to/library.whl" }, { "whl" : "/Volumes/path/to/library.whl" } or { "whl": "abfss://my-bucket/library.whl" }. If ADLS is used, please make sure the cluster has read access on the library. You may need to launch the cluster with a Microsoft Entra ID service principal to access the ADLS URI.

JobsForEachTask

Name Path Type Description
concurrency
concurrency integer

An optional maximum allowed number of concurrent runs of the task. Set this value if you want to be able to execute multiple runs of the task concurrently.

inputs
inputs string

Array for task to iterate on. This can be a JSON string or a reference to an array parameter.

task
task Object

ComputeRCranLibrary

Name Path Type Description
package
package string

The name of the CRAN package to install.

repo
repo string

The repository where the package can be found. If not specified, the default CRAN repo is used.

ComputeMavenLibrary

Name Path Type Description
coordinates
coordinates string

Gradle-style maven coordinates. For example: "org.jsoup:jsoup:1.7.2".

exclusions
exclusions array of string

List of dependences to exclude. For example: ["slf4j:slf4j", "*:hadoop-client"]. Maven dependency exclusions: https://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html.

repo
repo string

Maven repo to install the Maven package from. If omitted, both Maven Central Repository and Spark Packages are searched.

ComputePythonPyPiLibrary

Name Path Type Description
package
package string

The name of the pypi package to install. An optional exact version specification is also supported. Examples: "simplejson" and "simplejson==3.8.0".

repo
repo string

The repository where the package can be found. If not specified, the default pip index is used.

JobsNotebookTask

Name Path Type Description
base_parameters
base_parameters object

Base parameters to be used for each run of this job. If the run is initiated by a call to :method:jobs/run Now with parameters specified, the two parameters maps are merged. If the same key is specified in base_parameters and in run-now, the value from run-now is used. Use Task parameter variables to set parameters containing information about job runs. If the notebook takes a parameter that is not specified in the job’s base_parameters or the run-now override parameters, the default value from the notebook is used. Retrieve these parameters in a notebook using dbutils.widgets.get. The JSON representation of this field cannot exceed 1MB.

notebook_path
notebook_path string

The path of the notebook to be run in the Azure Databricks workspace or remote repository. For notebooks stored in the Azure Databricks workspace, the path must be absolute and begin with a slash. For notebooks stored in a remote repository, the path must be relative. This field is required.

source
source JobsSource
warehouse_id
warehouse_id string

Optional warehouse_id to run the notebook on a SQL warehouse. Classic SQL warehouses are NOT supported, please use serverless or pro SQL warehouses. Note that SQL warehouses only support SQL cells; if the notebook contains non-SQL cells, the run will fail.

JobsTaskNotificationSettings

Name Path Type Description
alert_on_last_attempt
alert_on_last_attempt boolean

If true, do not send notifications to recipients specified in on_start for the retried runs and do not send notifications to recipients specified in on_failure until the last retry of the run.

no_alert_for_canceled_runs
no_alert_for_canceled_runs boolean

If true, do not send notifications to recipients specified in on_failure if the run is canceled.

no_alert_for_skipped_runs
no_alert_for_skipped_runs boolean

If true, do not send notifications to recipients specified in on_failure if the run is skipped.

JobsPipelineTask

Name Path Type Description
full_refresh
full_refresh boolean

If true, triggers a full refresh on the delta live table.

pipeline_id
pipeline_id string

The full name of the pipeline task to execute.

JobsPythonWheelTask

Name Path Type Description
entry_point
entry_point string

Named entry point to use, if it does not exist in the metadata of the package it executes the function from the package directly using $packageName.$entryPoint()

named_parameters
named_parameters object

Command-line parameters passed to Python wheel task in the form of ["--name=task", "--data=dbfs:/path/to/data.json"]. Leave it empty if parameters is not null.

package_name
package_name string

Name of the package to execute

parameters
parameters array of string

Command-line parameters passed to Python wheel task. Leave it empty if named_parameters is not null.

JobsRunIf

JobsRunJobTask

Name Path Type Description
job_id
job_id integer

ID of the job to trigger.

job_parameters
job_parameters object

Job-level parameters used to trigger the job.

pipeline_params
pipeline_params JobsPipelineParams

JobsSparkJarTask

Name Path Type Description
main_class_name
main_class_name string

The full name of the class containing the main method to be executed. This class must be contained in a JAR provided as a library. The code must use SparkContext.getOrCreate to obtain a Spark context; otherwise, runs of the job fail.

parameters
parameters array of string

Parameters passed to the main method. Use Task parameter variables to set parameters containing information about job runs.

JobsSparkPythonTask

Name Path Type Description
parameters
parameters array of string

Command line parameters passed to the Python file. Use Task parameter variables to set parameters containing information about job runs.

python_file
python_file string

The Python file to be executed. Cloud file URIs (such as dbfs:/, s3:/, adls:/, gcs:/) and workspace paths are supported. For python files stored in the Azure Databricks workspace, the path must be absolute and begin with /. For files stored in a remote repository, the path must be relative. This field is required.

source
source JobsSource

JobsSparkSubmitTask

Name Path Type Description
parameters
parameters array of string

Command-line parameters passed to spark submit. Use Task parameter variables to set parameters containing information about job runs.

JobsWebhookNotifications

Name Path Type Description
on_duration_warning_threshold_exceeded
on_duration_warning_threshold_exceeded array of JobsWebhook

An optional list of system notification IDs to call when the duration of a run exceeds the threshold specified for the RUN_DURATION_SECONDS metric in the health field. A maximum of 3 destinations can be specified for the on_duration_warning_threshold_exceeded property.

on_failure
on_failure array of JobsWebhook

An optional list of system notification IDs to call when the run fails. A maximum of 3 destinations can be specified for the on_failure property.

on_start
on_start array of JobsWebhook

An optional list of system notification IDs to call when the run starts. A maximum of 3 destinations can be specified for the on_start property.

on_streaming_backlog_exceeded
on_streaming_backlog_exceeded array of JobsWebhook

An optional list of system notification IDs to call when any streaming backlog thresholds are exceeded for any stream. Streaming backlog thresholds can be set in the health field using the following metrics: STREAMING_BACKLOG_BYTES, STREAMING_BACKLOG_RECORDS, STREAMING_BACKLOG_SECONDS, or STREAMING_BACKLOG_FILES. Alerting is based on the 10-minute average of these metrics. If the issue persists, notifications are resent every 30 minutes. A maximum of 3 destinations can be specified for the on_streaming_backlog_exceeded property.

on_success
on_success array of JobsWebhook

An optional list of system notification IDs to call when the run completes successfully. A maximum of 3 destinations can be specified for the on_success property.

JobsWebhook

Name Path Type Description
id
id string

JobsTriggerSettings

Name Path Type Description
file_arrival
file_arrival JobsFileArrivalTriggerConfiguration
pause_status
pause_status JobsPauseStatus
periodic
periodic JobsPeriodicTriggerConfiguration

JobsFileArrivalTriggerConfiguration

Name Path Type Description
min_time_between_triggers_seconds
min_time_between_triggers_seconds integer

If set, the trigger starts a run only after the specified amount of time passed since the last time the trigger fired. The minimum allowed value is 60 seconds

url
url string

URL to be monitored for file arrivals. The path must point to the root or a subpath of the external location.

wait_after_last_change_seconds
wait_after_last_change_seconds integer

If set, the trigger starts a run only after no file activity has occurred for the specified amount of time. This makes it possible to wait for a batch of incoming files to arrive before triggering a run. The minimum allowed value is 60 seconds.

JobsPeriodicTriggerConfiguration

Name Path Type Description
interval
interval integer

The interval at which the trigger should run.

unit
unit JobsPeriodicTriggerConfigurationTimeUnit

JobsPeriodicTriggerConfigurationTimeUnit

JobsTriggerStateProto

Name Path Type Description
file_arrival
file_arrival JobsFileArrivalTriggerState

JobsFileArrivalTriggerState

Name Path Type Description
using_file_events
using_file_events boolean

Indicates whether the trigger leverages file events to detect file arrivals.

JobsRun

Name Path Type Description
attempt_number
attempt_number integer

The sequence number of this run attempt for a triggered job run. The initial attempt of a run has an attempt_number of 0. If the initial run attempt fails, and the job has a retry policy (max_retries > 0), subsequent runs are created with an original_attempt_run_id of the original attempt’s ID and an incrementing attempt_number. Runs are retried only until they succeed, and the maximum attempt_number is the same as the max_retries value for the job.

cleanup_duration
cleanup_duration integer

The time in milliseconds it took to terminate the cluster and clean up any associated artifacts. The duration of a task run is the sum of the setup_duration, execution_duration, and the cleanup_duration. The cleanup_duration field is set to 0 for multitask job runs. The total duration of a multitask job run is the value of the run_duration field.

cluster_instance
cluster_instance JobsClusterInstance
cluster_spec
cluster_spec JobsClusterSpec
creator_user_name
creator_user_name string

The creator user name. This field won’t be included in the response if the user has already been deleted.

description
description string

Description of the run

effective_performance_target
effective_performance_target JobsPerformanceTarget
end_time
end_time integer

The time at which this run ended in epoch milliseconds (milliseconds since 1/1/1970 UTC). This field is set to 0 if the job is still running.

execution_duration
execution_duration integer

The time in milliseconds it took to execute the commands in the JAR or notebook until they completed, failed, timed out, were cancelled, or encountered an unexpected error. The duration of a task run is the sum of the setup_duration, execution_duration, and the cleanup_duration. The execution_duration field is set to 0 for multitask job runs. The total duration of a multitask job run is the value of the run_duration field.

git_source
git_source JobsGitSource
has_more
has_more boolean

Indicates if the run has more array properties (tasks, job_clusters) that are not shown. They can be accessed via :method:jobs/getrun endpoint. It is only relevant for API 2.2 :method:jobs/listruns requests with expand_tasks=true.

job_clusters
job_clusters array of JobsJobCluster

A list of job cluster specifications that can be shared and reused by tasks of this job. Libraries cannot be declared in a shared job cluster. You must declare dependent libraries in task settings. If more than 100 job clusters are available, you can paginate through them using :method:jobs/getrun.

job_id
job_id integer

The canonical identifier of the job that contains this run.

job_parameters
job_parameters array of JobsJobParameter

Job-level parameters used in the run

job_run_id
job_run_id integer

ID of the job run that this run belongs to. For legacy and single-task job runs the field is populated with the job run ID. For task runs, the field is populated with the ID of the job run that the task run belongs to.

next_page_token
next_page_token string

A token that can be used to list the next page of array properties.

original_attempt_run_id
original_attempt_run_id integer

If this run is a retry of a prior run attempt, this field contains the run_id of the original attempt; otherwise, it is the same as the run_id.

overriding_parameters
overriding_parameters JobsRunParameters
queue_duration
queue_duration integer

The time in milliseconds that the run has spent in the queue.

repair_history
repair_history array of JobsRepairHistoryItem

The repair history of the run.

run_duration
run_duration integer

The time in milliseconds it took the job run and all of its repairs to finish.

run_id
run_id integer

The canonical identifier of the run. This ID is unique across all runs of all jobs.

run_name
run_name string

An optional name for the run. The maximum length is 4096 bytes in UTF-8 encoding.

run_page_url
run_page_url string

The URL to the detail page of the run.

run_type
run_type JobsRunType
schedule
schedule JobsCronSchedule
setup_duration
setup_duration integer

The time in milliseconds it took to set up the cluster. For runs that run on new clusters this is the cluster creation time, for runs that run on existing clusters this time should be very short. The duration of a task run is the sum of the setup_duration, execution_duration, and the cleanup_duration. The setup_duration field is set to 0 for multitask job runs. The total duration of a multitask job run is the value of the run_duration field.

start_time
start_time integer

The time at which this run was started in epoch milliseconds (milliseconds since 1/1/1970 UTC). This may not be the time when the job task starts executing, for example, if the job is scheduled to run on a new cluster, this is the time the cluster creation call is issued.

status
status JobsRunStatus
tasks
tasks array of JobsRunTask

The list of tasks performed by the run. Each task has its own run_id which you can use to call JobsGetOutput to retrieve the run resutls. If more than 100 tasks are available, you can paginate through them using :method:jobs/getrun. Use the next_page_token field at the object root to determine if more results are available.

trigger
trigger JobsTriggerType
trigger_info
trigger_info JobsTriggerInfo

JobsClusterInstance

Name Path Type Description
cluster_id
cluster_id string

The canonical identifier for the cluster used by a run. This field is always available for runs on existing clusters. For runs on new clusters, it becomes available once the cluster is created. This value can be used to view logs by browsing to /#setting/sparkui/$cluster_id/driver-logs. The logs continue to be available after the run completes. The response won’t include this field if the identifier is not available yet.

spark_context_id
spark_context_id string

The canonical identifier for the Spark context used by a run. This field is filled in once the run begins execution. This value can be used to view the Spark UI by browsing to /#setting/sparkui/$cluster_id/$spark_context_id. The Spark UI continues to be available after the run has completed. The response won’t include this field if the identifier is not available yet.

JobsClusterSpec

Name Path Type Description
existing_cluster_id
existing_cluster_id string

If existing_cluster_id, the ID of an existing cluster that is used for all runs. When running jobs or tasks on an existing cluster, you may need to manually restart the cluster if it stops responding. We suggest running jobs and tasks on new clusters for greater reliability

job_cluster_key
job_cluster_key string

If job_cluster_key, this task is executed reusing the cluster specified in job.settings.job_clusters.

libraries
libraries array of ComputeLibrary

An optional list of libraries to be installed on the cluster. The default value is an empty list.

new_cluster
new_cluster ComputeClusterSpec

JobsJobParameter

Name Path Type Description
default
default string

The optional default value of the parameter

name
name string

The name of the parameter

value
value string

The value used in the run

JobsRunParameters

Name Path Type Description
pipeline_params
pipeline_params JobsPipelineParams

JobsRepairHistoryItem

Name Path Type Description
effective_performance_target
effective_performance_target JobsPerformanceTarget
end_time
end_time integer

The end time of the (repaired) run.

id
id integer

The ID of the repair. Only returned for the items that represent a repair in repair_history.

start_time
start_time integer

The start time of the (repaired) run.

status
status JobsRunStatus
task_run_ids
task_run_ids array of integer

The run IDs of the task runs that ran as part of this repair history item.

type
type JobsRepairHistoryItemType

JobsRunStatus

Name Path Type Description
queue_details
queue_details JobsQueueDetails
state
state JobsRunLifecycleStateV2State
termination_details
termination_details JobsTerminationDetails

JobsQueueDetails

Name Path Type Description
code
code JobsQueueDetailsCodeCode
message
message string

A descriptive message with the queuing details. This field is unstructured, and its exact format is subject to change.

JobsQueueDetailsCodeCode

JobsRunLifecycleStateV2State

JobsTerminationDetails

Name Path Type Description
code
code JobsTerminationCodeCode
message
message string

A descriptive message with the termination details. This field is unstructured and the format might change.

type
type JobsTerminationTypeType

JobsTerminationCodeCode

JobsTerminationTypeType

JobsRepairHistoryItemType

JobsRunType

JobsRunTask

Name Path Type Description
attempt_number
attempt_number integer

The sequence number of this run attempt for a triggered job run. The initial attempt of a run has an attempt_number of 0. If the initial run attempt fails, and the job has a retry policy (max_retries > 0), subsequent runs are created with an original_attempt_run_id of the original attempt’s ID and an incrementing attempt_number. Runs are retried only until they succeed, and the maximum attempt_number is the same as the max_retries value for the job.

clean_rooms_notebook_task
clean_rooms_notebook_task Object
cleanup_duration
cleanup_duration integer

The time in milliseconds it took to terminate the cluster and clean up any associated artifacts. The duration of a task run is the sum of the setup_duration, execution_duration, and the cleanup_duration. The cleanup_duration field is set to 0 for multitask job runs. The total duration of a multitask job run is the value of the run_duration field.

cluster_instance
cluster_instance JobsClusterInstance
condition_task
condition_task JobsRunConditionTask
dashboard_task
dashboard_task Object
dbt_task
dbt_task Object
depends_on
depends_on array of JobsTaskDependency

An optional array of objects specifying the dependency graph of the task. All tasks specified in this field must complete successfully before executing this task. The key is task_key, and the value is the name assigned to the dependent task.

description
description string

An optional description for this task.

effective_performance_target
effective_performance_target JobsPerformanceTarget
email_notifications
email_notifications JobsJobEmailNotifications
end_time
end_time integer

The time at which this run ended in epoch milliseconds (milliseconds since 1/1/1970 UTC). This field is set to 0 if the job is still running.

environment_key
environment_key string

The key that references an environment spec in a job. This field is required for Python script, Python wheel and dbt tasks when using serverless compute.

execution_duration
execution_duration integer

The time in milliseconds it took to execute the commands in the JAR or notebook until they completed, failed, timed out, were cancelled, or encountered an unexpected error. The duration of a task run is the sum of the setup_duration, execution_duration, and the cleanup_duration. The execution_duration field is set to 0 for multitask job runs. The total duration of a multitask job run is the value of the run_duration field.

existing_cluster_id
existing_cluster_id string

If existing_cluster_id, the ID of an existing cluster that is used for all runs. When running jobs or tasks on an existing cluster, you may need to manually restart the cluster if it stops responding. We suggest running jobs and tasks on new clusters for greater reliability

for_each_task
for_each_task Object
git_source
git_source JobsGitSource
job_cluster_key
job_cluster_key string

If job_cluster_key, this task is executed reusing the cluster specified in job.settings.job_clusters.

libraries
libraries array of Object

An optional list of libraries to be installed on the cluster. The default value is an empty list.

new_cluster
new_cluster Object
notebook_task
notebook_task JobsNotebookTask
notification_settings
notification_settings Object
pipeline_task
pipeline_task Object
power_bi_task
power_bi_task Object
python_wheel_task
python_wheel_task Object
queue_duration
queue_duration integer

The time in milliseconds that the run has spent in the queue.

resolved_values
resolved_values JobsResolvedValues
run_duration
run_duration integer

The time in milliseconds it took the job run and all of its repairs to finish.

run_id
run_id integer

The ID of the task run.

run_if
run_if JobsRunIf
run_job_task
run_job_task JobsRunJobTask
run_page_url
run_page_url string
setup_duration
setup_duration integer

The time in milliseconds it took to set up the cluster. For runs that run on new clusters this is the cluster creation time, for runs that run on existing clusters this time should be very short. The duration of a task run is the sum of the setup_duration, execution_duration, and the cleanup_duration. The setup_duration field is set to 0 for multitask job runs. The total duration of a multitask job run is the value of the run_duration field.

spark_jar_task
spark_jar_task Object
spark_python_task
spark_python_task Object
spark_submit_task
spark_submit_task Object
sql_task
sql_task Object
start_time
start_time integer

The time at which this run was started in epoch milliseconds (milliseconds since 1/1/1970 UTC). This may not be the time when the job task starts executing, for example, if the job is scheduled to run on a new cluster, this is the time the cluster creation call is issued.

status
status JobsRunStatus
task_key
task_key string

A unique name for the task. This field is used to refer to this task from other tasks. This field is required and must be unique within its parent job. On Update or Reset, this field is used to reference the tasks to be updated or reset.

timeout_seconds
timeout_seconds integer

An optional timeout applied to each run of this job task. A value of 0 means no timeout.

webhook_notifications
webhook_notifications Object

JobsRunConditionTask

Name Path Type Description
left
left string

The left operand of the condition task. Can be either a string value or a job state or parameter reference.

op
op JobsConditionTaskOp
outcome
outcome string

The condition expression evaluation result. Filled in if the task was successfully completed. Can be "true" or "false"

right
right string

The right operand of the condition task. Can be either a string value or a job state or parameter reference.

JobsTriggerType

JobsTriggerInfo

Name Path Type Description
run_id
run_id integer

The run id of the Run Job task run

JobsRunOutput

Name Path Type Description
clean_rooms_notebook_output
clean_rooms_notebook_output Object
dashboard_output
dashboard_output Object
dbt_output
dbt_output Object
error
error string

An error message indicating why a task failed or why output is not available. The message is unstructured, and its exact format is subject to change.

error_trace
error_trace string

If there was an error executing the run, this field contains any available stack traces.

info
info string
logs
logs string

The output from tasks that write to standard streams (stdout/stderr) such as spark_jar_task, spark_python_task, python_wheel_task. It's not supported for the notebook_task, pipeline_task or spark_submit_task. Azure Databricks restricts this API to return the last 5 MB of these logs.

logs_truncated
logs_truncated boolean

Whether the logs are truncated.

metadata
metadata Object
notebook_output
notebook_output JobsNotebookOutput
run_job_output
run_job_output JobsRunJobOutput
sql_output
sql_output Object

JobsNotebookOutput

Name Path Type Description
result
result string

The value passed to dbutils.notebook.exit(). Azure Databricks restricts this API to return the first 5 MB of the value. For a larger result, your job can store the results in a cloud storage service. This field is absent if dbutils.notebook.exit() was never called.

truncated
truncated boolean

Whether or not the result was truncated.

JobsRunJobOutput

Name Path Type Description
run_id
run_id integer

The run id of the triggered job run

JobsResolvedValues

Name Path Type Description
condition_task
condition_task JobsResolvedConditionTaskValues
dbt_task
dbt_task JobsResolvedDbtTaskValues
notebook_task
notebook_task JobsResolvedNotebookTaskValues
python_wheel_task
python_wheel_task JobsResolvedPythonWheelTaskValues
run_job_task
run_job_task JobsResolvedRunJobTaskValues
simulation_task
simulation_task JobsResolvedParamPairValues
spark_jar_task
spark_jar_task JobsResolvedStringParamsValues
spark_python_task
spark_python_task JobsResolvedStringParamsValues
spark_submit_task
spark_submit_task JobsResolvedStringParamsValues
sql_task
sql_task JobsResolvedParamPairValues

JobsResolvedConditionTaskValues

Name Path Type Description
left
left string
right
right string

JobsResolvedDbtTaskValues

Name Path Type Description
commands
commands array of string

JobsResolvedNotebookTaskValues

Name Path Type Description
base_parameters
base_parameters object

JobsResolvedPythonWheelTaskValues

Name Path Type Description
named_parameters
named_parameters object
parameters
parameters array of string

JobsResolvedRunJobTaskValues

Name Path Type Description
job_parameters
job_parameters object
parameters
parameters object

JobsResolvedParamPairValues

Name Path Type Description
parameters
parameters object

JobsResolvedStringParamsValues

Name Path Type Description
parameters
parameters array of string