Azure Databricks

Azure Databricks offers a unified platform for scalable data management, governance, and analytics, combining streamlined workflows with the ability to handle diverse data types efficiently
This connector is available in the following products and regions:
Service | Class | Regions |
---|---|---|
Copilot Studio | Premium | All Power Automate regions except the following: - US Government (GCC) - US Government (GCC High) - China Cloud operated by 21Vianet - US Department of Defense (DoD) |
Power Apps | Premium | All Power Apps regions except the following: - US Government (GCC) - US Government (GCC High) - China Cloud operated by 21Vianet - US Department of Defense (DoD) |
Power Automate | Premium | All Power Automate regions except the following: - US Government (GCC) - US Government (GCC High) - China Cloud operated by 21Vianet - US Department of Defense (DoD) |
Contact | |
---|---|
Name | Databricks Support |
URL | https://help.databricks.com |
eng-partner-eco-help@databricks.com |
Connector Metadata | |
---|---|
Publisher | Databricks Inc. |
Website | https://www.databricks.com/ |
Privacy policy | https://www.databricks.com/legal/privacynotice |
Categories | Data |
Connect to Azure Databricks from Microsoft Power Platform
This page explains how to connect to Azure Databricks from Microsoft Power Platform by adding Azure Databricks as a data connection. When connected, you can use your Azure Databricks data from the following platforms:
- Power Apps: Build applications that can read from and write to Azure Databricks, while preserving your Azure Databricks governance controls.
- Power Automate: Build flows and add actions that enable executing custom SQL or an existing Job and get back the results.
- Copilot Studio: Build custom agents using your Azure Databricks data as a knowledge source.
Before you begin
Before you connect to Azure Databricks from Power Platform, you must meet the following requirements:
- You have a Microsoft Entra ID (formerly Azure Active Directory) account.
- You have a premium Power Apps license.
- You have an Azure Databricks account.
- You have access to a SQL warehouse in Azure Databricks.
Optional: Connect with Azure Virtual Networks
If your Azure Databricks workspace uses Virtual Networks, there are two ways to connect:
Integrate Power Platform with resources inside your virtual network without exposing them over the public internet. To connect to the private endpoint of your Azure Databricks workspace, do the following after you configure private connectivity to Azure Databricks:
If your Power Platform virtual network (whether Primary or Secondary) is different from your Azure Databricks virtual network, use virtual network peering to connect the virtual network with Azure Databricks.
For more information about virtual networks, see Virtual Network support overview.
Enable access with hybrid deployment, where a front-end private link with a public endpoint is protected by a Workspace IP Access List. To enable access, do the following:
- Enable public access at workspace level. For more details, see Configure IP access lists for workspaces.
- Add the AzureConnectors IP range, or specific Power Platform IP range based on your environment's region, to your Workspace IP access list.
Optional: Create a Microsoft Entra Service Principal
Important
If Azure Databricks and Power Platform are in different tenants, you must use Service Principals for authentication.
Before connecting, complete the following steps to create, set up, and assign a Microsoft Entra Service Principal to your Azure Databricks account or workspace:
- Register a new service principal in Microsoft Entra ID.
- Add service principals to your account.
- Assign a service principal to a workspace.
Step 1: Add an Azure Databricks connection to Power Platform
Note: If you're using Copilot Studio, we recommend creating the Databricks connection in Power Apps or Power Automate. Then it can be used in Copilot Studio.
To add an Azure Databricks connection, do the following:
In Power Apps or Power Automate, from the sidebar, click Connections.
Click + New connection in the upper-left corner.
Search for "Azure Databricks" using the search bar in the upper-right.
Select the Azure Databricks tile.
Select your Authentication type from the drop down menu.
Select your authentication method and enter your authentication information.
If your Power Platform deployment and Azure Databricks account are in the same Microsoft Entra tenant, you can use OAuth connection. Enter the following information:
- For Server Hostname, enter the Azure Databricks SQL warehouse hostname.
- For HTTP Path, enter the SQL warehouse HTTP path.
- Click Create.
- Sign in with your Microsoft Entra ID.
Service principal connection can be used in any scenario. Before connecting, create a Microsoft Entra service principal. Enter the following information:
- For Client ID, enter the service principal ID.
- For Client Secret, enter the service principal secret.
- For Tenant, enter the service principal tenant.
- For Hostname, enter the Azure Databricks SQL warehouse hostname.
- For HTTP Path, enter the SQL warehouse HTTP path.
- (Optional) You can rename or share the service principal connection with your team members after the connection is created.
To find your Azure Databricks SQL warehouse connection details, see Get connection details for an Azure Databricks compute resource.
Click Create.
Step 2: Use the Azure Databricks connection
After you create an Azure Databricks connection in Power Apps or Power Automate, you can use your Azure Databricks data to create Power canvas apps, Power Automate flows, and Copilot Studio agents.
Use your Azure Databricks data to build Power canvas apps
Important
You can only use canvas apps if directly connecting to Azure Databricks in the app. You can't use virtual tables.
To add your Azure Databricks data to your application, do the following:
- From the leftmost navigation bar, click Create.
- Click Start with a blank canvas and select your desired canvas size to create a new canvas app.
- From your application, click Add data > Connectors > Azure Databricks. Select the Azure Databricks connection you created.
- Select a catalog from the Choose a dataset sidebar.
- From the Choose a dataset sidebar, select all the tables you want to connect your canvas app to.
- Click Connect.
Data operations in Power Apps:
The connector supports create, update, and delete operations, but only for tables that have a primary key defined. When performing create operations you must always specify the primary key.
Note: Azure Databricks supports generated identity columns. In this case, primary key values are automatically generated on the server during row creation and cannot be manually specified.
Use your Azure Databricks data to build Power Automate flows
The Statement Execution API and the Jobs API are exposed within Power Automate, allowing you to write SQL statements and execute existing Jobs. To create a Power Automate flow using Azure Databricks as an action, do the following:
- From the leftmost navigation bar, click Create.
- Create a flow and add any trigger type.
- From your new flow, click + and search for "Databricks" to see the available actions.
To write SQL, select one of the following actions:
Execute a SQL Statement: Write and run a SQL statement. Enter the following:
- For Body/warehouse_id, enter the ID of the warehouse upon which to execute the SQL statement.
- For Body/statement_id, enter the ID of the SQL statement to execute.
- For more about the advanced parameters, see here.
Check status and get results: Check the status of a SQL statement and gather results. Enter the following:
- For Statement ID, enter the ID returned when the SQL statement was executed.
- For more about the parameter, see here.
Cancel the execution of a statement: Terminate execution of a SQL statement. Enter the following:
- For Statement ID, enter the ID of the SQL statement to terminate.
- For more about the parameter, see here.
Get result by chunk index: Get results by chunk index, which is suitable for large result sets. Enter the following:
- For Statement ID, enter the ID of the SQL statement whose results you want to retrieve.
- For Chunk index, enter the target chunk index.
- For more about the parameters, see here.
To interact with an existing Databricks Job, select one of the following actions:
- List Jobs: Retrieves a list of jobs. For more information see here.
- Trigger a new job run: Runs a job and returns the run_id of the triggered run. For more information see here.
- Get a single Job run: Returns metadata about a run, including run status (e.g., RUNNING, SUCCESS, FAILED), start and end time, execution durations, cluster information, etc. For more information see here.
- Cancel a Job run: Cancels a job run or a task run. For more information, see here.
- Get the output for a single job run: Retrieves the output and metadata of a single task run. For more information, see here.
Use Azure Databricks as a knowledge source in Copilot Studio
To add your Azure Databricks data as a knowledge source to a Copilot Studio agent, do the following:
- From the sidebar, click Agent.
- Select an existing agent or create a new agent by clicking + New agent.
- Describe the agent by inputting a message and then click Create.
- Or, click Skip to manually specify the agent's information.
- In the Knowledge tab, click + Knowledge.
- Click Advanced.
- Select Azure Databricks as the knowledge source.
- Input the catalog name your data is in.
- Click Connect.
- Select the tables you want your agent to use as a knowledge source and click Add.
Create Dataverse virtual tables with your Azure Databricks data
You can also create Dataverse virtual tables with the Azure Databricks connector. Virtual tables, also known as virtual entities, integrate data from external systems with Microsoft Dataverse. A virtual table defines a table in Dataverse without storing the physical table in the Dataverse database. To learn more about virtual tables, see Get started with virtual tables (entities).
Note
Although virtual tables do not consume Dataverse storage capacity, Databricks recommends you to use direct connections for better performance.
You must have the System Customizer or System Admin role. For more information, see security roles for Power Platform.
Follow these steps to create a Dataverse virtual table:
In Power Apps, from the sidebar, click Tables.
Click + New Table from the menu bar and select Create a virtual table.
Select an existing Azure Databricks connection or create a new connection to Azure Databricks. To add a new connection, see Step 1: Add an Azure Databricks connection to Power Platform.
Databricks recommends to use a service principal connection to create a virtual table.
Click Next.
Select the tables to represent as a Dataverse virtual table.
- Dataverse virtual tables require a primary key. Therefore, views cannot be virtual tables, but materialized views can.
Click Next.
Configure the virtual table by updating the details of the table, if necessary.
Click Next.
Confirm the details of the data source and click Finish.
Use the Dataverse virtual table in Power Apps, Power Automate, and Copilot Studio.
For a list of known limitations of Dataverse virtual tables, see Known limitations and troubleshooting.
Conduct batch updates
If you need to perform bulk create, update, or delete operations in response to Power Apps inputs, Databricks recommends to implement a Power Automate flow. To accomplish this, do the following:
Create a canvas app using your Azure Databricks connection in Power Apps.
Create a Power Automate flow using the Azure Databricks connection and use Power Apps as the trigger.
In the Power Automate trigger, add the input fields that you want to pass from Power Apps to Power Automate.
Create a collection object within Power Apps to collect all of your changes.
Add the Power Automate flow to your canvas app.
Call the Power Automate flow from your canvas app and iterate over the collection using a
ForAll
command.ForAll(collectionName, FlowName.Run(input field 1, input field 2, input field 3, …)
Concurrent writes
Row-level concurrency reduces conflicts between concurrent write operations by detecting changes at the row-level and automatically resolving conflicts that occur when concurrent writes update or delete different rows in the same data file.
Row-level concurrency is included in Databricks Runtime 14.2 or above. Row-level concurrency is supported by default for the following types of tables:
- Tables with deletion vectors enabled and without partitioning
- Tables with liquid clustering, unless deletion vectors are disabled
To enable deletion vectors, run the following SQL command:
ALTER TABLE table_name SET TBLPROPERTIES ('delta.enableDeletionVectors' = true);
For more information about concurrent write conflicts in Azure Databricks, see Isolation levels and write conflicts on Azure Databricks.
Add Azure Databricks to a data policy
By adding Azure Databricks to a Business data policy, Azure Databricks can't share data with connectors in other groups. This protects your data and prevents it from being shared with those who should not have access to it. For more information, see Manage data policies.
To add the Azure Databricks connector to a Power Platform data policy:
- From any Power Platform application, click the settings gear in the upper-right side, and select Admin Center.
- From the sidebar, click Policies > Data Policies.
- If you are using the new admin center, click Security > Data and Privacy > Data Policy.
- Click + New Policy or select an existing policy.
- If creating a new policy, enter a name.
- Select an environment to add to your policy and click + Add to policy above.
- Click Next.
- Search for and select the Azure Databricks connector.
- Click Move to Business and click Next.
- Review your policy and click Create policy.
Limitations
- The Power Platform connector does not support government clouds.
Power App limitations
The following PowerFx formulas calculate values using only the data that has been retrieved locally:
Category | Formula |
---|---|
Table function | - GroupBy - Distinct |
Aggregation | - CountRows - StdevP - StdevS |
Creating a connection
The connector supports the following authentication types:
OAuth Connection | OAuth Connection | All regions | Not shareable |
Service Principal Connection | Service Principal Connection | All regions | Shareable |
Default [DEPRECATED] | This option is only for older connections without an explicit authentication type, and is only provided for backward compatibility. | All regions | Not shareable |
OAuth Connection
Auth ID: oauth2-auth
Applicable: All regions
OAuth Connection
This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.
Name | Type | Description | Required |
---|---|---|---|
Server Hostname (Example: adb-3980263885549757139.2.azuredatabricks.net) | string | Server name of Databricks workspace | True |
HTTP Path (Example: /sql/1.0/warehouses/a9c4e781bd29f315) | string | HTTP Path of Databricks SQL Warehouse | True |
Service Principal Connection
Auth ID: oAuthClientCredentials
Applicable: All regions
Service Principal Connection
This is shareable connection. If the power app is shared with another user, connection is shared as well. For more information, please see the Connectors overview for canvas apps - Power Apps | Microsoft Docs
Name | Type | Description | Required |
---|---|---|---|
Client ID | string | True | |
Client Secret | securestring | True | |
Tenant | string | True | |
Server Hostname (Example: adb-3980263885549757139.2.azuredatabricks.net) | string | Server name of Databricks workspace | True |
HTTP Path (Example: /sql/1.0/warehouses/a9c4e781bd29f315) | string | HTTP Path of Databricks SQL Warehouse | True |
Default [DEPRECATED]
Applicable: All regions
This option is only for older connections without an explicit authentication type, and is only provided for backward compatibility.
This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.
Throttling Limits
Name | Calls | Renewal Period |
---|---|---|
API calls per connection | 100 | 60 seconds |
Actions
Cancel a run |
Cancels a job run or a task run. The run is canceled asynchronously, so it may still be running when this request completes. |
Cancel statement execution |
Requests that an executing statement be canceled. Callers must poll for status to see the terminal state. |
Check status and get results |
Get the status, manifest and results of the statement |
Execute a SQL statement |
Execute a SQL statement and optionally await its results for a specified time. |
Get a single job run |
Retrieves the metadata of a run. Large arrays in the results will be paginated when they exceed 100 elements. A request for a single run will return all properties for that run, and the first 100 elements of array properties (tasks, job_clusters, job_parameters and repair_history). Use the next_page_token field to check for more results and pass its value as the page_token in subsequent requests. If any array properties have more than 100 elements, additional results will be returned on subsequent requests. Arrays without additional results will be empty on later pages. |
Get result by chunk index |
After the statement execution has SUCCEEDED, this request can be used to fetch any chunk by index. |
Get the output for a single run |
Retrieve the output and metadata of a single task run. When a notebook task returns a value through the dbutils.notebook.exit() call, you can use this endpoint to retrieve that value. Azure Databricks restricts this API to returning the first 5 MB of the output. To return a larger result, you can store job results in a cloud storage service. This endpoint validates that the run_id parameter is valid and returns an HTTP status code 400 if the run_id parameter is invalid. Runs are automatically removed after 60 days. If you to want to reference them beyond 60 days, you must save old run results before they expire. |
List jobs |
Retrieves a list of jobs. |
Trigger a new job run |
Run a job and return the run_id of the triggered run. |
Cancel a run
Cancels a job run or a task run. The run is canceled asynchronously, so it may still be running when this request completes.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
run_id
|
run_id | True | integer |
This field is required. |
Cancel statement execution
Requests that an executing statement be canceled. Callers must poll for status to see the terminal state.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Statement ID
|
statement_id | True | string |
Statement ID |
Check status and get results
Get the status, manifest and results of the statement
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Statement ID
|
statement_id | True | string |
Statement ID |
Returns
Statement execution response
- Body
- SqlStatementResponse
Execute a SQL statement
Execute a SQL statement and optionally await its results for a specified time.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
warehouse_id
|
warehouse_id | True | string |
Target warehouse ID |
statement
|
statement | True | string |
The SQL statement to execute. The statement can optionally be parameterized, see parameters |
name
|
name | True | string |
Parameter marker name |
type
|
type | string |
Parameter data type |
|
value
|
value | string |
Parameter value |
|
catalog
|
catalog | string |
Default catalog for execution |
|
schema
|
schema | string |
Default schema for execution |
|
disposition
|
disposition | string |
Result fetching mode |
|
format
|
format | string |
Result set format |
|
on_wait_timeout
|
on_wait_timeout | string |
Action on timeout |
|
wait_timeout
|
wait_timeout | string |
Result wait timeout |
|
byte_limit
|
byte_limit | integer |
Result byte limit |
|
row_limit
|
row_limit | integer |
Result row limit |
Returns
Statement execution response
- Body
- SqlStatementResponse
Get a single job run
Retrieves the metadata of a run. Large arrays in the results will be paginated when they exceed 100 elements. A request for a single run will return all properties for that run, and the first 100 elements of array properties (tasks, job_clusters, job_parameters and repair_history). Use the next_page_token field to check for more results and pass its value as the page_token in subsequent requests. If any array properties have more than 100 elements, additional results will be returned on subsequent requests. Arrays without additional results will be empty on later pages.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Run ID
|
run_id | True | integer |
The canonical identifier of the run for which to retrieve the metadata. This field is required. |
Include History
|
include_history | boolean |
Whether to include the repair history in the response. |
|
Include Resolved Values
|
include_resolved_values | boolean |
Whether to include resolved parameter values in the response. |
|
Page Token
|
page_token | string |
Use next_page_token returned from the previous GetRun response to request the next page of the run's array properties. |
Returns
- Body
- JobsRun
Get result by chunk index
After the statement execution has SUCCEEDED, this request can be used to fetch any chunk by index.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Statement ID
|
statement_id | True | string |
Statement ID |
Chunk index
|
chunk_index | True | string |
Chunk index |
Returns
- Body
- SqlResultData
Get the output for a single run
Retrieve the output and metadata of a single task run. When a notebook task returns a value through the dbutils.notebook.exit() call, you can use this endpoint to retrieve that value. Azure Databricks restricts this API to returning the first 5 MB of the output. To return a larger result, you can store job results in a cloud storage service. This endpoint validates that the run_id parameter is valid and returns an HTTP status code 400 if the run_id parameter is invalid. Runs are automatically removed after 60 days. If you to want to reference them beyond 60 days, you must save old run results before they expire.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Run ID
|
run_id | True | integer |
The canonical identifier for the run. |
Returns
- Body
- JobsRunOutput
List jobs
Retrieves a list of jobs.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Limit
|
limit | integer |
The number of jobs to return. This value must be greater than 0 and less or equal to 100. The default value is 20. |
|
Expand Tasks
|
expand_tasks | boolean |
Whether to include task and cluster details in the response. Note that only the first 100 elements will be shown. Use :method:jobs/get to paginate through all tasks and clusters. |
|
Job Name
|
name | string |
A filter on the list based on the exact (case insensitive) job name. |
|
Page Token
|
page_token | string |
Use next_page_token or prev_page_token returned from the previous request to list the next or previous page of jobs respectively. |
Returns
- Body
- JobsListJobsResponse
Trigger a new job run
Run a job and return the run_id of the triggered run.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
idempotency_token
|
idempotency_token | string |
An optional token to guarantee the idempotency of job run requests. If a run with the provided token already exists, the request does not create a new run but returns the ID of the existing run instead. If a run with the provided token is deleted, an error is returned. If you specify the idempotency token, upon failure you can retry until the request succeeds. Azure Databricks guarantees that exactly one run is launched with that idempotency token. This token must have at most 64 characters. For more information, see How to ensure idempotency for jobs. |
|
job_id
|
job_id | True | integer |
The ID of the job to be executed |
job_parameters
|
job_parameters | object |
Job-level parameters used in the run. for example "param": "overriding_val" |
|
only
|
only | array of string |
A list of task keys to run inside of the job. If this field is not provided, all tasks in the job will be run. |
|
performance_target
|
performance_target | string | ||
full_refresh
|
full_refresh | boolean |
If true, triggers a full refresh on the delta live table. |
|
enabled
|
enabled | True | boolean |
If true, enable queueing for the job. This is a required field. |
Returns
- Body
- JobsRunNowResponse
Definitions
Object
SqlBaseChunkInfo
Metadata for a result set chunk
Name | Path | Type | Description |
---|---|---|---|
byte_count
|
byte_count | integer |
Number of bytes in the result chunk |
chunk_index
|
chunk_index | integer |
Position in the sequence of result set chunks |
row_count
|
row_count | integer |
Number of rows in the result chunk |
row_offset
|
row_offset | integer |
Starting row offset in the result set |
SqlColumnInfo
Name | Path | Type | Description |
---|---|---|---|
name
|
name | string |
Column name |
position
|
position | integer |
Column position (0-based) |
type_interval_type
|
type_interval_type | string |
Interval type format |
type_name
|
type_name | SqlColumnInfoTypeName |
The name of the base data type. This doesn't include details for complex types such as STRUCT, MAP or ARRAY. |
type_precision
|
type_precision | integer |
Number of digits for DECIMAL type |
type_scale
|
type_scale | integer |
Number of decimal places for DECIMAL type |
type_text
|
type_text | string |
Full SQL type specification |
SqlColumnInfoTypeName
The name of the base data type. This doesn't include details for complex types such as STRUCT, MAP or ARRAY.
The name of the base data type. This doesn't include details for complex types such as STRUCT, MAP or ARRAY.
SqlStatementResponse
Statement execution response
Name | Path | Type | Description |
---|---|---|---|
manifest
|
manifest | SqlResultManifest |
Result set schema and metadata |
result
|
result | SqlResultData | |
statement_id
|
statement_id | string |
Statement ID |
status
|
status | SqlStatementStatus |
Statement execution status |
SqlResultManifest
Result set schema and metadata
Name | Path | Type | Description |
---|---|---|---|
chunks
|
chunks | array of SqlBaseChunkInfo |
Result chunk metadata |
format
|
format | string | |
schema
|
schema | SqlResultSchema |
Result set column definitions |
total_byte_count
|
total_byte_count | integer |
Total bytes in result set |
total_chunk_count
|
total_chunk_count | integer |
Total number of chunks |
total_row_count
|
total_row_count | integer |
Total number of rows |
truncated
|
truncated | boolean |
Result truncation status |
SqlStatementStatus
Statement execution status
Name | Path | Type | Description |
---|---|---|---|
error
|
error | SqlServiceError | |
state
|
state | SqlStatementState |
Statement execution state |
SqlStatementState
SqlServiceError
Name | Path | Type | Description |
---|---|---|---|
error_code
|
error_code | string | |
message
|
message | string |
Error message |
SqlResultSchema
Result set column definitions
Name | Path | Type | Description |
---|---|---|---|
column_count
|
column_count | integer | |
columns
|
columns | array of SqlColumnInfo |
SqlResultData
Name | Path | Type | Description |
---|---|---|---|
byte_count
|
byte_count | integer |
Bytes in result chunk |
chunk_index
|
chunk_index | integer |
Chunk position |
data_array
|
data_array | SqlJsonArray |
Array of arrays with string values |
external_links
|
external_links | array of SqlExternalLink | |
next_chunk_index
|
next_chunk_index | integer |
Next chunk index |
next_chunk_internal_link
|
next_chunk_internal_link | string |
Next chunk link |
row_count
|
row_count | integer |
Rows in chunk |
row_offset
|
row_offset | integer |
Starting row offset |
SqlJsonArray
Array of arrays with string values
Name | Path | Type | Description |
---|---|---|---|
Items
|
array of |
SqlExternalLink
Name | Path | Type | Description |
---|---|---|---|
byte_count
|
byte_count | integer |
Bytes in chunk |
chunk_index
|
chunk_index | integer |
Chunk position |
expiration
|
expiration | date-time |
Link expiration time |
external_link
|
external_link | string | |
http_headers
|
http_headers | object |
Required HTTP headers |
next_chunk_index
|
next_chunk_index | integer |
Next chunk index |
next_chunk_internal_link
|
next_chunk_internal_link | string |
Next chunk link |
row_count
|
row_count | integer |
Rows in chunk |
row_offset
|
row_offset | integer |
Starting row offset |
JobsRunNowResponse
Name | Path | Type | Description |
---|---|---|---|
run_id
|
run_id | integer |
The globally unique ID of the newly triggered run. |
JobsPerformanceTarget
JobsPipelineParams
Name | Path | Type | Description |
---|---|---|---|
full_refresh
|
full_refresh | boolean |
If true, triggers a full refresh on the delta live table. |
JobsQueueSettings
Name | Path | Type | Description |
---|---|---|---|
enabled
|
enabled | boolean |
If true, enable queueing for the job. This is a required field. |
JobsListJobsResponse
Name | Path | Type | Description |
---|---|---|---|
jobs
|
jobs | array of JobsBaseJob |
The list of jobs. Only included in the response if there are jobs to list. |
next_page_token
|
next_page_token | string |
A token that can be used to list the next page of jobs (if applicable). |
prev_page_token
|
prev_page_token | string |
A token that can be used to list the previous page of jobs (if applicable). |
JobsBaseJob
Name | Path | Type | Description |
---|---|---|---|
created_time
|
created_time | integer |
The time at which this job was created in epoch milliseconds (milliseconds since 1/1/1970 UTC). |
creator_user_name
|
creator_user_name | string |
The creator user name. This field won’t be included in the response if the user has already been deleted. |
effective_budget_policy_id
|
effective_budget_policy_id | uuid |
The id of the budget policy used by this job for cost attribution purposes. This may be set through (in order of precedence): 1. Budget admins through the account or workspace console 2. Jobs UI in the job details page and Jobs API using budget_policy_id 3. Inferred default based on accessible budget policies of the run_as identity on job creation or modification. |
has_more
|
has_more | boolean |
Indicates if the job has more array properties (tasks, job_clusters) that are not shown. They can be accessed via :method:jobs/get endpoint. It is only relevant for API 2.2 :method:jobs/list requests with expand_tasks=true. |
job_id
|
job_id | integer |
The canonical identifier for this job. |
settings
|
settings | JobsJobSettings | |
trigger_state
|
trigger_state | JobsTriggerStateProto |
JobsJobSettings
Name | Path | Type | Description |
---|---|---|---|
budget_policy_id
|
budget_policy_id | uuid |
The id of the user specified budget policy to use for this job. If not specified, a default budget policy may be applied when creating or modifying the job. See effective_budget_policy_id for the budget policy used by this workload. |
continuous
|
continuous | JobsContinuous | |
deployment
|
deployment | JobsJobDeployment | |
description
|
description | string |
An optional description for the job. The maximum length is 27700 characters in UTF-8 encoding. |
edit_mode
|
edit_mode | JobsJobEditMode | |
email_notifications
|
email_notifications | JobsJobEmailNotifications | |
environments
|
environments | array of JobsJobEnvironment |
A list of task execution environment specifications that can be referenced by serverless tasks of this job. An environment is required to be present for serverless tasks. For serverless notebook tasks, the environment is accessible in the notebook environment panel. For other serverless tasks, the task environment is required to be specified using environment_key in the task settings. |
git_source
|
git_source | JobsGitSource | |
health
|
health | JobsJobsHealthRules | |
job_clusters
|
job_clusters | array of JobsJobCluster |
A list of job cluster specifications that can be shared and reused by tasks of this job. Libraries cannot be declared in a shared job cluster. You must declare dependent libraries in task settings. |
max_concurrent_runs
|
max_concurrent_runs | integer |
An optional maximum allowed number of concurrent runs of the job. Set this value if you want to be able to execute multiple runs of the same job concurrently. This is useful for example if you trigger your job on a frequent schedule and want to allow consecutive runs to overlap with each other, or if you want to trigger multiple runs which differ by their input parameters. This setting affects only new runs. For example, suppose the job’s concurrency is 4 and there are 4 concurrent active runs. Then setting the concurrency to 3 won’t kill any of the active runs. However, from then on, new runs are skipped unless there are fewer than 3 active runs. This value cannot exceed 1000. Setting this value to 0 causes all new runs to be skipped. |
name
|
name | string |
An optional name for the job. The maximum length is 4096 bytes in UTF-8 encoding. |
notification_settings
|
notification_settings | JobsJobNotificationSettings | |
parameters
|
parameters | array of JobsJobParameterDefinition |
Job-level parameter definitions |
performance_target
|
performance_target | JobsPerformanceTarget | |
queue
|
queue | JobsQueueSettings | |
run_as
|
run_as | JobsJobRunAs | |
schedule
|
schedule | JobsCronSchedule | |
tags
|
tags | object |
A map of tags associated with the job. These are forwarded to the cluster as cluster tags for jobs clusters, and are subject to the same limitations as cluster tags. A maximum of 25 tags can be added to the job. |
tasks
|
tasks | array of JobsTask |
A list of task specifications to be executed by this job. It supports up to 1000 elements in write endpoints (:method:jobs/create, :method:jobs/reset, :method:jobs/update, :method:jobs/submit). Read endpoints return only 100 tasks. If more than 100 tasks are available, you can paginate through them using :method:jobs/get. Use the next_page_token field at the object root to determine if more results are available. |
timeout_seconds
|
timeout_seconds | integer |
An optional timeout applied to each run of this job. A value of 0 means no timeout. |
trigger
|
trigger | JobsTriggerSettings | |
webhook_notifications
|
webhook_notifications | JobsWebhookNotifications |
JobsContinuous
Name | Path | Type | Description |
---|---|---|---|
pause_status
|
pause_status | JobsPauseStatus |
JobsPauseStatus
JobsJobDeployment
Name | Path | Type | Description |
---|---|---|---|
kind
|
kind | JobsJobDeploymentKind | |
metadata_file_path
|
metadata_file_path | string |
Path of the file that contains deployment metadata. |
JobsJobDeploymentKind
JobsJobEditMode
JobsJobEmailNotifications
Name | Path | Type | Description |
---|---|---|---|
on_duration_warning_threshold_exceeded
|
on_duration_warning_threshold_exceeded | array of string |
A list of email addresses to be notified when the duration of a run exceeds the threshold specified for the RUN_DURATION_SECONDS metric in the health field. If no rule for the RUN_DURATION_SECONDS metric is specified in the health field for the job, notifications are not sent. |
on_failure
|
on_failure | array of string |
A list of email addresses to be notified when a run unsuccessfully completes. A run is considered to have completed unsuccessfully if it ends with an INTERNAL_ERROR life_cycle_state or a FAILED, or TIMED_OUT result_state. If this is not specified on job creation, reset, or update the list is empty, and notifications are not sent. |
on_start
|
on_start | array of string |
A list of email addresses to be notified when a run begins. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent. |
on_streaming_backlog_exceeded
|
on_streaming_backlog_exceeded | array of string |
A list of email addresses to notify when any streaming backlog thresholds are exceeded for any stream. Streaming backlog thresholds can be set in the health field using the following metrics: STREAMING_BACKLOG_BYTES, STREAMING_BACKLOG_RECORDS, STREAMING_BACKLOG_SECONDS, or STREAMING_BACKLOG_FILES. Alerting is based on the 10-minute average of these metrics. If the issue persists, notifications are resent every 30 minutes. |
on_success
|
on_success | array of string |
A list of email addresses to be notified when a run successfully completes. A run is considered to have completed successfully if it ends with a TERMINATED life_cycle_state and a SUCCESS result_state. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent. |
JobsJobEnvironment
Name | Path | Type | Description |
---|---|---|---|
environment_key
|
environment_key | string |
The key of an environment. It has to be unique within a job. |
spec
|
spec | ComputeEnvironment |
ComputeEnvironment
Name | Path | Type | Description |
---|---|---|---|
dependencies
|
dependencies | array of string |
List of pip dependencies, as supported by the version of pip in this environment. Each dependency is a valid pip requirements file line per https://pip.pypa.io/en/stable/reference/requirements-file-format/. Allowed dependencies include a requirement specifier, an archive URL, a local project path (such as WSFS or UC Volumes in Azure Databricks), or a VCS project URL. |
environment_version
|
environment_version | string |
Required. Environment version used by the environment. Each version comes with a specific Python version and a set of Python packages. The version is a string, consisting of an integer. See https://learn.microsoft.com/azure/databricks/release-notes/serverless/#serverless-environment-versions. |
JobsGitSource
Name | Path | Type | Description |
---|---|---|---|
git_branch
|
git_branch | string |
Name of the branch to be checked out and used by this job. This field cannot be specified in conjunction with git_tag or git_commit. |
git_commit
|
git_commit | string |
Commit to be checked out and used by this job. This field cannot be specified in conjunction with git_branch or git_tag. |
git_provider
|
git_provider | JobsGitProvider | |
git_snapshot
|
git_snapshot | JobsGitSnapshot | |
git_tag
|
git_tag | string |
Name of the tag to be checked out and used by this job. This field cannot be specified in conjunction with git_branch or git_commit. |
git_url
|
git_url | string |
URL of the repository to be cloned by this job. |
JobsGitProvider
JobsGitSnapshot
Name | Path | Type | Description |
---|---|---|---|
used_commit
|
used_commit | string |
Commit that was used to execute the run. If git_branch was specified, this points to the HEAD of the branch at the time of the run; if git_tag was specified, this points to the commit the tag points to. |
JobsJobsHealthRules
Name | Path | Type | Description |
---|---|---|---|
rules
|
rules | array of JobsJobsHealthRule |
JobsJobsHealthRule
Name | Path | Type | Description |
---|---|---|---|
metric
|
metric | JobsJobsHealthMetric | |
op
|
op | JobsJobsHealthOperator | |
value
|
value | integer |
Specifies the threshold value that the health metric should obey to satisfy the health rule. |
JobsJobsHealthMetric
JobsJobsHealthOperator
JobsJobCluster
Name | Path | Type | Description |
---|---|---|---|
job_cluster_key
|
job_cluster_key | string |
A unique name for the job cluster. This field is required and must be unique within the job. JobTaskSettings may refer to this field to determine which cluster to launch for the task execution. |
new_cluster
|
new_cluster | ComputeClusterSpec |
ComputeClusterSpec
Name | Path | Type | Description |
---|---|---|---|
apply_policy_default_values
|
apply_policy_default_values | boolean |
When set to true, fixed and default values from the policy will be used for fields that are omitted. When set to false, only fixed values from the policy will be applied. |
autoscale
|
autoscale | ComputeAutoScale | |
autotermination_minutes
|
autotermination_minutes | integer |
Automatically terminates the cluster after it is inactive for this time in minutes. If not set, this cluster will not be automatically terminated. If specified, the threshold must be between 10 and 10000 minutes. Users can also set this value to 0 to explicitly disable automatic termination. |
azure_attributes
|
azure_attributes | ComputeAzureAttributes | |
cluster_log_conf
|
cluster_log_conf | ComputeClusterLogConf | |
cluster_name
|
cluster_name | string |
Cluster name requested by the user. This doesn't have to be unique. If not specified at creation, the cluster name will be an empty string. For job clusters, the cluster name is automatically set based on the job and job run IDs. |
custom_tags
|
custom_tags | object |
Additional tags for cluster resources. Azure Databricks will tag all cluster resources (e.g., AWS instances and EBS volumes) with these tags in addition to default_tags. Notes: - Currently, Azure Databricks allows at most 45 custom tags - Clusters can only reuse cloud resources if the resources' tags are a subset of the cluster tags |
data_security_mode
|
data_security_mode | ComputeDataSecurityMode | |
docker_image
|
docker_image | ComputeDockerImage | |
driver_instance_pool_id
|
driver_instance_pool_id | string |
The optional ID of the instance pool for the driver of the cluster belongs. The pool cluster uses the instance pool with id (instance_pool_id) if the driver pool is not assigned. |
driver_node_type_id
|
driver_node_type_id | string |
The node type of the Spark driver. Note that this field is optional; if unset, the driver node type will be set as the same value as node_type_id defined above. This field, along with node_type_id, should not be set if virtual_cluster_size is set. If both driver_node_type_id, node_type_id, and virtual_cluster_size are specified, driver_node_type_id and node_type_id take precedence. |
enable_elastic_disk
|
enable_elastic_disk | boolean |
Autoscaling Local Storage: when enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space. This feature requires specific AWS permissions to function correctly - refer to the User Guide for more details. |
enable_local_disk_encryption
|
enable_local_disk_encryption | boolean |
Whether to enable LUKS on cluster VMs' local disks |
init_scripts
|
init_scripts | array of ComputeInitScriptInfo |
The configuration for storing init scripts. Any number of destinations can be specified. The scripts are executed sequentially in the order provided. If cluster_log_conf is specified, init script logs are sent to <destination>/<cluster-ID>/init_scripts. |
instance_pool_id
|
instance_pool_id | string |
The optional ID of the instance pool to which the cluster belongs. |
is_single_node
|
is_single_node | boolean |
This field can only be used when kind = CLASSIC_PREVIEW. When set to true, Azure Databricks will automatically set single node related custom_tags, spark_conf, and num_workers |
kind
|
kind | ComputeKind | |
node_type_id
|
node_type_id | string |
This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads. A list of available node types can be retrieved by using the :method:clusters/listNodeTypes API call. |
num_workers
|
num_workers | integer |
Number of worker nodes that this cluster should have. A cluster has one Spark Driver and num_workers Executors for a total of num_workers + 1 Spark nodes. Note: When reading the properties of a cluster, this field reflects the desired number of workers rather than the actual current number of workers. For instance, if a cluster is resized from 5 to 10 workers, this field will immediately be updated to reflect the target size of 10 workers, whereas the workers listed in spark_info will gradually increase from 5 to 10 as the new nodes are provisioned. |
policy_id
|
policy_id | string |
The ID of the cluster policy used to create the cluster if applicable. |
runtime_engine
|
runtime_engine | ComputeRuntimeEngine | |
single_user_name
|
single_user_name | string |
Single user name if data_security_mode is SINGLE_USER |
spark_conf
|
spark_conf | object |
An object containing a set of optional, user-specified Spark configuration key-value pairs. Users can also pass in a string of extra JVM options to the driver and the executors via spark.driver.extraJavaOptions and spark.executor.extraJavaOptions respectively. |
spark_env_vars
|
spark_env_vars | object |
An object containing a set of optional, user-specified environment variable key-value pairs. Please note that key-value pair of the form (X,Y) will be exported as is (i.e., export X='Y') while launching the driver and workers. In order to specify an additional set of SPARK_DAEMON_JAVA_OPTS, we recommend appending them to $SPARK_DAEMON_JAVA_OPTS as shown in the example below. This ensures that all default databricks managed environmental variables are included as well. Example Spark environment variables: {"SPARK_WORKER_MEMORY": "28000m", "SPARK_LOCAL_DIRS": "/local_disk0"} or {"SPARK_DAEMON_JAVA_OPTS": "$SPARK_DAEMON_JAVA_OPTS -Dspark.shuffle.service.enabled=true"} |
spark_version
|
spark_version | string |
The Spark version of the cluster, e.g. 3.3.x-scala2.11. A list of available Spark versions can be retrieved by using the :method:clusters/sparkVersions API call. |
ssh_public_keys
|
ssh_public_keys | array of string |
SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. Up to 10 keys can be specified. |
use_ml_runtime
|
use_ml_runtime | boolean |
This field can only be used when kind = CLASSIC_PREVIEW. effective_spark_version is determined by spark_version (DBR release), this field use_ml_runtime, and whether node_type_id is gpu node or not. |
workload_type
|
workload_type | ComputeWorkloadType |
ComputeAutoScale
Name | Path | Type | Description |
---|---|---|---|
max_workers
|
max_workers | integer |
The maximum number of workers to which the cluster can scale up when overloaded. Note that max_workers must be strictly greater than min_workers. |
min_workers
|
min_workers | integer |
The minimum number of workers to which the cluster can scale down when underutilized. It is also the initial number of workers the cluster will have after creation. |
ComputeAzureAttributes
Name | Path | Type | Description |
---|---|---|---|
availability
|
availability | ComputeAzureAvailability | |
first_on_demand
|
first_on_demand | integer |
The first first_on_demand nodes of the cluster will be placed on on-demand instances. This value should be greater than 0, to make sure the cluster driver node is placed on an on-demand instance. If this value is greater than or equal to the current cluster size, all nodes will be placed on on-demand instances. If this value is less than the current cluster size, first_on_demand nodes will be placed on on-demand instances and the remainder will be placed on availability instances. Note that this value does not affect cluster size and cannot currently be mutated over the lifetime of a cluster. |
log_analytics_info
|
log_analytics_info | ComputeLogAnalyticsInfo | |
spot_bid_max_price
|
spot_bid_max_price | double |
The max bid price to be used for Azure spot instances. The Max price for the bid cannot be higher than the on-demand price of the instance. If not specified, the default value is -1, which specifies that the instance cannot be evicted on the basis of price, and only on the basis of availability. Further, the value should > 0 or -1. |
ComputeAzureAvailability
ComputeLogAnalyticsInfo
Name | Path | Type | Description |
---|---|---|---|
log_analytics_primary_key
|
log_analytics_primary_key | string | |
log_analytics_workspace_id
|
log_analytics_workspace_id | string |
ComputeClusterLogConf
Name | Path | Type | Description |
---|---|---|---|
dbfs
|
dbfs | ComputeDbfsStorageInfo | |
volumes
|
volumes | ComputeVolumesStorageInfo |
ComputeDbfsStorageInfo
Name | Path | Type | Description |
---|---|---|---|
destination
|
destination | string |
dbfs destination, e.g. dbfs:/my/path |
ComputeVolumesStorageInfo
Name | Path | Type | Description |
---|---|---|---|
destination
|
destination | string |
UC Volumes destination, e.g. /Volumes/catalog/schema/vol1/init-scripts/setup-datadog.sh or dbfs:/Volumes/catalog/schema/vol1/init-scripts/setup-datadog.sh |
ComputeDataSecurityMode
ComputeDockerImage
Name | Path | Type | Description |
---|---|---|---|
basic_auth
|
basic_auth | ComputeDockerBasicAuth | |
url
|
url | string |
URL of the docker image. |
ComputeDockerBasicAuth
Name | Path | Type | Description |
---|---|---|---|
password
|
password | string |
Password of the user |
username
|
username | string |
Name of the user |
ComputeInitScriptInfo
Name | Path | Type | Description |
---|---|---|---|
abfss
|
abfss | ComputeAdlsgen2Info | |
file
|
file | ComputeLocalFileInfo | |
gcs
|
gcs | ComputeGcsStorageInfo | |
volumes
|
volumes | ComputeVolumesStorageInfo | |
workspace
|
workspace | ComputeWorkspaceStorageInfo |
ComputeAdlsgen2Info
Name | Path | Type | Description |
---|---|---|---|
destination
|
destination | string |
abfss destination, e.g. abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<directory-name>. |
ComputeLocalFileInfo
Name | Path | Type | Description |
---|---|---|---|
destination
|
destination | string |
local file destination, e.g. file:/my/local/file.sh |
ComputeGcsStorageInfo
Name | Path | Type | Description |
---|---|---|---|
destination
|
destination | string |
GCS destination/URI, e.g. gs://my-bucket/some-prefix |
ComputeWorkspaceStorageInfo
Name | Path | Type | Description |
---|---|---|---|
destination
|
destination | string |
wsfs destination, e.g. workspace:/cluster-init-scripts/setup-datadog.sh |
ComputeKind
ComputeRuntimeEngine
ComputeWorkloadType
Name | Path | Type | Description |
---|---|---|---|
clients
|
clients | ComputeClientsTypes |
ComputeClientsTypes
Name | Path | Type | Description |
---|---|---|---|
jobs
|
jobs | boolean |
With jobs set, the cluster can be used for jobs |
notebooks
|
notebooks | boolean |
With notebooks set, this cluster can be used for notebooks |
JobsJobNotificationSettings
Name | Path | Type | Description |
---|---|---|---|
no_alert_for_canceled_runs
|
no_alert_for_canceled_runs | boolean |
If true, do not send notifications to recipients specified in on_failure if the run is canceled. |
no_alert_for_skipped_runs
|
no_alert_for_skipped_runs | boolean |
If true, do not send notifications to recipients specified in on_failure if the run is skipped. |
JobsJobParameterDefinition
Name | Path | Type | Description |
---|---|---|---|
default
|
default | string |
Default value of the parameter. |
name
|
name | string |
The name of the defined parameter. May only contain alphanumeric characters, _, -, and . |
JobsJobRunAs
Name | Path | Type | Description |
---|---|---|---|
service_principal_name
|
service_principal_name | string |
Application ID of an active service principal. Setting this field requires the servicePrincipal/user role. |
user_name
|
user_name | string |
The email of an active workspace user. Non-admin users can only set this field to their own email. |
JobsCronSchedule
Name | Path | Type | Description |
---|---|---|---|
pause_status
|
pause_status | JobsPauseStatus | |
quartz_cron_expression
|
quartz_cron_expression | string |
A Cron expression using Quartz syntax that describes the schedule for a job. See Cron Trigger for details. This field is required. |
timezone_id
|
timezone_id | string |
A Java timezone ID. The schedule for a job is resolved with respect to this timezone. See Java TimeZone for details. This field is required. |
JobsTask
Name | Path | Type | Description |
---|---|---|---|
clean_rooms_notebook_task
|
clean_rooms_notebook_task | Object | |
condition_task
|
condition_task | JobsConditionTask | |
dashboard_task
|
dashboard_task | JobsDashboardTask | |
dbt_task
|
dbt_task | Object | |
depends_on
|
depends_on | array of JobsTaskDependency |
An optional array of objects specifying the dependency graph of the task. All tasks specified in this field must complete before executing this task. The task will run only if the run_if condition is true. The key is task_key, and the value is the name assigned to the dependent task. |
description
|
description | string |
An optional description for this task. |
disable_auto_optimization
|
disable_auto_optimization | boolean |
An option to disable auto optimization in serverless |
email_notifications
|
email_notifications | JobsTaskEmailNotifications | |
environment_key
|
environment_key | string |
The key that references an environment spec in a job. This field is required for Python script, Python wheel and dbt tasks when using serverless compute. |
existing_cluster_id
|
existing_cluster_id | string |
If existing_cluster_id, the ID of an existing cluster that is used for all runs. When running jobs or tasks on an existing cluster, you may need to manually restart the cluster if it stops responding. We suggest running jobs and tasks on new clusters for greater reliability |
for_each_task
|
for_each_task | JobsForEachTask | |
health
|
health | JobsJobsHealthRules | |
job_cluster_key
|
job_cluster_key | string |
If job_cluster_key, this task is executed reusing the cluster specified in job.settings.job_clusters. |
libraries
|
libraries | array of ComputeLibrary |
An optional list of libraries to be installed on the cluster. The default value is an empty list. |
max_retries
|
max_retries | integer |
An optional maximum number of times to retry an unsuccessful run. A run is considered to be unsuccessful if it completes with the FAILED result_state or INTERNAL_ERROR life_cycle_state. The value -1 means to retry indefinitely and the value 0 means to never retry. |
min_retry_interval_millis
|
min_retry_interval_millis | integer |
An optional minimal interval in milliseconds between the start of the failed run and the subsequent retry run. The default behavior is that unsuccessful runs are immediately retried. |
new_cluster
|
new_cluster | ComputeClusterSpec | |
notebook_task
|
notebook_task | JobsNotebookTask | |
notification_settings
|
notification_settings | JobsTaskNotificationSettings | |
pipeline_task
|
pipeline_task | JobsPipelineTask | |
power_bi_task
|
power_bi_task | Object | |
python_wheel_task
|
python_wheel_task | JobsPythonWheelTask | |
retry_on_timeout
|
retry_on_timeout | boolean |
An optional policy to specify whether to retry a job when it times out. The default behavior is to not retry on timeout. |
run_if
|
run_if | JobsRunIf | |
run_job_task
|
run_job_task | JobsRunJobTask | |
spark_jar_task
|
spark_jar_task | JobsSparkJarTask | |
spark_python_task
|
spark_python_task | JobsSparkPythonTask | |
spark_submit_task
|
spark_submit_task | JobsSparkSubmitTask | |
sql_task
|
sql_task | Object | |
task_key
|
task_key | string |
A unique name for the task. This field is used to refer to this task from other tasks. This field is required and must be unique within its parent job. On Update or Reset, this field is used to reference the tasks to be updated or reset. |
timeout_seconds
|
timeout_seconds | integer |
An optional timeout applied to each run of this job task. A value of 0 means no timeout. |
webhook_notifications
|
webhook_notifications | JobsWebhookNotifications |
JobsConditionTask
Name | Path | Type | Description |
---|---|---|---|
left
|
left | string |
The left operand of the condition task. Can be either a string value or a job state or parameter reference. |
op
|
op | JobsConditionTaskOp | |
right
|
right | string |
The right operand of the condition task. Can be either a string value or a job state or parameter reference. |
JobsConditionTaskOp
JobsDashboardTask
Name | Path | Type | Description |
---|---|---|---|
dashboard_id
|
dashboard_id | string |
The identifier of the dashboard to refresh. |
subscription
|
subscription | JobsSubscription | |
warehouse_id
|
warehouse_id | string |
Optional: The warehouse id to execute the dashboard with for the schedule. If not specified, the default warehouse of the dashboard will be used. |
JobsSubscription
Name | Path | Type | Description |
---|---|---|---|
custom_subject
|
custom_subject | string |
Optional: Allows users to specify a custom subject line on the email sent to subscribers. |
paused
|
paused | boolean |
When true, the subscription will not send emails. |
subscribers
|
subscribers | array of JobsSubscriptionSubscriber |
The list of subscribers to send the snapshot of the dashboard to. |
JobsSubscriptionSubscriber
Name | Path | Type | Description |
---|---|---|---|
destination_id
|
destination_id | string |
A snapshot of the dashboard will be sent to the destination when the destination_id field is present. |
user_name
|
user_name | string |
A snapshot of the dashboard will be sent to the user's email when the user_name field is present. |
JobsSource
JobsTaskDependency
Name | Path | Type | Description |
---|---|---|---|
outcome
|
outcome | string |
Can only be specified on condition task dependencies. The outcome of the dependent task that must be met for this task to run. |
task_key
|
task_key | string |
The name of the task this task depends on. |
JobsTaskEmailNotifications
Name | Path | Type | Description |
---|---|---|---|
on_duration_warning_threshold_exceeded
|
on_duration_warning_threshold_exceeded | array of string |
A list of email addresses to be notified when the duration of a run exceeds the threshold specified for the RUN_DURATION_SECONDS metric in the health field. If no rule for the RUN_DURATION_SECONDS metric is specified in the health field for the job, notifications are not sent. |
on_failure
|
on_failure | array of string |
A list of email addresses to be notified when a run unsuccessfully completes. A run is considered to have completed unsuccessfully if it ends with an INTERNAL_ERROR life_cycle_state or a FAILED, or TIMED_OUT result_state. If this is not specified on job creation, reset, or update the list is empty, and notifications are not sent. |
on_start
|
on_start | array of string |
A list of email addresses to be notified when a run begins. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent. |
on_streaming_backlog_exceeded
|
on_streaming_backlog_exceeded | array of string |
A list of email addresses to notify when any streaming backlog thresholds are exceeded for any stream. Streaming backlog thresholds can be set in the health field using the following metrics: STREAMING_BACKLOG_BYTES, STREAMING_BACKLOG_RECORDS, STREAMING_BACKLOG_SECONDS, or STREAMING_BACKLOG_FILES. Alerting is based on the 10-minute average of these metrics. If the issue persists, notifications are resent every 30 minutes. |
on_success
|
on_success | array of string |
A list of email addresses to be notified when a run successfully completes. A run is considered to have completed successfully if it ends with a TERMINATED life_cycle_state and a SUCCESS result_state. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent. |
ComputeLibrary
Name | Path | Type | Description |
---|---|---|---|
cran
|
cran | ComputeRCranLibrary | |
jar
|
jar | string |
URI of the JAR library to install. Supported URIs include Workspace paths, Unity Catalog Volumes paths, and ADLS URIs. For example: { "jar": "/Workspace/path/to/library.jar" }, { "jar" : "/Volumes/path/to/library.jar" } or { "jar": "abfss://my-bucket/library.jar" }. If ADLS is used, please make sure the cluster has read access on the library. You may need to launch the cluster with a Microsoft Entra ID service principal to access the ADLS URI. |
maven
|
maven | ComputeMavenLibrary | |
pypi
|
pypi | ComputePythonPyPiLibrary | |
requirements
|
requirements | string |
URI of the requirements.txt file to install. Only Workspace paths and Unity Catalog Volumes paths are supported. For example: { "requirements": "/Workspace/path/to/requirements.txt" } or { "requirements" : "/Volumes/path/to/requirements.txt" } |
whl
|
whl | string |
URI of the wheel library to install. Supported URIs include Workspace paths, Unity Catalog Volumes paths, and ADLS URIs. For example: { "whl": "/Workspace/path/to/library.whl" }, { "whl" : "/Volumes/path/to/library.whl" } or { "whl": "abfss://my-bucket/library.whl" }. If ADLS is used, please make sure the cluster has read access on the library. You may need to launch the cluster with a Microsoft Entra ID service principal to access the ADLS URI. |
JobsForEachTask
Name | Path | Type | Description |
---|---|---|---|
concurrency
|
concurrency | integer |
An optional maximum allowed number of concurrent runs of the task. Set this value if you want to be able to execute multiple runs of the task concurrently. |
inputs
|
inputs | string |
Array for task to iterate on. This can be a JSON string or a reference to an array parameter. |
task
|
task | Object |
ComputeRCranLibrary
Name | Path | Type | Description |
---|---|---|---|
package
|
package | string |
The name of the CRAN package to install. |
repo
|
repo | string |
The repository where the package can be found. If not specified, the default CRAN repo is used. |
ComputeMavenLibrary
Name | Path | Type | Description |
---|---|---|---|
coordinates
|
coordinates | string |
Gradle-style maven coordinates. For example: "org.jsoup:jsoup:1.7.2". |
exclusions
|
exclusions | array of string |
List of dependences to exclude. For example: ["slf4j:slf4j", "*:hadoop-client"]. Maven dependency exclusions: https://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html. |
repo
|
repo | string |
Maven repo to install the Maven package from. If omitted, both Maven Central Repository and Spark Packages are searched. |
ComputePythonPyPiLibrary
Name | Path | Type | Description |
---|---|---|---|
package
|
package | string |
The name of the pypi package to install. An optional exact version specification is also supported. Examples: "simplejson" and "simplejson==3.8.0". |
repo
|
repo | string |
The repository where the package can be found. If not specified, the default pip index is used. |
JobsNotebookTask
Name | Path | Type | Description |
---|---|---|---|
base_parameters
|
base_parameters | object |
Base parameters to be used for each run of this job. If the run is initiated by a call to :method:jobs/run Now with parameters specified, the two parameters maps are merged. If the same key is specified in base_parameters and in run-now, the value from run-now is used. Use Task parameter variables to set parameters containing information about job runs. If the notebook takes a parameter that is not specified in the job’s base_parameters or the run-now override parameters, the default value from the notebook is used. Retrieve these parameters in a notebook using dbutils.widgets.get. The JSON representation of this field cannot exceed 1MB. |
notebook_path
|
notebook_path | string |
The path of the notebook to be run in the Azure Databricks workspace or remote repository. For notebooks stored in the Azure Databricks workspace, the path must be absolute and begin with a slash. For notebooks stored in a remote repository, the path must be relative. This field is required. |
source
|
source | JobsSource | |
warehouse_id
|
warehouse_id | string |
Optional warehouse_id to run the notebook on a SQL warehouse. Classic SQL warehouses are NOT supported, please use serverless or pro SQL warehouses. Note that SQL warehouses only support SQL cells; if the notebook contains non-SQL cells, the run will fail. |
JobsTaskNotificationSettings
Name | Path | Type | Description |
---|---|---|---|
alert_on_last_attempt
|
alert_on_last_attempt | boolean |
If true, do not send notifications to recipients specified in on_start for the retried runs and do not send notifications to recipients specified in on_failure until the last retry of the run. |
no_alert_for_canceled_runs
|
no_alert_for_canceled_runs | boolean |
If true, do not send notifications to recipients specified in on_failure if the run is canceled. |
no_alert_for_skipped_runs
|
no_alert_for_skipped_runs | boolean |
If true, do not send notifications to recipients specified in on_failure if the run is skipped. |
JobsPipelineTask
Name | Path | Type | Description |
---|---|---|---|
full_refresh
|
full_refresh | boolean |
If true, triggers a full refresh on the delta live table. |
pipeline_id
|
pipeline_id | string |
The full name of the pipeline task to execute. |
JobsPythonWheelTask
Name | Path | Type | Description |
---|---|---|---|
entry_point
|
entry_point | string |
Named entry point to use, if it does not exist in the metadata of the package it executes the function from the package directly using $packageName.$entryPoint() |
named_parameters
|
named_parameters | object |
Command-line parameters passed to Python wheel task in the form of ["--name=task", "--data=dbfs:/path/to/data.json"]. Leave it empty if parameters is not null. |
package_name
|
package_name | string |
Name of the package to execute |
parameters
|
parameters | array of string |
Command-line parameters passed to Python wheel task. Leave it empty if named_parameters is not null. |
JobsRunIf
JobsRunJobTask
Name | Path | Type | Description |
---|---|---|---|
job_id
|
job_id | integer |
ID of the job to trigger. |
job_parameters
|
job_parameters | object |
Job-level parameters used to trigger the job. |
pipeline_params
|
pipeline_params | JobsPipelineParams |
JobsSparkJarTask
Name | Path | Type | Description |
---|---|---|---|
main_class_name
|
main_class_name | string |
The full name of the class containing the main method to be executed. This class must be contained in a JAR provided as a library. The code must use SparkContext.getOrCreate to obtain a Spark context; otherwise, runs of the job fail. |
parameters
|
parameters | array of string |
Parameters passed to the main method. Use Task parameter variables to set parameters containing information about job runs. |
JobsSparkPythonTask
Name | Path | Type | Description |
---|---|---|---|
parameters
|
parameters | array of string |
Command line parameters passed to the Python file. Use Task parameter variables to set parameters containing information about job runs. |
python_file
|
python_file | string |
The Python file to be executed. Cloud file URIs (such as dbfs:/, s3:/, adls:/, gcs:/) and workspace paths are supported. For python files stored in the Azure Databricks workspace, the path must be absolute and begin with /. For files stored in a remote repository, the path must be relative. This field is required. |
source
|
source | JobsSource |
JobsSparkSubmitTask
Name | Path | Type | Description |
---|---|---|---|
parameters
|
parameters | array of string |
Command-line parameters passed to spark submit. Use Task parameter variables to set parameters containing information about job runs. |
JobsWebhookNotifications
Name | Path | Type | Description |
---|---|---|---|
on_duration_warning_threshold_exceeded
|
on_duration_warning_threshold_exceeded | array of JobsWebhook |
An optional list of system notification IDs to call when the duration of a run exceeds the threshold specified for the RUN_DURATION_SECONDS metric in the health field. A maximum of 3 destinations can be specified for the on_duration_warning_threshold_exceeded property. |
on_failure
|
on_failure | array of JobsWebhook |
An optional list of system notification IDs to call when the run fails. A maximum of 3 destinations can be specified for the on_failure property. |
on_start
|
on_start | array of JobsWebhook |
An optional list of system notification IDs to call when the run starts. A maximum of 3 destinations can be specified for the on_start property. |
on_streaming_backlog_exceeded
|
on_streaming_backlog_exceeded | array of JobsWebhook |
An optional list of system notification IDs to call when any streaming backlog thresholds are exceeded for any stream. Streaming backlog thresholds can be set in the health field using the following metrics: STREAMING_BACKLOG_BYTES, STREAMING_BACKLOG_RECORDS, STREAMING_BACKLOG_SECONDS, or STREAMING_BACKLOG_FILES. Alerting is based on the 10-minute average of these metrics. If the issue persists, notifications are resent every 30 minutes. A maximum of 3 destinations can be specified for the on_streaming_backlog_exceeded property. |
on_success
|
on_success | array of JobsWebhook |
An optional list of system notification IDs to call when the run completes successfully. A maximum of 3 destinations can be specified for the on_success property. |
JobsWebhook
Name | Path | Type | Description |
---|---|---|---|
id
|
id | string |
JobsTriggerSettings
Name | Path | Type | Description |
---|---|---|---|
file_arrival
|
file_arrival | JobsFileArrivalTriggerConfiguration | |
pause_status
|
pause_status | JobsPauseStatus | |
periodic
|
periodic | JobsPeriodicTriggerConfiguration |
JobsFileArrivalTriggerConfiguration
Name | Path | Type | Description |
---|---|---|---|
min_time_between_triggers_seconds
|
min_time_between_triggers_seconds | integer |
If set, the trigger starts a run only after the specified amount of time passed since the last time the trigger fired. The minimum allowed value is 60 seconds |
url
|
url | string |
URL to be monitored for file arrivals. The path must point to the root or a subpath of the external location. |
wait_after_last_change_seconds
|
wait_after_last_change_seconds | integer |
If set, the trigger starts a run only after no file activity has occurred for the specified amount of time. This makes it possible to wait for a batch of incoming files to arrive before triggering a run. The minimum allowed value is 60 seconds. |
JobsPeriodicTriggerConfiguration
Name | Path | Type | Description |
---|---|---|---|
interval
|
interval | integer |
The interval at which the trigger should run. |
unit
|
unit | JobsPeriodicTriggerConfigurationTimeUnit |
JobsPeriodicTriggerConfigurationTimeUnit
JobsTriggerStateProto
Name | Path | Type | Description |
---|---|---|---|
file_arrival
|
file_arrival | JobsFileArrivalTriggerState |
JobsFileArrivalTriggerState
Name | Path | Type | Description |
---|---|---|---|
using_file_events
|
using_file_events | boolean |
Indicates whether the trigger leverages file events to detect file arrivals. |
JobsRun
Name | Path | Type | Description |
---|---|---|---|
attempt_number
|
attempt_number | integer |
The sequence number of this run attempt for a triggered job run. The initial attempt of a run has an attempt_number of 0. If the initial run attempt fails, and the job has a retry policy (max_retries > 0), subsequent runs are created with an original_attempt_run_id of the original attempt’s ID and an incrementing attempt_number. Runs are retried only until they succeed, and the maximum attempt_number is the same as the max_retries value for the job. |
cleanup_duration
|
cleanup_duration | integer |
The time in milliseconds it took to terminate the cluster and clean up any associated artifacts. The duration of a task run is the sum of the setup_duration, execution_duration, and the cleanup_duration. The cleanup_duration field is set to 0 for multitask job runs. The total duration of a multitask job run is the value of the run_duration field. |
cluster_instance
|
cluster_instance | JobsClusterInstance | |
cluster_spec
|
cluster_spec | JobsClusterSpec | |
creator_user_name
|
creator_user_name | string |
The creator user name. This field won’t be included in the response if the user has already been deleted. |
description
|
description | string |
Description of the run |
effective_performance_target
|
effective_performance_target | JobsPerformanceTarget | |
end_time
|
end_time | integer |
The time at which this run ended in epoch milliseconds (milliseconds since 1/1/1970 UTC). This field is set to 0 if the job is still running. |
execution_duration
|
execution_duration | integer |
The time in milliseconds it took to execute the commands in the JAR or notebook until they completed, failed, timed out, were cancelled, or encountered an unexpected error. The duration of a task run is the sum of the setup_duration, execution_duration, and the cleanup_duration. The execution_duration field is set to 0 for multitask job runs. The total duration of a multitask job run is the value of the run_duration field. |
git_source
|
git_source | JobsGitSource | |
has_more
|
has_more | boolean |
Indicates if the run has more array properties (tasks, job_clusters) that are not shown. They can be accessed via :method:jobs/getrun endpoint. It is only relevant for API 2.2 :method:jobs/listruns requests with expand_tasks=true. |
job_clusters
|
job_clusters | array of JobsJobCluster |
A list of job cluster specifications that can be shared and reused by tasks of this job. Libraries cannot be declared in a shared job cluster. You must declare dependent libraries in task settings. If more than 100 job clusters are available, you can paginate through them using :method:jobs/getrun. |
job_id
|
job_id | integer |
The canonical identifier of the job that contains this run. |
job_parameters
|
job_parameters | array of JobsJobParameter |
Job-level parameters used in the run |
job_run_id
|
job_run_id | integer |
ID of the job run that this run belongs to. For legacy and single-task job runs the field is populated with the job run ID. For task runs, the field is populated with the ID of the job run that the task run belongs to. |
next_page_token
|
next_page_token | string |
A token that can be used to list the next page of array properties. |
original_attempt_run_id
|
original_attempt_run_id | integer |
If this run is a retry of a prior run attempt, this field contains the run_id of the original attempt; otherwise, it is the same as the run_id. |
overriding_parameters
|
overriding_parameters | JobsRunParameters | |
queue_duration
|
queue_duration | integer |
The time in milliseconds that the run has spent in the queue. |
repair_history
|
repair_history | array of JobsRepairHistoryItem |
The repair history of the run. |
run_duration
|
run_duration | integer |
The time in milliseconds it took the job run and all of its repairs to finish. |
run_id
|
run_id | integer |
The canonical identifier of the run. This ID is unique across all runs of all jobs. |
run_name
|
run_name | string |
An optional name for the run. The maximum length is 4096 bytes in UTF-8 encoding. |
run_page_url
|
run_page_url | string |
The URL to the detail page of the run. |
run_type
|
run_type | JobsRunType | |
schedule
|
schedule | JobsCronSchedule | |
setup_duration
|
setup_duration | integer |
The time in milliseconds it took to set up the cluster. For runs that run on new clusters this is the cluster creation time, for runs that run on existing clusters this time should be very short. The duration of a task run is the sum of the setup_duration, execution_duration, and the cleanup_duration. The setup_duration field is set to 0 for multitask job runs. The total duration of a multitask job run is the value of the run_duration field. |
start_time
|
start_time | integer |
The time at which this run was started in epoch milliseconds (milliseconds since 1/1/1970 UTC). This may not be the time when the job task starts executing, for example, if the job is scheduled to run on a new cluster, this is the time the cluster creation call is issued. |
status
|
status | JobsRunStatus | |
tasks
|
tasks | array of JobsRunTask |
The list of tasks performed by the run. Each task has its own run_id which you can use to call JobsGetOutput to retrieve the run resutls. If more than 100 tasks are available, you can paginate through them using :method:jobs/getrun. Use the next_page_token field at the object root to determine if more results are available. |
trigger
|
trigger | JobsTriggerType | |
trigger_info
|
trigger_info | JobsTriggerInfo |
JobsClusterInstance
Name | Path | Type | Description |
---|---|---|---|
cluster_id
|
cluster_id | string |
The canonical identifier for the cluster used by a run. This field is always available for runs on existing clusters. For runs on new clusters, it becomes available once the cluster is created. This value can be used to view logs by browsing to /#setting/sparkui/$cluster_id/driver-logs. The logs continue to be available after the run completes. The response won’t include this field if the identifier is not available yet. |
spark_context_id
|
spark_context_id | string |
The canonical identifier for the Spark context used by a run. This field is filled in once the run begins execution. This value can be used to view the Spark UI by browsing to /#setting/sparkui/$cluster_id/$spark_context_id. The Spark UI continues to be available after the run has completed. The response won’t include this field if the identifier is not available yet. |
JobsClusterSpec
Name | Path | Type | Description |
---|---|---|---|
existing_cluster_id
|
existing_cluster_id | string |
If existing_cluster_id, the ID of an existing cluster that is used for all runs. When running jobs or tasks on an existing cluster, you may need to manually restart the cluster if it stops responding. We suggest running jobs and tasks on new clusters for greater reliability |
job_cluster_key
|
job_cluster_key | string |
If job_cluster_key, this task is executed reusing the cluster specified in job.settings.job_clusters. |
libraries
|
libraries | array of ComputeLibrary |
An optional list of libraries to be installed on the cluster. The default value is an empty list. |
new_cluster
|
new_cluster | ComputeClusterSpec |
JobsJobParameter
Name | Path | Type | Description |
---|---|---|---|
default
|
default | string |
The optional default value of the parameter |
name
|
name | string |
The name of the parameter |
value
|
value | string |
The value used in the run |
JobsRunParameters
Name | Path | Type | Description |
---|---|---|---|
pipeline_params
|
pipeline_params | JobsPipelineParams |
JobsRepairHistoryItem
Name | Path | Type | Description |
---|---|---|---|
effective_performance_target
|
effective_performance_target | JobsPerformanceTarget | |
end_time
|
end_time | integer |
The end time of the (repaired) run. |
id
|
id | integer |
The ID of the repair. Only returned for the items that represent a repair in repair_history. |
start_time
|
start_time | integer |
The start time of the (repaired) run. |
status
|
status | JobsRunStatus | |
task_run_ids
|
task_run_ids | array of integer |
The run IDs of the task runs that ran as part of this repair history item. |
type
|
type | JobsRepairHistoryItemType |
JobsRunStatus
Name | Path | Type | Description |
---|---|---|---|
queue_details
|
queue_details | JobsQueueDetails | |
state
|
state | JobsRunLifecycleStateV2State | |
termination_details
|
termination_details | JobsTerminationDetails |
JobsQueueDetails
Name | Path | Type | Description |
---|---|---|---|
code
|
code | JobsQueueDetailsCodeCode | |
message
|
message | string |
A descriptive message with the queuing details. This field is unstructured, and its exact format is subject to change. |
JobsQueueDetailsCodeCode
JobsRunLifecycleStateV2State
JobsTerminationDetails
Name | Path | Type | Description |
---|---|---|---|
code
|
code | JobsTerminationCodeCode | |
message
|
message | string |
A descriptive message with the termination details. This field is unstructured and the format might change. |
type
|
type | JobsTerminationTypeType |
JobsTerminationCodeCode
JobsTerminationTypeType
JobsRepairHistoryItemType
JobsRunType
JobsRunTask
Name | Path | Type | Description |
---|---|---|---|
attempt_number
|
attempt_number | integer |
The sequence number of this run attempt for a triggered job run. The initial attempt of a run has an attempt_number of 0. If the initial run attempt fails, and the job has a retry policy (max_retries > 0), subsequent runs are created with an original_attempt_run_id of the original attempt’s ID and an incrementing attempt_number. Runs are retried only until they succeed, and the maximum attempt_number is the same as the max_retries value for the job. |
clean_rooms_notebook_task
|
clean_rooms_notebook_task | Object | |
cleanup_duration
|
cleanup_duration | integer |
The time in milliseconds it took to terminate the cluster and clean up any associated artifacts. The duration of a task run is the sum of the setup_duration, execution_duration, and the cleanup_duration. The cleanup_duration field is set to 0 for multitask job runs. The total duration of a multitask job run is the value of the run_duration field. |
cluster_instance
|
cluster_instance | JobsClusterInstance | |
condition_task
|
condition_task | JobsRunConditionTask | |
dashboard_task
|
dashboard_task | Object | |
dbt_task
|
dbt_task | Object | |
depends_on
|
depends_on | array of JobsTaskDependency |
An optional array of objects specifying the dependency graph of the task. All tasks specified in this field must complete successfully before executing this task. The key is task_key, and the value is the name assigned to the dependent task. |
description
|
description | string |
An optional description for this task. |
effective_performance_target
|
effective_performance_target | JobsPerformanceTarget | |
email_notifications
|
email_notifications | JobsJobEmailNotifications | |
end_time
|
end_time | integer |
The time at which this run ended in epoch milliseconds (milliseconds since 1/1/1970 UTC). This field is set to 0 if the job is still running. |
environment_key
|
environment_key | string |
The key that references an environment spec in a job. This field is required for Python script, Python wheel and dbt tasks when using serverless compute. |
execution_duration
|
execution_duration | integer |
The time in milliseconds it took to execute the commands in the JAR or notebook until they completed, failed, timed out, were cancelled, or encountered an unexpected error. The duration of a task run is the sum of the setup_duration, execution_duration, and the cleanup_duration. The execution_duration field is set to 0 for multitask job runs. The total duration of a multitask job run is the value of the run_duration field. |
existing_cluster_id
|
existing_cluster_id | string |
If existing_cluster_id, the ID of an existing cluster that is used for all runs. When running jobs or tasks on an existing cluster, you may need to manually restart the cluster if it stops responding. We suggest running jobs and tasks on new clusters for greater reliability |
for_each_task
|
for_each_task | Object | |
git_source
|
git_source | JobsGitSource | |
job_cluster_key
|
job_cluster_key | string |
If job_cluster_key, this task is executed reusing the cluster specified in job.settings.job_clusters. |
libraries
|
libraries | array of Object |
An optional list of libraries to be installed on the cluster. The default value is an empty list. |
new_cluster
|
new_cluster | Object | |
notebook_task
|
notebook_task | JobsNotebookTask | |
notification_settings
|
notification_settings | Object | |
pipeline_task
|
pipeline_task | Object | |
power_bi_task
|
power_bi_task | Object | |
python_wheel_task
|
python_wheel_task | Object | |
queue_duration
|
queue_duration | integer |
The time in milliseconds that the run has spent in the queue. |
resolved_values
|
resolved_values | JobsResolvedValues | |
run_duration
|
run_duration | integer |
The time in milliseconds it took the job run and all of its repairs to finish. |
run_id
|
run_id | integer |
The ID of the task run. |
run_if
|
run_if | JobsRunIf | |
run_job_task
|
run_job_task | JobsRunJobTask | |
run_page_url
|
run_page_url | string | |
setup_duration
|
setup_duration | integer |
The time in milliseconds it took to set up the cluster. For runs that run on new clusters this is the cluster creation time, for runs that run on existing clusters this time should be very short. The duration of a task run is the sum of the setup_duration, execution_duration, and the cleanup_duration. The setup_duration field is set to 0 for multitask job runs. The total duration of a multitask job run is the value of the run_duration field. |
spark_jar_task
|
spark_jar_task | Object | |
spark_python_task
|
spark_python_task | Object | |
spark_submit_task
|
spark_submit_task | Object | |
sql_task
|
sql_task | Object | |
start_time
|
start_time | integer |
The time at which this run was started in epoch milliseconds (milliseconds since 1/1/1970 UTC). This may not be the time when the job task starts executing, for example, if the job is scheduled to run on a new cluster, this is the time the cluster creation call is issued. |
status
|
status | JobsRunStatus | |
task_key
|
task_key | string |
A unique name for the task. This field is used to refer to this task from other tasks. This field is required and must be unique within its parent job. On Update or Reset, this field is used to reference the tasks to be updated or reset. |
timeout_seconds
|
timeout_seconds | integer |
An optional timeout applied to each run of this job task. A value of 0 means no timeout. |
webhook_notifications
|
webhook_notifications | Object |
JobsRunConditionTask
Name | Path | Type | Description |
---|---|---|---|
left
|
left | string |
The left operand of the condition task. Can be either a string value or a job state or parameter reference. |
op
|
op | JobsConditionTaskOp | |
outcome
|
outcome | string |
The condition expression evaluation result. Filled in if the task was successfully completed. Can be "true" or "false" |
right
|
right | string |
The right operand of the condition task. Can be either a string value or a job state or parameter reference. |
JobsTriggerType
JobsTriggerInfo
Name | Path | Type | Description |
---|---|---|---|
run_id
|
run_id | integer |
The run id of the Run Job task run |
JobsRunOutput
Name | Path | Type | Description |
---|---|---|---|
clean_rooms_notebook_output
|
clean_rooms_notebook_output | Object | |
dashboard_output
|
dashboard_output | Object | |
dbt_output
|
dbt_output | Object | |
error
|
error | string |
An error message indicating why a task failed or why output is not available. The message is unstructured, and its exact format is subject to change. |
error_trace
|
error_trace | string |
If there was an error executing the run, this field contains any available stack traces. |
info
|
info | string | |
logs
|
logs | string |
The output from tasks that write to standard streams (stdout/stderr) such as spark_jar_task, spark_python_task, python_wheel_task. It's not supported for the notebook_task, pipeline_task or spark_submit_task. Azure Databricks restricts this API to return the last 5 MB of these logs. |
logs_truncated
|
logs_truncated | boolean |
Whether the logs are truncated. |
metadata
|
metadata | Object | |
notebook_output
|
notebook_output | JobsNotebookOutput | |
run_job_output
|
run_job_output | JobsRunJobOutput | |
sql_output
|
sql_output | Object |
JobsNotebookOutput
Name | Path | Type | Description |
---|---|---|---|
result
|
result | string |
The value passed to dbutils.notebook.exit(). Azure Databricks restricts this API to return the first 5 MB of the value. For a larger result, your job can store the results in a cloud storage service. This field is absent if dbutils.notebook.exit() was never called. |
truncated
|
truncated | boolean |
Whether or not the result was truncated. |
JobsRunJobOutput
Name | Path | Type | Description |
---|---|---|---|
run_id
|
run_id | integer |
The run id of the triggered job run |
JobsResolvedValues
Name | Path | Type | Description |
---|---|---|---|
condition_task
|
condition_task | JobsResolvedConditionTaskValues | |
dbt_task
|
dbt_task | JobsResolvedDbtTaskValues | |
notebook_task
|
notebook_task | JobsResolvedNotebookTaskValues | |
python_wheel_task
|
python_wheel_task | JobsResolvedPythonWheelTaskValues | |
run_job_task
|
run_job_task | JobsResolvedRunJobTaskValues | |
simulation_task
|
simulation_task | JobsResolvedParamPairValues | |
spark_jar_task
|
spark_jar_task | JobsResolvedStringParamsValues | |
spark_python_task
|
spark_python_task | JobsResolvedStringParamsValues | |
spark_submit_task
|
spark_submit_task | JobsResolvedStringParamsValues | |
sql_task
|
sql_task | JobsResolvedParamPairValues |
JobsResolvedConditionTaskValues
Name | Path | Type | Description |
---|---|---|---|
left
|
left | string | |
right
|
right | string |
JobsResolvedDbtTaskValues
Name | Path | Type | Description |
---|---|---|---|
commands
|
commands | array of string |
JobsResolvedNotebookTaskValues
Name | Path | Type | Description |
---|---|---|---|
base_parameters
|
base_parameters | object |
JobsResolvedPythonWheelTaskValues
Name | Path | Type | Description |
---|---|---|---|
named_parameters
|
named_parameters | object | |
parameters
|
parameters | array of string |
JobsResolvedRunJobTaskValues
Name | Path | Type | Description |
---|---|---|---|
job_parameters
|
job_parameters | object | |
parameters
|
parameters | object |
JobsResolvedParamPairValues
Name | Path | Type | Description |
---|---|---|---|
parameters
|
parameters | object |
JobsResolvedStringParamsValues
Name | Path | Type | Description |
---|---|---|---|
parameters
|
parameters | array of string |