Serverless GPU compute

2025-08-29

Important

This feature is in Beta.

This article describes serverless GPU compute on Databricks and provides recommended use cases, guidance for how to set up GPU compute resources, and feature limitations.

What is serverless GPU compute?

Serverless GPU compute is part of the Serverless compute offering. Serverless GPU compute is specialized for custom single and multi-node deep learning workloads. You can use serverless GPU compute to train and fine-tune custom models using your favorite frameworks and get state-of-the-art efficiency, performance, and quality.

Serverless GPU compute includes:

An integrated experience across Notebooks, Unity Catalog, and MLflow: You can develop your code interactively using Notebooks.
Serverless GPU compute supports A10s.
Hyperparameter sweeps: You can perform hyperparameter sweep fine-tuning.
Multinode support: You can run distributed training workloads.

The pre-installed packages on serverless GPU compute are not a replacement for Databricks Runtime ML. While there are common packages, not all Databricks Runtime ML dependencies and libraries are reflected in the serverless GPU compute environment.

Recommended use cases

Databricks recommends serverless GPU compute for any model training use case that requires training customizations and GPUs.

For example:

LLM Fine-tuning
Computer vision
Recommender systems
Reinforcement learning
Deep-learning-based time series forecasting

Requirements

A workspace in one of the following Azure-supported regions:
- eastus
- eastus2
- centralus
- northcentralus
- westcentralus
- westus

What's installed

Serverless GPU compute for notebooks uses environment versions, which provide a stable client API to ensure application compatibility. This allows Databricks to upgrade the server independently, delivering performance improvements, security enhancements, and bug fixes without requiring any code changes to workloads.

Serverless GPU compute uses environment version 3 in addition to the following packages:

CUDA 12.4
torch 2.6.0
torchvision 0.21.0

See Serverless environment version 3 for the packages included in system environment version 3.

Note

Base environments are not supported for serverless GPU compute. In order to set up serverless GPU compute on your environment, specify the dependencies directly in the Environments side panel or pip install them.

Add libraries to the environment

You can install additional libraries to the serverless GPU compute environment. See Add dependencies to the notebook.

Note

Adding dependencies using the Environments panel as seen in Add dependencies to the notebook is not supported for serverless GPU compute scheduled jobs.

Set up serverless GPU compute

You can select to use a serverless GPU compute from the notebook environment in your workspace.

After you open your notebook:

Select the to open the Environment side panel.
Select A10 from the Accelerator field.
Select 3 as the Environment version.
Select Apply and then Confirm that you want to apply the serverless GPU compute to your notebook environment. After connecting to a resource, notebooks immediately begin using the available compute.

Note

Connection to your compute auto-terminates after 60 minutes of inactivity.

Create and schedule a job

The following steps show how to create and schedule jobs for your serverless GPU compute workloads. See Create and manage scheduled notebook jobs for more details.

After you open the notebook you want to use:

Select the Schedule button on the top right.
Select Add schedule.
Populate the New schedule form with the Job name, Schedule _and _Compute.
Select Create.

You can also create and schedule jobs from the Jobs and pipelines UI. See Create a new job for step-by-step guidance.

Distributed training

You can launch distributed training across multiple GPUs and/or nodes using the Serverless GPU Python API. It provides a simple, unified interface for running both single-GPU and multi-GPU jobs, automatically handling GPU provisioning, environment setup, and workload distribution behind the scenes. Whether you’re working with local GPUs or scaling out across nodes with remote GPUs, serverless_gpu makes distributed execution straightforward, requiring only minimal code changes.

For example, the code below uses serverless_gpu to distribute the hello_world function to execute across 8 remote A10 GPUs.

# Import the distributed decorator
from serverless_gpu import distributed

# Decorate the function with @distributed and specify the number of GPUs, the GPU type, and whether or not the GPUs are remote
@distributed(gpus=8, gpu_type='A10', remote=True)
def hello_world(s: str) -> None:
  print('hello_world ', s)

# Trigger the distributed execution of the hello_world function
hello_world.distributed(s='abc')

After running the above code, the results and logs appear in the **Experiment** section of your workspace.

Start by importing the [starter notebook](#distributed-training-with-serverless-gpu-api) to get hands-on with the API, then explore the [notebook examples](#notebook-examples) to see how it’s used in real distributed training scenarios.

For full details, refer to the [Serverless GPU Python API](https://api-docs.databricks.com/python/serverless_gpu/index.html) documentation.

## Limitations

:::aws

- Serverless GPU compute only supports A10 or similar compute.
- [PrivateLink](/security/network/classic/privatelink.md) is not supported. Storage or pip repos behind PrivateLink are not supported.
- Serverless GPU compute is **not supported for compliance security profile workspaces (like HIPAA or PCT)**. Processing regulated data is not supported at this time.
- Serverless GPU compute is only supported on interactive environments.
- Scheduled jobs on Serverless GPU compute:

  - Only supported for a single task.
  - Auto recovery behavior for incompatible package versions that are associated with your notebook is not supported.

:::

:::azure

- Serverless GPU compute only supports A10 or similar compute.
- [Private Link](/security/network/classic/private-link.md) is not supported. Storage or pip repos behind Private Link are not supported.
- Serverless GPU compute is **not supported for compliance security profile workspaces (like HIPAA or PCI)**. Processing regulated data is not supported at this time.
- Scheduled jobs on Serverless GPU compute:

  - Only supported for a single task.
  - Auto recovery behavior for incompatible package versions that are associated with your notebook is not supported.

:::

## Notebook examples

The notebooks in this section are examples to help demonstrate how to use Serverless GPU compute for different scenarios.

### Deep learning with PyTorch

The following notebook provides a simple example of how to run deep learning training using PyTorch and serverless GPU compute.

::notebook[Deep learning training using PyTorch notebook]{file='serverless-gpu-compute-notebook.html'}

### Distributed training with Serverless GPU API

The following notebook provides a basic example of how to use the Serverless GPU Python API to launch multiple A10 GPUs for distributed training.

::notebook[Serverless GPU API: A10 starter notebook]{file='sgc-api-a10-starter-notebook.html'}

### Distributed training and hyperparameter sweeps

The following notebook provides an example of distributed training and hyperparameter sweeps fine-tuning using the Serverless GPU Python API.

::notebook[Serverless GPU compute sweeps and distributed training notebook]{file='sgc-sweep-unsloth-llama3.2.html'}

### Fine-tune Qwen2-0.5B model

The following notebook provides an example of how to efficiently fine-tune the Qwen2-0.5B model using:

- Transformer reinforcement learning (TRL) for supervised finetuning
- Liger Kernels for memory-efficient training with optimized Triton kernels.
- LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning.and train it using PyTorch and serverless GPU compute.

::notebook[Fine-tune the Qwen2-0.5B model notebook]{file='serverless-gpu-fine-tuning.html'}

### Fine-tune an embedding model

The following notebook provides an example of how to fine-tune an embedding model. This example uses contrastive learning to fine-tune an embedding model, `gte-large-en-v1.5` on a single A10G.

::notebook[Fine-tune an embedding model notebook]{file='serverless-gpu-fine-tune-embedding-model.html'}

### Fine-tune Llama-3.2-3B with Unsloth

This notebook demonstrates how to fine-tune Llama-3.2-3B using the Unsloth library.

::notebook[Fine-tune Llama model with Unsloth notebook]{file='sgc-finetune-llama-unsloth.html'}

### Object detection custom fine-tuning

This notebook demonstrates how to train an object detection model using a Hugging Face example on one A10 GPU.

::notebook[Object detection custom fine-tuning notebook]{file='sgc-object-detection-1-gpu.html'}

### XGBoost model training

This notebook demonstrates how to train an XGBoost regression model on a single GPU.

::notebook[XGBoost model training notebook]{file='sgc-xgboost.html'}

### Two-tower recommendation model

These notebooks demonstrate how to convert your recommendation data into MDS format and then use that data to create a two-tower recommendation model.

- ::notebook[Convert recommendation model dataset to MDS format notebook]{file='sgc-recommendation-system-mds-conversion.html'}

- ::notebook[Two-tower recommendation model notebook]{file='sgc-recommender-system-1-gpu.html'}

### Distributed supervised fine-tuning using TRL

This notebook demonstrates how to use Databricks Serverless GPU to run supervised fine-tuning (SFT) using the TRL library with DeepSpeed ZeRO Stage 3 optimization on a single node A10 GPU.

::notebook[Distributed TRL SFT Training notebook]{file='sgc-sft-trl-deepspeed-llama-1b.html'}

### Time series forecasting with GluonTS

This notebook demonstrates an end-to-end workflow for probabilistic time-series forecasting of electricity-consumption data with GluonTS’s DeepAR model on a serverless GPU cluster, covering data ingestion, resampling, model training, prediction, visualization, and evaluation.

::notebook[Time series forecasting with GluonTS notebook]{file='sgc-time-series-gluonts-101.html'}

Share via

Serverless GPU compute

What is serverless GPU compute?

Recommended use cases

Requirements

What's installed

Add libraries to the environment

Set up serverless GPU compute

Create and schedule a job

Distributed training

Feedback

Additional resources