Cost Comparison Between Databricks, ADLS (Archive/Cold), and Azure SQL Hyperscale

Janice Chi 620 Reputation points
2025-08-28T13:01:24.7466667+00:00
  1. Databricks Serverless SQL Warehouse + ADLS Archive Tier

Databricks Serverless SQL Warehouse + ADLS Cold Tier

Azure SQL Database Hyperscale (Serverless)


How should we do a fair, apple-to-apple compute comparison between Databricks Serverless and Azure SQL Hyperscale — for example, when we are assuming 16 vCores worth of compute?

Can we consider the “X-Small” Databricks SQL Warehouse (6 DBU/hr) to be roughly equivalent to 16 vCores in Hyperscale?

  Or is there a better reference or guideline to ensure the CPU/memory/throughput are comparable across these two services?
  
  **What are the correct billing parameters we should consider for cost modeling across the three options?**
  
     For example, should we estimate:
     
           Compute cost in DBU/hour for Databricks
           
                 vCore-seconds for Hyperscale
                 
                       Read/rehydration charges for ADLS Archive or Cold?
                       
                          Should backup storage, HA replicas, or other infra components be included in these comparisons?
                          
                          **How is cost metering done in all three services — per second or per hour?**
                          
                             For instance:
                             
                                   Is Databricks Serverless billed per-second, per-minute, or per-hour?
                                   
                                         Is Hyperscale serverless billed strictly per second of usage?
                                         
                                               Are reads from Cold or Archive tier charged per GB read, or are there session-level charges?
                                               
                                               **Are there any hidden or indirect costs we should take into account?**
                                               
                                                  For example:
                                                  
                                                        ADLS Archive has a rehydration delay and early-deletion charges — what are the other hidden charges in Archive and Cold tiers?
                                                        
                                                              In Hyperscale, are there additional costs due to write-ahead logging, autoscaling behavior, or background tasks (e.g., checkpointing, backups)?
                                                              
                                                                    In Databricks, are there minimum active times, startup delays, or idle time billing patterns to be aware of?
                                                                    

We are not asking you to verify exact price numbers — but rather seeking guidance on how to do this cost modeling correctly, using appropriate billing units and assumptions. We want to ensure our client is making a sound cost decision with full transparency across all layers — storage and compute.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
{count} votes

1 answer

Sort by: Most helpful
  1. Smaran Thoomu 29,500 Reputation points Microsoft External Staff Moderator
    2025-08-28T14:03:08.9566667+00:00

    Hi @Janice Chi
    A fair cost comparison across Databricks Serverless SQL Warehouse, Azure SQL Hyperscale, and ADLS Archive/Cold requires normalizing against three dimensions: compute units, storage charges, and billing granularity.

    Compute equivalency (DBUs vs vCores)

    • Databricks Serverless SQL Warehouse: Billed in DBUs per hour (pro-rated per second of usage). The X-Small size (6 DBUs/hr) is not directly equivalent to a fixed number of SQL Hyperscale vCores - Databricks abstracts CPU/memory allocation and manages elasticity internally. There is no official 1:1 mapping (e.g., “X-Small = 16 vCores”) because execution engines differ (Spark vs. SQL Server engine).
    • Azure SQL Hyperscale (Serverless): Billed in vCore-seconds, with compute automatically paused/resumed. Here, compute is explicitly tied to vCores + memory per vCore. For comparisons, the recommended approach is to use benchmark workload performance (TPC-H or internal queries) instead of attempting a direct DBU↔vCore equivalence.

    Storage tiers and costs

    • ADLS Archive/Cold: Very low storage cost, but with rehydration latency (Archive) and early deletion penalties. Reads are charged per GB, and retrieval can add noticeable delay.
    • Hyperscale: Includes built-in HA replicas, write-ahead logging, checkpointing, and backups in the cost. Storage is billed separately for data + log + backup pages.
    • Databricks: Storage cost is externalized (your ADLS account). You only pay compute (DBUs) + underlying ADLS storage tier costs.

    Billing models and granularity

    • Databricks Serverless SQL Warehouse: Pro-rated per second, with a minimum billing time for active sessions. Startup/idle time may incur cost until the warehouse suspends.
    • SQL Hyperscale (serverless): Per-second billing, with no charge when compute is auto-paused.
    • ADLS Archive/Cold: Storage is billed per GB/month, with additional per-GB read and rehydration charges. Archive also has early deletion charges if data is deleted before 180 days.

    Hidden/indirect costs to account for

    • Databricks: Minimum cluster warm-up time, DBU/hr rounding if using classic (vs serverless).
    • Hyperscale: Background operations (auto statistics, backups, HA replicas) are included but may slightly increase storage cost.
    • ADLS Archive/Cold: Rehydration delay, egress/read charges, and early deletion costs.

    Best practice for cost modeling

    • Normalize workloads by expected query concurrency, frequency, and data scanned per query.
    • Compare total monthly TCO, not just per-hour compute: (Compute + Storage + Data Access + HA/replica overhead).
    • Use Azure Pricing Calculator and Databricks pricing documentation with workload assumptions to model apples-to-apples.

    References:


    Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.