merge into Azure SQL Hyperscale main tables

Janice Chi 620 Reputation points
2025-08-27T15:14:51.32+00:00

Context of our project

Source: IBM DB2 iSeries CDC into Kafka

Phases: one-time historical, catch-up (bounded CDC), and near-real-time streaming

Target: Azure SQL Database Hyperscale

Constraints: Healthcare data; correctness and replay are top priority; sustained average ~3k EPS with peaks up to ~70k EPS during backfills

Current design

Catch-up: CDC is first merged in ADLS Bronze (Delta). The already-merged result is then loaded from Bronze into Hyperscale main.

Streaming: Flattened CDC rows are landed into Hyperscale staging, then MERGE from staging into Hyperscale main.

Controls: RunID, watermarks, and per-run audit are tracked in control tables. We aim for full idempotency and atomicity.

Guidance requested

a) Hyperscale staging → Hyperscale main (streaming)

Best practice for transaction scoping, isolation levels, lock hints, and use of application locks to prevent overlapping merges on the same table.

Recommended MERGE template for high-throughput upserts/deletes, including MERGE OUTPUT to audit, and guardrails against accidental mass deletes.

Guidance on batching strategy: when to favor set-based MERGE vs batched parameterized updates at our volumes.

b) ADLS merged layer (Bronze) → Hyperscale main (catch-up)

Best practice for loading an already-merged upsert/delete set into Hyperscale with minimal latency and predictable locking.

Tuning guidance on minimal logging options that are still safe, recommended table/index settings, and handling constraints.

c) ADLS raw → Hyperscale main (if required)

If raw events carry reliable I/U/D flags and commit timestamps, what safeguards are recommended to preserve correctness at high throughput?

Any official examples on event ordering, late arrivals, and idempotent replays.

Failure semantics and “partial failure”

Our understanding is that a single MERGE in Hyperscale is atomic and will roll back on error. Please confirm.

Recommended settings for XACT_ABORT, retry patterns on deadlocks/timeouts, and how to align control-table state changes with data transactions to avoid “data committed but RunID not marked” drift.

Avoiding wrong merges

We are guarding against join predicate errors, cut-over boundary mistakes, late/out-of-order events, isolation mismatches, mass-delete patterns, schema drift, and collation mismatches.

Do you have a published checklist or reference implementation specific to Hyperscale that we should align with?

Rollback and revoke

For table-scope rollback: plan is to capture MERGE OUTPUT with before/after hashes and programmatically reverse actions for a given RunID.

For disaster recovery: plan is to restore point-in-time copies and rehydrate.

Please confirm if this is aligned with best practice, and share any Hyperscale-specific optimizations for faster restores or table-level undo.

Performance and cost

Recommended Hyperscale configurations, index patterns, or concurrency settings for sustained 3k EPS with spikes up to 70k EPS.

  • Guidance on balancing rowversion, filtered indexes, or columnstore for heavily updated tables.Context of our project
    • Source: IBM DB2 iSeries CDC into Kafka
    • Phases: one-time historical, catch-up (bounded CDC), and near-real-time streaming
    • Target: Azure SQL Database Hyperscale
    • Constraints: Healthcare data; correctness and replay are top priority; sustained average ~3k EPS with peaks up to ~70k EPS during backfills
    Current design
    • Catch-up: CDC is first merged in ADLS Bronze (Delta). The already-merged result is then loaded from Bronze into Hyperscale main.
    • Streaming: Flattened CDC rows are landed into Hyperscale staging, then MERGE from staging into Hyperscale main.
    • Controls: RunID, watermarks, and per-run audit are tracked in control tables. We aim for full idempotency and atomicity.
    Guidance requested a) Hyperscale staging → Hyperscale main (streaming)
    • Best practice for transaction scoping, isolation levels, lock hints, and use of application locks to prevent overlapping merges on the same table.
    • Recommended MERGE template for high-throughput upserts/deletes, including MERGE OUTPUT to audit, and guardrails against accidental mass deletes.
    • Guidance on batching strategy: when to favor set-based MERGE vs batched parameterized updates at our volumes.
    b) ADLS merged layer (Bronze) → Hyperscale main (catch-up)
    • Best practice for loading an already-merged upsert/delete set into Hyperscale with minimal latency and predictable locking.
    • Tuning guidance on minimal logging options that are still safe, recommended table/index settings, and handling constraints.
    c) ADLS raw → Hyperscale main (if required)
    • If raw events carry reliable I/U/D flags and commit timestamps, what safeguards are recommended to preserve correctness at high throughput?
    • Any official examples on event ordering, late arrivals, and idempotent replays.
    Failure semantics and “partial failure”
    • Our understanding is that a single MERGE in Hyperscale is atomic and will roll back on error. Please confirm.
    • Recommended settings for XACT_ABORT, retry patterns on deadlocks/timeouts, and how to align control-table state changes with data transactions to avoid “data committed but RunID not marked” drift.
    Avoiding wrong merges
    • We are guarding against join predicate errors, cut-over boundary mistakes, late/out-of-order events, isolation mismatches, mass-delete patterns, schema drift, and collation mismatches.
    • Do you have a published checklist or reference implementation specific to Hyperscale that we should align with?
    Rollback and revoke
    • For table-scope rollback: plan is to capture MERGE OUTPUT with before/after hashes and programmatically reverse actions for a given RunID.
    • For disaster recovery: plan is to restore point-in-time copies and rehydrate.
    • Please confirm if this is aligned with best practice, and share any Hyperscale-specific optimizations for faster restores or table-level undo.
    Performance and cost
    • Recommended Hyperscale configurations, index patterns, or concurrency settings for sustained 3k EPS with spikes up to 70k EPS.
    • Guidance on balancing rowversion, filtered indexes, or columnstore for heavily updated tables.
Azure SQL Database
{count} votes

1 answer

Sort by: Most helpful
  1. Smaran Thoomu 29,500 Reputation points Microsoft External Staff Moderator
    2025-08-27T17:41:18.37+00:00

    Hi @Janice Chi
    Thanks for pulling all the moving parts together here - this nicely builds on the earlier discussions around staging vs. Bronze, reconciliation, and MERGE design. Let me address each of your areas directly:

    a) Hyperscale staging → Hyperscale main (streaming)

    • A single MERGE in Hyperscale is atomic (as noted before), so you can rely on rollback semantics if the statement fails.
    • To prevent overlaps, application locks (sp_getapplock) or scoped staging → main sequencing are common. READ COMMITTED SNAPSHOT is usually sufficient; escalate to SERIALIZABLE only if you’re enforcing strict ordering.
    • At your throughput, a set-based MERGE pattern is still recommended, with batching in the 50k–200k row range if you see blocking. Guardrail clauses (e.g. WHEN MATCHED AND ...) can help avoid mass accidental deletes. Use MERGE OUTPUT to capture audit state per RunID.

    b) Bronze (already merged) → Hyperscale main (catch-up)

    • Bulk load into a staging table with BULK_LOGGED–style minimal logging isn’t available in Hyperscale, so parallel COPY INTO or BULK INSERT + index maintenance is the best practice.
    • Consider disabling nonclustered indexes during bulk catch-up, then reapplying. Partitioning by business key or timestamp can make incremental loads safer.

    c) Raw → Hyperscale main (if required)

    If you bypass Bronze, correctness hinges on reliable I/U/D flags. In that case, late-arriving or out-of-order events must be re-merged by a deterministic key + commit timestamp. The same idempotency logic we discussed earlier still applies.

    Failure semantics / partial failure

    Correct: a MERGE is atomic. With SET XACT_ABORT ON, deadlocks/timeouts will roll back. For retries, capture your RunID in a control table inside the same transaction so that commit + state marking never drift.

    Avoiding wrong merges

    You’ve already listed the main risks. There isn’t a published Hyperscale-only checklist, but the SQL Server Data Loading Patterns and Idempotent Merge examples are the closest references.

    Rollback and revoke

    Capturing MERGE OUTPUT with before/after values per RunID is the standard approach if you want programmatic undo. For DR, Hyperscale supports fast point-in-time restore (with named restore points if you want more control). That aligns with best practice.

    Performance and cost

    • Hyperscale tier selection and page server count drive sustained throughput. For sustained ~3k EPS with 70k spikes, consider enabling enough page servers (high service objective like HS_Gen5_8 or above).
    • For heavily updated tables, rowstore with filtered indexes is safer than clustered columnstore (which favors append-heavy patterns). Rowversion columns can help in change detection but don’t replace CDC.

    I hope this information helps. Please do let us know if you have any further queries.


    Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.