Context of our project
Source: IBM DB2 iSeries CDC into Kafka
Phases: one-time historical, catch-up (bounded CDC), and near-real-time streaming
Target: Azure SQL Database Hyperscale
Constraints: Healthcare data; correctness and replay are top priority; sustained average ~3k EPS with peaks up to ~70k EPS during backfills
Current design
Catch-up: CDC is first merged in ADLS Bronze (Delta). The already-merged result is then loaded from Bronze into Hyperscale main.
Streaming: Flattened CDC rows are landed into Hyperscale staging, then MERGE from staging into Hyperscale main.
Controls: RunID, watermarks, and per-run audit are tracked in control tables. We aim for full idempotency and atomicity.
Guidance requested
a) Hyperscale staging → Hyperscale main (streaming)
Best practice for transaction scoping, isolation levels, lock hints, and use of application locks to prevent overlapping merges on the same table.
Recommended MERGE template for high-throughput upserts/deletes, including MERGE OUTPUT to audit, and guardrails against accidental mass deletes.
Guidance on batching strategy: when to favor set-based MERGE vs batched parameterized updates at our volumes.
b) ADLS merged layer (Bronze) → Hyperscale main (catch-up)
Best practice for loading an already-merged upsert/delete set into Hyperscale with minimal latency and predictable locking.
Tuning guidance on minimal logging options that are still safe, recommended table/index settings, and handling constraints.
c) ADLS raw → Hyperscale main (if required)
If raw events carry reliable I/U/D flags and commit timestamps, what safeguards are recommended to preserve correctness at high throughput?
Any official examples on event ordering, late arrivals, and idempotent replays.
Failure semantics and “partial failure”
Our understanding is that a single MERGE in Hyperscale is atomic and will roll back on error. Please confirm.
Recommended settings for XACT_ABORT, retry patterns on deadlocks/timeouts, and how to align control-table state changes with data transactions to avoid “data committed but RunID not marked” drift.
Avoiding wrong merges
We are guarding against join predicate errors, cut-over boundary mistakes, late/out-of-order events, isolation mismatches, mass-delete patterns, schema drift, and collation mismatches.
Do you have a published checklist or reference implementation specific to Hyperscale that we should align with?
Rollback and revoke
For table-scope rollback: plan is to capture MERGE OUTPUT with before/after hashes and programmatically reverse actions for a given RunID.
For disaster recovery: plan is to restore point-in-time copies and rehydrate.
Please confirm if this is aligned with best practice, and share any Hyperscale-specific optimizations for faster restores or table-level undo.
Performance and cost
Recommended Hyperscale configurations, index patterns, or concurrency settings for sustained 3k EPS with spikes up to 70k EPS.
- Guidance on balancing rowversion, filtered indexes, or columnstore for heavily updated tables.Context of our project
- Source: IBM DB2 iSeries CDC into Kafka
- Phases: one-time historical, catch-up (bounded CDC), and near-real-time streaming
- Target: Azure SQL Database Hyperscale
- Constraints: Healthcare data; correctness and replay are top priority; sustained average ~3k EPS with peaks up to ~70k EPS during backfills
Current design
- Catch-up: CDC is first merged in ADLS Bronze (Delta). The already-merged result is then loaded from Bronze into Hyperscale main.
- Streaming: Flattened CDC rows are landed into Hyperscale staging, then MERGE from staging into Hyperscale main.
- Controls: RunID, watermarks, and per-run audit are tracked in control tables. We aim for full idempotency and atomicity.
Guidance requested a) Hyperscale staging → Hyperscale main (streaming)
- Best practice for transaction scoping, isolation levels, lock hints, and use of application locks to prevent overlapping merges on the same table.
- Recommended MERGE template for high-throughput upserts/deletes, including MERGE OUTPUT to audit, and guardrails against accidental mass deletes.
- Guidance on batching strategy: when to favor set-based MERGE vs batched parameterized updates at our volumes.
b) ADLS merged layer (Bronze) → Hyperscale main (catch-up)
- Best practice for loading an already-merged upsert/delete set into Hyperscale with minimal latency and predictable locking.
- Tuning guidance on minimal logging options that are still safe, recommended table/index settings, and handling constraints.
c) ADLS raw → Hyperscale main (if required)
- If raw events carry reliable I/U/D flags and commit timestamps, what safeguards are recommended to preserve correctness at high throughput?
- Any official examples on event ordering, late arrivals, and idempotent replays.
Failure semantics and “partial failure”
- Our understanding is that a single MERGE in Hyperscale is atomic and will roll back on error. Please confirm.
- Recommended settings for XACT_ABORT, retry patterns on deadlocks/timeouts, and how to align control-table state changes with data transactions to avoid “data committed but RunID not marked” drift.
Avoiding wrong merges
- We are guarding against join predicate errors, cut-over boundary mistakes, late/out-of-order events, isolation mismatches, mass-delete patterns, schema drift, and collation mismatches.
- Do you have a published checklist or reference implementation specific to Hyperscale that we should align with?
Rollback and revoke
- For table-scope rollback: plan is to capture MERGE OUTPUT with before/after hashes and programmatically reverse actions for a given RunID.
- For disaster recovery: plan is to restore point-in-time copies and rehydrate.
- Please confirm if this is aligned with best practice, and share any Hyperscale-specific optimizations for faster restores or table-level undo.
Performance and cost
- Recommended Hyperscale configurations, index patterns, or concurrency settings for sustained 3k EPS with spikes up to 70k EPS.
- Guidance on balancing rowversion, filtered indexes, or columnstore for heavily updated tables.