Is it possible to set up a disaster recovery strategy where Azure Tenant A is primary, Azure Tenant B is secondary, and backups (Azure SQL, Azure MySQL, VMs) are pushed to AWS so we can rebuild the whole environment from scripts if Azure goes down?

Question

Is it possible to set up a disaster recovery strategy where Azure Tenant A is primary, Azure Tenant B is secondary, and backups (Azure SQL, Azure MySQL, VMs) are pushed to AWS so we can rebuild the whole environment from scripts if Azure goes down?

MOHAMED AKOUR 0

We are currently running our production workloads in Azure Tenant A and we want to design a disaster recovery / multi-cloud backup solution.

Here’s what we are thinking:

Azure Tenant A (primary)
1. All resources currently live here (VMs, storage, Azure SQL Database, and Azure Database for MySQL – both are managed services, not SQL Server installed on a VM).
Azure Tenant B (secondary, same cloud provider)
1. We want to replicate the infrastructure from Tenant A into Tenant B (same region or different) so that if something happens at the tenant level (security issue, accidental deletion, etc.), we can fail over to Tenant B.
2. Infrastructure should be reproducible from scripts (Terraform or Bicep) so we can redeploy from scratch if needed.
AWS (tertiary, multi-cloud backup location)
1. Store backups of managed databases:
  1. Azure SQL Database → export to .bacpac and push to AWS S3.
  2. Azure Database for MySQL → export dumps (e.g. mysqldump) and push to AWS S3.
2. Backup Azure VMs using Veeam and offload those backups to AWS (S3/Glacier).
3. Have Terraform/CloudFormation scripts in place so that if Azure were completely unavailable, we could deploy minimal infra in AWS and restore data from backups.

Questions:

Is this kind of multi-tenant + multi-cloud strategy feasible in practice?
Are there recommended ways to keep infrastructure definitions in sync between two Azure tenants? (e.g. shared Terraform modules, pipelines)
For Azure SQL Database and Azure Database for MySQL (both managed), is exporting bacpacs and dumps the correct way to get restorable backups outside Azure, or is there a better approach for multi-cloud DR?

We are not looking for 100% real-time replication but rather a backup + redeploy-from-scratch model in case of disaster. Any insights, experiences, or best practices would be greatly appreciated.

Sandhya Kommineni 245 Reputation points Microsoft External Staff Moderator

2025-08-28T05:01:18.75+00:00
Hi MOHAMED AKOUR,

Thanks for posting your question in Microsoft Q&A

Yes, a multi-tenant + multi-cloud disaster recovery strategy like the described one is feasible in practice.

Is this kind of multi-tenant + multi-cloud strategy feasible in practice?

Azure supports multi-tenant environments through CSP (cloud service provider) CSP programand other models, where each tenant has isolated subscriptions and resources. Disaster recovery and synchronization across tenants can be achieved using Azure Site Recovery (especially for virtual machines) and service to manage replication, failover, and failback

Refer document: https://learn.microsoft.com/en-us/azure/site-recovery/vmware-azure-multi-tenant-overview

For Azure VMs, ASR supports DR across subscriptions or regions but only within the same Entra (Azure AD*) tenant* https://learn.microsoft.com/en-us/azure/site-recovery/azure-to-azure-support-matrix?utm_source

refer doc to Set up VMware disaster recovery in a multi-tenancy environment with the (CSP) program https://learn.microsoft.com/en-us/azure/site-recovery/vmware-azure-multi-tenant-csp-disaster-recovery

Multi-cloud disaster recovery with Azure and AWS helps avoid vendor lock-in and ensures resilience by storing backups such as .bacpac files, MySQL dumps, and VM images in AWS S3 or Glacier. Rebuilding environments from scripts across both clouds requires strong data management, automation, and security practices to stay consistent and secure.

Having scripts to rebuild infrastructure in AWS and then restore data is very feasible but needs meticulous versioning, testing, and regular validation.

Are there recommended ways to keep infrastructure definitions in sync between two Azure tenants? (e.g. shared Terraform modules, pipelines)

Infrastructure can be replicated between tenants programmatically using Infrastructure as Code (IaC) tools and CI/CD pipelines and Terraform or Bicep + Azure DevOps pipelines you’ll maintain shared modules, version control, to ensure synchronization

refer document: https://learn.microsoft.com/en-us/azure/architecture/guide/multitenant/approaches/deployment-configuration

Recommended ways

Shared Modules & CI/CD: Create shared Terraform (or Bicep) modules for common components (network, VMs, databases). Store in version control (Git) and use pipelines (Azure DevOps, GitHub Actions) to deploy them to both Tenant A and B.

Using Scoped Deployments and Provider Aliases

In Terraform, use provider aliases to configure AzureRM providers for different tenants to target multiple tenants in one pipeline.

In Bicep, use scopes to specify tenant or subscription scope during deployment.

Azure Lighthouse: Use Lighthouse to manage policies, deployments, and security configurations across tenants even by service providers

https://learn.microsoft.com/en-us/azure/lighthouse/concepts/cross-tenant-management-experience?utm_source

Policy as Code: Manage Azure Policy definitions and assignments centrally (deploy via Terraform or pipelines) so you maintain guardrails across both tenants.

Test Deployments: Regularly run test deployments in Tenant B to ensure parity and recovery readiness.

These methods allow synchronized, consistent, and automated infrastructure deployment and management across multiple Azure tenants.For Azure SQL Database and Azure Database for MySQL (both managed), is exporting bacpacs and dumps the correct way to get restorable backups outside Azure, or is there a better approach for multi-cloud DR?

Azure SQL Database:

Built-in automated backups offer point-in-time and geo-restore within Azure

Exporting .bacpac gives you portability, but you must schedule and test these manually.

refer document: https://learn.microsoft.com/en-us/azure/azure-sql/database/database-export?view=azuresql

Azure Database for MySQL:

Internal backups are Azure-only; external exporting via mysqldump is your go-to method for off-cloud backups

Preview features for on-demand export to blob are paused and not reliable for now

Refer document: https://learn.microsoft.com/en-us/azure/mysql/flexible-server/concepts-backup-restore?utm_source

I hope this helps resolve the issue. If you have any further questions, please feel to reach out
Sandhya Kommineni 245 Reputation points Microsoft External Staff Moderator

2025-08-29T02:13:31.4833333+00:00

Hi MOHAMED AKOUR, just checking to see if you have a chance to check my previous response and helped, do let me know if you have any further questions on this.

1 answer

Your answer

Sandhya Kommineni 245 Reputation points Microsoft External Staff Moderator

2025-08-29T02:13:31.4833333+00:00

Hi MOHAMED AKOUR, just checking to see if you have a chance to check my previous response and helped, do let me know if you have any further questions on this.

Answer 1

Hello MOHAMED AKOUR,

Thank you for posting your question in the Microsoft Q&A forum.

First of all, I like to appreciate your very well-structured question. This is a sophisticated and increasingly common DR strategy, moving beyond simple regional failover to account for tenant-level and cloud-level outages.

I have tried to break down your questions and provide detailed guidance as below:

Multi-tenant feasibility - Yes, this strategy is absolutely feasible and is a robust approach to Disaster Recovery. It's often referred to as a "Multi-Cloud Backup and Redeploy" or "Cold Standby" DR model. You are correctly prioritizing different recovery scenarios:

Tenant-Level Disaster (Tenant B): Protects against administrative catastrophe (e.g., compromised root account, billing suspension, accidental tenant-wide deletion).
Cloud-Level Disaster (AWS): Protects against a total Azure region or platform outage.

The key to feasibility is automation. Your plan to use Infrastructure-as-Code (Terraform/Bicep) and automated backup pipelines is exactly the right way to make this manageable.

Keeping Infrastructure in Sync Between Azure Tenants
This is the core of your tenant-level DR plan. The goal is not to have resources running 24/7 in Tenant B (which would be expensive) but to have the ability to deploy them quickly and identically.

Recommended Approach: Unified CI/CD Pipeline with Terraform

Source Control: Store all your Terraform code (or Bicep) in a Git repository (e.g., Azure DevOps, GitHub). This is your single source of truth.
Use Modules Heavily: Structure your code so that your core infrastructure (Networking, AKS, App Service config, etc.) is defined in reusable Terraform modules.
Pipeline Design: Create a CI/CD pipeline (e.g., in Azure DevOps, GitHub Actions) that can authenticate and deploy to multiple tenants.
- Service Principals: Create a Service Principal (SPN) in Tenant A and another in Tenant B. Grant them the necessary permissions via Azure RBAC.
- Pipeline Variables: Use pipeline variables or different Terraform workspaces to manage environment-specific configurations (e.g., tenant ID, subscription ID, some resource names).
Execution Flow:
- Tenant A (Prod): Your pipeline deploys changes to Tenant A automatically upon a merge to the main branch (after a successful PR).
- Tenant B (DR): You have a manual approval gate in the same pipeline to promote the exact same code to Tenant B. You could also run a scheduled job (e.g., nightly) to ensure Tenant B's Terraform state is updated and reflects the current production infrastructure definition, without necessarily deploying the resources. The critical part is that the Terraform state file for Tenant B is kept in sync with the code.

Key Tool: Use Terraform Cloud or a remote backend (like an Azure Storage Account in Tenant B for Tenant B's state, and in Tenant A for Tenant A's state) to securely manage state files for each environment. This is crucial.

Backup Approach for Managed Databases for Multi-Cloud - Your approach is correct for a "cold backup" multi-cloud scenario. However, let's refine it and present the best practice.

Azure SQL Database

.bacpac Exports: This is a valid and supported method. It creates a snapshot of the database schema and data in a standard format.
- Pros: Portable, standard format.
- Cons: Can be very slow for large databases. Not transactionally consistent during export; it's a point-in-time snapshot. For minimal RPO, you need to combine it with other methods.
Better Approach: Use Long-Term Retention (LTR) Backups + Copy to AWS
- Configure Long-Term Retention (LTR) for your Azure SQL Database. This will automatically take full backups weekly and store them in Azure Blob Storage.
- Use az copy or a PowerShell script in an Automation Account to periodically copy these LTR .bak files to your AWS S3 bucket.
Why better?: The LTR backups are transactionally consistent full backups and are taken automatically by the platform. This is more robust than an on-demand bacpac export. You can then use these .bak files to restore to a SQL Server instance (VM or RDS) in AWS.

Azure Database for MySQL

Logical Dumps (mysqldump): This is also a valid method.
- Pros: Simple, standard tooling.
- Cons: Slow for large databases. Single-threaded. Can cause performance impact on the source server during the dump.
Better Approach: Physical Backups + Copy to AWS
- Enable the "Backup redundancy" option for your Azure Database for MySQL to Zone-redundant or Geo-redundant. This ensures your automated backups are stored in a GRS blob.
- Use the az mysql backup export command (or a script leveraging it). This is a platform command that directly exports the physical backup files to a Blob Storage container you specify.
- Again, use az copy to move these physical backup files from Azure Blob to AWS S3.
Why better?: The physical backup is faster and more efficient for large databases than a logical dump. The export command is the recommended way to get a portable backup.

For both databases, the process should be fully automated using:

Azure Automation Runbooks (PowerShell) or a Logic App to trigger the export/backup copy process.
A managed identity with access to the databases and storage.
A schedule (e.g., daily) to perform the operation.

Overall Architecture & Best Practices you may follow:

Automate Everything: The success of this DR plan hinges on 100% automation. Manual steps will fail during a real disaster.
Document the Recovery Process: Have runbooks that detail:
- Failover to Tenant B: 1) Run Terraform Apply to Tenant B, 2) Restore latest database backups from AWS S3 to the newly created databases in Tenant B, 3) Update DNS/CNAME records.
- Failover to AWS: 1) Run CloudFormation/Terraform for AWS, 2) Deploy SQL Server/MySQL on EC2 or RDS, 3) Restore from .bak/dump files in S3.
Plan Proper Tests: Regularly conduct DR drills.
- Test Tenant B Deployment: Monthly, run the Terraform apply to Tenant B to ensure it still works without errors. You can destroy it right after to minimize cost.
- Test Data Restoration: Quarterly, restore your database backups from AWS S3 to a test environment to validate their integrity.
Security:
- Use Azure Key Vault and AWS Secrets Manager to handle all connection strings, SAS tokens, and credentials for your scripts.
- The permissions for the Service Principals and managed identities should follow the principle of least privilege.
Cost Optimization: Since Tenant B will largely be idle, your main costs will be storage (for VM disks, Terraform state, and database backups in AWS S3). Use appropriate storage tiers (e.g., Cool Blob Tier, S3 Glacier Instant Retrieval for older backups).

This is a solid plan. By implementing it with the automated, code-first approach described, you will achieve a highly resilient multi-cloud disaster recovery posture.

Please, let me know the response helps answer your question? If the above answer helped, please do not forget to "Accept Answer" as this may help other community members to refer the info if facing a similar issue. 🙂

Share via

Is it possible to set up a disaster recovery strategy where Azure Tenant A is primary, Azure Tenant B is secondary, and backups (Azure SQL, Azure MySQL, VMs) are pushed to AWS so we can rebuild the whole environment from scripts if Azure goes down?

1 answer

Your answer