Raj Murugan

Posted on Feb 13

AWS DMS Fails After Azure SQL Managed Instance Failover? Here’s How to Fix It

#aws #sqlserver #dms

Problem Statement

A customer is using AWS Database Migration Service (DMS) to migrate data from Azure SQL Managed Instance to AWS RDS PostgreSQL over a VPN in Full Load + Change Data Capture (CDC) mode. The migration runs smoothly until Azure performs maintenance, triggering a failover. When this happens, DMS fails to locate the last tracked LSN (Log Sequence Number), causing a fatal error that stops the task.

Currently, the only workaround is to manually restart the DMS task and perform a full load again, which is inefficient and time-consuming.

Understanding the Issue

When an Azure SQL Managed Instance failover occurs:

The old primary is demoted, and a new primary takes over.
The transaction logs on the new primary may not have the same LSN sequence as the old primary.
AWS DMS attempts to resume CDC from the last tracked LSN, but if that LSN doesn’t exist on the new primary, DMS throws an error and stops the task.
This issue arises because SQL Server Availability Groups do not synchronize backup history across replicas.

Potential Solutions & Trade-offs

Since this scenario is easy to replicate, testing these solutions in a non-production environment first is recommended.

1️⃣ Temporary Fix – Manual Action Required on Every Failover

Find the latest LSN on the new primary:

  SELECT sys.fn_cdc_get_max_lsn() AS CurrentMaxLSN;

Restart DMS using the new LSN via AWS CLI or console.

Trade-offs:
✅ Simple and requires no architectural changes.

❌ Requires manual intervention every time a failover occurs.

❌ Increases downtime and operational overhead.

2️⃣ Permanent Fix – Adjust DMS Settings (May Not Work for Azure SQL MI)

Set:

  "AlwaysOnSharedSynchedBackupIsEnabled": false

This allows DMS to poll all nodes in the Always On cluster for transaction backups.
Works for on-prem SQL Server, but may prevent reading from the old primary in Azure SQL MI.

Trade-offs:
✅ Fully automated once configured.

✅ No need for manual intervention.

❌ May not be supported in Azure SQL MI, requiring additional testing.

❌ Potential data consistency risks if backups are not fully synchronized.

3️⃣ Permanent Fix – Use Transaction Log Backups Instead of Live Logs

Azure SQL Managed Instance creates transaction log backups every ~10 minutes.
These backups are consistent across failovers, avoiding LSN loss.
DMS v3.5.3+ supports reading from log backups for Amazon RDS for SQL Server.
Grant necessary permissions for DMS to read log backups:

  GRANT EXEC ON msdb.dbo.rds_dms_tlog_download TO rds_user;
  GRANT EXEC ON msdb.dbo.rds_dms_tlog_read TO rds_user;
  GRANT EXEC ON msdb.dbo.rds_dms_tlog_list_current_lsn TO rds_user;
  GRANT EXEC ON msdb.dbo.rds_task_status TO rds_user;

More details: AWS DMS SQL Server Permissions.

📌 Supported for Amazon RDS for SQL Server from DMS v3.5.3+ but may not work for Azure SQL MI—needs testing.

Trade-offs:
✅ Provides a seamless failover experience without breaking CDC.

✅ No manual intervention required after failovers.

❌ Might not be supported in Azure SQL MI, requiring thorough testing.

❌ Backup frequency (~10 min) could introduce minor delays in CDC.

Alternative SQL Managed Instance Solution

4️⃣ Synchronize Backup History Across Replicas (If Solution #2 Fails)

If the secondary replica is not configured for read access, set:

  "AlwaysOnSharedSynchedBackupIsEnabled": true

This forces DMS to use the primary replica as a standalone SQL Server.
However, SQL Server does not automatically synchronize backup metadata across replicas.
Workaround: Run a script on the replica where backups occur to register backup history on other replicas.
Example script & details: Qlik Replicate & SQL Server AG HA Handling.

Trade-offs:
✅ Ensures backup history is consistent across replicas.

✅ Works well with Always On Availability Groups.

❌ Requires additional scripting and maintenance overhead.

❌ May not work in all configurations, depending on permissions and replication settings.

Conclusion

Failovers in Azure SQL Managed Instance disrupt AWS DMS CDC due to LSN mismatches. Using transaction log backups or modifying DMS settings can help mitigate the issue. Testing these solutions in a staging environment before deploying to production is essential.

Let me know if you’ve encountered similar issues or tested any of these solutions! 🚀

DEV Community

AWS DMS Fails After Azure SQL Managed Instance Failover? Here’s How to Fix It

Problem Statement

Understanding the Issue

Potential Solutions & Trade-offs

1️⃣ Temporary Fix – Manual Action Required on Every Failover

2️⃣ Permanent Fix – Adjust DMS Settings (May Not Work for Azure SQL MI)

3️⃣ Permanent Fix – Use Transaction Log Backups Instead of Live Logs

Alternative SQL Managed Instance Solution

4️⃣ Synchronize Backup History Across Replicas (If Solution #2 Fails)

Conclusion

References

Top comments (0)

Read next

DynamoDB Accelerator (DAX): Enhancing Performance with Caching

Ensuring Data Consistency with the Transactional Outbox Pattern on AWS

Is DeepSeek Really a Game Changer in 2025? Unpacking the AI Revolution

💻 Step-by-Step: Hosting a Static Website on AWS EC2 🌟