When running AWS RDS Microsoft SQL Server you may run into a configuration issue that may trip you up either during failover or instance upgrades.
This is taken from the Microsoft SQL Server Multi-AZ Deployment Notes and Recommendations section under the “Multi-AZ Deployments for Microsoft SQL Server with Database Mirroring” document:
If you have SQL Server Agent jobs, you need to recreate them in the secondary, as these jobs are stored in the msdb database, and this database can’t be replicated via Mirroring. Create the jobs first in the original primary, then fail over, and create the same jobs in the new primary.
This is one of the weaknesses in the Multi-AZ for RDS Server Server service.
They use mirroring to keep two RDS instances loaded with identical user table data, but they can’t mirror MSDB because it’s a system database.
One of the reasons jobs are so confusing on Multi-AZ SQL Server is, if you start off as Single-AZ, and move to Multi-AZ, all of your jobs are copied as part of the move to Multi-AZ. That’s because AWS takes a snapshot of all your data (including MSDB) and recreates it on the mirrored instance. This is where it can get confusing: people who look at a multi-AZ instance, and at a “was Single-AZ, now is Multi-AZ” and see inconsistent behavior in the jobs. But it can all be understood if you apply two rules:
- Jobs created when you’re Single-AZ will be copied when you move to Multi-AZ, because AWS takes a snapshot of all databases (including MSDB), but
- Other than that, no changes to jobs will ever be copied to the mirror unless the changes are done manually on both servers.
Here’s a scenario that you may find yourself with:
- At some time (maybe days/weeks/months ago), you modified your jobs on the principal, but didn’t failover and make those changes on the mirror.
- Then, during an update, AWS updated the mirror, and made sure mirroring was recovering.
- Then the database failed over. Now, you’re on the secondary, which hasn’t had the changes applied to the jobs.
- Then, AWS updated the former-primary-now-secondary. If you fail over to that, you should see your jobs are as you set them up.
To failover RDS SQL Server manually, you should use the console or the CLI (or other method of calling the API) and do a reboot, choosing to failover. (In the console, when you ask to reboot the instance, it won’t have that option, but it will ask you once you submit the reboot request.)
In most cases where you might want to failover to deal with problems – say, SQL Server seems locked up – it’s often better to just reboot, without a failover. (Why? Because if SQL Server is locked up, it can’t take action to move the databases to the mirror. If you reboot, the mirror and witness will see that SQL Server isn’t running, and trigger the failover themselves.) You shouldn’t make any changes to the partner of a database (including ALTER…FAILOVER”), unless you need to, e.g., drop or rename a database – and there, you’ll be turning the partner off, which will force the instance to remirror.
Of course, there’s a final tidbit hidden in those rules: if you’re on the server where the jobs are correctly set up, and you can’t afford a failover, but you’re sure you can handle re-mirroring, then modifying your RDS instance to make it Single-AZ, then moving it back to Multi-AZ, will copy the jobs over. Remember to consider the impact of both possibilities (failing over, versus MAZ->SAZ->MAZ) if you find yourself in that situation.
One warning: if you fail over, while running SQL Server Management Studio (SSMS), open a fresh copy of SSMS to check your jobs. If you open an SSMS instance, and connect to RDS SQL Server, and it fails over, that copy of SSMS might connect to former-principal/now-mirror – it remembers the server it first connected to, and uses that.
So: failover; open a fresh copy of SSMS, and check your jobs there; now, they should look normal.
This scenario was taken from a real-world example with the helpful hints from AWS Support. Thanks John P.!