Home  |  Articles  |  Scripts  |  Consultancy  |  FAQs  |  Links  |  SSIS

 

Implementing High Availability for Merge Replication in SQL Server 2005

This article describes how to set up database mirroring as a high-availability (HA) solution for merge replication in SQL Server 2005. There are actually 2 recommended methods for HA - database mirroring and log-shipping - and I'll do another article about log shipping later on. First of all let's consider what our normal disaster recovery solution is i.e. if the production system goes down what will we do? Hopefully as a thorough DBA we have a script of the replication publication hived away somewhere, and copies of recent database backups of the published database on a tape system. We can edit the script to make it work on another server, restore the production database, apply the script and then reinitialize the subscribers. It's all very painful but it does work.

So, are there any less-painful solutions out there? Well in SQL Server 2000 we could enable something called "alternative synchronization partners" which was unfortunately deprecated in SQL Server 2005. For a subscriber we could allow it to point at another node apart from the publisher and although manual intervention was required, it worked well. Now that this is deprecated, database mirroring is the preferred alternative. In this article I'll go over the main issues I encountered when setting up merge publications and database mirroring and main points of interest.

Test Topology

I have added a diagram below of the topology I used to test all this.

As far as the replication setup goes, we have a merge publisher and a single subscriber on a separate box. This box also has the (remote) distributor. This is pretty much as expected because if the publisher's box goes down and it contains the distribution database, bang goes the replication setup! So, for the sake of safety we have a remote distributor. The subscriber database is on the same server as the distributor. Usually this is not the case, but it is not entirely uncommon and anyway it makes testing easier :).

For the Mirror Setup I used High Protection mode so there is no witness being used - just a principal and a mirror. The mirror is on the same box as the subscriber and distributor - no problem there and again done this way purely for simplicity.

Setting it all up

I pretty much followed BOL for the setup but with a few changes:

When configuring the distributor, I placed this it on the subscriber server and had it use a snapshot share also on this server (i.e. I didn't use the publisher server for the snapshots!). I then enabled both the publisher and the mirror servers as publishers for this distributor.

I created the publication and subscription pretty much as per normal.

When configuring database mirroring of the published database, I set it up without having a witness - for the sake of simplicity - but apparently any configuration is fine. There are plenty of details about setting up mirroring in BOL and don't be put off - it's probably easier to set up than log-shipping. Also, apparently the setup is totally flexible in the implementation order - publication first then mirroring, or vice versa.

To configure the replication agents for failover is the most important modification to a normal replication setup. This is pretty much the crux of the issue. You'll need to create separate profiles for the snapshot and merge agents and enter values for the –PublisherFailoverPartner parameter. This should be set to be the name of the mirror. If you don't see this parameter, you'll need to disable the "Show only parameters used in this profile" checkbox. Once this is set, then make sure this profile is selected and then restart the merge agent if it is running continuously, so it picks up the new profile as shown below.

Finally, configure the Replication Monitor so it has both the principal and the mirror registered.

Once it is completed, it should look like the figure below on the LHS. This takes a bit of getting used to because so much detail is displayed in the GUI so bear with me here....

The database "TestMirroringDB" is the mirrored database. It appears under both server nodes (XXXXXX03 and XXXXXX02) and everything is running fine as the principal and mirror are showing that they are both "Synchronized".

The server XXXXXX03 is the mirror, has a subscription in the database "testSubscriber" and is also the remote distributor.

The server XXXXXX02 is the principal and holds the publication in the database TestMirroringDB - the mirrored database.

What Happens if various bits are removed or become unavailable?

Let's sort out the terminology first of all. After a failover, the mirror becomes the principal, which can be confusing when explaining things so in this section the terms "principal" and "mirror" always refer to the original principal and mirror.

After a successful failover to the mirror, the replication publication will appear under the node of the mirror server, and the mirror will become the new publisher. It's quite incredible the first time you have it all set up and see it working! This final state is shown on the RHS of the figure above but if you are implementing this remember that you might need to do a refresh after failover to see these changes reflected in the GUI. However, the crucial thing is what happens if various parts of the setup are removed/disabled? A nice clean failover represents the ideal state but might not always be possible....

(1) remove mirroring without doing a failover. Replication continues to work unchanged.

(2) failover and then turn off the publisher (or the SQL Server service). To do this I switched to High Protection mode so I could initiate the failover from SSMS. After failover, the replication synchronization at the mirror works ok.

(3) turn off the publisher and then failover to the mirror (in this case I tested using high performance mode). Obviously a standard failover is not possible because there's no principal to initiate the failover, so you have to use:

 ALTER DATABASE yourdatabase SET PARTNER FORCE_SERVICE_ALLOW_DATA_LOSS

After that, replication at the mirror works fine.

(4) failover, then completely remove database mirroring. This failover was done using the High Protection mode and after removing mirroring the database at the principal was left in the restoring state. There was the following error when running the merge agent:

"The merge process could not connect to the Publisher 'PublisherServerName:TestMirroringDB'. Check to ensure that the server is running."

If you recover the principal database and make it available, you'll have the following cryptic error:

"{call sp_MSchecksnapshotstatus (N'TestMirroringDR')}"

So, removing mirroring after failover is fatal to the replication setup, but turning off the principal server is actually ok!!!

This is not a bug and is documented in BOL, even though it seems odd. I can appreciate that the server name is actually stored as a part of the publication metadata as this has always been an issue when moving a published database to a new server. However I still find this behaviour rather strange. If the system works in the absence of the principal, it seems that the server naming issues are not insurmountable and it should be possible to have a maintenance stored procedure released at some stage to allow the metadata to be reset. Hopefully we'll see something along these lines soon.

Conclusions

Anyway, hopefully this shows that the setup is not nearly as complicated as it seems and if you have a really mission-critical system it's definitely worth investigating.