Home  |  Articles  |  Scripts  |  Consultancy  |  FAQs  |  Links  |  SSIS

 
Merge Replication and Various Causes of Non-Convergence

article by Paul Ibison


This article looks at various scenarios where new data or data changes can "disappear" in merge replication. In each case this is by design according to the rules of the merge replication engine but if you are unaware of these scenarios you might think that there is something wrong with your setup, and perhaps even lose confidence in the technology. So, it is very important when designing your merge replication topology to understand each of the scenarios below. There may turn out to be others which I have missed and if this is the case please email me details and I'll add them in.

(1) "Normal" Conflicts

Suppose one subscriber deletes a record and another one updates the same record. By default the first subscriber to synchronize will "win" the conflict. So, if the delete statement wins, the updated record will disappear and in such cases the data will be removed as a result of a straightforward conflict and nothing more arcane.

In trying to understand why the data has disappeared, we must always check 2 GUIs: the Replication Monitor and the Conflict Viewer to first try to understand what has happened. Assuming we have a HistoryVerboseLevel of 2 in the profile, we'll see the detailed log of each table in the Replication Monitor and notice that there have been conflicts. The Conflict Viewer will record the conflict details as below.

We can track the cause of such conflicts by looking at the MSmerge_conflicts_info table which gives the details of the cause of the conflict and the actual row identifier. The reason_text summarises the situation nicely: "The same row was updated at 'ServerX.testsub' and deleted at 'ServerY.AdventureWorks'. The resolver chose the delete as the winner."

(2) Conflicts Which Cannot be Resolved

Suppose we have the publisher adds PK = '1' and a subscriber adds PK = '1' and then they synchronize. In the case of a conflicting PK value, the conflict will be raised and the data not converged. In the replication monitor the uploaded and downloaded record will both register a conflict. Such a conflict cannot be resolved using the conflict viewer - submitting either the winner or the loser won't work. So, you'll have to manually update the PK value at a particular replica so as to remove the cause of conflict or just delete one of the records. After that you can sync up again in order to allow for convergence. By the way for this particular type of conflict in Profiler I notice that it regenerates a conflict on each synchronization. After attempting and failing the insert, there is a subsequent call to sp_MSsetgentozero which resets the row and then it gets picked up once again for the next sync. Let me reiterate here that your application must provide code which reacts appropriately once such conflicts are raised. This applies to all the conflicts which are not resolved by a winner (conflict_type = 5 to 10 inclusive in the MSmerge_conflicts_info table).

I guess because these conflicts are not solved by the default mechanism and hence lead to non-convergence, they are considered an exception to the normal rules.
 

(3) Compensating Changes

When the merge agent synchronizes it might be the case that an insert, update or a delete cannot be applied at a destination replica. This could be because of a particular constraint or trigger e.g. a check constraint violation at the replica. The merge agent then may send a compensating change back to the source replica to undo the failed change i.e. delete the source record in the case of a failed insert. This all occurs in the same sync session.

There is an option which came into being in a service pack (I forget which) of SQL Server 2000 which controls this behaviour by setting the @compensate_for_errors parameter for sp_addmergearticle and this has been well-documented. In the above scenario a delete would be automatically sent to one of the subscribers during the same synchronization session.

You might now wonder "how do I know I have 'experienced' a compensating change"? Good question! This is not so straightforward. There will be a conflict registered in the conflict viewer. In the case of an insert being compensated by a delete, the conflict viewer will show "A row insert at 'YourServername.testsub' could not be propagated to 'YourServername.AdventureWorks'. This failure can be caused by a constraint violation. ...". Unfortunately this message is not particular to compensating changes so we won't know this is the case explicitly. I'm looking into ways to provide a query which can determine that compensating changes have occurred and will add it to this article if I succeed.

(4) Partition Realignment

Suppose we have three tables - Region, Company and Orders. These are related by PK-FK relationships such that Orders belong to Companies and Companies belong to Regions. We define dynamic partitions such that Subscriber "Sub_North" gets the North Region records and subscriber "Sub_South" gets the "South" records, by using a reference to HOST_NAME() in the filter and adding join filters to the FK tables:

The fact that there is a join filter defined in the publication between Regions and Companies and Orders means that Sub_North will only receive Companies and Orders from the North. Now, suppose we change a company from the North to the South region....what happens? Well first of all the Company data is realigned. Subscriber Sub_South will now also get all the associated orders. While subscriber North will have the Company deleted and all the associated orders will be deleted. If you are new to merge replication this might catch you by surprise. Actually it is quite logical, but we have to be aware that a change near the top of a hierarchy can result in the same amount of traffic as a reinitialization!

Such deletes are visible in the Replication Monitor but there are no conflicts recorded in the conflict viewer. As with (3) we have no easy way of definitively determining that partition re-alignment is the cause of the deletes!
 

(5) Join Filters which "oppose" Foreign Keys (1 inserts)

I wasn't aware of this issue until recently when a client showed me some research done by Steve Wright and James Rowland Jones. As it turned out it wasn't relevant to my particular client but I was aware that this is definitely something to watch out for.

Articles are processed in a certain order which is well-documented. However in the case of merge replication with filters, things can go awry. In the example below, the Sales data will be processed prior to the Company and Region data because the ordering will be derived according to the order of the join filters. Although it's rare, sometimes it makes logical sense to have the join filters run contrary to the PK-FK order.

In the above example, suppose for the North subscriber a company is added to the North region and also an associated Northern sale is added to this company. We then synchronize. We would expect the subscriber who has the defined "North" filter to keep the new sales data. However the views which evaluate the partition contents will not allow sales which don't belong to the partition, and the sale data will be processed prior to the company because of the ordering of the join filters, so the merge agent ensures a corresponding delete will be sent to the subscriber. You can avoid this by having the filter order correspond to the PK-FK order but might not be the correct order for your business requirements. The foolproof method is to  explicitly set the order using the @processing_order parameter of sp_addmergearticle and this'll override the derived order. Plenty more work to do, and this is not supported in SSMS so it's back to scripting.

(5) Join Filters which "oppose" Foreign Keys (2 deletes)

Please see this article for more info.

(7) Merge users not in the PAL

Fellow MVP James Rowland-Jones did some nice research on this section which I've linked to in the title. Basically if the executing user is not in the PAL and doesn't have the correct permissions on the table, the merge view which determines what is replicated will filter out the record, so we have non-convergence. The permissions check is done between the user and the table, so having EXEC rights on a stored proc that inserts into the table won't be sufficient.

(8) Using Joins with Multiple Filters

I'm sure filters and joins are where the complication and confusion of merge lies. Here is another example for this article: if you have a join filter between 2 tables, and a different static table filter on each table in the join, the results are not filtered correctly, or at least not at all intuitively.

E.G. consider 2 tables: Customers and Orders. These are joined by a PK - FK relationship on CustomerID so that as we'd expect a customer can place several orders. If I create a join filter to link the tables, all 'ordering' Customers and their associated Orders are replicated. I then add a filter clause on the Customers table (parent) to restrict the output to a particular customer: CustomerID = 'Joe'. The resulting (automatically created) merge view for Orders restricts the results to the orders belonging to customer 'Joe'. So, on initialization one customer and his associated orders are replicated. However, what if I add another filter now to the Orders table: CustomerID = 'Harry'? Logically this should further restrict the replicated orders and we'd expect zero records returned. However, on initialization we find that more orders are now replicated! In this case the merge view is created which UNIONs the 2 clauses, rather than using a logical AND clause! This is quite different to the above scenarios - we now observe extra rows rather than seeing rows unexpectedly deleted.

(9) Adding New Records without firing the Merge Insert Trigger

This can happen in 2 circumstances:

(a) you BULK INSERT the rows and accept the defaults, then "FIRE_TRIGGERS" is false and consequently details of the new rows are not added to MSmerge_contents. Merge replication relies on entries in MSmerge_contents to determine which records need synchronizing with a replica, so if there are no entries in this table, there'll be no syncing of these newly-added records.

(b) you do a "fast-load" using the Transform Data task using DTS/SSIS.

In either case to fix the issue you need to subsequently run sp_addtabletocontents to include the rows and then resynchronise. Alternatively you can use sp_mergedummyupdate for each single row. For the fast load case, in future if you deselect the check box in the SSIS/DTS data flow, the merge insert trigger will fire and you'll not have the issue.