Peer-to-Peer Transactional Replication: How Conflict Detection WorksArticle by Paul IbisonIntroductionI was recently asked to take a look at a Peer-to-Peer (P2P) implementation for someone who had set it up and who was encountering all sorts of issues. It struck me that I assumed that P2P is a fairly simple technology when compared to merge but nevertheless I didn't fully grasp how it worked at a more fundamental level. So I did some digging and the result is this article :). Main ScenarioBOL / Technet gives the following advice "To avoid potential data inconsistency, make sure that you avoid conflicts in a peer-to-peer topology, even with conflict detection enabled.". So, we clearly see that this is not intended as a replacement for merge, but my main interest is how does this conflict detection work anyway? Can we really see it at work at the lowest level in order to get an understanding of this technology? Suppose I have a table ("TestTable") as follows:
This table exists on 2 nodes in my P2P setup: UK and France. The TextData in row 1 for the UK node is updated to become "AAA". This propagates to France and all is ok. Remember that this update is read by the log reader agent and placed in a transformed format into the distribution database where the France distribution agent later picks it up and applies it to the France node. There is really nothing especially different in this pattern of events if we compare P2P to ordinary transactional replication. So, what if we change the same row at the 2 different nodes simultaneously? For this test I stop the distribution agents temporarily because I don't want any possibility of a change being propagated as I flick between my query windows, in which case I'll negate the test. I make the data changes so we have the following: UK
France
In this case a conflict is detected when I run the distribution agents. Quite how this is resolved depends on what strategy you have selected (see "Handling Conflicts" here for more info) but here I am interested in how the conflict is detected itself. Looking a bit deeperFirst of all a bit of background....In both merge replication and transactional replication with updating subscribers (now sadly deprecated!) there is a schema change when we create a publication where an extra column is added to the replicated table. In both cases this extra column holds an GUID. In merge replication the GUID value allocated to a row never changes. while changed merge rows have their GUID values noted in external meta data tables and a lineage column in these tables indicates who changed the row. In transactional replication with updating subscribers the GUID value itself changes when the row changes, but we have no record of who actually made the change as the simplified conflict resolution options have no need to distinguish between subscribers. So, I expected that in some way the conflict resolution in P2P to be based on some of the previous methodologies mentioned above. Sure enough, there is an extra column which is added to the table "TestTable", although it doesn't contain a GUID. It is called "$sys_p2p_cd_id" but if I try to refer to it directly I get the following message: Invalid pseudocolumn "$sys_p2p_cd_id". I can however see this extra column if I use the DAC to open up a query editor window and run the following: select *, $sys_p2p_cd_id as HiddenColumn from TestTableThis now shows:
This is becoming more interesting! The value in this column changes when I do an update, so we begin to see that the technology more resembles transactional replication with updating subscribers as perhaps we could have expected. To get further information I run the following query: select originator_node, res.* from( select *, $sys_p2p_cd_id as HiddenColumn ,sys.fn_replvarbintoint($sys_p2p_cd_id) as OriginatorID ,sys.fn_replp2pversiontotranid($sys_p2p_cd_id) as TranID fromtesttable ) res inner join dbo.MSpeer_originatorid_history poh on res.PeerID = poh.originator_id
The query above reflects the fact that the hidden column $sys_p2p_cd_id actually holds 2 separate bits of info: the node which made the change and the id of the transaction itself. The internal function sys.fn_replvarbintoint is particularly useful here as this returns the Originator_ID. In the query above I have joined the inner results to the MSpeer_originatorid_history table in order to get the actual peer hostname so you can see what is going on, but the numerical Originator_ID is still important itself. This is because it is used in the conflict resolution if p2p_continue_onconflict is set to true - the Originator_ID with the highest value wins in a conflict. This is quite neat because it means an additional metadata logging table is not required (unlike in merge). Where this info comes from...At this stage you might ask how can this possibly work? If the TranID changes when a row changes, the value in $sys_p2p_cd_id must change, so the value of $sys_p2p_cd_id between one changed node and another (not changed) will naturally be different. If we look for differences in $sys_p2p_cd_id as an indication that there is a conflict, then all data changes should produce such a conflict!!! Actually, the log reader here does something quite clever. When the data change is first made, the old and the new values of the $sys_p2p_cd_id column are stored in the MSRepl_Commands table. We can use sp_browsereplcmds to see that these values are added to the arguments for the internal stored procedure sp_MSupd_dbotesttable. This way the old value of $sys_p2p_cd_id can be checked against the value at the destination, because this should still be identical. If it isn't then there is a conflict and we then go on to take a look at the 2 Originator_IDs and check for the highest value which will predominate. So, the info for this article comes from sp_browsereplcmds and from code in the system-generated update proc sp_MSupd_dbotesttable :). ConclusionsSo, hopefully this makes P2P a little clearer. There have been one or two issues where this type of analysis is essential to understand some bugs (see here for example) but it's quite likely that you won't need to know all this to this level of detail to work as a DBA using P2P. Even so, if you're anything like me you'll feel generally more comfortable with a technology knowing that you can delve deeper to troubleshoot should the need arise :). |
||||||||||||||||||||||||||||||
Paul Ibison, Copyright © 2011 |