Home  |  Articles  |  Scripts  |  Consultancy  |  FAQs  |  Links  |  SSIS

 

Implementing NoSync Initializations (and variations) on SQL Server 2005/8

I'd like to do a brief article on doing "nosync initializations" for SQL Server 2005/8. These are initializations which don't use a snapshot. Why would we like to do things this way?

(1) Usually the reason is the size of the snapshot. This is particularly relevant for one of my main clients where we replicate to a fleet of worldwide vessels over narrow satellite links. If we use a snapshot, one particular publication would take ~1 week to initialize. So, the easiest alternative is to zip up the database backup and send it over, unzip, restore and then use this as a starting point.

(2) taking the database ensures that all supporting objects are taken at the same time. Not all objects are able to be added as articles so using this methodology means that we don't have to use post-snapshot files and it is therefore less maintenance.

(3) we might have the data already existing at the subscriber but the application of a snapashot or a database restore from the publisher would break the system. This is clearly not a recommended setup but I have had the dubious pleasure of working on such a system at a bank where standard initialization is impossible. If the subscriber database contains data which only exists on the subscriber then you will have to use this as a starting point. Non-convergence especially in the case of merge replication is something we try to avoid but strangely in some systems it is an accepted state.

Note that in some cases we could also zip up the snapshot files and use an alternative snapshot location rather than using a database backup, however taking a backup is often a simpler method.

How to Add the Subscription - the Wizard

When adding a subscription, the option in the wizard to declare that 'No, the Subscriber already has the schema and data' which existed in SQL Server 2000 has been replaced by the screen below which is fairly self-explanatory. We still need to get the data to the subscriber as before, and there are several modifications we might need to make to the schema on the subscriber as explained in the section below, but this sets us up for a nosync initialization.
For merge replication this is much the same as the previous methodology for a nosync initialization and running the merge agent will generate all the extra system tables, triggers etc on the subscriber.
For transactional replication this is not the same as the SQL Server 2000 nosync initialization as in SQL Server 2005/8 we are using the new 'replication support only' option which is explained further below.

How to Add the Subscription - Scripts

Using scripts we can more easily see what all the available options are and what they mean. To add a subscription in transactional replication we use the following procedures: sp_addsubscription and sp_addpushsubscription_agent / sp_addpullsubscription_agent. In sp_addsubscription there are now 4 options for @sync_type. Previously there were just 2 options: 'automatic' (the default) and 'none' which was used to make a nosync initialization. This is still the case for merge replication in SQL Server 2005/8 so I'll not mention that any more, but for transactional we now have several new options which aren't explicitly available in the wizard. To paraphrase my SQL Server 2000 article, the latter option ('none') had to be used in conjunction with a manual generation of replication stored procedures which are then applied to each subscriber. In SQL Server 2005/8 we now have alternative options which are: 

(1) none - assumes that the subscriber already has the schema and initial data for published tables - NOW DEPRECATED. (however sp_scriptpublicationcustomprocs is not marked as deprecated for manual use in BOL!!!)

(2) automatic (default) - automatic transfer of schema, data, replication stored procedures (and triggers where necessary).

(3) replication support only - what does the term "replication support" mean in this context? It is the article's internal replication stored procedures, the replication triggers that support updating subscriptions and the required replication system tables. So, this is like the SQL Server 2000 option, without the headache of doing the manual creation and implementation of the stored procedures. This option assumes that the subscriber already has the schema and all the initial data for published tables. Here we must be extremely careful:

- there must be NO difference in data between the publisher and subscriber. This is not ensured by the setup and as DBAs we must guarantee that this is the case. In some terminology this is referred to as "Quiescing the system" while the data is transferred. Obviously if you work on a system where there is 24 hour access, this is not going to be possible, so option (2) or (4) would be required.

- if you are setting this up using a backup, the backup must be taken AFTER the publication has been created. Why is this so critical? Because creating the publication makes some changes to the schema which are essential for replication to work. In particular, the identity columns are set to "NOT FOR REPLICATION".

- be aware that this setup is not reflected accurately in the GUI. The normal way of setting up articles is to add them in the GUI and then to edit the article properties. This allows us to decide which permissions, triggers, indexes etc to take for a particular replicated article. Changes afterwards to eg the index setup are not propagated, but it allows for a granularity in the initialization process. In the case of a "replication support only" option, the article properties are MEANINGLESS! You are bypassing this functionality and taking the whole database.

(4) initialize with backup. The complete schema and initial data for published tables are obtained from a recent full backup of the publication database. As mentioned above, this backup must be taken AFTER the publication was created. Once the backup of the publication database is created, we must ensure that the subscriber has access to the backup file - either the backup is transferred to the subscriber or is put on a fileshare that the subscriber can read. The location of the backup and media type for the backup are specified by 2 new parameters of sp_addsubscription: @backupdevicename and @backupdevicetype.

At this point I'll note that you'll need to do 3 things: 

(a) remove redundant objects and tables. There's no point cluttering up the database on the subscriber with nothing that wouldn't be there is you had done an automatic initialization. This will speed up the backup and clarify to other administrators what the subscriber is being used for. 
(b) replace timestamp columns with binary(8) columns. You'll have to use a temporary holding table to achieve this. 
(c) remember that the backup includes all data, even if row or column filters are specified on a table - so some columns might need to be dropped. 

How "initialize with backup" Works, and How to avoid Pitfalls

When using "initialize with backup", a transactional replication topology does not need to be quiesced during configuration. With "initialize with backup", sp_addsubscription internally calls RESTORE HEADERONLY to extract the last LSN of backup to be used as the starting point of the new subscription. A subsequent comparison is made to the xact_seqno value in distribution..MSrepl_commands, and the necessary commands transferred. To ensure that all the required commands actually exist in the MSrepl_commands table we must restore the backup before the retention period is reached and the distribution cleanup agent removes the rows. BOL recommends that we disable the distribution cleanup agent during this period to ensure it all goes smoothly. If you get this wrong, you'll receive the following message:

Msg 21397, Level 16, State 1, Procedure sp_MSsetupnosyncsubwithlsnatdist, Line 213 The transactions required for synchronizing the nosync subscription created from the specified backup are unavailable at the Distributor. Retry the operation again with a more up-to-date log, differential, or full database backup. The Subscriber was dropped.

Adding New tables

If you add a new article to a publication which has a subscriber configured as "replication support only" you have to be a little careful. For a normal (default) subscription we would run the snapshot agent to create the scripts for the new article and this would then be applied at the subscriber. In this case things are different. After adding and propagating the new article, there will be a message in the distribution agent history saying "Applied the snapshot to the Subscriber". However when you look on the subscriber you'll see no such table. What has been created at the subscriber during this process are the internal stored procedures to apply commands to the new table. Creation of the actual table and the data at the subscriber is up to you. So, make sure that you use something like SSIS to transfer the table and data. Make sure that the identity columns are set to NOT FOR REPLICATION and any related objects are created there also - eg non-clustered indexes and triggers. Really you need to do this before subscribing to the new article. Essentially all of this means that the system needs to be quiesced while this happens so we don't lose any data. Therefore we could say that the initial reason we used this method breaks down when we add articles!

Conclusions

Hopefully this article will help you to take advantage of the useful new options in setting up a nosync initialization. This option has always involved some necessary scripting requirements, and it could never be done entirely using the GUI. Now with SQL Server 2005/8 this is even more so, but the additional options present a necessary alternative to automatic initializations.

Thanks to Nigel Maneffa for pointing out some improvements on this article.