Implementing NoSync Initializations (and variations) on SQL Server 2005/8
I'd like to do a brief article on doing "nosync
initializations" for SQL Server 2005/8. These are initializations which don't use a snapshot. Why would we like to do things this way?
(1) Usually the reason is the size of the snapshot. This is particularly relevant for one of my main clients where we replicate to a fleet of worldwide vessels over narrow satellite links. If we use a snapshot, one particular publication would take ~1 week to initialize. So, the easiest alternative is to zip up the database backup and send it over, unzip, restore and then use this as a starting point.
(2) taking the database ensures that all supporting objects are taken at the same time. Not all objects are able to be added as articles so using this methodology means that we don't have to use post-snapshot files and it is therefore less maintenance.
(3) we might have the data already existing at the subscriber but the application of a snapashot or a database restore from the publisher would break the system. This is clearly not a recommended setup but I have had the dubious pleasure of working on such a system at a bank where standard initialization is impossible. If the subscriber database contains data which only exists on the subscriber then you will have to use this as a starting point. Non-convergence especially in the case of merge replication is something we try to avoid but strangely in some systems it is an accepted state.
Note that in some cases we could also zip up the snapshot files and use an alternative snapshot location rather than using a database backup, however taking a backup is often a simpler method.
How to Add the Subscription - the Wizard
When
adding a subscription, the option in the wizard to declare that 'No, the
Subscriber already has the schema and data' which existed in SQL Server 2000 has been replaced by the screen
below which is fairly self-explanatory. We still need to get the data to the
subscriber as before, and there are several modifications we might need to make
to the schema on the subscriber as explained in the section below, but this sets
us up for a nosync initialization.
For merge replication this is much the same
as the previous methodology for a nosync initialization and running the merge
agent will generate all the extra system tables, triggers etc on the subscriber.
For transactional replication this is not the same as the SQL Server
2000 nosync initialization as in SQL Server 2005/8 we are using the new 'replication support only'
option which is explained further below.
How to Add the Subscription - Scripts
Using scripts we can more easily see what all the
available options are and what they mean. To add a subscription in transactional
replication we use the following
procedures: sp_addsubscription and sp_addpushsubscription_agent /
sp_addpullsubscription_agent.
In sp_addsubscription there are now 4 options for @sync_type. Previously there
were just 2 options: 'automatic' (the default) and 'none' which was used to make
a nosync initialization. This is still the case for merge replication in SQL
Server 2005/8 so I'll not mention that any more, but for transactional we now have several new
options which aren't explicitly available in the wizard. To
paraphrase my SQL Server 2000 article, the latter option ('none') had to be used in conjunction
with a manual generation of replication stored procedures which are then
applied to each subscriber. In SQL Server 2005/8 we now have alternative options which are:
(1) none -
assumes that the subscriber already has the schema and initial data for
published tables - NOW DEPRECATED. (however sp_scriptpublicationcustomprocs is
not marked as deprecated for manual use in BOL!!!)
(2) automatic (default)
- automatic transfer of schema, data, replication stored procedures (and
triggers where necessary).
(3) replication support only
- what does the term "replication support" mean in this context? It is
the article's internal replication stored procedures, the replication triggers that support updating
subscriptions and the required replication system tables. So, this is like the SQL Server
2000 option, without the headache of doing the manual creation and
implementation of the stored procedures. This option assumes that the
subscriber already has the schema and all the initial data for published tables.
Here we must be extremely careful:
- there must be NO difference in data between
the publisher and subscriber. This is not ensured by the setup and as DBAs we
must guarantee that this is the case. In some terminology this is referred to as
"Quiescing the system" while the data is transferred. Obviously if you
work on a system where there is 24 hour access, this is not going to be
possible, so option (2) or (4) would be required.
- if you are setting this up using a backup, the backup must be taken AFTER the publication has been created. Why is this so critical? Because creating the publication makes some changes to the schema which are essential for replication to work. In particular, the identity columns are set to "NOT FOR REPLICATION".
- be aware that this setup is not reflected accurately in the GUI. The normal way of setting up articles is to add them in the GUI and then to edit the article properties. This allows us to decide which permissions, triggers, indexes etc to take for a particular replicated article. Changes afterwards to eg the index setup are not propagated, but it allows for a granularity in the initialization process. In the case of a "replication support only" option, the article properties are MEANINGLESS! You are bypassing this functionality and taking the whole database.
(4) initialize with backup.
The complete schema and initial data for published tables are obtained from a
recent full backup of the publication database. As mentioned above, this backup must be taken AFTER the publication was created.
Once the backup of the publication database is created, we must ensure that the
subscriber has access to the backup file - either
the backup is transferred to the subscriber or is put on a fileshare that the
subscriber can read. The location of the backup and media type for the backup
are specified by 2 new parameters of sp_addsubscription: @backupdevicename and @backupdevicetype.
At this point I'll note that you'll need to do 3 things:
(a) remove redundant objects and tables. There's
no point cluttering up the database on the subscriber with nothing that wouldn't
be there is you had done an automatic initialization. This will speed up the
backup and clarify to other administrators what the subscriber is being used
for.
(b) replace timestamp columns with binary(8) columns. You'll have to use a
temporary holding table to achieve this.
(c) remember that the backup includes all data, even if row or column filters
are specified on a table - so some columns might need to be dropped.
How "initialize with backup" Works,
and How to avoid Pitfalls
When using "initialize with backup", a
transactional replication topology does not need to be quiesced during
configuration. With "initialize with backup", sp_addsubscription
internally calls RESTORE HEADERONLY to extract the last LSN of backup to be used
as the starting point of the new subscription. A subsequent comparison is made
to the xact_seqno value in distribution..MSrepl_commands, and the necessary
commands transferred. To ensure that all the required commands actually exist in
the MSrepl_commands table we must restore the backup before the retention period
is reached and the distribution cleanup agent removes the rows. BOL recommends
that we disable the distribution cleanup agent during this period to
ensure it all goes smoothly. If you get this wrong, you'll receive the following
message:
Msg 21397, Level 16, State 1, Procedure
sp_MSsetupnosyncsubwithlsnatdist, Line 213 The transactions required for
synchronizing the nosync subscription created from the specified backup are
unavailable at the Distributor. Retry the operation again with a more up-to-date
log, differential, or full database backup. The Subscriber was dropped.
Adding New tables
If you add a new article to a publication which has a subscriber configured as "replication support only" you have to be a little careful.
For a normal (default) subscription we would run the snapshot agent to create the scripts for the new article and this would then be applied at the subscriber. In this case things are different. After adding and propagating the new article, there will be a message in the distribution agent history saying "Applied the snapshot to the Subscriber". However when you look on the subscriber you'll see no such table. What has been created at the subscriber during this process are the internal stored procedures to apply commands to the new table. Creation of the actual table and the data at the subscriber is up to you. So, make sure that you use something like SSIS to transfer the table and data. Make sure that the identity columns are set to NOT FOR REPLICATION and any related objects are created there also - eg non-clustered indexes and triggers. Really you need to do this before subscribing to the new article. Essentially all of this means that the system needs to be quiesced while this happens so we don't lose any data. Therefore we could say that the initial reason we used this method breaks down when we add articles!
Conclusions
Hopefully this article will help you to take
advantage of the useful new options in setting up a nosync initialization. This
option has always involved some necessary scripting requirements, and it could
never be done entirely using the GUI. Now with SQL Server 2005/8 this is even more
so, but the additional options present a necessary alternative to automatic
initializations.
Thanks to Nigel
Maneffa for pointing out some improvements on this article.
|