How to Optimize Database Snapshots In Parameterized Merge Publications in SQL Server 2005BackgroundIn a typical merge publication, each subscriber is
essentially equivalent. When we initialize the subscribers they all get the same
data, and afterwards if all the subscribers were fully synchronized they would
again all have identical data. I said that this is the case for "typical" merge
replication, because there is also the possibility of partitioning the data
according to each particular subscriber. In SQL Server 2000 this was referred to
as "dynamic filters" while in SQL Server 2005 this is called "parameterized
filters". Adding the FilterAll you need to have in order to set up data partitions is a valid WHERE clause in order to split the data according to the subscribers. You create a merge publication as per usual and in the wizard select the option to add a filter. For this particular example, I used a single table with one column called ComputerName. In the table there are 4 million rows like this: "IBM" The parameterized filter is shown in the wizard screen below ("Filter statement:"):
For parameterized filters, the partition must be defined
using either the HOST_NAME() function or the SUSER_SNAME() function. Don't be
put off by having to use these functions if they seem to be completely unrelated
to your own data - a relevant business value for the function is hard-coded for
the subscriber when the subscription is created ie the actual value of HOST_NAME()
is not itself used. Once the partition clause is defined against one table,
publication join filters can be used to propagate the filtering down to the
other child tables. Now for the clever bit. Instead of creating a whole snapshot and having each subscriber doing the filtering and the consequential slow separate insert statements etc, we pre-create the snapshots - one for each subscriber - which have just the data required for that particular subscriber. Using the Data Partitions TabTo pre-create the partitioned snapshots you use the "Data Partitions" page of the Publication Properties dialog box, after having already completed an initial (normal) snapshot.
In the figure above you can see that I have already entered
2 values for the HOST_NAME()
function - 'IBM' and 'Compaq'.
Note that a lot of this was already available in SQL
Server 2000. There have been some improvements though. In SQL Server 2005 we can set it all up using the GUI rather than
by scripts, which is one advantage. Another improvement is that the jobs can be
initiated from the GUI. Also, the subscription no longer needs to be pointed at the
snapshot location manually. If you're wondering how the merge agent in SQL
Server 2005 "knows" where
the snapshot actually is, then the data is stored in the following tables: Time Benefits IllustratedSo, now you know how to set it up, what are the benefits?
Remember in the simple table I had added 4 million rows - 1 million for each computer
name.
The times taken initialization of the "Compaq" partition are shown below:
Partitioned Snapshot
So, the overall time for initialization has reduced from 18 mins to 1 min. Hopefully this article will encourage you to investigate the use of dynamic snapshots! |
Paul Ibison, Copyright © 2011 |