How to Massively Optimise Transactional Replication
Article by Paul Ibison
Of course us SQL Server Replicators all know that Transactional replication uses the distribution database. The log reader agent stores the replicated commands in the MSrepl_commands and MSrepl_transactions tables, and then the distribution agent reads these commands and applies them at the subscribers. After that they are removed by the SQL Server Agent job “Distribution clean up: distribution”in order to maintain and optimise the process. But are the commands really cleaned up? Have a look in the MSrepl_commands table (using sp_browsereplcmds) and see if there are commands which have already all been sent to the subscribers. In fact also have a look at the size of this table- is it big/huge? If so, it is quite likely that there is a big problem with your system!
Reason for the Problem
The problem will relate to a single setting of your publication: 'immediate_sync'!
Basically the cleanup job internally calls sp_MSdelete_publisherdb_trans to remove records from the MSrepl_commands table. If a publication has "immediate_sync" set to 1 in the MSpublications table, then the MSrepl_commands records will not be removed until the commands are older than the max retention setting (3 days by default).
To see what your publication is currently set as, run the following proc:
exec sp_helppublication @publication = 'your_publication_name'
If you see a 1 for “immediate_sync” then this is the problem. This setting means that any DML commands run against replicated articles at the publisher, which took place after the snapshot was created, will be required to remain around until the end of your max retention period before they can be removed, even if they have all been sent to the subscriber.
Another way of determining if there is a problem is running the following:
select * from distribution.dbo.MSsubscriptions
This will show ‘virtual’ as the value for subscriber_db which is an additional way this setting is recorded in the replication metadata tables.
But what is "immediate_sync" for and where did this setting come from?
First things first, what is this setting for….From BOL “Specifies if the synchronization files for the publication are created each time the Snapshot Agent runs”. This sounds odd – almost self-evident in fact. However it is actually quite subtle. Suppose I add a new article to an existing publication and then run the snapshot agent, should I get a snapshot of the new article on its own, or a complete snapshot of everything? In my experience I usually only want the new article, and if a new subscriber comes along, I’ll run the snapshot agent manually to get all the articles together. However I have seen questions relating to this in the newsgroup - some publications seem to produce an entire snapshot when new articles are added and others not. It is this setting - 'immediate_sync' - which controls the behaviour. If 'immediate_sync' is set to 1 it will ensure the snapshot agent always produces an entire snapshot. This snapshot is available for new subscribers until the end of the retention period, and likewise the commands in the distribution database will also remain until the end of the retention period. If you have a lot of subscribers coming on board and a lot of schema changes then I suppose it could be useful, but generally not so.
So – where did it come from if I didn’t set it myself? A newsgroup poster (James Brake) pointed out a while back that when using the Create Publication wizard there is an innocent looking check box on the Snapshot Agent dialog (below).
The label says 'Create a snapshot immediately and keep the snapshot available to initialize subscriptions'. If it is checked then the resulting publication will have immediate_sync set to 1 (true).
So, how can we change it to something reasonable and remove all those unwanted commands that are slowing up our synchronizations and clogging up the system?
Basically run the following:
Select * from distribution.dbo.MSsubscriptions
...will show no virtual subscriptions.
After that running the “Distribution clean up: distribution” job should result in a huge difference to the size of the data.
If you are using the wizards to create your publications, you might be inadvertently throttling performance for no extra gain. Check the details in the article to see if this is the case and then use the simple scripts to modify if this is the case!
Paul Ibison, Copyright Â© 2013