Home  |  Articles  |  Scripts  |  Consultancy  |  FAQs  |  Links  |  SSIS

 

SQL Server 2005 BCP Partitioning

Introduction

In SQL Server 2000 when an article is BCPd to the file system (distribution working folder) during the snapshot generation, there is always just one file used for the data. Conversely, in SQL Server 2005 when you look in the distribution working folder after creating a snapshot you might be surprised to find many such files for each article, each containing a part of the table data.
 

 

Clearly there has been a big change in the processing rules, which is not documented in BOL in any detail. I'll refer to this overall process as "BCP Partitioning" - getting the term from developers posting in the Microsoft Replication Newsgroup. This article explains why BCP Partitioning exists, what to expect to occur and how to troubleshoot if it all goes wrong.

Why was it created?

There are several benefits of BCP Partitioning. Firstly, when the snapshot is being applied to the subscriber, there might be a network outage. In SQL Server 2000 this would mean that the complete snapshot would need to be reapplied, and in the case of "concurrent snapshot" this will be all in one transaction. However, if you have a SQL Server 2005 distributor and SQL Server 2005 subscribers there is now a much greater granularity in the process. Each partition is applied in a separate transaction, meaning that after an outage the snapshot distribution is able continue at the partition level where it left off and complete just the remaining partitions. For a table containing a lot of rows this could lead to a huge saving in time. Other useful side-effects are that this can cause less expansion of the transaction log (assuming the migration crosses a backup schedule or we're using the simple recovery model) and it can lead to paths of parallel execution of the BCP process for those machines having > 1 processor (it's true that parallel execution existed in SQL Server 2000, but this was only for several articles and not for a single table). Similarly, the same benefits apply when creating the initial snapshot using the snapshot agent.

Testing It

To test the number of files produced and to try to investigate the algorithm, I created a simple table (below) and populated it.

CREATE TABLE TestBCP(id int not null,
fullname varchar(100) COLLATE SQL_Latin1_General_CP1_CI_AS null,
CONSTRAINT PK_TestBCP PRIMARY KEY CLUSTERED (id ASC) ON PRIMARY
) ON PRIMARY

declare @id int
set @id = 1
while @id < 1501
begin
INSERT INTO testbcp(id, fullname)
values(@id, 'Name ' + cast(@id as varchar(100)))
set @id = @id + 1
end

On my machine I initially found a simple dependency on the number of rows which on further investigation was completely unrepeatable! ie the number of batch files seemed to vary with the size of the table in unpredictable ways. Quite why this occurred is fully explained in the troubleshooting section.

Results

(a) For 8 processors or less, the formula used to calculate the number of partitions is:

default # partitions = NUMBER_OF_PROCESSORS x 4

So, in my case as I have 4 processors, I would expect there to be 16 partitions which is exactly what I saw. Even for 1 processor there could be 4 partitions, which reinforces the fact that multithreading is only part of the reason behind the algorithm. Note that the –BcpBatchSize parameter of the snapshot and distribution agents simply governs how often progress messages are logged and has no bearing at all on the number of partitions.

(b) there is a threshold of a minimum of 1000 rows after which partitioning occurs.

(c) the data was found to be evenly distributed amongst the partitions (data files).

Troubleshooting

(a) "why is the number of partitions incorrect"?

In many cases, as mentioned above, what I found wasn't in accordance with the 3 rules. In some cases I had expected to see many partitions and all the data was BCPd to just one file and in others I only had <1000 rows although many partitions were created, most of which were empty.
Runing DBCC SHOWSTATS - eg DBCC SHOW_STATISTICS ('testbcp','PK_TestBCP') below - revealed the problem. It turned out that the statistics were out of date and the replication processes doesn't update the stats themselves.

In reading the stats output above the "Rows" total should be identical to that returned from "select count(*) from testbcp". Running UPDATE STATISTICS testbcp in all cases resulted in these totals being the consistent and subsequently the number of partitions created was in agreement with the 3 rules.

(b) Disabling BCP Partitioning

To disable BCP Partitioning, you can add the unofficial "-EnableArticleBcpPartitioning 0" switch to the snapshot agent as shown below and a single datafile will be produced, just like in SQL Server 2000:

 

Why would you want to turn off such a useful feature? Well, anecdotally, things may get worse for folks don’t start off with empty tables (archiving or roll-up scenarios) or if they use concurrent snapshot (default for SQL2005) and any or all of CPU, disk I/O, and network bandwidth can be the bottleneck in the attempt to extract more snapshot processing throughput using BCP partitioning.

(c) Ensuring Bulk-Logging

For those tables which really expand the transaction log, some DBAs like to enable the bulk-logged recovery mode to minimise logging, but this won't always work when we are dealing with multiple partitions. To ensure that there is a maximum chance of going down the bulk-logged path you should use -MaxBcpThreads > 1 for the distribution agent and ensure that the target table doesn't have any indexes on it before the distribution agent delivers the snapshot or just use use -MaxBcpThreads = 1 to turn off parallelism although the latter option obviously might reduce performance.

Conclusions

The new BCP Partitioning functionality offers many new improvements - initialization is generally faster and certainly more resilient. Mostly we'd not even need to know that this mechanism had changed in SQL Server 2005 and simply accept that somehow the system seems improved and faster, but hopefully this article shed a little more light on the issues going on behind the scenes and will help if troubleshooting is ever called for.