When working with a large table the UPDATE statement can easily bring even the most robust server to a screeching halt. In this posting I will show an alternative so even the largest tables can be updated without negatively impacting server performance.

Updating a million rows

We’ve all been there; writing a simple update statement that is just setting one column’s value to a new value.  And maybe we are using a simple join to determine what rows need to be updated.  Without thinking about it you click execute and wait… and wait… and wait…

 

The transaction is trying to update > 1 million rows and begins writing to the local disk system or worse the SAN.  The query may eventually finish if you are lucky, or it just may time out and need to rollback if you are unlucky.

What to do? What to do?

Fortunately, T-SQL supports a simple looping structure.  You can use a WHILE loop with a BEGIN and END statement to denote the sql commands to execute.

Example

Explanation

In the example below I set the variable ROWCOUNT to 10000.  Next we will loop through the UPDATE statement in groups of 10000 WHILE ROWCOUNT > 0.

The UPDATE statement will execute, and commit the transaction updating the 10000 rows, then the next 10000 rows will be updated, until all of the rows have been updated.

Caveat

The query above will still be ACID compliant, even if it is interrupted mid-stream.  The query can just be restarted without any issues.  The reason being is that if one of the batches of 10000 doesn’t complete the transaction for those 10000 rows will not be committed.  And since in the WHERE clause we are looking for any values that haven’t been updated, i.e. in this example they are NULL, the query will just pick up from where it left off.

Real World Statistics

By updating in batches of 10000 I was able to update a table with > 100 million rows in under 4 minutes.  If I tried to update all 100 million rows in one query, it wasn’t possible to update the rows.  In another example I updated 10 million rows in under 1 1/2 minutes.  Alternatively the same query trying to update all 10 million rows took more than 25 minutes. Your results may vary.