A pint of ACID.

First of all I'd like to assure non-developers accidentally reading this post that it has nothing to do with lysergic acid diethylamide, an A-class substance commonly known as LSD or "acid". It's all about ACID properties of database transactions in general (boring stuff), and the "I" property and scalability in particular. Now that we have the legalities behind us let's get back to business.

According to Wikipedia the "I" in the acronym has the following meaning:

Isolation refers to the ability of the application to make operations in a transaction appear isolated from all other operations. This means that no operation outside the transaction can ever see the data in an intermediate state; a bank manager can see the transferred funds on one account or the other, but never on both - even if he ran his query while the transfer was still being processed. More formally, isolation means the transaction history (or schedule) is serializable. This ability is the constraint which is most frequently relaxed for performance reasons.

The last sentence is of particular interest as it implies that isolation comes at a cost (concurrency vs. consistency) and this basic fact prompted me to do some experiments and in result write this post.

The Problem

The main problem with maintaining isolation is that resources which are supposed to be isolated have to be locked. As I am sure you are aware of it locking in general, and in databases in particular, means lower concurrency and throughput.

To see exactly how database performance is affected by concurrent reads and updates I devised a simple "bank" scenario: while one thread moves money between hypothetical accounts the other tries to get the total amount of money held in the bank (which has to remain constant).Various approaches to this problem are the subject of the rest of this post and as you'll hopefully see they produce very different results. Although the topic may seem trivial (and rightly so) it represents a wider class of problems where the same table is read and updated concurrently.

All that's needed to test the "bank" scenario is a table defined as follows:

image

With two stored procedures, one for moving the money and the other which gets the total accumulated:

image

Once the table has been populated with 10000 rows (and equal amount of money in each account) I ran the test app to see how many times I could concurrently execute both stored procedures within 20 seconds and what results would I get (see attachment for the test app and SQL setup script).

First run

The first run of the program with default SQL Server settings (isolation level set to READ_COMMITTED) produced following results:

Operation

Total # of executions (average of 3 runs)

Reads (totals retrieved)

1656

Writes (transfers executed)

19300

Inconsistent results

1120

 

The interesting thing in this case is the number of inconsistent results, i.e. the database reported incorrect total of all balances held in our "bank". In spite of the fact that the movement of money happens within a transaction, the "reader" is clearly able to see "half-committed" data. In case you were wondering why this happens have a look at the following table which illustrates the cause of the problem.

 

Time

Reader

Writer

1

Reads row ID=1, gets the balance = 100 and releases the lock.

 

2

 

Updates row ID=1 sets the balance=balance-10 (90) and exclusively locks the row.

3

 

Updates row ID=2 sets the balance=balance+10 (110) and exclusively locks the row.

4

 

Commits and releases both locks.

5

Reads the row ID=2, gets the balance of 110 producing total of 210!

 

 

Due to different locking strategies the reader hits rows updated by the other statement (and quite possibly the other way around). The reader never reads uncommitted data so in principle the database obeys the READ_COMMITED isolation level, the end result however may be far from desired and many people will find it surprising.

 

Approach no 1

The first possible approach to produce consistent results which came to my mind was to set the isolation level for the GetTotal stored procedure to REPEATABLE READ. In this case the locks are held on the data for the duration of the transaction in order to prevent updates. The outcome however was an immediate and somewhat surprising deadlock which the following table explains:

Time

Reader

Writer

1

Reads row ID=1, gets the balance = 100 and holds shared lock on the row.

 

2

 

Updates row id=2 with balance = balance - 10 and holds the lock

3

Tries to read the row ID = 2 but it's already exclusively locked by the writer so the statement is waiting for the lock to be released.

 

4

 

Tries to update row id=1 but the row is locked by the other statement. Transaction deadlocks.

5

Reader is chosen as the deadlock victim as there is less work to "rollback".

 

 

Approach no 2

The reason for the deadlock above is the "progressive" locking of the rows as the SELECT query continues along the table.  The simple solution is to use the TABLOCK hint in the query to make sure that the entire table is locked in one "go".

image

This approach works exactly as expected (no inconsistent results) the downside however is an immediate drop in "writer" performance as the following table illustrates:

Operation

Total # of executions (average of 3 runs)

Reads (totals retrieved)

3823

Writes (transfers executed)

3072

Inconsistent results

0

 

Approach no 3

The third approach was something I was very keen to test: in SQL Server 2005 there is a magic option (actually a couple of them) which enables row versioning.

Row versioning is not a new thing (in fact Oracle RDBMS used it "by definition" for as far as I care to remember) and the whole concept works (very roughly) as follows:

Time

Reader

Writer

1

 

Writer transaction starts

2

Reader transaction starts

 

3

 

Row ID=1 Updated with balance+10=110, new version  number applied to the row and the old version of the row gets stored in the "version store"

4

Tries to retrieve row ID=1, the engine realizes that the row version is different than it was at T=2 so retrieves previous version of the row from the "version store" with balance = 100.

 

5

 

Row ID=2 Updated with balance-10=90, new version  "number" applied and the old version of the row gets stored in the "version store"

6

Tires to retrieve row ID = 2, the engine realizes again that the row has been updated after the statement started so retrieves previous version from the "version store" with balance = 100.

 

7

Produces correct total of 200 exactly as it was at the time the statement started.

 

 

The simplest approach to enable statement level row versioning is to use READ_COMMITTED_SNAPSHOT database option.

image

Executing the above statement enables row versioning for statements (not transactions) executing at READ_COMMITED isolation level which incidentally is the default isolation level. This means that no changes in the application are usually necessary to benefit from this type of row versioning. After setting the option the test app produced following results: 

Operation

Total # of executions (average of 3 runs)

Reads (totals retrieved)

3103

Writes (transfers executed)

24855

Inconsistent results

0

 

As you can see this approach not only solves the problem of inconsistent result but also substantially increases throughput of the app. This is because one of the interesting side effects of row versioning is the fact that no locks are applied during execution of SELECT statements. When READ_COMMITTED_SNAPSHOT is active only "writers" block "writers" which is in stark contrast to default SQL Server behaviour where everybody locks just about everybody else (only readers do not block readers).

Approach no 4

READ_COMMITTED_SNAPSHOT works at the statement level i.e. consistent view of the data is maintained relative to the start of the statement. In case of our test application this is perfectly sufficient because GetTotal () stored procedure executes a single select statement.

If we wanted to maintain consistency at transaction level, the ALLOW_SNAPSHOT_ISOLATION option has to be set to ON. The downside is that in such a case row versioning has to be explicitly requested by setting the transaction isolation level to SNAPSHOT.

image

Once the reader procedure has been modified the test app produced following results:

Operation

Results (average)

Reads (totals retrieved)

3156

Writer (transfers executed)

22697

Inconsistent results

0

 

The differences in performance of both approaches using row versioning should be considered negligible as the performance varied quite widely between runs.

Wrap Up

As these results hopefully illustrate row versioning not only solves some annoying result "consistency" problems but also substantially increases performance of our sample app (see the following graph).

image

I would not recommend blindly applying row versioning strategies in every case as they put additional stress o the TEMPDB (all updates have to save old row versions in there) and have some other surprising properties but it's clearly an option worth considering for scenarios where concurrency (locking and/or deadlocks) becomes an issue. Following two graphs illustrate number of lock requests with row versioning disabled and enabled. As you can clearly see the differences are substantial and so will be the scalability of a system with frequent and concurrent reads and updates.

Lock requests and waits with READ_COMMITTED isolation level.

image

Lock requests and waits with READ_COMMITTED_SNAPSHOT isolation level (read line illustrates processor usage during the test as the number of lock requests and is minimal)

image

March 6 2008
blog comments powered by Disqus