ReSharper 4.0 from JetBrains is released

The JetBrains team have been working hard at getting functionality and bug fixes into version 4.0 of ReSharper.  I'm very excited to see that it's finally reached RTM status and now fully supports Visual Studio 2008 and all the C# 3.0 goodness!  Go get it here.  For those who have been involved in the EAP program, it's actually build 819.

For those who've never tried R#, I strongly suggest that you take it for a spin.  The improvements it brings to productivity are just immense.  For the full list of features the JetBrains site has all the info, but here are a couple of my favourite features:

Go To File / MemberResharper_navigation
This functionality is brilliant at quickly navigating around large solutions.  You can type in the full name of the class / method / file that you are after, just the capital letters, you can use wildcards or you can combine all these techniques.

Refactoring Options
Yes I know that Visual Studio has included some refactoring functionality for a while now, but this one of ReSharper primary strengths.  It includes options such as:

  • Rename class / method / variable / parameter / field / property (optionally searches within comments and strings)
  • Change signature
  • Extract / inline a method
  • Extract interface / superclass
  • Push members up / pull members down
  • Introduce parameter
  • Introduce field
  • Introduce variable (I esp like the way it detects multiple occurrences of an expression of offers to replace them all)
  • Convert anonymous delegates to / from lambdas
  • Set field values from constructor arguments

There's so many of these - these are just the ones I can think of right now.  In fact, there are so many that I always keep a copy of the ReSharper keyboard mappings on my desk.

Coding in Reverse
Undoubtedly the most powerful feature available within ReSharper, JetBrains picked it out as the primary skill of the ReSharper Jedi.  This is a technique where you use variables / methods / classes before they exist.  ReSharper will then infer their definition from your usage and allow you to actualize them with a couple of keystrokes.  It's immensely powerful, but it's really got to be seen to be fully appreciated.

And More
I strongly ReSharper to all developers who want to improve their productivity ... which should be all developers.  I also strongly suggest that you take some time out to learn the featureset and some of the options available within ReSharper.  Joe White's blog series The 31 Days of ReSharper is a great place to start this journey.  Just be warned that you might get seriously hooked.

June 9 2008

A pint of ACID.

First of all I'd like to assure non-developers accidentally reading this post that it has nothing to do with lysergic acid diethylamide, an A-class substance commonly known as LSD or "acid". It's all about ACID properties of database transactions in general (boring stuff), and the "I" property and scalability in particular. Now that we have the legalities behind us let's get back to business.

According to Wikipedia the "I" in the acronym has the following meaning:

Isolation refers to the ability of the application to make operations in a transaction appear isolated from all other operations. This means that no operation outside the transaction can ever see the data in an intermediate state; a bank manager can see the transferred funds on one account or the other, but never on both - even if he ran his query while the transfer was still being processed. More formally, isolation means the transaction history (or schedule) is serializable. This ability is the constraint which is most frequently relaxed for performance reasons.

The last sentence is of particular interest as it implies that isolation comes at a cost (concurrency vs. consistency) and this basic fact prompted me to do some experiments and in result write this post.

The Problem

The main problem with maintaining isolation is that resources which are supposed to be isolated have to be locked. As I am sure you are aware of it locking in general, and in databases in particular, means lower concurrency and throughput.

To see exactly how database performance is affected by concurrent reads and updates I devised a simple "bank" scenario: while one thread moves money between hypothetical accounts the other tries to get the total amount of money held in the bank (which has to remain constant).Various approaches to this problem are the subject of the rest of this post and as you'll hopefully see they produce very different results. Although the topic may seem trivial (and rightly so) it represents a wider class of problems where the same table is read and updated concurrently.

All that's needed to test the "bank" scenario is a table defined as follows:

image

With two stored procedures, one for moving the money and the other which gets the total accumulated:

image

Once the table has been populated with 10000 rows (and equal amount of money in each account) I ran the test app to see how many times I could concurrently execute both stored procedures within 20 seconds and what results would I get (see attachment for the test app and SQL setup script).

First run

The first run of the program with default SQL Server settings (isolation level set to READ_COMMITTED) produced following results:

Operation

Total # of executions (average of 3 runs)

Reads (totals retrieved)

1656

Writes (transfers executed)

19300

Inconsistent results

1120

 

The interesting thing in this case is the number of inconsistent results, i.e. the database reported incorrect total of all balances held in our "bank". In spite of the fact that the movement of money happens within a transaction, the "reader" is clearly able to see "half-committed" data. In case you were wondering why this happens have a look at the following table which illustrates the cause of the problem.

 

Time

Reader

Writer

1

Reads row ID=1, gets the balance = 100 and releases the lock.

 

2

 

Updates row ID=1 sets the balance=balance-10 (90) and exclusively locks the row.

3

 

Updates row ID=2 sets the balance=balance+10 (110) and exclusively locks the row.

4

 

Commits and releases both locks.

5

Reads the row ID=2, gets the balance of 110 producing total of 210!

 

 

Due to different locking strategies the reader hits rows updated by the other statement (and quite possibly the other way around). The reader never reads uncommitted data so in principle the database obeys the READ_COMMITED isolation level, the end result however may be far from desired and many people will find it surprising.

 

Approach no 1

The first possible approach to produce consistent results which came to my mind was to set the isolation level for the GetTotal stored procedure to REPEATABLE READ. In this case the locks are held on the data for the duration of the transaction in order to prevent updates. The outcome however was an immediate and somewhat surprising deadlock which the following table explains:

Time

Reader

Writer

1

Reads row ID=1, gets the balance = 100 and holds shared lock on the row.

 

2

 

Updates row id=2 with balance = balance - 10 and holds the lock

3

Tries to read the row ID = 2 but it's already exclusively locked by the writer so the statement is waiting for the lock to be released.

 

4

 

Tries to update row id=1 but the row is locked by the other statement. Transaction deadlocks.

5

Reader is chosen as the deadlock victim as there is less work to "rollback".

 

 

Approach no 2

The reason for the deadlock above is the "progressive" locking of the rows as the SELECT query continues along the table.  The simple solution is to use the TABLOCK hint in the query to make sure that the entire table is locked in one "go".

image

This approach works exactly as expected (no inconsistent results) the downside however is an immediate drop in "writer" performance as the following table illustrates:

Operation

Total # of executions (average of 3 runs)

Reads (totals retrieved)

3823

Writes (transfers executed)

3072

Inconsistent results

0

 

Approach no 3

The third approach was something I was very keen to test: in SQL Server 2005 there is a magic option (actually a couple of them) which enables row versioning.

Row versioning is not a new thing (in fact Oracle RDBMS used it "by definition" for as far as I care to remember) and the whole concept works (very roughly) as follows:

Time

Reader

Writer

1

 

Writer transaction starts

2

Reader transaction starts

 

3

 

Row ID=1 Updated with balance+10=110, new version  number applied to the row and the old version of the row gets stored in the "version store"

4

Tries to retrieve row ID=1, the engine realizes that the row version is different than it was at T=2 so retrieves previous version of the row from the "version store" with balance = 100.

 

5

 

Row ID=2 Updated with balance-10=90, new version  "number" applied and the old version of the row gets stored in the "version store"

6

Tires to retrieve row ID = 2, the engine realizes again that the row has been updated after the statement started so retrieves previous version from the "version store" with balance = 100.

 

7

Produces correct total of 200 exactly as it was at the time the statement started.

 

 

The simplest approach to enable statement level row versioning is to use READ_COMMITTED_SNAPSHOT database option.

image

Executing the above statement enables row versioning for statements (not transactions) executing at READ_COMMITED isolation level which incidentally is the default isolation level. This means that no changes in the application are usually necessary to benefit from this type of row versioning. After setting the option the test app produced following results: 

Operation

Total # of executions (average of 3 runs)

Reads (totals retrieved)

3103

Writes (transfers executed)

24855

Inconsistent results

0

 

As you can see this approach not only solves the problem of inconsistent result but also substantially increases throughput of the app. This is because one of the interesting side effects of row versioning is the fact that no locks are applied during execution of SELECT statements. When READ_COMMITTED_SNAPSHOT is active only "writers" block "writers" which is in stark contrast to default SQL Server behaviour where everybody locks just about everybody else (only readers do not block readers).

Approach no 4

READ_COMMITTED_SNAPSHOT works at the statement level i.e. consistent view of the data is maintained relative to the start of the statement. In case of our test application this is perfectly sufficient because GetTotal () stored procedure executes a single select statement.

If we wanted to maintain consistency at transaction level, the ALLOW_SNAPSHOT_ISOLATION option has to be set to ON. The downside is that in such a case row versioning has to be explicitly requested by setting the transaction isolation level to SNAPSHOT.

image

Once the reader procedure has been modified the test app produced following results:

Operation

Results (average)

Reads (totals retrieved)

3156

Writer (transfers executed)

22697

Inconsistent results

0

 

The differences in performance of both approaches using row versioning should be considered negligible as the performance varied quite widely between runs.

Wrap Up

As these results hopefully illustrate row versioning not only solves some annoying result "consistency" problems but also substantially increases performance of our sample app (see the following graph).

image

I would not recommend blindly applying row versioning strategies in every case as they put additional stress o the TEMPDB (all updates have to save old row versions in there) and have some other surprising properties but it's clearly an option worth considering for scenarios where concurrency (locking and/or deadlocks) becomes an issue. Following two graphs illustrate number of lock requests with row versioning disabled and enabled. As you can clearly see the differences are substantial and so will be the scalability of a system with frequent and concurrent reads and updates.

Lock requests and waits with READ_COMMITTED isolation level.

image

Lock requests and waits with READ_COMMITTED_SNAPSHOT isolation level (read line illustrates processor usage during the test as the number of lock requests and is minimal)

image

March 6 2008

Design-time perf improvements in Visual Studio 2008

Some metrics that we captured on my project: (I think the numbers speak for themselves)

Operation     VS 2005     VS 2008           % Improvement
Start the IDE 00:30 00:09 70%
Load solution 03:45 00:45 80%
Get Latest (no changes) 00:13 00:05 62%
Rebuild solution 02:21 01:58 16%
Run all tests 05:29 03:44 32%
Start the application 00:19 00:18 5%

FYI our solution has 33 projects in it (including a database project, a web application project, numerous DLLs, an EXE and a WiX installer) and there are almost 1,600 unit and integration tests.

January 28 2008

Altnetconf UK: First round of registration is closed

Registration to the first ALT.Net UK conference was recently announced by Michelle.  Unfortunately it's already full, so if you're name isn't on the participants list of people who have registered then I'm afraid you're a bit late for this round.  Ian Cooper says that another batch of registrations will open soon, so those who watch closely may still be able to attend.  I'm pleased to announce that I will be attending the conference as a Conchango representative.
January 3 2008
Newer Posts Older Posts