Performance Matters

The very definition of software performance will vary depending on whom you ask. If you asked the end user he would immediately mention the “speed” of the application he has to work with. If you asked the CIO he would probably define performance as “throughput” measured in transactions per seconds. Finally if you asked an IT guy who has to deal with the hardware end of the system he would say that he needs scalability so that his duties are limited to provisioning more hardware when demand increases. All these elements: response time, throughput and scalability are desired components of software performance.

I have spent last 12 months working pretty much continuously on performance optimisation and James Saul asked me to share some of my findings with a wider audience. To start somewhere I went on to dig up some resources on wikipedia and came across an interesting article on performance engineering. According to the article one of the objectives of this discipline is to “Increase business revenue by ensuring the system can process transactions within the requisite timeframe”. In other words performance is money and there is probably no better example of how it is lost than total meltdown of the Debenham’s website which took place just before last Christmas. I have to admit here that I have no idea what went wrong at Debenhams but I can easily imagine a number of ways to build a software product which breaks under heavy load. As they say there is more than one way to skin a cat and build poor quality software but this time round I will focus primarily on the “process” issues, rather than particular technical aspects.

Small database syndrome (aka SDS)

Personally I think that SDS is the major contributor towards building poorly performing programmes: if the development team works against a tiny database, they are very likely to get in serious trouble further down the line and there are a number of reasons for it. The most obvious is the fact that there will be more data (surprise, surprise) so naturally more work will be required to get whatever you want out of the database. Secondly, query plans will be turned upside down in light of larger tables and distribution of data will influence it heavily as well. And last but not least when working against a small datasets it is impossible to spot any potential performance problems as everything will (or at least should) execute rather quickly.

The best example of spectacular “volume related” failure I have witnessed not so long ago is an application which when fired for the first time against fully populated database, executed 40 000 SQL queries during its start-up and the whole procedure took a better part of 40 minutes. To add insult to an injury, some of the tables involved in the queries were missing rather crucial indexes while others were indexed incorrectly (not that it matters a lot when you execute 40 000 queries to start one instance of the app). This potential fiasco made everyone involved in the project somewhat embarrassed and steps were taken to avoid such mishaps in the future. Luckily for the team this accident happened early enough in the project lifecycle and fixing it was relatively cheap and easy. But as you can hopefully see from this example SDS is a serious risk and I find it somewhat difficult to understand that people oftentimes try to find all possible excuses not to use properly sized database for development and/or testing. The one I hear most often is related to cost, measured in terms of either time or money; resources which someone has to spend to produce the data. But given the availability of data generation tools like the one provided by Redgate this is indeed a very poor explanation. It is even worse considering that cost of maintaining such a dataset is just a fraction of the total cost of the project.

“We will have better hardware in production”

This is another one of my favourites which I hear a lot when people testing an application realise that something is not quite right performance-wise. Accepting that the app is sluggish usually means that someone has to admit to a failure of sorts and nobody likes it. So people usually go into denial and try to find excuses not to tackle the problem now and then. If you consider that most of us developers work on single processor machines, it is not hard to see how people may fall into this trap, but even so basic calculations often prove that hoping to kill the problem with hardware may be nothing more than wishful thinking. Let me illustrate it with an example: lets consider a sample operation which takes 10 seconds on a modern single processor PC with plenty of RAM. It is easy to imagine that production hardware may be 20 times more powerful, leading to a false conclusion that in production the same process will take 1/20th of 10 seconds i.e. 500 ms. Job done. The failure in such reasoning is first of all an assumption that the production hardware will be serving one user at a time or that concurrent user load will have no influence on performance. Secondly the more powerful hardware may be indeed 20 times the capacity of the PC, but this capacity will be available only when you are able to parallelise the algorithm! If the original process is sequential (single-threaded) in nature, adding more processors to the server will not change response time at all. So the only conclusion we can draw from running software on inferior hardware is that if it works on a PC, there is a chance that it will work on a big server, provided of course that the software is scalable. On the other hand if it does not work well on your PC, the chances that it will ever work anywhere else under substantially heavier user load are close to zero.

“We have no [time|requirement|money|resources] for performance testing”

Some wise people say that if you have no time for testing, than you better have time for fixing last minute bugs and patching the app. The same is pretty much true when it comes to performance. When building systems which will potentially face high user load it is absolutely imperative that load and stress testing have to be executed unless you want to face similar fate as the website I have mentioned earlier. I may be biased here because I like to load test software, but load testing the app is probably the best way to make sure that it actually works. Let me give you an example here: about 18 months ago I participated in a POC at Microsoft’s working on a a website for an airline. Together with another guy from EMC we were responsible for doing the back end of the system: the database and WCF based app server. As we finished our job earlier than expected I decided that it would not be a bad idea to actually make use of available resources and take that thing for a spin a see what it can do. The app server was running on an 8-way 64 bit machine with ample amounts of RAM so I whacked some unit tests simulating users’ journeys through the website, plugged the whole lot to a VSTS load testing machine and pressed the green button. As soon as I pressed it we discovered that the whole thing grinds to a rather embarrassing halt within several seconds from being started… After a bit of head scratching we decided that it is a rather good idea to close server connections once you are done with them and repeated the test scoring a rather measly result of 100 method invocation per second. To cut the long story short during the next few days we have discovered that from performance point of view it is actually wiser to use ADO .NET rather than LINQ to SQL, that when building high performance systems it is better to have network cables which work at full capacity of the switch they are connected to, and that SQL Server 2008 rocks and it would take load from 3 app servers before it became fully saturated. In the meantime our load testing machine ran out of puff and we needed to use two more 4-way boxes in order to generate enough load to saturate the app server. The end result of this exercise was 18 fold (sic!) increase in system performance not to mention the fact that it was happy working for hours with no end. And when it came to presentation of the finished website everyone was raving about how quick the whole thing was. The moral of the story is however that things will inevitably break under heavy load. If you load test them before handing them to the users, chances are that they will get much more robust system and you as application developer will save yourself potentially huge embarrassment.

PS: I know that this post is barely technical, but I promise to improve next time round :)

August 8 2009

Integration Testing WCF Services with TransactionScope

Integration testing WCF services can be right pain and I experienced it firsthand on my current project. If you imagine a hypothetical person repository service exposing 4 CRUD operations, it would make sense to test all four of them. If the operations were to be tested in random order, it would be perfectly feasible that testing update after a delete may fail if the object to be updated has been deleted by a previous test. The same obviously applies to reads following updates etc. In other words tests are dependent on each other and this dependency is really evil for a number of reasons: first of all it usually means that you cannot run tests in isolation as they may depend on modifications made by other tests. Secondly, one failing test may cause a number of failures in the tests which follow and thirdly, by the end of the test run underlying database is in a right mess as it contains modified data, forcing you to redeploy it should you wish to re-run the tests. Considering the fact that running integration tests is usually time consuming exercise, this vicious circle of deploy/run/fix becomes extremely expensive as the project goes on.

TransactionScope to the rescue

Fortunately for us WCF supports distributed transactions and if there is one place where they make perfect sense it is in the integration testing. Imagine a test class written along the following lines:

image

The idea behind it is that whenever a test starts, a new transaction gets initiated. When the test completes, regardless of its outcome, the changes are rolled back leaving the underlying database in pristine condition. This means that we can break dependency between tests, run them in any order and rerun the whole lot without the need for redeploying the database. Holy grail of integration testing :) To make it work however, the service needs to support distributed transactions (which is usually a not a bad idea anyhow). Having said that  have to be aware of various and potentially serious gotchas which I will cover later.

To make a service "transaction aware" following changes have to be made (I assume default, out of the box WCF project): first of all the service has to expose an endpoint which uses a binding which in turn supports distributed transactions (e.g. WsHttpBinding) and the binding has to be configured to allow transaction flow. This configuration has to be applied on both client (unit test project) and the server side:

image

Secondly, all operations which are supposed to participate in transaction have to be marked as such in the service contract:

image

The TransactionFlowOptions enumeration includes NotAllowed, Allowed and Required flags which I hope are self explanatory. Using Allowed flag is usually the safest bet as the operation will allow callers to call the service with or without transaction scope. Making the service transaction aware as illustrated above is usually enough to make this whole idea work.

The third change which is optional but I highly recommend it, is to decorate all methods which accept inbound transactions with [OperationBehaviorAttribute(TransactionScopeRequired=true, TransactionAutoComplete = true)]. By doing so we state that regardless of the client "flowing" transaction or not, the method will execute within transaction scope. If the scope is not provided by the client, WCF will simply create one for us which means that code remains identical regardless of the client side transaction being provided. The TransactionAutoComplete option means that unless the method throws an exception, the transaction will commit. This also means that we do not have to worry about making calls to BeginTransaction/Commit/Rolback anymore. The default for TransactionAutoComplete is true so strictly speaking it is not necessary to set it but I did it here for illustration purposes.

image

The attached sample solution contains a working example of person repository and may be useful to get you started.

The small print

Important feature of WCF is the default isolation level for distributed transactions which is Serializable. This means that more often than not, your service is likely to suffer badly from scalability problems should the isolation level remain set to the default value. Luckily for us WCF allows us to adjust it; the service implementation has to simply specify required level using ServiceBehaviorAttribute. Unless you know exactly what you are doing I would strongly recommend setting the isolation level to ReadCommitted. This is the default isolation level in most SQL Server implementations and it also gives you some interesting options.

image

Having done this the caller has to explicitly specify its required isolation level as well when constructing transaction scope.

image

An interesting "feature" of using transaction scope, in testing in particular, is the fact that your test may deadlock on itself if not all operations being executed within the transaction scope participate in it. The main reason for which this may happen is lack of TransactionFlowAttribute decorating the operation in service contract. In the test below if the GetPerson operation was not supporting transactions, yet the DeletePerson was, then an attempt to read the value deleted by another transaction would cause a deadlock. Feel free to modify the code and try it for yourself. 

image 

Distributed transactions will require MSDTC running on all machines participating in the transaction i.e. the client, the WCF server and the database server. This is usually the first stumbling block as MSDTC may be disabled or may be configured in a way which prevents it from accepting distributed transactions. To configure MSDTC you will have to use "Administrative Tools\Component services" applet from the control panel. MSDTC configuration is hidden in the context menu of "My Computer\Properties". Once you activate this option you will have to navigate to MSDTC tab and make sure that security settings allow "Network access" as well as  "Inbound/Outbound transactions".

image

Performance

One issue which people usually raise with regards to distributed transactions is performance: these concerns are absolutely valid and have to be given some serious consideration. The first problem is the fact that if the service has to involve transaction managers (MSDTC) in order to get the job done it usually means some overhead. Luckily, the transaction initiated in TransactionScope does not always need to use MSDTC. Microsoft provides Local Transaction Manager which will be used by default as long as the transaction meets some specific criteria: transactions involving just one database will remain local incurring almost no overhead (~1% according to my load tests). As soon as your transaction involves other resources (databases or WCF services) it will be promoted to distributed and will get a performance hit (in my test case it is 25% decrease in performance but your mileage may vary). To check if a method executes within local or distributed transaction you may inspect Transaction.Current.TransactionInfo.DistributedIdentifier: value equal to Guid.Empty means that transaction is local. The second issue affecting performance is the fact that transactions will usually take longer to commit/rollback meaning that database locks will be held for longer. In case of WCF services the commit will happen when the results have been serialized back to the client which can introduce serious scalability issues due to locking. This problem can be usually alleviated by using ReadCommitted isolation level and row versioning in the database.

Parting shots

The project I am currently working on contains some 2500 integration tests, 600 of which test our WCF repository. In order to make sure that every test obeys the same rules with regard to transactions we have a unit test in place which inspects all test classes in the project and makes sure all of them derive from the common base class which is responsible for setting up and cleaning the transaction. I would strongly recommend to follow this approach in any non trivial project as otherwise you may end up with some misbehaving tests breaking the whole concept.

Happy testing!

November 16 2008

A pint of ACID.

First of all I'd like to assure non-developers accidentally reading this post that it has nothing to do with lysergic acid diethylamide, an A-class substance commonly known as LSD or "acid". It's all about ACID properties of database transactions in general (boring stuff), and the "I" property and scalability in particular. Now that we have the legalities behind us let's get back to business.

According to Wikipedia the "I" in the acronym has the following meaning:

Isolation refers to the ability of the application to make operations in a transaction appear isolated from all other operations. This means that no operation outside the transaction can ever see the data in an intermediate state; a bank manager can see the transferred funds on one account or the other, but never on both - even if he ran his query while the transfer was still being processed. More formally, isolation means the transaction history (or schedule) is serializable. This ability is the constraint which is most frequently relaxed for performance reasons.

The last sentence is of particular interest as it implies that isolation comes at a cost (concurrency vs. consistency) and this basic fact prompted me to do some experiments and in result write this post.

The Problem

The main problem with maintaining isolation is that resources which are supposed to be isolated have to be locked. As I am sure you are aware of it locking in general, and in databases in particular, means lower concurrency and throughput.

To see exactly how database performance is affected by concurrent reads and updates I devised a simple "bank" scenario: while one thread moves money between hypothetical accounts the other tries to get the total amount of money held in the bank (which has to remain constant).Various approaches to this problem are the subject of the rest of this post and as you'll hopefully see they produce very different results. Although the topic may seem trivial (and rightly so) it represents a wider class of problems where the same table is read and updated concurrently.

All that's needed to test the "bank" scenario is a table defined as follows:

image

With two stored procedures, one for moving the money and the other which gets the total accumulated:

image

Once the table has been populated with 10000 rows (and equal amount of money in each account) I ran the test app to see how many times I could concurrently execute both stored procedures within 20 seconds and what results would I get (see attachment for the test app and SQL setup script).

First run

The first run of the program with default SQL Server settings (isolation level set to READ_COMMITTED) produced following results:

Operation

Total # of executions (average of 3 runs)

Reads (totals retrieved)

1656

Writes (transfers executed)

19300

Inconsistent results

1120

 

The interesting thing in this case is the number of inconsistent results, i.e. the database reported incorrect total of all balances held in our "bank". In spite of the fact that the movement of money happens within a transaction, the "reader" is clearly able to see "half-committed" data. In case you were wondering why this happens have a look at the following table which illustrates the cause of the problem.

 

Time

Reader

Writer

1

Reads row ID=1, gets the balance = 100 and releases the lock.

 

2

 

Updates row ID=1 sets the balance=balance-10 (90) and exclusively locks the row.

3

 

Updates row ID=2 sets the balance=balance+10 (110) and exclusively locks the row.

4

 

Commits and releases both locks.

5

Reads the row ID=2, gets the balance of 110 producing total of 210!

 

 

Due to different locking strategies the reader hits rows updated by the other statement (and quite possibly the other way around). The reader never reads uncommitted data so in principle the database obeys the READ_COMMITED isolation level, the end result however may be far from desired and many people will find it surprising.

 

Approach no 1

The first possible approach to produce consistent results which came to my mind was to set the isolation level for the GetTotal stored procedure to REPEATABLE READ. In this case the locks are held on the data for the duration of the transaction in order to prevent updates. The outcome however was an immediate and somewhat surprising deadlock which the following table explains:

Time

Reader

Writer

1

Reads row ID=1, gets the balance = 100 and holds shared lock on the row.

 

2

 

Updates row id=2 with balance = balance - 10 and holds the lock

3

Tries to read the row ID = 2 but it's already exclusively locked by the writer so the statement is waiting for the lock to be released.

 

4

 

Tries to update row id=1 but the row is locked by the other statement. Transaction deadlocks.

5

Reader is chosen as the deadlock victim as there is less work to "rollback".

 

 

Approach no 2

The reason for the deadlock above is the "progressive" locking of the rows as the SELECT query continues along the table.  The simple solution is to use the TABLOCK hint in the query to make sure that the entire table is locked in one "go".

image

This approach works exactly as expected (no inconsistent results) the downside however is an immediate drop in "writer" performance as the following table illustrates:

Operation

Total # of executions (average of 3 runs)

Reads (totals retrieved)

3823

Writes (transfers executed)

3072

Inconsistent results

0

 

Approach no 3

The third approach was something I was very keen to test: in SQL Server 2005 there is a magic option (actually a couple of them) which enables row versioning.

Row versioning is not a new thing (in fact Oracle RDBMS used it "by definition" for as far as I care to remember) and the whole concept works (very roughly) as follows:

Time

Reader

Writer

1

 

Writer transaction starts

2

Reader transaction starts

 

3

 

Row ID=1 Updated with balance+10=110, new version  number applied to the row and the old version of the row gets stored in the "version store"

4

Tries to retrieve row ID=1, the engine realizes that the row version is different than it was at T=2 so retrieves previous version of the row from the "version store" with balance = 100.

 

5

 

Row ID=2 Updated with balance-10=90, new version  "number" applied and the old version of the row gets stored in the "version store"

6

Tires to retrieve row ID = 2, the engine realizes again that the row has been updated after the statement started so retrieves previous version from the "version store" with balance = 100.

 

7

Produces correct total of 200 exactly as it was at the time the statement started.

 

 

The simplest approach to enable statement level row versioning is to use READ_COMMITTED_SNAPSHOT database option.

image

Executing the above statement enables row versioning for statements (not transactions) executing at READ_COMMITED isolation level which incidentally is the default isolation level. This means that no changes in the application are usually necessary to benefit from this type of row versioning. After setting the option the test app produced following results: 

Operation

Total # of executions (average of 3 runs)

Reads (totals retrieved)

3103

Writes (transfers executed)

24855

Inconsistent results

0

 

As you can see this approach not only solves the problem of inconsistent result but also substantially increases throughput of the app. This is because one of the interesting side effects of row versioning is the fact that no locks are applied during execution of SELECT statements. When READ_COMMITTED_SNAPSHOT is active only "writers" block "writers" which is in stark contrast to default SQL Server behaviour where everybody locks just about everybody else (only readers do not block readers).

Approach no 4

READ_COMMITTED_SNAPSHOT works at the statement level i.e. consistent view of the data is maintained relative to the start of the statement. In case of our test application this is perfectly sufficient because GetTotal () stored procedure executes a single select statement.

If we wanted to maintain consistency at transaction level, the ALLOW_SNAPSHOT_ISOLATION option has to be set to ON. The downside is that in such a case row versioning has to be explicitly requested by setting the transaction isolation level to SNAPSHOT.

image

Once the reader procedure has been modified the test app produced following results:

Operation

Results (average)

Reads (totals retrieved)

3156

Writer (transfers executed)

22697

Inconsistent results

0

 

The differences in performance of both approaches using row versioning should be considered negligible as the performance varied quite widely between runs.

Wrap Up

As these results hopefully illustrate row versioning not only solves some annoying result "consistency" problems but also substantially increases performance of our sample app (see the following graph).

image

I would not recommend blindly applying row versioning strategies in every case as they put additional stress o the TEMPDB (all updates have to save old row versions in there) and have some other surprising properties but it's clearly an option worth considering for scenarios where concurrency (locking and/or deadlocks) becomes an issue. Following two graphs illustrate number of lock requests with row versioning disabled and enabled. As you can clearly see the differences are substantial and so will be the scalability of a system with frequent and concurrent reads and updates.

Lock requests and waits with READ_COMMITTED isolation level.

image

Lock requests and waits with READ_COMMITTED_SNAPSHOT isolation level (read line illustrates processor usage during the test as the number of lock requests and is minimal)

image

March 6 2008
Newer Posts Older Posts