Getting started with Redis on Windows and a look at performance for logging

There’s a few scenarios in a few different projects where I need a fast dumping ground for data. In my deployments I need somewhere quick to pump logging messages to. These messages do not need to be written to disk or read out immediately so can be processed shortly after the fact out of process. I have the same requirements when rendering machine performance on a UI; I need a fast and non-intrusive way to collect the data from different machines and processes which can be read quickly later.

Redis is something which has been coming up frequently when talking linux and much has been said about its performance, so I wanted to take a quick look at performance in the context of logging. But first let’s get it up and running. I’ll be using PowerShell, chocolately and npm, so get those set up first.

Let’s begin by using chocolately to install redis.

cinst redis

The package should have installed a windows service running redis, let’s double check using powershell.

C:\> get-service *redis*

Status   Name               DisplayName
------   ----               -----------
Running  redis              Redis Server

Great, now time to get a UI to make using redis a little easier, this time the package is on npm.

C:\> npm install –g redis-commander;
C:\> redis-commander
No config found.
Using default configuration.
path.existsSync is now called `fs.existsSync`.
listening on  8081
Redis Connection 127.0.0.1:6379 Using Redis DB #0

The manager runs on port 8081 by default, let's take a look in a browser (use another shell as redis-commander will block your current one).

start http://localhost:8081

If you can see the UI then you are all set up and its time to push some data in and run some performance tests. Our build and deployment framework logs to the filesystem. This worked great for builds and simple deployments, but for scaled out environments and production which runs on many machines this is problematic. To avoid deadlocks and waits we manage a log file per machine and cat the tail into the main log when we can. We could aggregate these files post-deployment but its time to rethink how we do our logging. For now I just want to check raw speed and basic features of a few different technologies, today its redis.

We’ll use measure-command to find out how quickly we can log a bunch of messages to a file and then to redis. Using our framework to deploy to an environment for performance testing generates around 100K lines of logs across 123 log files and 27 machines. These get aggregated onto the deployment server for convenience but its still a lot of logs which are written during the deployment.

measure-command { 1..100000 | %{ Add-Content -Path C:\log1.txt -Value $_ } }
TotalSeconds : 319.9256509

This command is analgous to what we do in our PowerShell framework, since we do not keep a handle open on any one file we have to open it before writing every time. It takes 5 minutes to write that many lines of logs to the file system using Add-Content. This is a long time to spend logging and as we only write numbers in the benchmark involves less IO then deployment messages. We will benchmark redis for comparison and use the PowerRedis module.

We will make a single connection to redis as cmdlet takes care of the global state of the connection.

Connect-RedisServer

And the benchmark:

measure-command {1..100000 | %{ Add-RedisListItem -Name "deployment:log" -ListItem $_ } }
TotalSeconds      : 14.0009325

A fantastic improvement and the ability to log from multiple machines at the same time makes it a good option to consider. I can't help but wonder how fast we can log to a file if we did it more responsibly though.

C:\> $writer = New-Object System.IO.StreamWriter C:\log3.log
C:\> measure-command { 1..100000 | %{ $writer.WriteLine($_); } }
TotalSeconds      : 2.5816333
C:\> $writer.close()

So there are a few lessons to learn here. Avoid add-content if you’re writing more than a few messages. Do always measure your code so you do not waste minutes(!) logging to files. Redis is certainly fast enough for logging.

Catching the PowerShell error stream

One of the challenges with PowerShell is the merging of concepts into a single DSL. In regards to error messaging, we have $LASTEXITCODE, PowerShell / .NET (throw) and the error stream (Write-Error). I tend to avoid using Write-Error and prefer to use throw instead, however I do use powershell modules which use Write-Error and I want to be able to handle these when it makes sense.

The following code:

Write-Host "Standard catch:" -ForegroundColor Cyan
try {
    Write-Error "writing to the error stream"
}
catch{
    Write-Host "I've been caught" 
}

Produces the following output:

Standard catch:C:\users\naeemk\desktop\error_stream.ps1 : writing to the error stream
At line:1 char:1
+ .\error_stream.ps1
+ ~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [Write-Error], WriteErrorException
    + FullyQualifiedErrorId : Microsoft.PowerShell.Commands.WriteErrorException,error_stream.ps1

This behaviour was unexpected when I first encountered it. Today I stumbled upon a tidy way to handle the error: by Stopping and Catching. If we set an ErrorAction of Stop to the problem call, we are able to catch the error stream properly.

Write-Host "Stop and Catch:" -ForegroundColor Cyan
try {
    Write-Error "writing to the error stream" -ErrorAction Stop
}
catch{
    Write-Host ("I've been caught: {0}" -f $_)
}

Which produces the expected output:

Stop and Catch:
I've been caught: writing to the error stream

In our case we were running this code inside a parallel runspace and the unhandled call to the error stream was crashing the runspace and giving us the wrong stacktrace. With this we can raise a PowerShell exception and marshall over the correct stacktrace to help with debugging.

Compile can not resolve System.Net.Http in currently targeted framework

<assembly> could not be resolved because it has an indirect dependency on the framework assembly "System.Net.Http, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a" which could not be resolved in the currently targeted framework. ".NETFramework,Version=v4.0"

On my machine I have .NET v4.5 installed along with VS2012 which has mostly been working fine. Out of the blue I started getting the above error despite the build server passing. A quick google search reveals that the assemblies in question are actually missing from the v4.0 folder for some reason.

I installed ASP.NET MVC 4 using the web platform installer to try and get the assemblies put into the correct folder however they were still only in the v4.5 folder!

The fix for me was unfortunately to copy them over from:

C:\Program Files (x86)\Microsoft ASP.NET\ASP.NET MVC 4\Assemblies

To the following folder:

C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.0.

Now my build is passing but the fix isn’t satisfactory in my mind.

Why it is important to use HTTP verbs appropriately

On the morning of Thursday 29th August it became apparent that our WIKI was empty and missing all of its pages. We took it offline shortly afterwards and started to investigate why the data was missing and how we might restore it.

Background

The previous day a team member came over to ask why they didn’t have permissions to edit the attachments they had uploaded so they could replace it with a different one. We are using Roadkill Wiki which unfortunately does not support granular permissions, you are either an editor or an admin and only admins can edit uploaded files. To unblock the user we promoted all domain users to admins which at the time was sub-optimal but didn’t seem totally unreasonable.

What happened to the data?

The data was intact and accessible that evening but missing the following morning, after checking the event logs we noticed a call to a page on the wiki that users would not usually ask for.

Request information: 
    Request URL: http://tfs-wiki/pages/version/b81d890b-b3ae-48cd-864b-06550aee8490 
    Request path: /pages/version/b81d890b-b3ae-48cd-864b-06550aee8490 
    User host address: 10.200.95.135 
    User: DOMAIN\SCOM_2012_DATA_READ 
    Is authenticated: True 
    Authentication Type: Negotiate 
    Thread account name: NT AUTHORITY\NETWORK SERVICE

We checked with infrastructure about the SCOM account and it seemed likely to be related to the Search Server 2010 indexer. This would normally be fine and has been running for a while, however with the elevated permissions this crawler would have access to links on the wiki which it lacked before.

On the allpages page there is a full list of pages and their corresponding delete links. Unfortunately the delete links are GET requests which look like normal links.

<a class="btn btn-mini confirm" href="/pages/delete/134">Delete</a>

So there was a good chance that all these delete links were hit by the indexer as it crawled our wiki. We were able to verify this in the search indexers database. A count of 116 pages (which is all of them) were removed.

 

A better solution?

Promoting all domain users to administrators to get around the limitations of the permission model was a grave error, which was unfortunately compounded by the URL design on the wiki engine. Considering a request to delete an object changes server state, it should have been a POST or DELETE. This is mentioned in the HTTP protocol section 9.1.1.

In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered "safe". This allows user agents to represent other methods, such as POST, PUT and DELETE, in a special way, so that the user is made aware of the fact that a possibly unsafe action is being requested.

In the future we will definitely be more careful when making permission changes, and I will personally be even more mindful of HTTP guidelines when designing web applications and web services. Time permitting I hope to send a patch to the Roadkill Wiki contributors.

Newer Posts Older Posts