I­­­ read an article the other day by Nicholas C. Zakas titled ‘The Problem with Native Javascript APIs’, and found it thoroughly depressing.

“Browsers are written by humans just like web pages are written by humans. All humans have one thing in common: They make mistakes. Browsers have bugs just like web pages have bugs just like any other software has bugs. The native APIs you are relying on likely have bugs”

I’m not totally naïve, of course I understand that it is rare that a code base will be bug-free, but I find it a real shame that someone like Nicholas, a respected authority for javascript who strongly advocates best-practise methodologies, would be so dismissive of native code within the browser. These APIs are in there to help us, as developers, and as a result of being coded natively are typically considerably faster than equivalent javascript would be.

So why does he have this attitude? It’s not totally unfounded, and he does present one example of a native API which had different bugs in the implementation Firefox and WebKit, which we know from history is just one of many browser bugs.

His solution to avoiding native APIs, then? Write it yourself, of course!

There are two ways that you can rewrite a native API: either by using a façade, such as jQuery, which provides an alternate interface to existing code; or a polyfill, such as Modernizer, which attempts to implement native API functionality which may be missing. Nicholas advocates facades, as polyfills “represent yet another implementation of the same functionality”. I don’t totally understand this, as it seems that facades do the exact same thing, just with a different interface, but that’s neither here nor there, as it seems to me that both have their place within a code base.

The final solution presented recreates the functionality of the native API, but without using it directly. This to me stinks of reinventing the wheel. Furthermore, I think it’s downright arrogant to assume that your code is somehow impervious to bugs. Imagine if we all had this viewpoint, and used no third party code at all. The reason we use third party libraries and frameworks is because they allow us to concentrate on the code that is relevant to us. If you know there is a bug in some code, don’t waste your time by duplicating the functionality and adding to your own codebase, let the developers know: file bug reports, email them, tweet. Get it fixed and help everybody.*

*Interestingly enough, the author has even noted that the bugs mentioned in the case study have both been sorted! Think about how many of us developers are using browsers: it’s far more likely that a bug in Chrome will be noticed, for example, than a bug in your code.

I was flicking through an old notebook that I used at for uni notes the other day, and I stumbled across my overly-simplistic guide to Principal Component Analysis: PCA for Morons. It’s a really cool data-mining technique, so I figured it would be worth fleshing out to be slightly more detailed than a few short bullet points!

What is Principal Component Analysis and Why is it Useful?

Data mining is the process of discovering patterns within large volumes of data. This data typically contains redundant information, associations that would be difficult to spot due to high numbers of variables, and lots of noise; making it virtually impossible to detect these patterns without the use of statistical/artificially intelligent processing systems.

Essentially, PCA aims to transform a data set by creating a new smaller set made up from combinations of the original data, known as Principal Components. This has the result of shrinking the size of the search space, making predictions / analysis easier, and the principal components are ordered to indicate the most interesting patterns identified. PCA does lead to a loss of information, though it is typically minimal, and the dataset can be reconstructed afterwards, so it can be used as a lossy compression method too.

The Basic Gist of PCA Dimension Reduction

Correlations between v1 & v2, and v1 & v3

The awesome chart to the left (yes, that is done with the airbrush tool in MS Paint) attempts to show how this reduction can be possible: The left hand side plots the variables v1 and v2, which as you can see have no correlation – if you know v1, it would be very difficult to predict v2. The right hand side, however, shows two variables (v1 and v3) which show a strong correlation – as v1 grows, so does v3, making predictions much easier. Because of this strong relationship we can transform the data and combining the data from multiple axes down to one single axis.

This process is performed for all dimensional pairs within the data, and then all related variables are reduced. This process isn’t limited to reducing just two dimensions, but can reduce any number, as long as they are all sufficiently correlated.

The Data

In this post I’ll go perform PCA on a data set made up of 15 three dimensional items. This isn’t real data, I made it up for the sake of the example, and is only three dimensional for speed and clarity of concepts – but there is no limit to the dimensionality of data when applying PCA to a real dataset.
Before performing PCA the data needs to be mean adjusted, so that the mean of the whole data set is 0, and this is achieved by simply calculating the mean across each dimension and subtracting it from each item.

Raw Data Mean Adjusted Data
v1 v2 v3 v1′ v2′ v3′
5.60 5.10 8.80 2.44 2.01 3.53
1.00 0.80 2.40 -2.16 -2.29 -2.87
2.30 2.20 4.20 -0.86 -0.89 -1.067
3.10 3.00 6.00 -0.06 -0.09 0.73
2.50 2.70 2.60 -0.66 -0.39 -2.67
1.20 1.50 2.20 -1.96 -1.59 -3.07
4.80 4.50 8.70 1.64 1.41 3.43
4.00 4.40 7.80 0.84 1.31 2.53
3.90 3.80 6.30 0.74 0.71 1.03
1.50 1.20 2.00 -1.66 -1.89 -3.27
3.90 4.10 4.50 0.74 1.01 -0.76
3.40 3.50 6.80 0.24 0.41 1.53
3.70 3.00 5.90 0.54 -0.09 0.63
2.10 2.50 3.80 -1.06 -0.59 -1.47
4.40 4.10 7.00 1.24 1.01 1.73

Covariance Matrix

As I said before, PCA compares all pairs of dimensions in the data in order to find correlations within the data, and this is acheived by measuring the covariance – or the variarance of one dimension with respect to another. Variance is a measure of the spread of the one-dimensional data, and is calculated by dividing the total of all the data points less the mean of the data set by the number of data points minus 1 (or standard deviation squared):

Covariance is very similar, but is calculated using two variables (v1 and v2) instead of just the one (v):

To compare all pairs of dimensions within the data, we can construct a covariance matrix, of the form:

Note: if you’re calculating this yourself, notice that it is symmetrical [ie. cov(x,y) == cov(y,x)] so you can save yourself some computation time and just calculate half of it

Eigenvalues and Eigenvectors

Full disclosure: this bit is magic. Or at least if there is some mathematical reasoning behind it, I don’t know it.

With our covariance matrix, we can get some useful numbers out of it, known as eigenvalues, and some useful vectors, known as (surprise, surprise) eigenvectors. These can only be found in square matrices, and of an n x n matrix, there will be n pairs of  eigenvalues and eigenvectors (with each eigenvector representing n dimensions (x1, x2 … xn). I shan’t go into too much detail about what they are / how they are calculated, because they are a bit of a mystery to me, but most maths packages should be able to calculate them for you. The only point you may need to bear in mind is that PCA requires unit vectors – that is the vectors should be normalized so that they are all of length 1. Fortunately, most maths packages will return unit vectors, but it’s worth checking if you are unsure.

For the dataset above, the eigenvalues are: 8.62, 0.35 and 0.05; and the eigenvectors are: [0.45,-0.50,-.73], [0.42,-0.61,0.67] and [0.79,0.61,0.65].

The eigenvectors characterise the data, and the corresponding eigenvalue indicates just how representative of the data the vector is. If you order the eigenvectors by eigenvalue and plot on a scatter graph of the data then you can see that the first vector (the principal component) should pass through the centre of the points like a line of best fit, with each corresponding vector having less significance than the one before it.

Apologies, I realise that the above plot is hardly clear, but it shows each of the 15 points plotted with the original axes (v1, v2 and v3 – in blue) and the eigenvectors (v1′, v2′ and v3′ – in purple). If there were more data points – and I had access to a better charting library – then it’d be much more apparent, but for now you’ll just have to trust that it works!

The Good Bit: Transforming the Original Data

The final step of PCA is to reduce the dimensionality of the data based on the eigenvalues calculated, by selecting only the top eigenvalue/eigenvectors. As the sample data has a clear pattern, the eigenvalues clearly show one vector is far more representative of the data than the others, but it can be far tighter. In these cases, you can either manually select the threshold, or use some thresholding algorithm to determine the cut-off point. Calculating thresholds is a really big topic in AI, with many different approaches, but one simple technique I like (and I think works well in situations like this) is to calculate the standard deviation of all of the eigenvectors, and subtract it from the first eigenvalue.

When you have selected the eigenvectors that you will use, you must construct a feature vector, which is essentially just a matrix composed of vector columns (v1, v2 … etc) and then transposed. The final data is then simply the feature vector multiplied by the mean adjusted data.

Here is the plot of the new data transformed using two eigenvectors, resulting in two dimensional data. Plotting the data in this way clearly shows that there is a strong pattern in the data – although this was visible in the 3d plot, it would be impossible to plot, for instance, a 20-dimensional plot. PCA also helps to remove redundant information which contains very little information. As you have seen, performing PCA is a reasonably simple affair, but is a powerful tool when trying to simplify a large and complicated dataset.

TL;DR – Solution Below

Version 2.1.0 of Twitter’s awesome css/javascript framework Bootstrap was released a couple of days ago, and so I took the opportunity today to upgrade a few of our projects that are using it from version 2.0.4. We take full advantage of Bootstrap by using less css, and compile it with .less (pronounced dotless, and conveniently available via NuGet); not only to ease with customization, but also to allow use of the helpful mixins provided.

Unfortunately, there are (at the time of writing this) errors in the v2.1.0 less files, and when I replaced them the pages rendered with no styling, and inspecting the compiled files showed that they were being returned as blanks.

Step 1: The Binary Chop

I’ve never had to diagnose .less errors before, my prior experience has pretty much entirely consisted amending existing less files and throwing mixins into my styles, so the debugging process started – as so many hacky debugs do – with the binary chop. Although it’s not a pretty debug technique, it did enable me to quickly find that the issue was caused by the variables.less file. Unfortunately (thought it’s behaviour we’d usually want), .less caches the compiled css, and so a rebuild was required after each change to regenerate the css.

Step 0.9: Disable Caching

Man I wish I’d known this one before beginning step 1. We installed .less using NuGet, which adds the correct configuration to the Web.Config which makes it all Just Work – which is a great thing, but came round to bite me when I realised that I’d never actually taken the time to look it.

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <configSections>
    <section name="dotless" type="dotless.Core.configuration.DotlessConfigurationSectionHandler, dotless.Core" />
  </configSections>
  <dotless minifyCss="false" cache="true" />
</configuration>

cache=”true” !? Oops, turned that off. Great, no more rebuilding!

Step 2: Closer Look at variables.less

So I knew where the problems were in , but still didn’t know what the problems actually were. Unfortunately, this required more hacky debugging, commenting out code to make it compile and uncommenting it chunk-by-chunk until it broke again. I quickly found the first error, in the Navbar variables:

// Navbar
@navbarBackground:                darken(@navbarBackgroundHighlight, 5%);
@navbarBackgroundHighlight:       #ffffff;

See it? The issue is caused by the @navbarBackgroundHighlight variable being used before it is defined. To be honest I’m not entirely sure whether this is a less syntax error, or with the .less compiler implementation, but it was this that was causing the compiler to return blank css.

This was an awkward thing to find, and I expected that since it was in the code once, there were probably multiple instances of the bug, and so I decided that it was time to look into seeing if it was possible to log .less compiler errors.

Logging .less Compiler Errors

It was possible, and really easy – just a touch more configuration needed in the Web.Config:

<dotless minifyCss="true" cache="false" logger="dotless.Core.Loggers.AspResponseLogger" />

Even nicer, I’d expected the errors to be logged to disk somewhere, but conveniently they were viewable in the bootstrap.less file when inspecting it in the browser, and as expected this made it really quick and easy to identify and fix the errors.

variable @dropdownLinkBackgroundHover is undefined on line 94 in file 'less/dropdowns.less':
[93]: background-color: @dropdownLinkBackgroundHover;
[94]: #gradient > .vertical(@dropdownLinkBackgroundHover, darken(@dropdownLinkBackgroundHover, 5%));
------------------------^
[95]: }
from line 94:
[94]: #gradient > .vertical(@dropdownLinkBackgroundHover, darken(@dropdownLinkBackgroundHover, 5%));

Final .less Config:

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <configSections>
    <section name="dotless" type="dotless.Core.configuration.DotlessConfigurationSectionHandler, dotless.Core" />
  </configSections>
  <dotless minifyCss="false" cache="false" logger="dotless.Core.Loggers.AspResponseLogger" />
</configuration>
Update: Before writing this blog post I filed an issue on the Bootstrap github, and less than five minutes after finishing it the issue was closed: it seems the variable ordering bug was found and fixed already. Still, I’m glad it gave me the opportunity to dig deeper into debugging .less


Update: Exciting times! Lots of things have changed in the fortnight since I wrote this post, and I’m no longer going to work at InfoMentor. Instead I’m proud to say I’ve founded a new software house down here in Devon, 36degrees, alongside my former employers at Tech 13. Really excited, it’s a great opportunity and I really hope I can make a good go of it! :)


I recently moved from Birmingham down to Devon, and began working from home. Unfortunately, (at this point in my life at least) the isolation and lack of social interaction that comes from working from home was not for me, and so it is with a very heavy heart than I am leaving my current job at Tech13 to work for InfoMentor developing collaborative online learning tools. Don’t get me wrong, I’m very excited for the new experience at a new company, but it’s always difficult to leave, especially a job that I enjoyed so much.

I’ve been with Tech13 for a little over a year, and am incredibly grateful for the opportunities and experiences I have had whilst working there. It’s a great team, and I’m particularly thankful to have been mentored by Paul Cox, an incredibly talented developer whose passion for learning has been hugely infectious and has fundamentally shaped the way I approach the betterment of my craft. Through Paul I learnt the joy of immersing myself into a community that I really had a limited knowledge of beforehand. I feel shocked now when I meet a developer that doesn’t read blogs or listen to podcasts, and who is unfamiliar with the great technologies and open-source projects that exist outside of his/her own little bubble; but that was once me. It was Paul who reccommended I start this blog, Paul who advised me to really dig deep into the systems that I’m using, and Paul who showed me that in order to really succeed you could not be afraid to take a few chances.

I wrote a blog post at the start of the year on my experiences so far in the real world of software development, and on the decision to pursue experience over further education. I’m only a little further on now, so I’m probably still not qualified to answer my own question: “Did I make the right decision to work rather than to continue down the educational track?” That said, I stand by it being one of the best decisions that I ever made.

Obviously it’s tough to be moving on, but I’m very excited for the next chapter of my career to begin and for the new experiences and opportunities that working for InfoMentor will bring.

Oh yeah, and Paul, in case you happen to be reading this: for god’s sake, start your bloody blog! I’m going to miss hearing the things you have to talk about, so please write them down so I can still get my fix!

I’m currently working on implementing an issue tracker within a customer’s web portal so that their users can more easily log system issues with us and be updated on their progress, without the need for additional phone calls and emails, and it’s been decided that a convenient feature would be for it to integrate closely with our own issue tracking (which is hosted on BitBucket).

After a quick google I stumbled across a nice, albeit fairly young, API wrapper that uses RestSharp – conveniently named BitBucketSharp. It was started just a couple of months ago by Dillon Buchanan, and has made it really easy to start use the BitBucket API from within a web application.

There is another similar project, CSharp.Bitbucket, which is built upon the Spring.NET Social framework; but a quick look at the download counts on NuGet shows developers clearly favour RestSharp, with 28,230 downloads, rather than Spring.NET Social’s 529. CSharp.Bitbucket does, however, have the advantage of being hosted on NuGet, so makes it really easy to pull down and have a play with.

Since BitBucketSharp is fairly new, it currently only really has support for GET requests, so I’ve forked it and am in the process of adding support for creating/updating resources, and I’d also like to get a NuGet package up when I get the opportunity.

Check it out at bitbucket.org/j_wallwork/bitbucketsharp

<div class='box'>
    <h2 style='-webkit-user-select: none;'>Title</h2>
    <div class='boxSlide'>
        <div class='boxBg'>
            <div class='boxContent'>
                content...
            </div>
        </div>
        <div class='boxFooter'><img src='img/ws_box_footer.png'></div>
    </div>
</div>

Apologies for the above code. I realise it’s not very pleasant to look at, but unfortunately I was working on a website not too long ago which was based off of an online template, and this was the structure I’d been asked to use to add some fancy, styled boxes to a whole bunch of pages so as not to break the existing site’s css.

Now that’s quite a lot of markup for a simple section – please understand I’d be far happier using cleaner and more semantic markup, such as section, article, or even a single div if needsbe – but unfortunately that wasn’t an option. However, even if that was the code that I had to use, it sure as hell wasn’t the code I was going to write. Not over and over and over…

I first wrote a HtmlHelper extension method which would allow me to use the syntax @Html.Section(“header”, “content…”), which was definitely a marked improvement on all that code, and lead to much more readable source.

public static MvcHtmlString Section(this HtmlHelper html, string header, string content) {
   return MvcHtmlString.Create(
              "<div class='box'>" +
                  "<h2 style='-webkit-user-select: none;'>" + title + "</h2>" +
                  "<div class='boxSlide'>" +
                      "<div class='boxBg'>" +
                          "<div class='boxContent'>" +
                              content +
                          "</div>" +
                      "</div>" +
                      "<div class='boxFooter'><img src='img/ws_box_footer.png'></div>" +
                  "</div>" +
              "</div>";
}

Job done. Well, kind of. This worked well for just text-only sections. A section containing multiple paragraphs, images, or other html elements was possible, but looked a mess and was harder to read that the original syntax! It completely ruled out the option of using more complicated contents in a section – ie. if I wanted a section to contain a form. Furthermore, these boxes were nestable – so one box could potentially contain multiple. Since they took an string as content and returned an MvcHtmlString, then in addition to these problems I would have had to keep adding .ToString() to the end of each nested Html.Section().

@using (Html.BeginForm()) { ... }

How many times have we all used the above code? And what does it do? Well that’s easy, it’s an HtmlHelper for creating forms. The interesting thing with BeginForm, however, is that unlike most other HtmlHelpers, this allows for start and end tags, but with any other content nested within the curly braces (or if you prefer, open/close mustaches).

Oh what a coincidence, that sounds almost exactly like what I need to do!

So how does it work? Well fortunately, the solution is brilliantly easy. When you call BeginForm, it first outputs the opening

tag, not by returning an MvcHtmlString, but by writing directly to the view context using HtmlHelper’s htmlHelper.ViewContext.Writer.Write(string). It then returns a class implementing IDisposable (in this instance, an MvcForm), which is passed the view context when instantiated. All this class does is to write out to the view context when it is being disposed of – which as we all know is at the close of the braces.

Therefore, the code needed to allow me the freedom to write my sections as

@using (Html.Section("title here")) {
    ... any content I want! ...
}

is as follows:

public static HtmlSection Section(this HtmlHelper html, string title)
{
    html.ViewContext.Writer.Write(
              "<div class='box'>" +
                  "<h2 style='-webkit-user-select: none;'>" + title + "</h2>" +
                  "<div class='boxSlide'>" +
                      "<div class='boxBg'>" +
                          "<div class='boxContent'>"
        );
    return new HtmlSection(html.ViewContext);
}


public class HtmlSection : IDisposable
{
    private readonly ViewContext _viewContext;

    public HtmlSection(ViewContext viewContext)
    {
        _viewContext = viewContext;
    }

    public void Dispose()
    {
        _viewContext.Writer.Write(
                          "</div>" +
                      "</div>" +
                      "<div class='boxFooter'><img src='img/ws_box_footer.png'></div>" +
                  "</div>" +
              "</div>"
        );
    }
}

Not long ago I blogged about auditing with NHibernate.Envers. I’m using it again in another project but I don’t want to audit all the fields on the entity; furthermore, Envers automatically tries to audit referenced tables, so tables that are not configured to be audited cause an error message: An audited relation from SomeAuditedEntity to a not audited entity NonAuditedEntity! Such mapping is possible, but has to be explicitly defined using [Audited(TargetAuditMode = RelationTargetAuditMode.NotAudited)]. Understandably I expected that adding this attribute to the properties that I didn’t want auditing would prevent this behaviour, but it didn’t. I then added it to the class that I didn’t want auditing, but the message still didn’t go away. A quick google revealed another similar attribute, NotAudited, but this didn’t work either.

Finally after hacking around with my code I stumbled across the solution; the properties to be excluded must be defined in the Envers configuration. Although I think I’d prefer to annotate the fields in the entity class itself, this is the solution that worked for me:

var enversConf = new NHibernate.Envers.Configuration.Fluent.FluentConfiguration();
enversConf.SetRevisionEntity<REVINFO>(e => e.Id, e => e.RevisionDate, new AuditUsernameRevInfoListener());
enversConf.Audit<SomeAuditedEntity>()
          .Exclude(x => x.NonAuditedProperty1)
          .Exclude(x => x.NonAuditedProperty2)
          .ExcludeRelationData(x => x.NonAuditedCollection;
configuration.IntegrateWithEnvers(enversConf);

I use two different types of auditing in my various projects – low-level auditing, where individual property changes are recorded, and high-level, where I record actions (ie. created x, edited y). For the high-level auditing, I record each ‘action’ using a class, which inherits from a base Audit class. The basic audit class is as follows:

public class Audit : Entity {
    public virtual string CreatedBy { get; set; }
    public virtual DateTime CreatedOn { get; set; }
    public virtual Func<HtmlHelper,MvcHtmlString> Text {
        get { return html > MvcHtmlString.Empty; }
    }
}

CreatedBy and CreatedOn are both fairly obvious properties, recording when the audit was created and by whom. The Text property is used for showing the audit information in views and provides an easy and highly flexible way to display the audit text, as will be demonstrated shortly. As I said, each auditable ‘action’ is represented using a new class. These classes are subclasses of Audit, and can contain additional fields. For example, I may have an audit for the creation of a new user:

public class UserCreateAudit : Audit {
    public virtual int UserId { get;set; }
    public virtual string Username { get; set; }
    public virtual Func<HtmlHelper,MvcHtmlString> Text {
        get {
            return html => MvcHtmlString.Create(
                string.Format("Created new user {0}", Username)
            );
        }
    }
}

In order to store the audit data, we can use NHibernate’s inheritance mapping. NHibernate supports three strategies for polymorphism – table-per-hierarchy, table-per-class, and table-per-concrete-class. Table-per-class and table-per-concrete class are similar, storing the data of different subclasses in separate tables, while table-per-hierarchy stores the Audit class and any subclasses in a single table, containing all the entities in all the classes. I use table-per-hierarchy because it uses fewer joins so reading/writing is faster, although the trade off is that there will be many irrelevant columns stored for each audit type. Modern database management systems work very efficiently, minimising the effects of storing null/redundant data.

Configuring NHibernate for Polymorphism

Setting up inheritance is really easy:

public class AuditAutoMappingOverride : IAutoMappingOverride
{
    public void Override(AutoMapping mapping)
    {
        mapping.DiscriminateSubClassesOnColumn("Type");
    }
}

Ok, that’s a bit unfair. We use FluentNHibernate.AutoMapper, so if you do too then you can simply override the mapping; for everyone else, sorry but you’ll have to look it up yourself! Honestly though, it isn’t hard! I really just wanted to show you how NHibernate acheives polymorphism: it uses an extra column, in our case called ‘Type’, which acts as a discriminator. Each subclass of Audit will have it’s own discriminator, which is by default the full name of the type, and this is how NHibernate can differentiate between audit types.

Why is this good for auditing then?

There are three reasons why using polymorphism and NHibernate’s inheritance mapping strategies work well with the style of auditing that I use:

  1. Using a separate class per audit type makes it really easy to maintain the various audit types and the data relevant to each one. For instance when auditing that a user created a new task, in addition to the CreatedOn and CreatedBy fields, I want to store the task id, the type of task created, and the due date of the task; which results in a clear to read audit subclass:
    public class TaskCreateAudit : Audit {
        public virtual int TaskId { get; set; }
        public virtual TaskType TaskType { get; set; }
        public virtual DateTime DueDate { get; set; }
    }

    Also, if multiple subclasses have the same property NHibernate will aggregate them all when mapping the database table. For example if I have another audit class that has the properties DueDate and AnotherProperty, then the resulting table will have the fields Type, CreatedBy, CreatedOn, TaskId, TaskType, DueDate and AnotherProperty (note only a single instance of DueDate)

  2. The text shown when displaying the audit data is highly customizable, as a result of using separate classes for each audit type. I find using a Func<HtmlHelper,MvcHtmlString>  is a really simple, elegant, and flexible way to determine how the audit information is displayed in the view.
    public class TaskCreateAudit : Audit {
        public virtual int TaskId { get; set; }
        public virtual TaskType TaskType { get; set; }
        public virtual DateTime DueDate { get; set; }
        public override Func<HtmlHelper,MvcHtmlString> Text {
            get { return html => MvcHtmlString.Create(
                "Created new task (id #" + TaskId + ")"
            ); }
        }
    }

    Using @Model.Text.Invoke(Html) in a view will display: Created new task (id #1001). If later on I decide that actually I would rather link to the task instead, and show the due date in the audit, all I need to do is change the class definition:

    public override Func<HtmlHelper,MvcHtmlString> Text {
        get { return html => html.ActionLink(
            "Created new task due " + DueDate.ToString(),
            "Index", "Tasks", new { id = TaskId }, null
        ); }
    }

    And now the audits will display as: Created new task due 01/01/2012

  3. Finally, aside from ease of use and flexibility, auditing in this way allows for really easy querying of the data. For instance, to query all audit entries, I can use:
    _session.Query<Audit>();

    Similarly, if I want to query only a certain type of audit, I can use:

    _session.Query<TaskCreateAudit>();

In most of the projects I’ve worked on, auditing has been a fairly high level affair – typically recording user actions, such as user edited entity x or user deleted child entity y. This has been adequate for most of our systems where we do not need to be able to see exactly all the modifications made to an entity. However, for a recent system this style of auditing has been causing issues; on several occasions we’ve had requests from clients saying “a property on this entity is not as expected, but it was correct x days ago, can you see who changed it?”. And unfortunately, unless only a single user has edited the entity, we cannot see who made the change. And that really isn’t good enough…

So I took a look at NHibernate.Envers, a project which facilitates easy entity versioning with NHibernate and would therefore enable me to save an in depth history of every version of a particular entity. The docs weren’t great I found, but Giorgetti Alessandro over at PrimordialCode has a great series of posts covering virtually everything you need to know: a quick introduction, querying (part 1 and part 2) and customising the revision entity. If you’ve got some time to spare, I’d suggest you to read the introduction before continuing, as it’s a great article with some really valuable information, and clearly demonstrates the database structures generated by Envers.

 Wiring Up .Envers with Existing Auditing

After reading the PrimordialCode posts, I was able to really quickly get Envers up and running. In order to link our existing high-level action based auditing up with the detailed low-level information provided by Envers, I needed to store the revision id as part of the existing audit, which was really easy to access using IAuditReader.GetCurrentRevision(bool persist)

[HttpPost]
public ActionResult Edit(int id, UserViewModel model)
{
    var auditer = _session.Auditer();
    var user = _userBuilder.BuildEntity(model, id);
    var revisionInfo = auditer.GetCurrentRevision<REVINFO>(true);

    /* Create an instance of UserEditAudit which references the
       revision info                                             */
    _session.Save(UserEditAudit.Create(
                        LoggedInUser.UserName,
                        user, revisionInfo));

    return RedirectToAction("Overview", new { id });
}

Displaying Audit Data

Now we have the revision id for the audit, we can create a view to display the revision data. Out of the box, Envers enables you to query entities at a particular revision, and also query the revision data (to get the timestamp). Unfortunately, using the built in audit reader generates very inefficient sql with multiple and unnecessary queries.

Furthermore, I wanted to be able to show not simpy the version of the entity, but the differences between it and it’s previous version. Envers doesn’t support this behaviour out of the box, so to implement it I needed to load the two versions of the entity and diff them myself. Because of the inefficiencies of looking up entity versions using Enver’s audit reader, I wrote a generic method that could loaded the two versions of any auditable entity using a single sql query with dapper:

private IEnumerable<T> EntityAndPreviousRevision<T>(int entityId, int revisionId) where T : Entity<T>
{
    var sql = string.Format(
        @"select * from dbo.{0}_AUD where {0}Id = @entityId and REV in (
         select top 2 a.REV
         from dbo.{0}_AUD a with (nolock)
         inner join dbo.REVINFO r with (nolock) on a.REV = r.REVINFOId
         where a.{0}Id = @entityId and a.REV <= @revisionId
         order by r.RevisionDate desc )", typeof(T).Name);
    return _connectionProvider
                    .Connection
                    .Query<T>(sql, new { entityId, revisionId });
}

I then diffed the two classes by simply using reflection to iterate through each of the properties on the entity and checking for any non-matches:

public class PropertyChange
{
    public string From { get; set; }
    public string To { get; set; }

    public bool IsDifferent()
    {
        if (From == null && To == null) return false;
        if (From == null || To == null) return true;
        return !From.Equals(To);
    }
}

private static Dictionary<string,PropertyChange> DifferencesBetweenRevisions<T>(T from, T to)
{
    var properties = typeof (T).GetProperties();
    var differences = new Dictionary<string, PropertyChange>();
    foreach (var property in properties)
    {
        var pc = new PropertyChange
        {
            From = (property.GetValue(from, null) ?? String.Empty).ToString(),
            To = (property.GetValue(to, null) ?? String.Empty).ToString()
        };
        if (pc.IsDifferent()) differences.Add(property.Name, pc);
    }
    return differences;
} 

Although cloud-based file storage has existed in some form for a long time, it is now starting to play a more prominent role in day-to-day use. Firstly, most servies are offering generous storage amounts for free accounts: DropBox offers 2GB, Google Drive offers 5GB, and SkyDrive offers 7GB. Upgrading the amount of available storage you have is cheap too, between $0.25 and a couple of dollars per extra GB There are also services such as Bitcasa (currently only in beta) which offer unlimited storage, which is pretty tough to resist.

Price aside, most importantly of all they make it really easy to share files between all the different platforms you own – your desktop, laptop, tablet, phone – and even enable you to have access to all of your files from another persons machine if needsbe.

Gone are the days where you would be forced to log onto a website, and upload your files a handful at a time through a web form (or worse, though some java multi-uploader applet). Typcially, all you need to do is install the app, which drops a virtual folder onto your computer. You can then proceed to use this folder like any other, integrating natively with your machine, meaning it’s super simple to use. Even my technophobe fiancé has a DropBox account, which she loves!

The final convenience that cloud-based file storage offers is easy integration of files with cloud-based apps. For instance, I can be working on a word document on my desktop at home, save it to my Google Drive file, and then jump on a train and continue working on it on my galaxy tab from Google Docs, and there are a number of similar file sharing scenarios available for most of the storage services available today.

My only concern with these virtual storage systems are with the security of the files. While I trust that these services have strong security in place, I’d still feel somewhat uneasy about using storing sensitive data on them. My other concern, though I’m sure completely unfounded, is with the longevity – I’d hate to store the only copy of an important document on a cloud service and risk it vanishing. For those reasons, I find myself using virtual storage systems mainly for backups, though I hope expect when I’m more comfortable with using these services my behaviour will shift to reflect that.