Counter-intuitive LINQ

When someone asks me to describe LINQ, depending on their familiarity I might say something along the lines of:

It’s magic!

or

A way of writing SQL-like statements in C#

or most specifically

A set of tools using extension methods and generics to perform queries on sets of data

At the end of the day, however, I do caution them that LINQ is easy to learn, hard to master.

When I first started using LINQ my mentor said “At the end of your query, just send it .ToList(). You’ll thank me later.”

He and I had a few more discussions on why you should be sending your LINQ queries .ToList() and he didn’t know himself other than “Performance and Delayed Execution.”

When working with other C# developers, I find that the Delayed Execution feature of LINQ is the concept they struggle with most. They remember it, work with it, but inevitably write code that forgets that feature, and ultimately create bugs.

Consider the following classes:

Master:

class Master
{
    public Guid MasterID { get; set; }
    public string SomeString { get; set; }
    public Master()
    {
        MasterID = Guid.NewGuid();
        SomeString = "Some Master";
    }
}

And Detail:

class Detail
{
    public Guid MasterFK { get; set; }
    public string SomeDetails { get; set; }
    public Detail(Guid masterFK, string someDetails)
    {
        MasterFK = masterFK;
        SomeDetails = someDetails;
    }
}

Using those two classes, read the following lines of code and think about what the output will be.

static void Main(string[] args)
{
    var mast = new Master();
    var deta = new Detail(mast.MasterID, "");
    var masters = new List<Master>() { mast };
    var details = new List<Detail>() { deta };

    int iterations = 0;
    var joinedValues = masters.Join(details,
                                    x => x.MasterID,
                                    x => x.MasterID,
                                    (x, y) =>
                                    {
                                      iterations++;
                                      return new { Mas = x, Det = y };
                                    });

    Console.WriteLine("The number of times we returned a value is: " + iterations);
    Console.WriteLine("The number of values is: " + joinedValues.Count());
    Console.ReadLine();
}

Got it? Okay, here’s the output:


The number of times we returned a value is: 0
The number of values is: 1

To some of coworkers, when they saw this result, they immediately wanted me to open up GitHub and submit a bug report to the .NET team. They thought they found a critical bug in the LINQ library.

The thing to realize is that in this code we have only created the query when we print out “iterations”, we haven’t executed the query yet, so the value of iterations is still 0. Adding the following line will get results closer to what you expect:

Console.WriteLine("The number of times we returned a value is: " + iterations);
Console.WriteLine("The number of values is: " + joinedValues.Count());
Console.WriteLine("The number of times we returned a value now is: " + iterations);
Console.ReadLine();

Output:

The number of times we returned a value is: 0
The number of values is 1
The number of times we returned a value now is: 1

Since we executed the query when we called joinedValues.Count(), we incremented the iterations variable in our return value, giving the result we initially expected.

A final word of warning on this, however: consider the following code modification. What do you think will be the output?

Console.WriteLine("The number of times we returned a value is: " + iterations);
while (true)
{
    Console.WriteLine("The number of values is " + joinedValues.Count());
    Console.WriteLine("The number of times we returned a value now is: " + iterations);
    Thread.Sleep(1000);
}
Console.ReadLine();

You can probably see where this is going:

The number of times we returned a value is: 0
The number of values is 1
The number of times we returned a value now is: 1
The number of values is 1
The number of times we returned a value now is: 2
The number of values is 1
The number of times we returned a value now is: 3
...

And so on and so on

Every time we are calling .Count() on our IEnumerable (joinedValues) we are re-evaluating the query. Think about what that might mean if you wrote expensive code in your join like so:

var joinedValues = masters.Join(details,
                                x => x.MasterID,
                                x => x.MasterID,
                                (x, y) =>
                                {
                                  iterations++;
                                  //Do some expensive work
                                  Thread.Sleep(10000);
                                  return new { Mas = x, Det = y };
                                });

Then every time you did an operation on that query, you are re-doing that expensive work.

So remember: if you want the code in your join to be executed immediately, or you are doing expensive work you don’t want to repeat, it is safest to send your LINQ queries .ToList() or some other persistent data object.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s