C# | ifdevthentalk

/// <summary> /// Returns distinct elements from a sequence using the provided function to compare values. /// </summary> /// <typeparam name="TSource">The type of the elements of source.</typeparam> /// <typeparam name="TKey">The type of the key used to compare elements.</typeparam> /// <param name="source">The sequence to remove duplicate elements from.</param> /// <param name="keySelector">A function to select the key for determining equality between elements.</param> /// <returns>An IEnumerable<T> that contains distinct elements from /// the source sequence.</returns> public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)

static void Main(string[] args) { var now = DateTime.MinValue; var nowPlusASecond = now.AddSeconds(1); var nowPlusOneSecondAndABit = now.AddSeconds(1.0001); if (nowPlusASecond.Ticks == nowPlusOneSecondAndABit.Ticks) { Console.WriteLine("Do you think we will get here?"); Console.WriteLine("Apparently these DateTimes refer to the same instant"); } Console.WriteLine("Press 'Enter' to exit"); Console.ReadLine(); }

When someone asks me to describe LINQ, depending on their familiarity I might say something along the lines of:

It’s magic!

A way of writing SQL-like statements in C#

or most specifically

A set of tools using extension methods and generics to perform queries on sets of data

At the end of the day, however, I do caution them that LINQ is easy to learn, hard to master.

When I first started using LINQ my mentor said “At the end of your query, just send it .ToList(). You’ll thank me later.”

He and I had a few more discussions on why you should be sending your LINQ queries .ToList() and he didn’t know himself other than “Performance and Delayed Execution.”

When working with other C# developers, I find that the Delayed Execution feature of LINQ is the concept they struggle with most. They remember it, work with it, but inevitably write code that forgets that feature, and ultimately create bugs.

Consider the following classes:

Master:

class Master
{
    public Guid MasterID { get; set; }
    public string SomeString { get; set; }
    public Master()
    {
        MasterID = Guid.NewGuid();
        SomeString = "Some Master";
    }
}

And Detail:

class Detail
{
    public Guid MasterFK { get; set; }
    public string SomeDetails { get; set; }
    public Detail(Guid masterFK, string someDetails)
    {
        MasterFK = masterFK;
        SomeDetails = someDetails;
    }
}

Using those two classes, read the following lines of code and think about what the output will be.

static void Main(string[] args)
{
    var mast = new Master();
    var deta = new Detail(mast.MasterID, "");
    var masters = new List<Master>() { mast };
    var details = new List<Detail>() { deta };

    int iterations = 0;
    var joinedValues = masters.Join(details,
                                    x => x.MasterID,
                                    x => x.MasterID,
                                    (x, y) =>
                                    {
                                      iterations++;
                                      return new { Mas = x, Det = y };
                                    });

    Console.WriteLine("The number of times we returned a value is: " + iterations);
    Console.WriteLine("The number of values is: " + joinedValues.Count());
    Console.ReadLine();
}

Got it? Okay, here’s the output:


The number of times we returned a value is: 0
The number of values is: 1

To some of coworkers, when they saw this result, they immediately wanted me to open up GitHub and submit a bug report to the .NET team. They thought they found a critical bug in the LINQ library.

The thing to realize is that in this code we have only created the query when we print out “iterations”, we haven’t executed the query yet, so the value of iterations is still 0. Adding the following line will get results closer to what you expect:

Console.WriteLine("The number of times we returned a value is: " + iterations);
Console.WriteLine("The number of values is: " + joinedValues.Count());
Console.WriteLine("The number of times we returned a value now is: " + iterations);
Console.ReadLine();

Output:

The number of times we returned a value is: 0
The number of values is 1
The number of times we returned a value now is: 1

Since we executed the query when we called joinedValues.Count(), we incremented the iterations variable in our return value, giving the result we initially expected.

A final word of warning on this, however: consider the following code modification. What do you think will be the output?

Console.WriteLine("The number of times we returned a value is: " + iterations);
while (true)
{
    Console.WriteLine("The number of values is " + joinedValues.Count());
    Console.WriteLine("The number of times we returned a value now is: " + iterations);
    Thread.Sleep(1000);
}
Console.ReadLine();

You can probably see where this is going:

The number of times we returned a value is: 0
The number of values is 1
The number of times we returned a value now is: 1
The number of values is 1
The number of times we returned a value now is: 2
The number of values is 1
The number of times we returned a value now is: 3
...

And so on and so on

Every time we are calling .Count() on our IEnumerable (joinedValues) we are re-evaluating the query. Think about what that might mean if you wrote expensive code in your join like so:

var joinedValues = masters.Join(details,
                                x => x.MasterID,
                                x => x.MasterID,
                                (x, y) =>
                                {
                                  iterations++;
                                  //Do some expensive work
                                  Thread.Sleep(10000);
                                  return new { Mas = x, Det = y };
                                });

Then every time you did an operation on that query, you are re-doing that expensive work.

So remember: if you want the code in your join to be executed immediately, or you are doing expensive work you don’t want to repeat, it is safest to send your LINQ queries .ToList() or some other persistent data object.

ifdevthentalk

Thoughts and musings from a Developer

C#

Where Should Documentation Go? Or “Is DateTime Broken?”

Counter-intuitive LINQ