Gotchas with lazy evaluation

Lazy evaluation can be a really useful thing.  It means that code won't get executed unless it actually needs to be.  So in other words you can save the system from doing unnecessary work.  There are times however when you need to be aware of what is going to be evaluated lazily.  Take for example the following class.

public IEnumerable<string> ReadNamesFromDatabase()
{
    using (var conenction = new MySqlConnection("Server=myServerAddress;Database=myDataBase;Uid=myUsername;Pwd=myPassword;"))
    {
        conenction.Open();
        var command = conenction.CreateCommand();
        command.CommandText = "SELECT name FROM products;";
        using (var reader = command.ExecuteReader())
        {
            while (reader.Read())
            {
                yield return reader.GetString(reader.GetOrdinal("name"));
            }
        }
    }    
}

Obviously I hope that you wouldn't actually write your code like this as you would be holding the connection open to the database for as long as the calling code of this method is doing any work but this could equally be something that is reading from a file on disk or any other relatively expensive operation.  Let's say we then use this DatabaseReader in the following code.

public class ProductRepository
{
    public DatabaseReader DatabaseReader;

    public IEnumerable<Product> GetProducts()
    {
        return DatabaseReader.ReadNamesFromDatabase().Select(name => new Product(name));
    }
}

public class Product
{
    public string Name { get; private set; }

    public Product(string name)
    {
        Name = name;
    }
}

If we then go on to use this in the following code that consumes our system.

public void Consume()
{
    var products = new ProductRepository().GetProducts();
    foreach (var product in products)
    {
        Console.WriteLine(product.Name);
    }
    Console.WriteLine("Total number " + products.Count());
}

This will actually cause the database to be read from twice.  Once when we print out the names of each product and once when we count the total number.  This is because by default LINQ uses lazy evaluation.  Until you perform any action on the result of the Select query it has not been evaluated.  We can fix this by forcing eager evaluation of the LINQ query.  An easy way to do this is to use the ToList() method.  ToList() is a good choice for this as it's generic and can be assigned to IEnumerable directly.  It forces the collection to be completely iterated and collects all items in a List<T>.  This would leave us with the following code in our product repository.

public class ProductRepository
{
    public DatabaseReader DatabaseReader;

    public IEnumerable<Product> GetProducts()
    { 
        return DatabaseReader.ReadNamesFromDatabase().Select(name => new Product(name)).ToList();
    }
}

This is an important concept to be aware of but it also allows LINQ to perform very well in certain situations.  It allows us to write expressive queries that perform well by allowing you to chain queries together.  Let's look at an example of this

public void GetSum()
{
    var sa = new[] {
      "#33",
      "#22",
      "% this is text",
      "#18"
    };

    var sume = sa.Where(s => s.StartsWith("#")).Select(s => s.Substring(1)).Select(s => int.Parse(s)).Sum();

    Console.WriteLine(sum);
}

This is a very simple example but already you can see the problems that we can get into by chaining together multiple LINQ queries.  We soon end up with one line of code that although may be functionally correct is extremely hard to read.  Fortunately because of the lazy evaluation of LINQ we can split this out into multiple lines and it is just as efficient as if we had written it in a single line.  This is because although we have split them out all the queries yield collections so that no intermediate collections are materialized.

public void GetSum()
{
    var sa = new[] {
      "#33",
      "#22",
      "% this is text",
      "#18"
    };

    var justNumbersAsStrings = sa.Where(s => s.StartsWith("#")).Select(s => s.Substring(1));

    var numbersAsInts = justNumbersAsStrings.Select(s => int.Parse(s));

    var sum = numbersAsInts.Sum();

    Console.WriteLine(sum);
}

LINQ is where you will come across this lazy evaluations most often because of their use of the yield statement, however, you can also see it whenever you chain together methods that all return IEnumerable<T> by using the yield keyword

Fortunately if you have ReSharper installed it's clever enought to work out when lazy evaluations could be an issue and will give you a helpful hint to warn you about it.

An interesting bit of trivia about this issue of lazy evaluation is that it is known as the Halloween problem, so called because researchers first discovered it on October 31st and the bugs appeared to be slightly spooky in nature.