Infinite Lists With C# Yield

Writing

First I have to give props because I saw a post about the yield keyword on the ytechie blog and it is what spurned me to elaborate a bit on the benefits and uses of the yield keyword. I have actually blogged before about the yield keyword, but I don’t feel bad bringing it up again since it is truly one of the coolest features in the C# language.

So anyways, what we are about to talk about here may seem like a purely academic exercise, but hopefully I will convince you that this isn’t exactly the case. But first, in order to discuss this topic we are going to need to define exactly what we mean by an infinite list. Instead of giving you a definition, I am just going to give you the example of the set of natural numbers. Natural numbers are simply the set of counting numbers 1, 2, 3, 4, 5, etc…

If we wanted to write a small program to produce this list, then you could write something like this (I’m using a BigInt class provided by Carl Johansen):

static void Main()
{
    foreach (BigInt value in NaturalNumbers())
    {
        Console.WriteLine(value);
    }

}

public static IEnumerable<BigInt> NaturalNumbers()
{
    var result = new List<BigInt>();
    BigInt value = 0;

    for(;;)
    {
        value++;
        result.Add(value);
    }

    return result;
}

Which while being somewhat logically correct, will never do anything because not only will we run out of memory, but even if we did have infinite memory the loop would still never complete. And yes, this code will compile, in case you were going to ask. All parts of the “for” loop are optional.

Now if we want this to actually run, then we could make a few small modifications like this:

static void Main()
{
    foreach (BigInt value in NaturalNumbers())
    {
        Console.WriteLine(value);
    }
}

public static IEnumerable<BigInt> NaturalNumbers()
{            
    BigInt value = 0;
    for(;;)
    {
        value++;
        yield return value;
    }
}

That is quite delicious, it is rare that we get to remove code in order to get it to work! But how is this working? Well, the “yield return” statement essentially returns one value at a time back from the function. This is a form of continuation, or a really weak form of coroutine.

Now let’s say that we want to filter our numbers to only even numbers, we can do this:

static void Main()
{
    foreach (BigInt value in EvenNumbers(NaturalNumbers()))
    {
        Console.WriteLine(value);
    }
}

public static IEnumerable<BigInt> NaturalNumbers()
{            
    BigInt value = 0;
    for(;;)
    {
        value++;
        yield return value;
    }
}

public static IEnumerable<BigInt> EvenNumbers(IEnumerable<BigInt> numbers)
{
    foreach (BigInt number in numbers)
    {
        if (number % 2 == 0)
        {
            yield return number;
        }
    }
}

One by one each of our natural numbers will be passed into “EvenNumbers” where they will be skipped if they are odd and returned if they are even. But we can take this a bit further by using extension methods…

public static class Extensions
{
    public static IEnumerable<BigInt> EvenNumbers(this IEnumerable<BigInt> numbers)
    {
        foreach (BigInt number in numbers)
        {
            if (number % 2 == 0)
            {
                yield return number;
            }
        }
    }
}

So now our filtering looks more natural:

static void Main()
{
    foreach (BigInt value in NaturalNumbers().EvenNumbers())
    {
        Console.WriteLine(value);
    }
}

But we don’t necessarily have to write our own extension, it is almost as easy to just use the Linq “Where” method:

static void Main()
{
    foreach (BigInt value in NaturalNumbers().Where(n => n % 2 == 0))
    {
        Console.WriteLine(value);
    }
}

See how easy that is! Now what if we don’t want to take every single even number, we only need the first thousand? Well, that is extremely easy as well:

static void Main()
{
    foreach (BigInt value in NaturalNumbers().Where(n => n % 2 == 0).Take(1000))
    {
        Console.WriteLine(value);
    }
}

Almost seems wrong how easy it is. But I originally said that this wasn’t going to be a purely academic exercise, so either I have something else to show you or I lied. Well, I do have something else to show you.

So let’s think about where in real life we hit infinite lists. The first thing that comes to my mind is when processing streams or when checking queues for work items. We are going to look at the queue, but first we are going to have to implement a simple thread-safe queue:

public class ThreadSafeQueue<T>
{
    private Queue<T> queue = new Queue<T>();

    public  int Count
    {
        get
        {
            lock (queue)
            {
                return queue.Count;
            }    
        }            
    }

    public void Enqueue(T item)
    {
        lock (queue)
        {
            queue.Enqueue(item);    
        }            
    }

    public T Dequeue()
    {
        lock (queue)
        {
            return queue.Dequeue();    
        }            
    }

    public T Peek()
    {
        lock (queue)
        {
            return queue.Peek();
        }
    }        
}

Very simple and written in 2 seconds, so don’t critique 🙂 Now let’s think about what we want to do here, we need to start processing all items in the queue and then wait if we have nothing left to do. Using “yield return” this is an incredibly simple thing to do. But first we have to get items on the queue, so we will define a method which will add items at random intervals:

public static void AddItemsToQueue(Object state)
{
    var rand = new Random();
    
    var queue = (ThreadSafeQueue<BigInt>) state;
    BigInt value = 0;
    for (;;)
    {
        value++;
        queue.Enqueue(value);
        Thread.Sleep(rand.Next(100, 1000));
    }
}

Now we can define our method which will process the items in the queue:

public static IEnumerable<T> ProcessQueue<T>(ThreadSafeQueue<T> queue)
{
    for(;;)
    {
        if (queue.Count == 0)
        {
            Thread.Sleep(3000);
        }
        yield return queue.Dequeue();
    }
}

Now all we have to do is use them:

static void Main()
{
    var queue = new ThreadSafeQueue<BigInt>();
    
    ThreadPool.QueueUserWorkItem(AddItemsToQueue, queue);
   
    foreach (BigInt value in ProcessQueue(queue))
    {
        Console.WriteLine(value);
    }
}

And there we go! Let’s walk through it. First we create our queue and we pass it to a background thread in order to start randomly adding items to it. Then when we hit the foreach loop we start pulling items one by one off of the queue, pausing when the queue is empty and then resuming when more items are in the queue. We are currently pausing three seconds when the queue is empty, but there really isn’t any reason to do that in a production application.

So what we have seen here is that if you think of an infinite list as not just a huge list of repeating numbers and think of it as any process that has a potentially unending stream of data that needs to be processed in discreet chunks, then using the yield keyword may be for you. Even though what we implemented above wouldn’t have been terribly complicated to implement in other ways, you can start thinking about how much easier it would make your code if you needed to do something like pull off all data out of a stream until you saw a particular character and then return that data, but keep monitoring the stream in order to return data once the character is seen again. There are numerous clever ways in which yield return can be used, it is just a matter of finding them. Let me know the best uses you have come up with!

Comments (10)

  1. This is not an academic excercise. This concept was Actually introduced as a concept for uncrackable encryption. Essentially one party sends a continuous stream of data masked in graphics I’d audio etc nfinitely and inside the continuous stream there are message signatures inside that can be picked off. So this actually has uses.

  2. Just found your blog and have read through some entries. Love it.

    I can’t really not keep quiet about your Thread Safe queue 🙂

    Naming it thread safe implies that it’s thread safe, but it’s really not.

    Take this code sample for example:

    ThreadSafeQueue<User> queue = new ThreadSafeQueue<User>();

    User user;
    if (queue.Count > 0)
    user = queue.Dequeue();
    else
    user = null;

    That code could throw an exception, since another thread might Dequeue between the Count and and the Dequeue call.

    You could provide a SyncRoot property to let the user do syncronization, or simply change your Dequeue to return null if there are no more objects. Won’t work with value types, but who gives a shit? 😉

  3. @Jonas Thanks for the comment! And yes, you are right, it isn’t truly thread-safe. I should have exposed the SyncRoot! Changing Dequeue to return null wouldn’t work, I’d have to return "default" since T could be a primitive type. And that would mean that it would return 0 for things like int, which is probably not acceptable! 🙂

    Nice catch!

  4. I am still coming to grips with the yield.

    So what is the advantage of a yield over a normal queue?
    Why not just pop items out if a queue in a loop and have another threat add them to the queue?
    What advantage is there of of putting the whole thing in an IEnumerator/ yield?

  5. Everyone who writes higehr order functions (=functions that take other functions as parameters and/or return functions) must have written this ForEach extension method on IEnumerable for himself already.Think I saw a blog post about the reason why it isn’t in the BCL already (something to do with duplicating the foreach statement or something, giving people more than one way to do the exact same thing, which could get confusing), but I think it should have been in the BCL. Or List.ForEach() should not have existed, right?

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

More Insights

View All