.NET 4.0 and ConcurrentBag

Writing

Inside of .NET 4.0 there are numerous new namespaces, but the one that we are here to talk about today is called System.Collections.Concurrent. This namespace contains a handful of types which implement different types of thread-safe collections. The thread-safe collection type that we are going to take a look at today is the ConcurrentBag<T>.

The ConcurrentBag<T> is one of the most simple concurrent types which has been introduced. It is a typical bag data structure (also known as a multiset), meaning that it is an unordered collection of items that allows duplicates. It is called a bag because if you were to draw a diagram of the data structure, it would look something like this:

image

Just a jumble of items, with no order at all. It has three important methods, which are “Add”, “TryTake”, and “TryPeek”. Add works exactly as you would expect, you simply instantiate a ConcurrentBag instance and then call Add in order to put items into it:

var cb = new ConcurrentBag<string>();
cb.Add("test");

“TryTake” operates by allowing us to remove an item from the bag and return it back to the caller. It has the signature “bool TryTake(out T result)” which means that if the bag is empty, then false will be returned. If we wanted to take a single item from the ConcurrentBag, we would do it like this:

string val;
if (cb.TryTake(out val))
{
    Console.WriteLine(val);
}

“TryPeek” allows us to look at the next item, but not remove it from the bag. This isn’t quite as useful in a multi-threaded scenario, since you could peek an item, but then have another thread remove it before you get to it. Keep that in mind while you are implementing. But if you wanted to peek at an item, you could do it like this:

string val;
if (cb.TryPeek(out val))
{
    Console.WriteLine(val);
}

ConcurrentBag also implements IEnumerable<T>, so you can iterate over it in the same way that you would any class that supports IEnumerable. (And also execute LINQ queries against it) One big difference is that this operation is thread safe as well, and you can enumerate the ConcurrentBag while some other thread is adding items to it (I am using Tasks from the System.Threading.Tasks namespace in order to fire off async operations. Go read my post on Tasks if you have never used them):

var cb = new ConcurrentBag<string>();
Task.Factory.StartNew(() =>
{                
    for (int i = 0; i < 1000; i++)
    {
        cb.Add("test");
    }
    cb.Add("Last");
});

Task.Factory.StartNew(() => {
    foreach (string item in cb)
    {
        Console.WriteLine(item);
    }
}).Wait();

In the above code we are firing off one task which will start adding items to the bag, then we fire off a second task which starts to write them out to the console. In order to implement the enumerable as thread-safe, the GetEnumerator method returns a moment-in-time snapshot of the ConcurrentBag as it was when you started iterating over it. In this way, any items added after the enumeration started won’t be seen while iterating over the data structure. This is done by calling “ToArray” and traversing the entire bag and putting each item into an array. In the code above, you are unlikely to ever see “Last” written out to the console since the enumeration will almost certainly start occurring before it is added to the bag.

Two final operations that you can perform on the ConcurrentBag are to check to see if it is empty and to check the number of items currently contained inside it. These are performed using the IsEmpty and Count properties respectively. I would explain these, but I think that you can probably figure them out. It is worth noting that with multi-threaded implementations of ConcurrentBag, you can’t rely on the results of these operations, since in the time between when you check and when you perform an operation, another thread could have added or removed items.

Another very interesting fact about the concurrent bag is the way in which it implements its multi-threaded ability. The ConcurrentBag is optimized for multiple threads to all be adding and removing items at the same time. Because of this, internally the ConcurrentBag keeps a local queue of items for each thread that is accessing it. This is done using an internal type called ThreadLocalList. So while the conceptual diagram above showed the ConcurrentBag as a big jumble of items, in reality it looks more like this:

image

This way, when multiple threads are accessing the bag, when they are hitting their own queue, no locking needs to happen since only one thread is accessing it. However, if one thread goes to take an item and finds that its local queue is empty, it will then perform some locking and look at one of the queues from another thread in order to try to find an item. In this way, if each thread is pushing and consuming its own items then you can get very high throughput with no locking. People will often refer to this as a “work stealing algorithm”, since once a thread finishes up its own items, it will try to steal items from other threads queues.

Summary

The ConcurrentBag has a very simple API, but its ability to efficiently hold an unordered, arbitrary number of items and allow efficient thread safe access makes it an invaluable item in your parallel programming tool belt. Stay tuned, in my next post we will be looking at another new member in the System.Collections.Concurrent namespace, the ConcurrentStack.

Loved the article? Hated it? Didn’t even read it?

We’d love to hear from you.

Reach Out

Comments (13)

  1. The MSDN docs disagree with your statement:

    "The ConcurrentBag is optimized for multiple threads to all be adding and removing items at the same time."

    MSDN says:

    "ConcurrentBag(T) is a thread-safe bag implementation, optimized for scenarios where the same thread will be both producing and consuming data stored in the bag."

    As you write later, the bag is indeed efficient when more than one thread is accessing it – if they are working on a private set of items – but MSDN insists it’s been optimised for single threaded access.

  2. @Rik It hasn’t been optimized for single threaded access. It is has been optimized for a bunch of threads both adding and removing their own items. So a bunch of threads that are each working with their own data set. Once threads start stealing work from each other, performance will suffer. It is worded a bit odd on MSDN.

  3. @Justin That’s good to know – their documentation made me feel slightly depressed.

    Also, I’m becoming more prone to writing things incorrectly as time goes on. Ten years ago I’m sure I’d never have accidentally written ‘loose’ instead of ‘lose’. I blame other people; it can’t possibly be my brain that’s faulty.

  4. Hey Justin,
    just wanna thank you for your great article and the whole blog. Also I love you screencast at TekPub.

    Thanks for your effort.

    Greets from germany 🙂

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

More Insights

View All