The TekPub LINQ Challenge And The Sieve Of Eratosthenes

In my last post about the TekPub LINQ Challenge I asked if anyone wanted to post an algorithm which was faster than the solution that Nate had posted up on his blog. There are a few known algorithms for finding positive prime numbers, and one of those is called the Sieve of Eratosthenes. Well, it turns out that Steve Strong had already stepped up to the challenge and implemented this algorithm. Well done Steve!

Explanation

The Sieve of Eratosthenes rules goes as follows (as explained by Wikipedia):

To find all the prime numbers less than or equal to a given integer n by Eratosthenes’ method:

Create a list of consecutive integers from two to n: (2, 3, 4, …, n).

Initially, let p equal 2, the first prime number.

Strike from the list all multiples of p greater than p.

Find the first number remaining on the list greater than p (this number is the next prime); let p equal this number.

Repeat steps 3 and 4 until p² is greater than n.

All the remaining numbers on the list are prime

Steve’s implementation is as follows:

int max = 100;
var primes = Enumerable.Range(1, max)
    .Where(i => i > 1)
    .Aggregate(Enumerable.Range(2, max - 1).ToArray(), (sieve, i) =>
        {
            if ((i > Math.Sqrt(max)) || (sieve[i - 2] == 0)) return sieve;
            for (int m = 2; m <= max / i; m++)
                sieve[i * m - 2] = 0;
            return sieve;
        })
    .Where(n => n != 0);

We start with a range from 1 to n, and then we get all of the numbers greater than 1. Next we create a range from 2 to max – 1. We pass this into the aggregate method as a seed (we will use this array to accumulate our result), and then we run an accumulator method which takes each number in our initial range and runs it through the sieve.

The sieve first checks to see if the current number of greater than the square root of max. Since max doesn’t change we could certainly pull this value out of the LINQ query so that we don’t keep calling Math.Sqrt. Since Steve was following the rules of the challenge, he just left it in like this. We then check to see if the current number in the array, which is at index "i – 2", is equal to zero. If it is zero, this means that it has already been removed from the list and doesn’t need to be checked. Next we loop through and remove all multiples of "i" in the list. And then return the sieve. We just keep doing this over and over until eventually we have removed all non-prime numbers. Then we just filter the list to return only the non-zero numbers, and there we have a list of primes. Pretty awesome!

Initial Results

So, at first glance it is hard to tell if this algorithm is going to be faster. And so we could analyze it and use big O notation to represent the performance of this algorithm, but I think in this case it’ll be easier to just run it. 🙂 Besides, the Wikipedia page lists the complexity of this algorithm as O(n(logn)(loglogn)) which I’m going to go out on a limb here and say that I wouldn’t be able to reach this. So, here are the numbers from Steve’s algorithms run up against the optimized brute force approach from my previous post (I ran the brute force approach numbers again, to make sure that everything was fair, and because we didn’t run a single threaded test on 20 million with the brute force approach last time):

Primes Up To	Brute Force	Sieve Of Eratosthenes
2000	.001 seconds	.007 seconds
200,000	.193 seconds	.041 seconds
2,000,000	3.3 seconds	.309 seconds
20,000,000	79 seconds	3.5 seconds

The numbers look great! It starts off a bit slower at the very beginning, but then quickly pulls ahead at the second test. By the time we reach 20 million, there is really no comparison. In this case, a little bit of research and forethought has improved the algorithm by an absurd amount. But let’s see if we can speed this up a bit more.

Optimizations

The first thing I am going to do is ditch the call to Math.Sqrt and move it outside of the LINQ statement. Steve was following the rules of the contest and so he left this inside of the LINQ expression. I am however not bound by any such rules. Since I created them!

var primes = Enumerable.Range(1, max)
    .Where(i => i > 1)
    .Aggregate(Enumerable.Range(2, max - 1).ToArray(), (sieve, i) =>
    {
        if ((i > maxSquaredRoot || (sieve[i - 2] == 0)) return sieve;
        for (int m = 2; m <= max / i; m++)
            sieve[i * m - 2] = 0;
        return sieve;
    })
    .Where(n => n != 0);

Next I am going to remove that check against the square root of max up into the range creation. This way we aren’t even feeding in values which are too big into the aggregate:

var primes = Enumerable.Range(1, (int)maxSquareRoot + 2)
    .Where(i => i > 1)
    .Aggregate(Enumerable.Range(2, max - 1).ToArray(), (sieve, i) =>
    {
        if (sieve[i - 2] == 0) return sieve;
        for (int m = 2; m <= max / i; m++)
            sieve[i * m - 2] = 0;
        return sieve;
    })
    .Where(n => n != 0);

Then I’ll change my range to start at 2 so that I can remove the "Where" filter completely:

var primes = Enumerable.Range(2, (int)maxSquareRoot + 2)
    .Aggregate(Enumerable.Range(2, max - 1).ToArray(), (sieve, i) =>
    {
        if (sieve[i - 2] == 0) return sieve;
        for (int m = 2; m <= max / i; m++)
            sieve[i * m - 2] = 0;
        return sieve;
    })
    .Where(n => n != 0);

Optimized Results

So, we have implemented a few simple optimizations, let’s look at some new numbers:

Primes Up To	Sieve Of Eratosthenes (Original)	Sieve Of Eratosthenes (With Optimizations)
2000	.007 seconds	.011 seconds
200,000	.041 seconds	.029 seconds
2,000,000	.309 seconds	.190 seconds
20,000,000	3.5 seconds	2.2 seconds

Great! It shaved even more off the already very quick times. Do you see anymore quick optimizations that we could make to this to speed it up even more?

Summary

I guess this goes to show you that while premature optimization may be the root of all evil, if you need performance in your application, it may behoove you to do a bit of research before you jump right in to implementing a naive solution in your application. The key is to test and benchmark to make sure that you really do need the optimizations before you start spending time on it. I hope you have found this little exercise interesting, I know that I sure have.

Loved the article? Hated it? Didn’t even read it?

We’d love to hear from you.

Reach Out

Comments (10)

Sidney Andrews says:

January 12, 2010 at 1:05 pm

That’s pretty awesome, never heard of that technique to find Prime numbers before.

Conker says:

January 12, 2010 at 2:25 pm

Justin, can you post results by using parallel LINQ ?

Gene says:

January 12, 2010 at 3:33 pm

I’m also curious about the parallel results.

alwin says:

January 13, 2010 at 6:33 pm

You could cache the
[quote]Enumerable.Range(2, max – 1).ToArray()[/quote]
just as you did with
[quote]Math.Sqrt(max)[/quote]

Justin Etheredge says:

January 13, 2010 at 7:05 pm

@alwin It might be deceptive, but the range is being passed in as a seed to the aggregate method, so it is only being executed once.

Jonas Elfström says:

January 14, 2010 at 2:56 pm

I tried to parallelize the inner for loop like this:

Parallel.For(2, max / i + 1, m => { sieve[i * m – 2] = 0; });

Before doing that the load was on one CPU only and by using Parallel.For it was almost 100% on both (Core 2 Duo). Sadly the speedup was marginal, from 2.2s to 2.1s. Maybe we’ve hit the memory bandwidth wall?

Justin Etheredge says:

January 14, 2010 at 3:12 pm

@Jonas I think that the operation happening in this loop is too fast, and so the overhead of making it parallel pretty much cancels out any performance gains you might get.

Jonas Elfström says:

January 14, 2010 at 5:45 pm

I implemented The Sieve of Atkin. Not in LINQ but it scales.
See:
http://alicebobandmallory.com/articles/2010/01/14/prime-factorization-in-parallel

Jason Clark says:

January 15, 2010 at 12:13 am

You could change the inner for loop to not use multiplication. The for loop is really just marking ever "i" element 0.

int max = 100;
int maxSquareRoot = (int)Math.Sqrt(max);

var primes = Enumerable.Range(2, maxSquareRoot + 2)
.Aggregate(Enumerable.Range(2, max – 1).ToArray(), (sieve, i) =>
{
int m = i – 2;
if (sieve[m] == 0) return sieve;
while((m += i) <= max – 1)
sieve[m] = 0;
return sieve;
})
.Where(n => n != 0);

guest says:

October 13, 2011 at 12:48 pm

this is my solution..i know it includes some hardcoding values but it is only one line

var primes = Enumerable.Range(1, 20150).Where(x => x == 2 || x == 3 || x == 5 || x == 7 || (x> 10 && x%2 != 0 && x%3 != 0 && x%5 !=0) );

what do you think?

The TekPub LINQ Challenge And The Sieve Of Eratosthenes

Explanation

Initial Results

Optimizations

Optimized Results

Summary

Comments (10)

Leave a Reply Cancel reply

Leave a comment

More Insights

The Philosopher King, the Demiurge and the Programmer

The Path for Energy Transformation: Vision and Purpose

Running Celery 5 on Windows

We're here to help.