The TekPub LINQ Challenge Part 2 – Faster Algorithms

Writing

In order to challenge others to the TekPub LINQ challenge, I felt like I had to create my own solution first, before I challenged anyone else to the task. What I came up with is an algorithm that works, and is also terribly inefficient. There are a few optimizations that can be made in order to speed it up though.

My original solution looked like this:

var primes = Enumerable.Range(1, 20)
    .Where(i => i != 1 && !Enumerable.Range(2, i - 2).Any(j => i % j == 0));

Here you can see that we take our range, check to make sure that 1 is not involved (since 1 is not prime!) and then take the range from 2 to one less than the number and then use “Any” LINQ extension method to check to see if any of the numbers in between divide evenly into it. A brute force approach, which works, but it is not super efficient.

Let’s Pull In Someone A Bit Smarter

Well, my friend Nate over at Kohari.org created an implementation very similar to mine, but had one major difference, and that was leveraging the fact that for any number, in order to determine if the number is prime, you only need to check for divisors less than the square root of the given number. Like this:

var primes = Enumerable.Range(1, 20)
    .Where(x => x != 1 &&
      !Enumerable.Range(2, (int)Math.Sqrt(x)).Any(y => x != y && x % y == 0));

This is a very interesting law, so let’s look at that a bit closer. Let’s say that we want to check if 10 is prime. The square root of 10 is about 3.2, so we only need to check 3 and 2 to see if 10 is evenly divisible by these numbers. So why is this? Well, it is fairly simple, but not immediately obvious.

If we are only checking 3 and 2 to see if 10 is prime, then what about 5? Well, the idea here is that 10 is evenly divisible by 5, and so there must be some number n where 5 x n = 10. This number is 2. Since 2 is smaller than 5, we don’t need to try 10 against 5 because we will figure out 10 is not prime by dividing by 2. In this case, no matter what number we change 5 to, if n is smaller than that number, n is always going to be smaller than the square root of the number we are checking, or it is going to also have a factor smaller than the square root of the number we are checking. Pretty cool, huh?

As you can imagine, this has a huge impact on the performance of this algorithm. How much? Well, I’m glad you asked.

Primes Up To My Algorithm Nate’s Algorithm
2000 .011 seconds .001 seconds
200,000 31 seconds .2 seconds
2,000,000 2,400 seconds (40 minutes) 4.5 seconds

Let’s Use All Of Those Cores

Wow. You really start to see the difference when you hit 2 million! Hmmm, now what would happen to these numbers if we were to use Nate’s algorithm, but with parallel LINQ? All we would have to do is add “AsParallel” to the LINQ query like this:

var primes = Enumerable.Range(1, 20).AsParallel()
    .Where(x => x != 1 &&
      !Enumerable.Range(2, (int)Math.Sqrt(x)).Any(y => x != y && x % y == 0));

And look at what we get!

Primes Up To Nate’s Algorithm With Parallel LINQ
2000 .029 seconds
200,000 .155 seconds
2,000,000 2.6 seconds
20,000,000 65 seconds

As expected, the parallel version is quite a bit slower for the smaller numbers due to the required overhead. When we get into larger numbers, we see that on my dual core machine we see an almost 2 times speedup.

Summary

So there you have it. A tiny bit of research, and a little bit of parallel LINQ goodness had led to a speedy algorithm that beats out my naive implementation by an absolutely absurd margin. Now, there are other algorithms out there that are even more efficient, especially for larger primes. What do you all think, are you up for creating a LINQ query which will beat this algorithm? What are your trade-offs? How does your algorithm work? Let us all know!

Loved the article? Hated it? Didn’t even read it?

We’d love to hear from you.

Reach Out

Comments (9)

  1. Why not enumerate from 2 so that you don’t have to x!=1 20,000,000 times?

    var primes = Enumerable.Range(2,20000000).AsParallel()
    .Where(x => !Enumerable.Range(2, (int)Math.Sqrt(x)).Any(y => x != y && x % y == 0));

    I’m not sure of the overhead of AsParallel() but you could also try to filter out the even numbers before AsParallel with something like
    Enumerable.Range(2,20000000).Where(n => n % 2 != 0) – see http://alicebobandmallory.com/articles/2009/10/06/the-thrush-combinator-in-c for more examples.

    You will lose the only even prime number from the result but that could be added manually.

  2. I’m happy you discovered that there is no reason to look for divisers past sqrt(x). This is a basic math for 6th grade, dude 🙂

    The second query be easily simplified and will be alot faster , especially for greater numbers.

    Initial range can started from 2 – this removes x!=1 check. x!=y check can be removed from second range check as well:

    var primes = Enumerable.Range(2, 200-1).Where(x => !Enumerable.Range(2, (int)Math.Sqrt(x)-1).Any(y =>x % y == 0));

    Now is it possible to write a query that uses prime numbers in "previous" search , using the fact that you only need to check if number is divisible by prime numbers ?

  3. @Jonas Good idea, it’ll be interesting to see at what point filtering the even numbers is less overhead than performing the operation. And yes, the initial range could have started at 2, but the rules of the challenge were that it had to start at 1.

    @Conker I’m glad that you remember ever rule from your 6th grade math 🙂 My immediate reaction was that I only needed to go up to n/2, but obviously this isn’t the case. And yes, I suppose that we could implement a system where we could store previous primes and use those to divide against. Care to take a stab at it?

  4. I found a solution that uses such approach on http://tunatoksoz.com/post/Tekpube28099s-Mastering-Linq-Challange.aspx.
    Unfortunately , while looking good at paper, in reality its far far worse than your and Nate’s approach. I not sure if its bad optimization of LINQ engine in this case, or some kind of memory allocation problem.
    Another link related to this (http://blogs.microsoft.co.il/blogs/sasha/).
    He suggest to use sieve_of_erathosthenes -http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes, maybe someone can implement it strictly according to rules ?

  5. @Justin: You should try the Memoization technique I mentioned at http://www.jprl.com/Blog/archive/development/2010/Jan-08.html.

    In particular, you’ll want Bart’s Memoize() extension method: http://bartdesmet.net/blogs/bart/archive/2008/10/21/memoization-for-dummies.aspx

    Specifically:

    static Func<T, R> Memoize<T, R>(this Func<T, R> f)
    {
       Dictionary<T, R> cache = new Dictionary<T, R>();
       return t => {
          if (cache.ContainsKey(t))
             return cache[t];
          else
             return (cache[t] = f(t));
       };
    }

    The downside here is that when you’re processing 20million elements, that Dictionary is going to be large — it will contain ~1189680 keys (as per http://primes.utm.edu/howmany.shtml), requiring at minimum 9.5MB of memory (likely more). This might actually be a reasonable tradeoff here, assuming that the dictionary lookup (~O(1)) is faster than the computation it replaces.

    Care to test the performance of that? 🙂

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

More Insights

View All