HashSet versus Dictionary Cage Match

Writing

In my last post I introduced you to the new HashSet class in .net 3.5. I showed you how easy it was to do a plethora of set operations using this class. Well, Dave over at Encosia saw this class and thought it might be a good fit for a program he had wrote that searched through variable length letter permutations to find items in a list of valid strings. He wondered if it might be faster to load up a HashSet and do an intersect operation than the current method he had of using the Dictionary class to lookup valid words. So, I asked him to send me over the source so that I could do a bit of performance testing. He happily sent me the source, and so here is my performance comparison between the two.

Method 1: This method uses a Dictionary. It first opens up the word file and loads up the Dictionary with the words. It then takes the input string and generates all the permutations of it. After each permutation is generated it checks the Dictionary of valid words using the "ContainsKey" method.

Method 2: This method uses a HashSet. It starts off like Method 1 and opens the word file and loads the HashSet with the valid words. It then generates all of the permutations and loads them into a second HashSet. It then calls the "IntersectWith" method on the first HashSet passing in the second HashSet which returns the valid words.

The tests were run in a loop of 10 passes with 500 lookups per pass. Lets look at the results and then discuss:

HashSet versus Dictionary

As you can see, in the 4, 5, and 6 letter combinations the numbers look very similar. For 6 letters it took roughly 650 milliseconds to do 500 looks-ups. When we hit the 7 letter mark things start looking very different. The Dictionary begins to pull ahead by a good margin. The HashSet had an average time of 5529 milliseconds and the Dictionary averaged 4495 milliseconds. A different of more than 20%! So, why the sudden difference? Well, the HashSet method goes through all of the possible permutations and loads up a giant HashSet object with all of those permutations. So, at 6 letters we have 1956 possible permutations, but since it jumps up to 13700 permutations at 7 letters, it begins to seriously affect the performance of the HashSet method.

Now, at this point you may be saying "Heyyyyy! 6 letters only has 6! (720) possible combinations!" And you would be partially correct, but we are doing all combinations including lengths of 5,4,3,2, and 1. The formula for calculating this is:

Permutation Formula

In this forumla n is the set size and r is the size of the permutation. So 6 letters is actually 6!/(6-6)! + 6!/(6-5)! + 6!/(6-4)! etc…

So, looking at this you might think that the choice between Dictionary and HashSet is easy if you are dealing with large sets, and you’d be wrong. 🙂 The issue is that the HashSet spent most of its time loading up the second HashSet in order to do the "Intersection" operation. Well, the HashSet also has a "Contains" method so we really didn’t need to load up the second HashSet at all! If we switch out the HashSet in the Dictionary method and replace "ContainsKey" with the "Contains" method we end up with almost identical performance even at 7 characters.

So, in the end, these two classes have almost identical lookup speeds, the real question is whether you need the ability to assign a key to a piece of data (the dictionary has a seperate key and value type, the HashSet does not) or whether you need the robust set operations that are given to you by the HashSet object.

And finally, if you want to see the code that was generating the permutations for the HashSet and the Dictionary, here it is:

  //HashSet Permutations
  public void GeneratePermutations(HashSet<string> permutations, 
    string prefix, List<char> letters)
  {
    for (int i = 0; i < letters.Count; i++)
    {
      _permutations++;
      string word = prefix + letters[i];
      List<char> j = new List<char>(letters);
      permutations.Add(word);
      j.RemoveAt(i);
      GeneratePermutations(permutations, word, j);
    }
  }
 
  //Dictionary Permutations
  protected void GeneratePermutations(List<string> results, 
    string prefix, List<char> letters)
  {
    for (int i = 0; i < letters.Count; i++)
    {
      _permutations++;
      string word = prefix + letters[i];
      List<char> j = new List<char>(letters);
 
      if (CheckWord(word))
        results.Add(word);
 
      j.RemoveAt(i);
      WordSearch(results, word, j);
    }
  }

More Insights

View All