Friday, October 13, 2017

A brief study in optimization (Part 3)

You're back!  

Of course I am. We are the same person after all.

Touche. So when we left off in part 2, we had an algorithm that could get us all triangle square numbers below 100,000, but it was very slow. How slow? Not really all that slow. (About 4 milliseconds on my PC on average.)

But you said that it was slow! 

Yes, it is. Try modifying it so that we find all triangle square numbers below 10,000,000 or even 100,000,000, and you'll see how slow it gets. Which leads us to the first rule of optimization.

What's the first rule of optimization, smart guy? 

The first rule of optimization is don't.

Wait, so this series is a sham? 

No. The general rule with optimization is that you don't optimize unless it's necessary. If we only wanted to generate the triangle square numbers below 10,000 then the algorithm we implemented in part 2 is just fine. But we want more. So the first step is to actually time our code, and get an idea how fast or slow it is, and where there are bottlenecks. So that it's clearer, I modified the code to look for all triangle square numbers below 10,000,000. I ran the code several times, and here are the results:

Average Time to Complete: 0.443 seconds
Generate Triangle Numbers: 0.0
Generate Square Numbers: 0.0
Find Triangle Square Numbers: 0.443
Obviously, the merge step at the end is the dominant bottleneck in this process, and a little analysis will tell us why.

For each item in the list of triangle numbers, we look at every single number in the list of square numbers until we're found a matching number. In the worst case (a triangle number that's not a square number) we have to iterate through the entire list of square numbers.

That sounds inefficient.

Because it is. Lets look at another way of merging these lists, and see if we can eliminate the nested for loop. Here's the code for the new merge:

triangle_square_numbers = []
count_triangles = len(triangle_numbers)
count_squares = len(square_numbers)
idx_triangle = 0
idx_square = 0
while idx_triangle < count_triangles and idx_square < count_squares:
  if triangle_numbers[idx_triangle] == square_numbers[idx_square]:
    triangle_square_numbers.append(triangle_numbers[idx_triangle])
    idx_triangle = idx_triangle + 1
    idx_square = idx_square + 1
  elif triangle_numbers[idx_triangle] < square_numbers[idx_square]:
    idx_triangle = idx_triangle + 1
  else: # triangle_numbers[idx_triangle] > square_numbers[idx_square]
    idx_square = idx_square + 1

The key of this approach is the while loop:

while idx_triangle < count_triangles and idx_square < count_squares:

This while loop looks a bit complicated, but it allows us to loop through both arrays simultaneously. It's like a for loop with two index variables.

The actual functionality is the three conditions. We know that given two numbers A and B, that one of the following conditions will be true:
  • A = B
  • A < B
  • A > B
The first condition is the easiest. If the two numbers are equal, then it's a triangle square number. We store the number in our list of results, and increment both indices, since we've 'used' the number from both.

The second condition means that the current triangle number is less than the current square number, and so we can move to the next triangle number.

The third condition is the opposite of the second. We have finished with this square number, so we move on to the next.

We know at the end that we haven't missed any, because the two lists are in numeric order.

That makes zero sense. Can you explain it so I can try it with real physical objects?

Ok, try this:
  1. Take two suits from a deck of cards. 
  2. Separate the two suits into two piles. 
  3. Randomly remove five cards from each pile.
  4. Sort each pile into ascending order (A - 2 - 3 - ... - J - Q - K), face up.
  5. If the card's values match, take the card from the left pile and place it face down in a third pile. (This is the result pile)
  6. If the value of the card on the left is less than the value of the card on the right, discard the card on the left.
  7. If the value of the card on the right is less than the value of the card on the left, discard the card on the right.
  8. If you have cards left in both piles, go back to step 5. If not, you're done.
 (For bonus points, use the same card method to try out the previous implementation.)

I see how it works now, but what did it do to our execution time?

I'm glad you asked. Looking just at the merge part of the algorithm, we can see a marked change in execution time:

Old Method 0.446
New Method 0.002

You can see that the time is a bit better. Next time we'll look at a better way

No comments:

Post a Comment

All comments are moderated. I have a life, so it may take some time to show up. I reserve the right to delete any comment for any reason or no reason at all. Be nice. Racist, homophobic, transphobic, misogynist, or rude comments will get you banned.

Programmer vs Software Engineer: The Interview

A common question presented in interviews for developer positions goes something like this: Given an array of numbers, write a function th...