Monday, October 15, 2018

Programmer vs Software Engineer: The Interview

A common question presented in interviews for developer positions goes something like this:
Given an array of numbers, write a function that returns the three largest values. Is your approach the optimal approach?
A programmer and a software engineer will give you two different answers to this question. Let's look at each hypothetical answer, then discuss a bit which is better, and why.

The Programmer Answer

The approach begins by checking whether the array has at least 3 values,  if it doesn't, return a copy of the array, which has been sorted. Otherwise, grab the first three items from the array, sort them into descending order, and then iterate through the rest of the input, looking for values which are larger than the three items you've already obtained. In code (JavaScript today) it looks something like this:

function findTopThree(arr) {
    // Handle trivial case
    if (arr.length <= 3)
        return arr.slice(0,3).sort();
    let result = [arr[0], arr[1], arr[2]];
    result.sort((a, b) => b - a);
    for (let i = 3; i < arr.length; i++)
        if (arr[i] > result[2])
            if (arr[i] > result[1])
                if (arr[i] > result[0])
                    result.splice(1, 0, arr[i]);
                result[2] = arr[i];
    return result;
For the worst case, an array sorted into ascending order, this solution runs in 3n comparisons. (Plus a constant for sorting the first three items.)  There are several other ways to do this, like Math.max, but this method will tend to run less comparisons for randomly ordered data than repeatedly finding the maximum. In the best case, an array in descending order, it runs n comparisons.

But is this the optimal solution? The short answer is maybe, but maybe not. To understand why, let's look at the software engineer's answer.

The Software Engineer's Answer

The 'best' solution, is not necessarily the answer given above. To determine how to implement this algorithm we need to consider some additional details that the question doesn't consider.

1. Will we be doing this a lot? Do we occasionally want the top 4? If we sort this data, and keep it sorted, finding the top n values in the array is a constant-time algorithm. If we are going to be doing this a lot, it may make sense to sort the data.
2. Later on in the code, do we need the median value? If so, we'll have to sort the data anyway to find the midpoint, so we can combine the two operations, sort the array once, and find all the values we care about at the same time.
3. What is the source of the values in this array? If it's a list of anonymized test scores, we can sort the values without a problem. If the data is an image, we can't sort it without destroying the data's spacial correlation. We can't decide the best algorithm without knowing what's in the array. 
4. Is the data even something that can be sorted? Let's say our array is a set of process objects representing a group of things running on our server, and the top three values we're interested in are the memory use of the three processes that are using the most memory. In this case, it doesn't make sense to try to sort the data and keep it sorted, since the values can change at any time. We'd have to re-sort the array any time we wanted to deal with it.
5. Is the array small? If it's small then the trivial solution that makes a copy, sorts it, and returns the top three items may be best. It's certainly the easiest logic to understand and maintain in the future.
6. Does this code run in the main time-critical code path, or is it diagnostic code that runs after our process has suffered a fatal exception? If it's time-critical we need to worry about making it fast. If it's not, we can (and should) write it to be easy to understand and maintain, which may come at the expense of its execution speed. If the process has already crashed, it probably doesn't matter if we take an extra 500 ms to process some information.

These are just the first 6 things I could come up with when thinking about the question. There are certainly more things that would change how the algorithm is implemented.

The Takeaway

The lesson from all of this is:
Before deciding if a solution is optimal, make sure you understand the problem and the environment that your code will be running in. An O(n) algorithm is not inherently better than an O(n2) algorithm.
As a bonus, when you better understand how your code fits into the project, you can better reason about possible edge-cases, but that's a topic for another day.

Monday, March 12, 2018

Monday Inspiration: Jack Delano

When studying other art forms, like painting, you may be asked to do a study, or a copy of a painting in order to learn about the techniques that were used in its construction. We're going to do something similar today, and look at some work done by a master, which will, hopefully, give you some inspiration for your future work.

Today's master is Jack Delano. Jack moved to the US from what is now Ukraine with his parents in 1923 when he was 9 years old. He studied art and music at the Curtis Institute and the Pennsylvania Academy of the Fine Arts (PAFA). Eventually he came to work for the Farm Security Administration's Photography program, which in 1943 was eliminated, and transferred to the Office of War Information.

Library of Congress Call Number: LC-USF35-86 [P&P]

This photo was taken sometime around 1940, along the Skyline Drive in Virginia. Some of the composition techniques in use here include:

  • The rule of thirds: the placement of the sign in the foreground, and the road, map out the right and left thirds of the image. 
  • Perspective: this image demonstrates the single-point perspective technique, where all of the lines moving away from the camera appear to converge on a single point.
  • Fill the frame: By framing the shot so that the mountains fill nearly the entire sky, they are made to appear larger. This effect could cause the mountains to overpower the foreground, but the perspective of the road helps to push them off to the background.
  • Depth of field: The entire image, from the foreground to the background, is in focus. (This is a digitized color slide, so there are some resolution issues with the digital representation.)

Library of Congress Call Number: LC-USF35-7 [P&P]
This next photo, taken at the end of 1940 or the beginning of 1941, is some detail of an industrial building in Massachusetts. Here we can see:
  • Lines: the pipes on the side of the building effectively lead the eye around the photograph, and add interest.
  • The rule of threes: not to be confused with the rule of thirds, the three stacks on the top of the building add interest to the top right of the image.
  • Pattern: the bricks give the photograph texture.
  • Centering: the division between the two parts of the building is centered within the image. By centering the split, it helps to balance the asymmetry of the building and piping. 

Library of Congress Call Number: LC-UWS36-561 [P&P]
Our final photo was taken in 1941 inside the roundhouse at the Chicago and Northwestern Railroad's yard in Chicago, Illinois. This photo is a complex shot, and is probably my favorite of the three we've looked at today. Techniques include:

  • Light and shadow: the light from the left of the photo is ethereal and soft, which balances the dark mechanical aspect of the right side.
  • Color: this photo is clearly in color, but the framing and the lighting make it monochromatic. The only color in the photo is the fire at the bottom right and the light bulb at the top. The single color helps to highlight those details.
  • Curves: the way that the subject of the image curves through the frame, starting to the right, curving into the center, and finally curving back to the right, adds visual interest. Think about how the image would be different if, like the first image we looked at, all of the lines leading away from the camera went straight to a vanishing point in the distance.
I hope this quick look at three photos from an expert is helpful, and inspires your photography as much as it inspires me. I hope to do several more of these focusing on other photographers who worked for the FSA/OWI. If you like this sort of thing, please let me know.

The famous photo Migrant Mother was taken by Dorothea Lange as part of the FSA program.

All of the photos used in this blog entry are part of the Library of Congress' collection of photographs from the Farm Security Administration and Office of War Information, and were taken as part of the photographer's official duties. Since they are a work of the federal government, they are in the public domain in the United States. That said, if you use one of the images from the collection, be sure to credit the photographer, even if you aren't technically required to.

Wednesday, February 28, 2018

Phebruary Day 28: Bright

Day 28: Bright

A bit more experimental today. Edited to make it more ethereal and to soften the edges.

And with that Phebruary is over. That was a buttload of work, so I'm going to take the rest of this week off from blogging, and may take next week too.

Tuesday, February 27, 2018

Phebruary Day 27: Below

Day 27: Below

Metro North Waterbury Line train heading north through Seymour, CT.

Monday, February 26, 2018

Saturday, February 24, 2018

Programmer vs Software Engineer: The Interview

A common question presented in interviews for developer positions goes something like this: Given an array of numbers, write a function th...