Useful Python Koans for Interviews
This post has been updated thanks to a reader who prefers to remain anonymous.
I’ve been back on the interview circuit which means solving contrived programming problems that have nothing to do with the CSS fixes to React components that will be required of me in my day job. Here are some useful tricks I’ve picked up along the way.
Consider the following problem:
You are given a list of points on a Cartesian plane and some number K. Return the K closest points to the origin.
This problem can be solved manually through a combination of iteration and temp variables without too much trouble. Or, you can leverage the power of Python! Specifically, using a custom sort method.
The sorted
function has changed in Python3. It now uses a key
function which computes some numerical value to use for the comparison for each item in a list. This generalization can help us greatly. Let’s take a look at easily you can solve this problem using a custom sort:
'''
You are given a list of points on a Cartesian plane and some number K.
Return the K closest points to the origin. Point is a list in [x,y] format.
Input: points = [[1,3],[-2,2]], K = 1
Output: [[-2,2]]
'''
def k_closest_points(points, K):
def euclidian_distance(item):
x = item[0]
y = item[1]
return math.sqrt(x**2 + y**2)
return sorted(points, key=euclidian_distance)[:K]
Let’s look consider another simplified problem:
You are on the security team and you want to better track DDOS attacks. Write a function that ingests a log file (a collection of IP addresses, one per line) and a threshold N. Return all IPs that have a number of accesses above the threshold.
Anytime you need to accumulate data for later lookup, you should immediately think of a Hashmap given its O(1) lookup time. However, many languages implement this so that if you try to lookup a key that doesn’t exist, an exception is thrown. This means you end up writing some boilerplate like:
access_counts = {}
for ip_addr in log_file:
if ip_addr in access_counts.keys():
access_counts[ip_addr] += 1
else:
access_counts[ip_addr] = 1
Not only is this pretty verbose, but it also complicates your logic by increasing the number of cases you have to keep in your mental model of the code. Fortunately, Python can help us simplify this greatly with its built-in DefaultDict
. DefaultDict
is just hashmap that returns a default value (that you provide on initialization) if they key doesn’t exist. The previous logic using DefaultDict
now looks like this:
access_counts = collections.defaultdict(int)
for ip_addr in log_file:
access_counts[ip_addr]+=1
EDIT: As it turns out, there is an even simpler way to handle the above problem using Python built-ins. Observe collection.Counter
:
# Note: readlines() not suitable for large files
access_counts = collection.Counter(logfile.readlines())
access_counts.most_common(20)
Much simpler to reason about! DefaultDict
also very useful for cases where you want to place things in buckets since you can have a default value of an empty list.