Parallelizing single-threaded batch jobs using Python’s multiprocessing library.

Suppose you have to run some program with 100 different sets of parameters. You might automate this job using a bash script like this:

ARGS=("-foo 123" "-bar 456" "-baz 789")
for a in "${ARGS[@]}"; do
  my-program $a

The problem with this type of construction in bash is that only one process will run at a time. If your program isn’t already parallel, you can speed up execution by running multiple jobs at a time. This isn’t easy in bash, but fortunately Python’s multiprocessing library makes it quite simple.

Continue reading Parallelizing single-threaded batch jobs using Python’s multiprocessing library.

The fruits of some recent Arduino mischief.

I recently consulted on a project involving embedded devices. Like most early-stage embedded endeavors, it currently consists of an Arduino and a bunch of off-the-shelf peripherals. During the project, I developed two small libraries (unrelated to the main focus of the project) which I’m open-sourcing today.

Continue reading The fruits of some recent Arduino mischief.

A simple recommender system in Python.

Inspired by this post I found about clustering analysis over a dataset of Scotch tasting notes, I decided to try my hand at writing a recommender that works with the same dataset. The dataset conveniently rates each whisky on a scale from 0 to 4 in each of 12 flavor categories.

Continue reading A simple recommender system in Python.

Optimizing MySQL and Apache for a low-memory VPS.

Diagnosing the problem.

My last post had a plug about the migration of our WordPress instance to a new server. However, it didn’t go completely smoothly. The site had gone down a few times in the first day after the migration, with WordPress throwing “Error establishing a database connection.” Sure enough, MySQL had gone down. A simple restart of MySQL would bring the site back up, but what caused the crash in the first place?

Continue reading Optimizing MySQL and Apache for a low-memory VPS.

This blog is illegal!

At Zeall, we offer our employees the courtesy of free hosting for their personal blogs, in hopes of furthering their professional image. Today, we completed the migration of the employee Wordpress instance from a shared hosting provider to its own VPS, and simultaneously deployed TLS certificates (thanks, Let’s Encrypt!) for all domains hosted there (including this one).

Continue reading This blog is illegal!

Information-centric networking for laymen.

The design of the current Internet is based on the concept of connections between “hosts”, or individual computers. For example, when you visit a website, your computer (a host) always connects to a particular server (another host) and retrieves content through a session-oriented pipe. However, the amount of content hosted on the Internet and the number of connected devices are both growing. This is a crisis scenario for the current Internet architecture — it won’t scale.

Several proposals for Next-Generation Network (NGN) architectures have been proposed in recent years, aimed at better handling immense amounts of traffic and orders of magnitude more pairwise connections. Information-Centric Networking (ICN) is one NGN paradigm which eschews the concept of connections entirely, removing the host as the basic “unit” of the network and replacing it with content objects.

In other words, the defining feature of an ICN is that instead of asking the network to connect you to a particular server (where you may hope to find a content you desire), you instead ask the network for the content itself.

Continue reading Information-centric networking for laymen.

Why are tuples greater than lists?

I pose this question in quite a literal sense. Why does Python 2.7 have this behavior?

>>> (1,) > [2]

No matter what the tuple, and no matter what the list, the tuple will always be considered greater. On the other hand, Python 3 gives us an error, which actually makes a bit more sense:

>>> (1,) > [2]
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
TypeError: unorderable types: tuple() > list()

The following post is a journey into some CPython internals, with a goal of finding out why 2.7 gives us such a weird comparison result.

Continue reading Why are tuples greater than lists?

The single most hideous line of code I’ve ever seen

Have you ever used a ternary expression inside a condition? I hope not. It seems that whoever wrote the Java drivers for MongoDB didn’t have this much sense.

The offending line of code can be found here:

It basically goes like this:

try {
    // a bunch of stuff
} catch (Exception e) {
    if (!((_ok) ? true : (Math.random() > 0.1))) {
        // silently ignore error
    else {
        // log error

The intent appears to be to log just 10% of errors that result in an “okay” status, while logging all “not okay” errors. However, this condition is utterly unreadable, and I believe this awful implementation actually yields the opposite result.

Continue reading The single most hideous line of code I’ve ever seen

Quick postfix & dovecot config with virtual hosts (Ubuntu 16.04)

This morning, I received an email from my VPS host notifying me that they will no longer accept PayPal. Instead, my only payment option would be Bitcoin. Not willing to go through this trouble, I decided to migrate from this host (which I had been using for my personal servers for about five years now) to DigitalOcean (which fortunately accepts normal forms of payment).

Part of my server migration was to move email for two of my domains: and Setting up a new mailserver is a notoriously arduous task, so I’m documenting the process in this post — mostly for my future reference, but also to benefit anyone who might stumble upon my blog in their own confusion.

Since I’m serving mail for two domains, I will be using a simple “virtual hosts” configuration. I’ll talk about the process in four parts: local setup, postfix, dovecot, and DNS configuration.

Continue reading Quick postfix & dovecot config with virtual hosts (Ubuntu 16.04)

An easy way to visualize git activity

Today, I wrote gitply — a fairly simple Python script for visualizing the weekly activity of each contributor to a git repository.

It started out as a run-once script to get some statistics for one of my projects, but I ended up improving it incrementally until it turned into something friendly enough for other people to use.

Continue reading An easy way to visualize git activity