The single most hideous line of code I’ve ever seen

Have you ever used a ternary expression inside a condition? I hope not. It seems that whoever wrote the Java drivers for MongoDB didn’t have this much sense.

The offending line of code can be found here: ConnectionStatus.java

It basically goes like this:

try {
    // a bunch of stuff
} catch (Exception e) {
    if (!((_ok) ? true : (Math.random() > 0.1))) {
        // silently ignore error
    }
    else {
        // log error
    }
}

The intent appears to be to log just 10% of errors that result in an “okay” status, while logging all “not okay” errors. However, this condition is utterly unreadable, and I believe this awful implementation actually yields the opposite result.

I’m not even going to talk about how revolting this line of code is. Instead, I will dedicate the rest of my post to trying to figure out its logic. Let’s start by breaking down that ternary statement into something that makes a little more sense.

bool condition;

if(_ok) {
    condition = true;
}
else {
    condition = Math.random() > 0.1;
}

if(!condition) {
    // silently ignore error
}
else {
    // log error
}

Now we can see that “condition” is true if either the “ok” flag is true, or we drew a random number greater than 0.1 (90% chance). Therefore, we can write this:

if(!(_ok || Math.random() > 0.1)) {
    // silently ignore error
}
else {
    // log error
}

Now, applying the negation (using DeMorgan’s law):

if(!_ok && Math.random() <= 0.1) {
    // silently ignore error
}
else {
    // log error
}

So what is the result? We silently ignore 10% of “not-okay” error conditions. We log 90% of “not-okay” errors, and all of the “okay” errors. Is there a chance this was the intended behavior? Sure. Does it make sense? Hell no.

The moral of this story: code review is important.

Quick postfix & dovecot config with virtual hosts (Ubuntu 16.04)

This morning, I received an email from my VPS host notifying me that they will no longer accept PayPal. Instead, my only payment option would be Bitcoin. Not willing to go through this trouble, I decided to migrate from this host (which I had been using for my personal servers for about five years now) to DigitalOcean (which fortunately accepts normal forms of payment).

Part of my server migration was to move email for two of my domains: le1.ca and lo.calho.st. Setting up a new mailserver is a notoriously arduous task, so I’m documenting the process in this post — mostly for my future reference, but also to benefit anyone who might stumble upon my blog in their own confusion.

Since I’m serving mail for two domains, I will be using a simple “virtual hosts” configuration. I’ll talk about the process in four parts: local setup, postfix, dovecot, and DNS configuration.

Local Setup

We must prepare a few things before we’re ready to configure the actual mail services. The first thing I’d like to do is install a SSL certificate, so that we can access the mail server securely. This is easy with Let’s Encrypt:

$ sudo apt-get install letsencrypt # (if you don't have it installed already)
$ letsencrypt certonly --standalone -d mail.le1.ca -d mail.lo.calho.st

I’m using the standalone option here, because I do not have an HTTP daemon installed on this server. If you’re also serving HTTP from your mail machine, you will need to follow the appropriate instructions for your configuration.

Now that we have an SSL cert, we can set up the directory structure for our virtual mailboxes. I’m going to make the root for this environment /var/spool/mail, and in that directory, I’m going to create one folder for each domain I’m serving:

$ mkdir -p /var/spool/mail/le1.ca
$ mkdir -p /var/spool/mail/lo.calho.st

The next step is to configure passwords for each of your users. I’m doing this the easy way, since I’m the only person using the mail server. If you need users to be able to change their own passwords, this is going to be more difficult; you’ll need to find an alternate authentication configuration.

You will need to calculate a password hash for each user you want to serve:

$ doveadm pw -u foo@le1.ca

Each time, it should spit out a hash that looks like this:

{CRAM-MD5}a7cb902940b3f6662c48ace840a4e3e410241e875d720cb45b2d95a3e1ddfc8b

For each of your domains, create a ‘shadow’ file in the corresponding /var/spool/mail/* directory. For example, I will create /var/spool/mail/le1.ca/shadow. Add a username:password line for each virtual user you want to create.

foo@le1.ca:{CRAM-MD5}a7cb902940b3f6662c48ace840a4e3e410241e875d720cb45b2d95a3e1ddfc8

Next, let’s create a “virtual mail” user and configure him as the owner of these mail directories.

$ sudo adduser vmail
$ sudo chown -R vmail:vmail /var/spool/mail/

Find out the UID and GID of this new user and remember it for later. An easy way to do this is grep vmail /etc/passwd. The UID and GID are the third and fourth columns, respectively.

That’s it for initial setup. Now let’s deal with postfix.

Configuring Postfix

Postfix is responsible for delivering received mail to the correct locations on our server, as well as sending mail on our behalf.

There’s some initial configuration we can do interactively before we start poking around in config files. If you haven’t installed postfix yet, this configuration will start automatically when you do apt-get install postfix. If you do already have it installed, then you can run dpkg-reconfigure postfix to get there.

Here are the responses you should enter to this wizard:

  • Type of mail configuration: Internet Site
  • System mail name: Use your “primary” mail domain here. I used “le1.ca”
  • Root and postmaster recipient: Enter your primary email address here. It will be used as the destination for administrative mail.
  • Destinations to accept mail for: Enter any aliases of your machine here, but don’t enter the domains you’re going to use in your virtual mailboxes. A good list is something like “my-hostname, localhost.localdomain, localhost”
  • Everything else: Leave it at the default value

The main configuration file for postfix is /etc/postfix/main.cf. Let’s break down the config into three sections: SSL, virtual mailboxes, and SASL. I’m only including lines that need to be modified here. Everything else should stay at its default value.

First, SSL-related configs:

smtpd_tls_cert_file=/etc/letsencrypt/live/your_mailserver_name/fullchain.pem
smtpd_tls_key_file=/etc/letsencrypt/live/your_mailserver_name/privkey.pem
smtpd_use_tls=yes

Virtual mailbox configs:

virtual_mailbox_domains = /etc/postfix/virtual_domains
virtual_mailbox_maps = hash:/etc/postfix/virtual_mailbox
virtual_mailbox_base = /var/spool/mail
virtual_uid_maps = static:1001 # Replace with the UID of your vmail user
virtual_gid_maps = static:1001 # Replace with the GID of your vmail user

SASL configs (to pass authentication duties to Dovecot):

smtpd_sasl_auth_enable = yes
smtpd_sasl_type = dovecot
smtpd_sasl_path = private/auth
smtpd_sasl_security_options = noanonymous
smtpd_sasl_local_domain =
smtpd_relay_restrictions = permit_mynetworks permit_sasl_authenticated defer_unauth_destination
smtpd_recipient_restrictions = permit_sasl_authenticated, permit_mynetworks, reject_unauth_destination

Hopefully you noticed that the virtual mailbox configs refer to some new files. We’re going to have to create /etc/postfix/virtual_domains and /etc/postfix/virtual_mailbox in order to create the mappings for our domains.

The virtual_domains file is easy, just a list of the domains you’re serving. In my case:

le1.ca
lo.calho.st

The virtual_mailbox file is also pretty easy. It just maps email addresses to mailbox folders, relative to the virtual_mailbox_base directory. For example, if I want both foo@le1.ca and bar@le1.ca to go to the mailbox le1.ca/foobar, and baz@lo.calho.st to go to lo.calho.st/baz:

foo@le1.ca le1.ca/foobar/
 bar@le1.ca le1.ca/foobar/
 baz@lo.calho.st lo.calho.st/baz/

You can create as many aliases as you want here. Whatever is on the right will end up determining your login username (in this case, foobar@le1.ca and baz@lo.calho.st are my usernames), and the addresses on the left create aliases.

Postfix now requires us to hash this map file. Run the following command from within /etc/postfix:

$ postmap virtual_mailbox

This should create a file called virtual_mailbox.db.

The last part of the postfix config is to enable the “user-facing” SMTP port, 587. Just paste this block in master.cf somewhere:

submission inet n - y - - smtpd
-o smtpd_tls_security_level=encrypt
-o smtpd_sasl_auth_enable=yes
-o smtpd_sasl_type=dovecot
-o smtpd_sasl_path=private/auth
-o smtpd_sasl_security_options=noanonymous
-o smtpd_sasl_local_domain=$myhostname
-o smtpd_client_restrictions=permit_sasl_authenticated,reject
-o smtpd_sender_login_maps=hash:/etc/postfix/virtual_mailbox
-o smtpd_recipient_restrictions=reject_non_fqdn_recipient,reject_unknown_recipient_domain,permit_sasl_authenticated,reject

The next step is to configure dovecot.

Configuring Dovecot

Dovecot is much easier to configure than postfix. Create /etc/dovecot/dovecot.conf with the following content (delete it if it already exists):

protocols = imap
log_path = /var/log/dovecot.log

ssl = required
ssl_cert = </etc/letsencrypt/live/albatross.le1.ca/fullchain.pem
ssl_key = </etc/letsencrypt/live/albatross.le1.ca/privkey.pem
ssl_dh_parameters_length=2048
ssl_cipher_list = ALL:!LOW:!SSLv2:!EXP:!aNULL

disable_plaintext_auth = no
mail_location = maildir:/var/spool/mail/%d/%n

auth_verbose = yes
auth_mechanisms = plain login

passdb {
 driver = passwd-file
 args = /var/spool/mail/%d/shadow
}

userdb {
 driver = static
 args = uid=vmail gid=vmail home=/var/spool/mail/%d/%n
}

protocol lda {
 postmaster_address = root@le1.ca
}

service auth {
 unix_listener /var/spool/postfix/private/auth {
 mode = 0660
 user = postfix
 group = postfix
 }
}

Now you should be ready to restart postfix and dovecot:

$ sudo service dovecot restart
$ sudo service postfix restart

If everything went smoothly, we’re ready to configure DNS records.

DNS Records

For each of your domains, you will need three DNS records:

  • Either a CNAME or an A record pointing to the mailserver
  • An MX record designating the mailserver as the destination for the domain’s mail
  • A SPF record designating the server as a permitted origin for mail

The A record is easy. Just create an A record for “mail.yourdomain.com” pointing to the IP address of the server.

The MX record might be even easier: set the mailserver for “yourdomain.com” to “mail.yourdomain.com”

The SPF record is also very straightforward. Many people overlook its importance, but without it, mail from your domain is likely to go into recipients’ spam folders. There exist wizards to create custom SPF records, but I find the one below to be sufficient for most purposes. Just insert this as a TXT record for “yourdomain.com”:

"v=spf1 mx a ptr ~all"

And that’s it. Allow a few minutes for your DNS records to propagate, then test out your new mailserver.

An easy way to visualize git activity

Today, I wrote gitply — a fairly simple Python script for visualizing the weekly activity of each contributor to a git repository.

It started out as a run-once script to get some statistics for one of my projects, but I ended up improving it incrementally until it turned into something friendly enough for other people to use.


What it does:

In short, gitply will show you how much code (and how many commits) a repository’s contributors have written each week. Its output looks something like this:

example

You’ll get a graph like this for each contributor. What all do we see here?

  • Lines inserted (green bars, left axis, log scale)
  • Lines deleted (red bars, left axis, log scale)
  • Commits made (blue line, right axis, linear scale)

You also get a textual representation of the same data:

History for anon2
  2016, week  6:  2 commits, +366  -26  
  2016, week  7:  4 commits, +325  -5   
  2016, week  8:  2 commits, +224  -3   
  2016, week  9: 21 commits, +2617 -219 
  -- Gap of 2 weeks
  2016, week 12: 10 commits, +4066 -614 
  2016, week 13:  6 commits, +2480 -432 
  2016, week 14:  2 commits, +654  -490 
  -- Gap of 1 week
  2016, week 16:  2 commits, +661  -229 
  -- Gap of 2 weeks
  2016, week 19:  3 commits, +1508 -47  
  -- Gap of 2 weeks
  2016, week 22:  1 commits, +1490 -29  

History for anon1
  2016, week 12:  1 commits, +143  -89  
  2016, week 13:  2 commits, +58   -16  
  2016, week 14:  3 commits, +137  -17  
  2016, week 15:  2 commits, +403  -106 
  -- Gap of 2 weeks
  2016, week 18:  5 commits, +134  -38  
  -- Gap of 3 weeks
  2016, week 22:  1 commits, +13   -1   
  2016, week 23:  1 commits, +485  -1   

If you just want to download it and use it already, check out the project on GitHub. For those who are interested, some implementation details follow.


How does it work?

The first thing gitply does is invoke the standard git client to retrieve a log:

git log --numstat --date=short --no-notes

Then, it analyzes the log line-by-line in real time (thanks, Python iterators) in order to get the data we need. Essentially, we create an iterator that yields tuples of (email, date, stats) for each commit, where stats is the pair (insertions, deletions).

Once we create this iterator over the commit history, we can attribute each commit to a user and ultimately fill in a dictionary that will help us print the analysis. We sum the insertions, deletions, and commits for each ISO week, then iterate through the weeks and print out the final results.

Now, if for some reason you need an iterator over the commits in a repository, you can import my iterate_commits function from this script and use it yourself.


What’s next?

Right now, gitply supports aggregating statistics from several repositories into one report. One planned feature is to use stacked bars and lines to show the amount of activity per-project, per-user, per-week. Until then, the data from multiple repositories will all be displayed together.

Adventures in image glitching

Databending is a type of glitch art wherein image files are intentionally corrupted in order to produce an aesthetic effect. Traditionally, these effects are produced by manually manipulating the compressed data in an image file. As a result, this is a trial-and-error process; often, edits will result in the file being completely corrupted and unopenable.

Someone recently asked me whether I knew why databending different types of image files produces different effects — and particularly, why PNG glitches are the most interesting. I didn’t know the answer, but the question inspired me to do a little research (mostly reading the Wikipedia articles about the compression techniques used in different image formats). I discovered that most compression techniques are not all that different. Most of them just employ some kind of run-length encoding or dictionary encoding, and then a prefix-free coding step. The subtle differences between the compression algorithms could not explain the wildly different effects we observed (except for in JPEGs, perhaps, since the compression is done in the frequency domain). However, PNG used a pre-filtering step which made it stand out.

PNG’s pre-filtering is basically a heuristic to try to make the image data more compressible. It tries to translate each pixel of the image into some delta based on pixels in the previous row or column. Since the core compression algorithms have no idea about the two-dimensional nature of an image, this two-dimensional filtering step actually makes a pretty big impact.

There are five main line filters defined by the PNG format. Most PNG encoders will try to optimize their choices of filters for each line in the image, in order to reduce the compressed size of the image. However, each filter also yields its own unique glitches.

Thus, an idea was spawned , and I wrote a Python script that glitches images not by manipulating their raw representations, but by applying one of the PNG filters and modifying the pixels in that representation.


PNG Filter Glitches

Since this article is about glitching — not image compression — we’re going to focus on the different visual effects we get from each filter. Usually, PNG codecs alternate between filters for each line, however for the sake of demonstration I configured my script to use one filter throughout the whole image.

All of the following glitches are made from a scale-down version of this wallpaper from Wallhaven.

Two of the simplest PNG filters are the Up filter and the Sub filter. While Up creates a delta from the adjacent pixel in the previous row, Sub creates a delta from the previous pixel in the same row. The results of glitching on the filters are similar — Up gives us vertical lines, and Sub gives us horizontal lines.

Glitch result from the Up filter
Up
Sub
Sub

A more interesting one is the Average filter. This filter looks at the both the above-pixel and the left-pixel and takes their average, then computes a delta using that value. Glitching on the Average filter results in smears of altered color that slowly defuse away across the image.

Average
Average

My favorite is actually the Paeth filter, which heuristically chooses either the above-pixel, left-pixel, or above-and-to-the-left pixel as the basis for the delta, depending on a function of all three. Glitches applied to this filter give us rectangular disturbances which slowly fade out, however stack together to create harsher effects.

Paeth
Paeth

I mentioned that there are five filters, but so far I’ve only mentioned four. The reason for this is simple — the fifth filter is the Null filter, which has no effect whatsoever.


Mixing Filters

To simulate the mixed effects one can observe in a typical PNG glitch, I implemented a glitching procedure which uses a random filter on each line. The results are pretty intriguing:

Random line filters
Random line filters

In this example, I’ve only allowed a 5% chance for the filter to change from one line to the next; without this provision, the results are quite boring.

To make the glitches even cooler, I started stacking filters on top of each other. Here’s an example with an extra pass through the Average filter applied after the randomized filtering stage:

Randomized + Average
Randomized + Average

Glitch Glitches

Overall, my favorite filters were the ones that had bugs when I wrote them. My original implementation of the Paeth filter had an error due to the use of unsigned integers in the predictor (when really I needed signed ones). However, this version resulted in some cool effects that acted like a combination of the normal Paeth and Average behaviors:

Buggy Paeth filter
Buggy Paeth filter

I also wrote a buggy Average filter that resulted in some bold vertical lines and some subtler horizontal artifacts:

Buggy Average filter

Try it Yourself

I’ve posted my glitch scripts on GitHub, so you can try them out too. The original version was pure Python and incredibly slow, but I’ve now converted the core functionality into a C module and it is several orders of magnitude faster. This is also my first foray into the intricacies of the Python C API, but that’s a topic for another post…

To conclude, have two more glitches achieved by mixing various filters:

test (2)

test (1)

Blue Coat’s position as a Certificate Authority, and what it means for you.

Recently, Blue Coat Systems has been approved as an intermediate certificate authority. If you aren’t versed in network security, this means nothing to you. However, be assured that it is a big deal.

Blue Coat is primarily known as  a vendor of application-layer (deep packet inspection) firewalls. In other words, they help people sniff your data — primarily in order to censor the Internet. Maybe your company’s firewall blocks access to YouTube and Facebook while you’re at work. That’s no big deal — Blue Coat delivers something a bit more sinister.

Countries such as Burma, Syria, Bahrain, China, Egypt, Kuwait, Russia, and Saudi Arabia have utilized Blue Coat’s products (or similar offerings) to enable Internet censorship and tracking. This is not a good reputation to carry. However, Blue Coat’s impact has been limited to unencrypted traffic thus far. Now that they are a trusted CA, they’re capable of much more.

In this article, I will give an overview of how our Internet communications are normally secured, then explain why it is a problem that Blue Coat how has this authority.


What is a Certificate Authority?

To encrypt data on the Internet, we use a protocol called Transport Layer Security, or TLS (you might be more familiar an older but similar protocol called SSL, or Secure Sockets Layer). Basically, if a URL starts with https://, then you’re accessing it through TLS; if your browser shows you a green lock icon (or something similar), you can typically be assured that your connection is secure.

A browser will trust a website and display this lock icon if it presents a valid certificate signed by a trusted certificate authority (CA). A certificate is a pretty simple thing — it is essentially just a public key for the website, accompanied by the signature of a CA. Your operating system (or web browser, depending on which one you use) trusts a handful of CAs by default. These are called root CAs. However, root CAs can often delegate the authority to sign certificates to other entities called intermediate CAs. Intermediate CAs don’t have to be explicitly trusted by your computer; the fact that a root CA has vouched for them makes them trusted by extension. Because another CA has vouched for them, Blue Coat is now capable of issuing certificates.

How can a CA be harmful?

To understand Blue Coat’s new role might be harmful, it is important to know a few more details about certificates and how they’re used.

A certificate is always restricted to some domain name (effectively representing a single website). Normally, a CA only gives a website a certificate if it can prove that it is legitimately operating under the name that they have requested; consequently, the certificate is tied to that name and won’t be valid for any other website.

Normally, a rogue website can’t impersonate a legitimate website because it will be unable to obtain a certificate for the website’s domain name. If the rogue site does somehow obtain a certificate, there is essentially no way for a normal user to know they aren’t visiting the legitimate site. We rely on CAs to check that any entity requesting a certificate can prove that it is authorized to represent the domain that it wants a certificate for; this is the entire basis of our security and privacy on the Internet.

Okay, so what?

Since Blue Coat produces hardware and software that enables Internet censorship, we can only imagine how badly they’re itching to be able to sniff our TLS-encrypted Internet traffic to be able to filter it. In order to sniff traffic arbitrarily, they would need to essentially impersonate whatever website you want to reach. Impersonating a website in this way is called a man-in-the-middle (MiTM) attack.

In general, a man-in-the-middle attack works like this:

  • You try to connect to example.com.
  • I intercept your connection so that you connect to my server instead.
  • I act as a middleman between you and example.com, and I am able to filter and manipulate your conversation however I want.

TLS normally prevents this. In the case that TLS is being used, a normal connection would work like this:

  • You try to connect to example.com.
  • Example.com sends you a certificate, signed by some CA.
  • You check to see if the certificate is valid, and decide that it is okay; you then proceed to communicate with example.com.

A MiTM against a TLS-secured website would typically go like this:

  • You try to connect to example.com.
  • I intercept the connection so that you connect to my server instead.
  • I send you a bogus certificate.
  • You look at my certificate and decide it is fake, and disconnect.

This is great; this is what should happen if someone manages to intercept your connection to a website. However, if a company like Blue Coat is a CA, they can actually create a fake certificate in order to pull off the MiTM successfully:

  • You try to connect to example.com.
  • Blue Coat intercepts the connection so that you connect to their server instead.
  • They send you a certificate that looks legitimate, since they have signed it and you trust them as an intermediate CA.
  • You accept the certificate, then proceed to communicate with them instead of the website you intended.
  • Blue Coat can monitor and filter your conversation with example.com however they see fit.

It should be clear why this is a huge problem for security and privacy on the Internet. Essentially, a de-facto lord-of-the-web (a CA) has given Blue Coat free reign to impersonate whatever website they want to. As a result, you can be easily fooled into thinking your connection is secure when really it can be monitored or filtered by a government, corporation, or some other potentially-malicious entity.

What can I do about it?

If you run a website where security and privacy are a concern, you should be employing a mechanism called Key Pinning. This will essentially allow a user’s browser to detect if your site’s purported certificate isn’t signed by the entity you’d expect. It is not foolproof, but will work most of the time unless the user has been subjected to the MiTM ever since their first visit to your site.

If you’re just an end-user, you might want to explicitly distrust Blue Coat. This will essentially tell your computer/browser that despite its certificate being signed by an otherwise-trustworthy CA, you should not trust the certificates that it signs. Apparently, it is pretty easy on OS X but is a little more difficult on Linux because it has a more varied trust ecosystem. I have not yet been able to obtain any information about untrusting the certificate on Windows.

Unnecessary mathematics: queuing delay for two types of network traffic

This morning I asked myself what I thought was an interesting question:

  • If there are two types of traffic in a network, where one is much less common than the other, will the two types experience different queuing delays?

If you have more common sense than me, you’ve already figured out that the answer is “No.” But it took me about an hour to figure this one out.

I designed a simple model of the scenario: there is a queue of N packets, and one of them is of a different type than the rest. Let’s call the two types A and B.

We can assume that the position in the queue reflects the queuing delay of the packet, so then we only need to be interested in the average position in the queue for each type of traffic: E[P(A)] , and E[P(B)].

We assume that there are N-1 packets of type A, and one packet of type B. We can place the type-B packet at any of the N positions in the queue with equal probabilities:

  • E[P(A)] = \frac{1}{N} \sum_{i=0}^{N-1} \left( \frac{1}{N-1} \left( \left( \sum_{j=0}^{N-1} j\right) - i \right) \right)
  • E[P(B)] = \frac{1}{N} \sum_{i=0}^{N-1} \left( i \right)

The term summed in the formula for E[P(A)] represents the average position of all N-1 type-A packets. In E[P(B)], there is only one packet so the term doesn’t need another sum inside of it.

Well, it turns out that E[P(A)] and E[P(B)] both simplify to \frac{N-1}{2}.

This should be immediately obvious in the formula for E[P(B)], and it takes just a little more work to derive for E[P(A)]. Yep, it couldn’t have been simpler.

If you think about it, this result makes a lot of sense and I shouldn’t have needed to compute it — if I’m at a random position in line, then so is everyone else. So it doesn’t matter whether I’m averaging over a group of N-1 people, or only worrying about myself. We’ll all wait the same amount of time (on average, of course).

What’s inside a PEM file?

As you probably know, PEM is a base64-encoded format with human-readable headers, so you can kind of figure out what you’re looking at if you open it in a text editor.

For example, let’s look at an RSA public key:

-----BEGIN PUBLIC KEY-----
MIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEA4YFgwNrEkMdynjtDsM0q
b+Hedk8p4pySfxakYTfSQPEyGxxnQGcVMV2ZEjPR4nZeqJrtNTlixhK2YWqunE6I
KopVDq3WvtPKweNEeZ8B2lA2I8FFrpZSjI/Tosq8/MbTd/Y/C4Q8Qcz78MF/NH17
/E82K3ca9/LM2b4KGTEIhsLUff7OGrJM7lPcQZN3EOdUeQnzT9uTh8Z9oFqChfJP
pLwwSebfrRB7VMXjeKHZmubSO5pULHLdZLbkgLSmnhbgBjO6apG0tkYyOeWd6L8F
MzA21WkXJdANrr1s/yv5zS9hx1q9jSM8Me9QA2/iaAbgem7VwQ2YlPiXEvUq48oB
VsKXMpHQ6A2cUygs+PiSFuUzNjTIebWFTWmKKuoRx0O2m63fAZJaT2aJA4G0HqdJ
ZQ2Aqr4Acs1+28IhLxUbMAlHJ4N2XPnE2WpQYbtUR4zZMXU+bVIToXuqHCLo4pf/
qEIK/xzr/S8WdvMvRVSOtVIIQwyaMDUxsnnKozYSVHvzYsxQo3b3VD5OOqmg1mx1
+Z/PLFViLkBjo+ZMkl5dFbsgYyHmkn/uvCV19IpjkdDNfFgdrOlSdNTnlGU7su5L
L31k/IwSvD0PR0egxiv8HhegaYwqgujVylB0gntyBsrVVHfE3Wr2+aJlR3YmrdCZ
lsAiSbnFxgGtfB6INHepFdkCAwEAAQ==
-----END PUBLIC KEY-----

We can see that we have a public key just from the header. But what’s all the base64 stuff? If it’s a public key, we know that there should be a modulus and an exponent hidden in there. And there’s also probably some kind of hint that this is an RSA key, as opposed to some other type of key.

That base-64 payload actually has the potential to contain a lot of information. It’s an ASN.1 (Abstract Syntax Notation One) data structure, encoded in a format called DER (Distinguished Encoding Rules), and then finally base64-encoded before the header and footer are attached.

ASN.1 is used for a lot of stuff besides keys and certificates; it is a generic file format that can be used to serialize any kind of hierarchical data. DER is just one of many encoding formats for an ASN.1 structure — e.g. there is an older format called BER (Basic Encoding Rules) and an XML-based format called XER (you can probably guess what that stands for).

But anyway, what is inside that public key? How can we find out?

Parsing PEM files

OpenSSL comes with a utility called asn1parse. It can understand plain DER or PEM-encoded ASN.1. Let’s feed our public key into it and take a look:

$ openssl asn1parse -in public.pem -i
    0:d=0  hl=4 l= 546 cons: SEQUENCE          
    4:d=1  hl=2 l=  13 cons:  SEQUENCE          
    6:d=2  hl=2 l=   9 prim:   OBJECT            :rsaEncryption
   17:d=2  hl=2 l=   0 prim:   NULL              
   19:d=1  hl=4 l= 527 prim:  BIT STRING

The column that contains “cons” and “prim” gives us information about the hierarchical structure of the data. “Cons” stands for a “constructed” field, i.e., a field that encapsulates other fields; on the other hand, “prim” means “primitive.”

The column after “cons” or “prim” tells us what type of data is in that field. The -i flag I’ve supplied causes that column to be indented according to how deep we are in the hierarchical structure. So overall, what are we looking at?

There is one root SEQUENCE object. That SEQUENCE contains another SEQUENCE and a BIT STRING. That internal SEQUENCE has an OBJECT and a NULL terminator. The OBJECT field is actually an Object Identifier — it contains some constant that tells us what kind of object we’re decoding. As we may have suspended, it tells us that we’re decoding an RSA key.

Here’s where stuff gets really interesting: That BIT STRING field actually contains more ASN.1 data. Let’s jump in and parse it:

$ openssl asn1parse -in public.pem -strparse 19 -i 
    0:d=0  hl=4 l= 522 cons: SEQUENCE          
    4:d=1  hl=4 l= 513 prim:  INTEGER           :E18160C0DAC490C7729E...
  521:d=1  hl=2 l=   3 prim:  INTEGER           :010001

(The -strparse 19 flag means “parse the data in the field located at offset 19 in the original structure.” If there was another BIT STRING inside this one, we could add another -strparse argument to recurse into it.)

What do we have here? A SEQUENCE containing two INTEGERs. It turns out that the first INTEGER is our modulus, and the second INTEGER is the exponent.

You can actually infer the effective length of the key from this information. The third column indicates the length of the modulus is 513 bytes; one of these bytes is just padding, so that means our key is 512 bytes (or 4096 bits) in strength.

Reverting to DER

As I explained earlier, PEM is just a wrapper around DER. So can we dump the raw DER and parse it that way? You sure can. You just cut out the PEM headers and base64-decode the payload, and what comes out is DER.

$ grep -v "PUBLIC KEY" public.pem | openssl base64 -d > public.der

Now we can inspect the raw DER:

$ hexdump public.der 
0000000 8230 2202 0d30 0906 862a 8648 0df7 0101
0000010 0501 0300 0282 000f 8230 0a02 8202 0102
0000020 e100 6081 dac0 90c4 72c7 3b9e b043 2acd
0000030 e16f 76de 294f 9ce2 7f92 a416 3761 40d2
0000040 32f1 1c1b 4067 1567 5d31 1299 d133 76e2
0000050 a85e ed9a 3935 c662 b612 6a61 9cae 884e
0000060 8a2a 0e55 d6ad d3be c1ca 44e3 9f79 da01
0000070 3650 c123 ae45 5296 8f8c a2d3 bcca c6fc
...

Or, we can use asn1parse if we actually want to understand it:

$ openssl asn1parse -in public.der -inform DER -i
    0:d=0  hl=4 l= 546 cons: SEQUENCE          
    4:d=1  hl=2 l=  13 cons:  SEQUENCE          
    6:d=2  hl=2 l=   9 prim:   OBJECT            :rsaEncryption
   17:d=2  hl=2 l=   0 prim:   NULL              
   19:d=1  hl=4 l= 527 prim:  BIT STRING   
$ openssl asn1parse -in public.der -inform DER -strparse 19 -i
    0:d=0  hl=4 l= 522 cons: SEQUENCE          
    4:d=1  hl=4 l= 513 prim:  INTEGER           :E18160C0DAC490C7729E...
  521:d=1  hl=2 l=   3 prim:  INTEGER           :010001

This is the exact same data we saw earlier, but there’s your proof that PEM really is just DER underneath a layer of base64.

My first adventure with Let’s Encrypt on nginx, dovecot, and postfix

Let’s Encrypt is old news by now. It launched back in December, so it has been giving away free DV certificates for nearly four months now. Being a TA for a Computer Security course, it’s about time that I actually tried it out.

Prologue

Let’s Encrypt is a free certificate authority. They grant TLS certificates that you can use to secure your webserver. They are Domain Validated (DV) certificates, which means they will verify that you control the domain name you are trying to certify.

To test out Let’s Encrypt, I decided to deploy new certificates on some internal Zeall.us services (Gitlab and Roundcube). Previously, we were using self-signed certs, so we would have to approve a security exception on every visit.

Unfortunately, I chose the wrong day to test out Let’s Encrypt. For some reason, their DNS client decided to die at the very moment I tried to get a certificate. I wasn’t the only one having this problem, but Google hadn’t yet crawled the support site so I didn’t see these other reports until the problem had already subsided. Once it cleared up, everything else was a breeze.

Obtaining certificates

Obtaining certificates for our Gitlab and webmail instances was incredibly easy. I just cloned the Let’s Encrypt repository from Github and used their super-simple command-line utility. It doesn’t have built-in support for nginx (our webserver of choice), but you can still easily use the webroot plugin to validate your domain. You just tell it the directory from which your webserver serves, and it puts a file there which it uses for the validation. I ran it once for each of the services:

./letsencrypt-auto certonly --email example@example.com --text --agree-tos --renew-by-default --webroot -w /opt/gitlab/embedded/service/gitlab-rails/public -d <our_gitlab_domain>
./letsencrypt-auto certonly --email example@example.com --text --agree-tos --renew-by-default --webroot -w /srv/roundcube -d <our_webmail_domain>

It handles everything automatically, and sticks your new certificates and keys in their respective /etc/letsencrypt/live/<domain_name>/ directories.

Configuring nginx

Prior to setting up the new certificates in nginx, I generated strong Diffie-Hellman key exchange parameters, as suggested by this guide to nginx hardening:

sudo openssl dhparam -out /etc/ssl/certs/dhparam.pem 4096

Then I just put the following lines in each of my nginx site configs:

ssl on;
ssl_certificate /etc/letsencrypt/live/<domain>/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/<domain>/privkey.pem;
ssl_session_timeout 5m;
ssl_session_cache shared:SSL:10m;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_ciphers 'EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH';
ssl_prefer_server_ciphers on;
ssl_dhparam /etc/ssl/certs/dhparam.pem;

Follow up with a sudo service nginx reload, and you’re good to go. With this configuration, my domain got an A+ from Qualys’ SSL Labs Server Test.

Configuring Dovecot

To supply the new certificate to my Dovecot IMAP server, I set the following configs in /etc/dovecot/conf.d/10-ssl.conf:

ssl_cert = </etc/letsencrypt/live/<domain>/fullchain.pem
ssl_key = </etc/letsencrypt/live/<domain>/privkey.pem

This is the same certificate I used for webmail, since the domain is the same. I have not yet investigated hardening techniques for Dovecot, but at a glance the settings seem to be quite similar to nginx.

This configuration was followed up with a sudo service dovecot restart, and then everything worked.

Configuring Postfix

Finally, I supplied the same certificate to Postfix (SMTP) by putting the following in /etc/postfix/main.cf:

smtpd_tls_cert_file = /etc/letsencrypt/live/<domain>/fullchain.pem
smtpd_tls_key_file = /etc/letsencrypt/live/<domain>/privkey.pem

After a sudo service postfix restart, the certificates are live.

Auto-renew

Let’s Encrypt provides certificates with a term of only 90 days, so it is necessary to write a script to renew them automatically. I wrote the following script:

/opt/letsencrypt/letsencrypt-auto renew | tee -a ./lerenew.log
service nginx reload
service postfix reload
service dovecot reload

And a corresponding crontab:

0 0 * * 0 ~/lerenew.sh

Now once a week, any eligible certificates will be renewed and the services which use them will be reloaded.

Conclusion

It has never been this easy to deploy TLS certificates. I highly recommend that anyone who is currently using a self-signed certificate switch to Let’s Encrypt. If you’re currently paying for a DV certificate, consider switching to Let’s Encrypt to save some money — but keep compatibility issues in mind.

Demonstrating the double-DES meet-in-the-middle attack

Introduction

DES (Data Encryption Standard) is an old-school block cipher which has been around since the 1970s. It only uses a 56-bit key, which is undeniably too short for use in the modern day. Between the realization that DES is weak in the late 90s and the invention of AES in the early 2000’s, Triple-DES had a brief time to shine.

Triple-DES is just what it sounds like: you run the DES algorithm three times. You use two 56-bit keys, K1 and K2, and apply them in the order K1K2K1. The result is a cipher with 112 bits of key strength.

Students often ask me, why not just encrypt twice: K1, K2, done? The reason is that this construction is vulnerable to a particular chosen-plaintext attack, which we call the meet-in-the-middle attack. That is, if the attacker knows your plaintext in addition to your ciphertext, he doesn’t have to try all 2^112 keys. The maximum amount of work he has to perform is actually only 2^56 — not much more than to break single DES.

Notation

Before I proceed to an explanation of the meet-in-the-middle attack, let’s get some basic cryptography notation out of the way.

Let’s call our plaintext P, and our ciphertext C. Let’s call our DES encryption function  E. It takes in a plaintext and a key; in return, it produces a ciphertext. Similarly, there is a decryption function D which takes a ciphertext and a key and spits out a plaintext.

We can then say:

  • E(P, K) = C
  • D(C, K) = P
  • D(E(P, K), K) = P
  • E(D(C, K), K) = C

You get the idea. Let’s move on.

Double-DES

Double DES uses two keys, K1 and K2. We produce a ciphertext using the following formula:

  • C = E(E(P, K1), K2)

We just encrypt twice, using our two keys. Easy.

Since each key is 56 bits, an attacker that doesn’t know the plaintext would need to guess a total of 112 bits of key to break the cipher. Therefore, the total keyspace is 2^112, and an average of 2^111 keys would have to be guessed before the true K1 and K2 are obtained.

Meet-in-the-middle

Let’s redefine Double-DES as a two-step process:

  • M = E(P, K1)
  • C = E(M, K2)

We will refer to the result of the first encryption as M, as it is our “middle” value. Now, for this attack to work, we assume our adversary has access to both and C, but wants to determine K1 and K2.

Consider the following statement — we can easily derive it from the second formula we wrote above:

  • M = D(C, K2)

An attacker can easily exploit this fact, and essentially make K1 independent of K2. He can guess them each separately, instead of trying to guess them at the same time. But how?

First, the attacker guesses every possible K1. Let’s call each of his guesses K1′. For each guess, he computes M’ = E(P, K1′) to obtain a potential “middle” value, and stores the result in a table along with the corresponding K1′.

After filling out the table, with each of the 2^56 possible K1′ keys, the attacker moves on to guessing K2. Again, let’s refer to each guess as K2′. He computes M’ = D(C, K2′) then checks whether this M’ matches any M’ he stored in the table previously.

If a match is found, then both keys have been broken. That is, if E(P, K1′) = D(C, K2′), then K1 = K1′ and K2 = K2′.

How much work has the attacker done? Well, generating the table required 2^56 encryptions, then finding a match would require 2^55 decryptions on average. Overall, we will only have done only 2^56.6 cryptographic operations. This is in stark contrast to the 2^111 operations which we would expect to perform in a ciphertext-only attack with two 56-bit keys.

Pseudocode

In short, this is the attack:

function meet_in_the_middle(C, P):
      table = {}
      for i = 0 to 2^56 - 1:
            table[E(P, i)] = i
      for i = 0 to 2^56 - 1:
            if D(C, i) in table:
                  return table[D(C, i)], i

That’s it.

Demonstration

I’ve written a quick demonstration of this attack in Python using the pycrypto library. The source code is on my Github. For the demo, I have only used 18 bits of the key (just so you don’t have to wait forever to see the results). There are also some optimizations which can be made (e.g., DES takes the key as a 64-bit input even though only 56 of those bits are used… we can skip some of the inputs that we have tested to make it faster). However, it works.

Triple-DES

So why isn’t Triple-DES vulnerable to this attack? Well, it’s easy. We can only “skip” one encryption using this strategy. Since Triple-DES is defined as E(D(E(P,K1),K2),K1), we can’t split it up in such a way that we won’t, at some point, have to guess both K1 and K2 at the same time. A formal proof of this claim is, unfortunately, out of scope of this article.

Last word

This attack isn’t specific to DES. It works on any composition of two ciphers which is constructed as E(E(P, K1), K2). So don’t ever, ever, ever try to do that.

Digital privacy: Where do you draw the line?

I recently had a friendly debate on IRC about how much privacy you really need for it to be considered “enough.” Arguably, the answer is “there is never enough.” And still, there are plenty of people who would be perfectly content having no privacy at all.

Many argue that the key to privacy is actually transparency. If we use open source software, we can all audit it to be sure it is free of backdoors and bugs which may leak information. Perhaps this is why increasingly many users (but unfortunately still less than 2% overall) now prefer Linux to proprietary operating systems. But even among Linux users, there is not much knowledge about how deep the privacy rabbit hole goes.

Before I return to the debate about how much privacy is really sufficient, I’d like to give a quick overview of some of the tools one can use to preserve their privacy. I will discuss four levels of privacy: network, OS, firmware, and hardware.

Privacy-centric networking

Tor is now a very well-known protocol. It allows you to browse the Internet privately by encrypting your data several times and forwarding it through several proxies, each of which can only remove one layer of encryption.

One alternative to Tor is JonDonym, which is similar in that it enables privacy by packaging data under multiple layers of encryption. However, it is inherently less secure because its anonymization network is smaller and it uses less randomization in path selection. Even so, it is demonstrably faster than Tor, so may be suitable if you aren’t quite as concerned about privacy.

Even though Tor has become the de-facto standard for Internet privacy, some privacy advocates and security researchers have recently asserted that Tor isn’t strong enough. For example, so-called “Internet drug kingpin” Dread Pirate Roberts was nabbed by the FBI despite using Tor back in 2013. More recently, there was a big controversy over the FBI allegedly paying academics large sums of money to help them catch criminals who were using Tor.

In response to concerns about Tor-like protocols, new privacy-centric protocols have been proposed. One particularly notable design is Vuvuzela, which goes out of its way to hide any possible metadata by having idle clients continuously transmit bogus data. While many complain that browsing the Internet using Tor is too slow, using Vuvuzela for this purpose is utterly impossible. It was purposely designed with message transmission latencies of up to one minute.  Even so, it is perfectly suitable for sending sensitive emails, or other types of not-quite-instant messaging.

Privacy-centric operating systems

The Linux-based operating system Tails received notoriety back in 2013, as the operating system which Edward Snowden used to preserve his privacy while in exile. Tails is 100% open source and comes with a bundle of privacy-oriented software and state-of-the-art cryptographic tools. Perhaps most notably, it runs like a Live CD and does not store any state unless you specifically ask it to. Therefore, you can be sure that no identifying or compromising files will remain on the disk.

Though Tails is certainly the most popular, there are other privacy-oriented OSes out there. Some other notable distributions include:

  • BlackArch: Based on Arch Linux, this distro is targeted primarily at pentesters and security researchers. Nonetheless, it ships with many of the same cryptographic and privacy-preserving utilities that are included with Tails.
  • Parrot Security OS: This rolling-release Debian-based distro is similar to BlackArch in that it is aimed primarily at security researchers but still gives you all the utilities you need to ensure your privacy and security.
  • JonDo: JonDo is more like Tails in that its main goal is to protect the user’s privacy. In addition to Tor, it includes clients for the JonDo anonymity network.
  • Qubes OS: Qubes is unique in that it provides security by enforcing compartmentalization. Each program you run is contained in its own sandboxed virtual machine, so there is little chance of data leakage between applications. It is also notable in that its team is quite involved in the security research community.

Open source firmware

Recently, the NSA has been accused of infecting computers with persistent spyware that survives even if the operating system is reinstalled. The software can install itself permanently by infecting the computer’s firmware, e.g. in the BIOS/UEFI or System Management Controller. Detecting these threats is very difficult, and sometimes they can be impossible to mitigate. Updating firmware is difficult, and often vendors will never release updates to their code. In response, there have been recent efforts to make computer boot firmware open source.

The most popular free/open-source firmware projects are coreboot and libreboot. The two are intimately related, in that libreboot is actually a distribution of coreboot (e.g. in the way that Debian is a distribution of Linux). These projects aim to make full software-stack transparency possible for the average user — but this is still an ongoing endeavor.

As of now, there are not many hardware platforms supported by libreboot, and the installation process can be quite arduous, depending on the platform. Fortunately, support for open-source firmware is growing, so hopefully it will be more accessible in the near future.

Open source hardware

Though it may now be possible to run 100% open source code on your computer, all from the firmware up to application software, it is still difficult to gain full control over your hardware. As a result, the Open Source Hardware Foundation (OSHWA) and the Free Software Foundation (FSF) have recently initiated efforts to develop open hardware and certify compliant devices. Only with open-source hardware is it possible to have complete trust in your computer. With access to every detail of the design, these certified devices let you be sure that you are invulnerable to privacy holes and backdoors.

While several embedded devices and hobby platforms (e.g., Arduino) embrace the open-source hardware philosophy, it is still difficult to find a general-purpose computer that is completely open-source. The FSF’s open hardware certification program, “Respects Your Freedom,” has only certified a handful of devices since its inauguration. This is somewhat understandable, since only a small niche of users care enough about their privacy to go this far, so there is not much motivation for hardware vendors to make their products open and get them certified. However, one can rejoice in the fact that it is indeed feasible to obtain a completely open-source machine.

How much is too much?

Personally, I believe that everyone should be entitled to privacy online. However, I’m not going to go out of my way to make all of my communications 100% airtight from the prying eyes of determined state actors.

There is certainly a tradeoff between privacy and convenience. Not many people would take the time to find an open-source hardware platform with open-source firmware then install a locked-down, privacy-aware OS and route all of their Internet traffic through Vuvuzela. It is certainly possible to go to such an extent, and it will enhance your privacy. But I wouldn’t do it.

I am not a fan of the “nothing to hide, nothing to fear” argument. It is completely fallacious and does not justify surveillance. Nobody should feel guilty for valuing their privacy. Even so, you would have to be really determined to go to such extreme lengths for privacy.

I’m perfectly content using TLS to transmit private data, and using a VPN if I’m connected to a network that isn’t trustworthy. But my hard disk isn’t encrypted, I don’t use Tor, and I don’t use open-source firmware, so I’m certainly not what anyone would call paranoid.

I think that being paranoid usually isn’t worth it. For example, the average hacker isn’t capable of breaking your SSL session. A government can, but (with high probability) they would have to target you specifically. And if you’ve become the specific target of an actor with nation-state scales of resources, then you can never be 100% safe. There will always be some vulnerability that can be exploited.