Unnecessary mathematics: queuing delay for two types of network traffic

This morning I asked myself what I thought was an interesting question:

  • If there are two types of traffic in a network, where one is much less common than the other, will the two types experience different queuing delays?

If you have more common sense than me, you’ve already figured out that the answer is “No.” But it took me about an hour to figure this one out.

I designed a simple model of the scenario: there is a queue of N packets, and one of them is of a different type than the rest. Let’s call the two types A and B.

We can assume that the position in the queue reflects the queuing delay of the packet, so then we only need to be interested in the average position in the queue for each type of traffic: E[P(A)] , and E[P(B)].

We assume that there are N-1 packets of type A, and one packet of type B. We can place the type-B packet at any of the N positions in the queue with equal probabilities:

  • E[P(A)] = \frac{1}{N} \sum_{i=0}^{N-1} \left( \frac{1}{N-1} \left( \left( \sum_{j=0}^{N-1} j\right) - i \right) \right)
  • E[P(B)] = \frac{1}{N} \sum_{i=0}^{N-1} \left( i \right)

The term summed in the formula for E[P(A)] represents the average position of all N-1 type-A packets. In E[P(B)], there is only one packet so the term doesn’t need another sum inside of it.

Well, it turns out that E[P(A)] and E[P(B)] both simplify to \frac{N-1}{2}.

This should be immediately obvious in the formula for E[P(B)], and it takes just a little more work to derive for E[P(A)]. Yep, it couldn’t have been simpler.

If you think about it, this result makes a lot of sense and I shouldn’t have needed to compute it — if I’m at a random position in line, then so is everyone else. So it doesn’t matter whether I’m averaging over a group of N-1 people, or only worrying about myself. We’ll all wait the same amount of time (on average, of course).

What’s inside a PEM file?

As you probably know, PEM is a base64-encoded format with human-readable headers, so you can kind of figure out what you’re looking at if you open it in a text editor.

For example, let’s look at an RSA public key:

-----BEGIN PUBLIC KEY-----
MIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEA4YFgwNrEkMdynjtDsM0q
b+Hedk8p4pySfxakYTfSQPEyGxxnQGcVMV2ZEjPR4nZeqJrtNTlixhK2YWqunE6I
KopVDq3WvtPKweNEeZ8B2lA2I8FFrpZSjI/Tosq8/MbTd/Y/C4Q8Qcz78MF/NH17
/E82K3ca9/LM2b4KGTEIhsLUff7OGrJM7lPcQZN3EOdUeQnzT9uTh8Z9oFqChfJP
pLwwSebfrRB7VMXjeKHZmubSO5pULHLdZLbkgLSmnhbgBjO6apG0tkYyOeWd6L8F
MzA21WkXJdANrr1s/yv5zS9hx1q9jSM8Me9QA2/iaAbgem7VwQ2YlPiXEvUq48oB
VsKXMpHQ6A2cUygs+PiSFuUzNjTIebWFTWmKKuoRx0O2m63fAZJaT2aJA4G0HqdJ
ZQ2Aqr4Acs1+28IhLxUbMAlHJ4N2XPnE2WpQYbtUR4zZMXU+bVIToXuqHCLo4pf/
qEIK/xzr/S8WdvMvRVSOtVIIQwyaMDUxsnnKozYSVHvzYsxQo3b3VD5OOqmg1mx1
+Z/PLFViLkBjo+ZMkl5dFbsgYyHmkn/uvCV19IpjkdDNfFgdrOlSdNTnlGU7su5L
L31k/IwSvD0PR0egxiv8HhegaYwqgujVylB0gntyBsrVVHfE3Wr2+aJlR3YmrdCZ
lsAiSbnFxgGtfB6INHepFdkCAwEAAQ==
-----END PUBLIC KEY-----

We can see that we have a public key just from the header. But what’s all the base64 stuff? If it’s a public key, we know that there should be a modulus and an exponent hidden in there. And there’s also probably some kind of hint that this is an RSA key, as opposed to some other type of key.

That base-64 payload actually has the potential to contain a lot of information. It’s an ASN.1 (Abstract Syntax Notation One) data structure, encoded in a format called DER (Distinguished Encoding Rules), and then finally base64-encoded before the header and footer are attached.

ASN.1 is used for a lot of stuff besides keys and certificates; it is a generic file format that can be used to serialize any kind of hierarchical data. DER is just one of many encoding formats for an ASN.1 structure — e.g. there is an older format called BER (Basic Encoding Rules) and an XML-based format called XER (you can probably guess what that stands for).

But anyway, what is inside that public key? How can we find out?

Parsing PEM files

OpenSSL comes with a utility called asn1parse. It can understand plain DER or PEM-encoded ASN.1. Let’s feed our public key into it and take a look:

$ openssl asn1parse -in public.pem -i
    0:d=0  hl=4 l= 546 cons: SEQUENCE          
    4:d=1  hl=2 l=  13 cons:  SEQUENCE          
    6:d=2  hl=2 l=   9 prim:   OBJECT            :rsaEncryption
   17:d=2  hl=2 l=   0 prim:   NULL              
   19:d=1  hl=4 l= 527 prim:  BIT STRING

The column that contains “cons” and “prim” gives us information about the hierarchical structure of the data. “Cons” stands for a “constructed” field, i.e., a field that encapsulates other fields; on the other hand, “prim” means “primitive.”

The column after “cons” or “prim” tells us what type of data is in that field. The -i flag I’ve supplied causes that column to be indented according to how deep we are in the hierarchical structure. So overall, what are we looking at?

There is one root SEQUENCE object. That SEQUENCE contains another SEQUENCE and a BIT STRING. That internal SEQUENCE has an OBJECT and a NULL terminator. The OBJECT field is actually an Object Identifier — it contains some constant that tells us what kind of object we’re decoding. As we may have suspended, it tells us that we’re decoding an RSA key.

Here’s where stuff gets really interesting: That BIT STRING field actually contains more ASN.1 data. Let’s jump in and parse it:

$ openssl asn1parse -in public.pem -strparse 19 -i 
    0:d=0  hl=4 l= 522 cons: SEQUENCE          
    4:d=1  hl=4 l= 513 prim:  INTEGER           :E18160C0DAC490C7729E...
  521:d=1  hl=2 l=   3 prim:  INTEGER           :010001

(The -strparse 19 flag means “parse the data in the field located at offset 19 in the original structure.” If there was another BIT STRING inside this one, we could add another -strparse argument to recurse into it.)

What do we have here? A SEQUENCE containing two INTEGERs. It turns out that the first INTEGER is our modulus, and the second INTEGER is the exponent.

You can actually infer the effective length of the key from this information. The third column indicates the length of the modulus is 513 bytes; one of these bytes is just padding, so that means our key is 512 bytes (or 4096 bits) in strength.

Reverting to DER

As I explained earlier, PEM is just a wrapper around DER. So can we dump the raw DER and parse it that way? You sure can. You just cut out the PEM headers and base64-decode the payload, and what comes out is DER.

$ grep -v "PUBLIC KEY" public.pem | openssl base64 -d > public.der

Now we can inspect the raw DER:

$ hexdump public.der 
0000000 8230 2202 0d30 0906 862a 8648 0df7 0101
0000010 0501 0300 0282 000f 8230 0a02 8202 0102
0000020 e100 6081 dac0 90c4 72c7 3b9e b043 2acd
0000030 e16f 76de 294f 9ce2 7f92 a416 3761 40d2
0000040 32f1 1c1b 4067 1567 5d31 1299 d133 76e2
0000050 a85e ed9a 3935 c662 b612 6a61 9cae 884e
0000060 8a2a 0e55 d6ad d3be c1ca 44e3 9f79 da01
0000070 3650 c123 ae45 5296 8f8c a2d3 bcca c6fc
...

Or, we can use asn1parse if we actually want to understand it:

$ openssl asn1parse -in public.der -inform DER -i
    0:d=0  hl=4 l= 546 cons: SEQUENCE          
    4:d=1  hl=2 l=  13 cons:  SEQUENCE          
    6:d=2  hl=2 l=   9 prim:   OBJECT            :rsaEncryption
   17:d=2  hl=2 l=   0 prim:   NULL              
   19:d=1  hl=4 l= 527 prim:  BIT STRING   
$ openssl asn1parse -in public.der -inform DER -strparse 19 -i
    0:d=0  hl=4 l= 522 cons: SEQUENCE          
    4:d=1  hl=4 l= 513 prim:  INTEGER           :E18160C0DAC490C7729E...
  521:d=1  hl=2 l=   3 prim:  INTEGER           :010001

This is the exact same data we saw earlier, but there’s your proof that PEM really is just DER underneath a layer of base64.

My first adventure with Let’s Encrypt on nginx, dovecot, and postfix

Let’s Encrypt is old news by now. It launched back in December, so it has been giving away free DV certificates for nearly four months now. Being a TA for a Computer Security course, it’s about time that I actually tried it out.

Prologue

Let’s Encrypt is a free certificate authority. They grant TLS certificates that you can use to secure your webserver. They are Domain Validated (DV) certificates, which means they will verify that you control the domain name you are trying to certify.

To test out Let’s Encrypt, I decided to deploy new certificates on some internal Zeall.us services (Gitlab and Roundcube). Previously, we were using self-signed certs, so we would have to approve a security exception on every visit.

Unfortunately, I chose the wrong day to test out Let’s Encrypt. For some reason, their DNS client decided to die at the very moment I tried to get a certificate. I wasn’t the only one having this problem, but Google hadn’t yet crawled the support site so I didn’t see these other reports until the problem had already subsided. Once it cleared up, everything else was a breeze.

Obtaining certificates

Obtaining certificates for our Gitlab and webmail instances was incredibly easy. I just cloned the Let’s Encrypt repository from Github and used their super-simple command-line utility. It doesn’t have built-in support for nginx (our webserver of choice), but you can still easily use the webroot plugin to validate your domain. You just tell it the directory from which your webserver serves, and it puts a file there which it uses for the validation. I ran it once for each of the services:

./letsencrypt-auto certonly --email example@example.com --text --agree-tos --renew-by-default --webroot -w /opt/gitlab/embedded/service/gitlab-rails/public -d <our_gitlab_domain>
./letsencrypt-auto certonly --email example@example.com --text --agree-tos --renew-by-default --webroot -w /srv/roundcube -d <our_webmail_domain>

It handles everything automatically, and sticks your new certificates and keys in their respective /etc/letsencrypt/live/<domain_name>/ directories.

Configuring nginx

Prior to setting up the new certificates in nginx, I generated strong Diffie-Hellman key exchange parameters, as suggested by this guide to nginx hardening:

sudo openssl dhparam -out /etc/ssl/certs/dhparam.pem 4096

Then I just put the following lines in each of my nginx site configs:

ssl on;
ssl_certificate /etc/letsencrypt/live/<domain>/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/<domain>/privkey.pem;
ssl_session_timeout 5m;
ssl_session_cache shared:SSL:10m;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_ciphers 'EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH';
ssl_prefer_server_ciphers on;
ssl_dhparam /etc/ssl/certs/dhparam.pem;

Follow up with a sudo service nginx reload, and you’re good to go. With this configuration, my domain got an A+ from Qualys’ SSL Labs Server Test.

Configuring Dovecot

To supply the new certificate to my Dovecot IMAP server, I set the following configs in /etc/dovecot/conf.d/10-ssl.conf:

ssl_cert = </etc/letsencrypt/live/<domain>/fullchain.pem
ssl_key = </etc/letsencrypt/live/<domain>/privkey.pem

This is the same certificate I used for webmail, since the domain is the same. I have not yet investigated hardening techniques for Dovecot, but at a glance the settings seem to be quite similar to nginx.

This configuration was followed up with a sudo service dovecot restart, and then everything worked.

Configuring Postfix

Finally, I supplied the same certificate to Postfix (SMTP) by putting the following in /etc/postfix/main.cf:

smtpd_tls_cert_file = /etc/letsencrypt/live/<domain>/fullchain.pem
smtpd_tls_key_file = /etc/letsencrypt/live/<domain>/privkey.pem

After a sudo service postfix restart, the certificates are live.

Auto-renew

Let’s Encrypt provides certificates with a term of only 90 days, so it is necessary to write a script to renew them automatically. I wrote the following script:

/opt/letsencrypt/letsencrypt-auto renew | tee -a ./lerenew.log
service nginx reload
service postfix reload
service dovecot reload

And a corresponding crontab:

0 0 * * 0 ~/lerenew.sh

Now once a week, any eligible certificates will be renewed and the services which use them will be reloaded.

Conclusion

It has never been this easy to deploy TLS certificates. I highly recommend that anyone who is currently using a self-signed certificate switch to Let’s Encrypt. If you’re currently paying for a DV certificate, consider switching to Let’s Encrypt to save some money — but keep compatibility issues in mind.

Demonstrating the double-DES meet-in-the-middle attack

Introduction

DES (Data Encryption Standard) is an old-school block cipher which has been around since the 1970s. It only uses a 56-bit key, which is undeniably too short for use in the modern day. Between the realization that DES is weak in the late 90s and the invention of AES in the early 2000’s, Triple-DES had a brief time to shine.

Triple-DES is just what it sounds like: you run the DES algorithm three times. You use two 56-bit keys, K1 and K2, and apply them in the order K1K2K1. The result is a cipher with 112 bits of key strength.

Students often ask me, why not just encrypt twice: K1, K2, done? The reason is that this construction is vulnerable to a particular chosen-plaintext attack, which we call the meet-in-the-middle attack. That is, if the attacker knows your plaintext in addition to your ciphertext, he doesn’t have to try all 2^112 keys. The maximum amount of work he has to perform is actually only 2^56 — not much more than to break single DES.

Notation

Before I proceed to an explanation of the meet-in-the-middle attack, let’s get some basic cryptography notation out of the way.

Let’s call our plaintext P, and our ciphertext C. Let’s call our DES encryption function  E. It takes in a plaintext and a key; in return, it produces a ciphertext. Similarly, there is a decryption function D which takes a ciphertext and a key and spits out a plaintext.

We can then say:

  • E(P, K) = C
  • D(C, K) = P
  • D(E(P, K), K) = P
  • E(D(C, K), K) = C

You get the idea. Let’s move on.

Double-DES

Double DES uses two keys, K1 and K2. We produce a ciphertext using the following formula:

  • C = E(E(P, K1), K2)

We just encrypt twice, using our two keys. Easy.

Since each key is 56 bits, an attacker that doesn’t know the plaintext would need to guess a total of 112 bits of key to break the cipher. Therefore, the total keyspace is 2^112, and an average of 2^111 keys would have to be guessed before the true K1 and K2 are obtained.

Meet-in-the-middle

Let’s redefine Double-DES as a two-step process:

  • M = E(P, K1)
  • C = E(M, K2)

We will refer to the result of the first encryption as M, as it is our “middle” value. Now, for this attack to work, we assume our adversary has access to both and C, but wants to determine K1 and K2.

Consider the following statement — we can easily derive it from the second formula we wrote above:

  • M = D(C, K2)

An attacker can easily exploit this fact, and essentially make K1 independent of K2. He can guess them each separately, instead of trying to guess them at the same time. But how?

First, the attacker guesses every possible K1. Let’s call each of his guesses K1′. For each guess, he computes M’ = E(P, K1′) to obtain a potential “middle” value, and stores the result in a table along with the corresponding K1′.

After filling out the table, with each of the 2^56 possible K1′ keys, the attacker moves on to guessing K2. Again, let’s refer to each guess as K2′. He computes M’ = D(C, K2′) then checks whether this M’ matches any M’ he stored in the table previously.

If a match is found, then both keys have been broken. That is, if E(P, K1′) = D(C, K2′), then K1 = K1′ and K2 = K2′.

How much work has the attacker done? Well, generating the table required 2^56 encryptions, then finding a match would require 2^55 decryptions on average. Overall, we will only have done only 2^56.6 cryptographic operations. This is in stark contrast to the 2^111 operations which we would expect to perform in a ciphertext-only attack with two 56-bit keys.

Pseudocode

In short, this is the attack:

function meet_in_the_middle(C, P):
      table = {}
      for i = 0 to 2^56 - 1:
            table[E(P, i)] = i
      for i = 0 to 2^56 - 1:
            if D(C, i) in table:
                  return table[D(C, i)], i

That’s it.

Demonstration

I’ve written a quick demonstration of this attack in Python using the pycrypto library. The source code is on my Github. For the demo, I have only used 18 bits of the key (just so you don’t have to wait forever to see the results). There are also some optimizations which can be made (e.g., DES takes the key as a 64-bit input even though only 56 of those bits are used… we can skip some of the inputs that we have tested to make it faster). However, it works.

Triple-DES

So why isn’t Triple-DES vulnerable to this attack? Well, it’s easy. We can only “skip” one encryption using this strategy. Since Triple-DES is defined as E(D(E(P,K1),K2),K1), we can’t split it up in such a way that we won’t, at some point, have to guess both K1 and K2 at the same time. A formal proof of this claim is, unfortunately, out of scope of this article.

Last word

This attack isn’t specific to DES. It works on any composition of two ciphers which is constructed as E(E(P, K1), K2). So don’t ever, ever, ever try to do that.

Digital privacy: Where do you draw the line?

I recently had a friendly debate on IRC about how much privacy you really need for it to be considered “enough.” Arguably, the answer is “there is never enough.” And still, there are plenty of people who would be perfectly content having no privacy at all.

Many argue that the key to privacy is actually transparency. If we use open source software, we can all audit it to be sure it is free of backdoors and bugs which may leak information. Perhaps this is why increasingly many users (but unfortunately still less than 2% overall) now prefer Linux to proprietary operating systems. But even among Linux users, there is not much knowledge about how deep the privacy rabbit hole goes.

Before I return to the debate about how much privacy is really sufficient, I’d like to give a quick overview of some of the tools one can use to preserve their privacy. I will discuss four levels of privacy: network, OS, firmware, and hardware.

Privacy-centric networking

Tor is now a very well-known protocol. It allows you to browse the Internet privately by encrypting your data several times and forwarding it through several proxies, each of which can only remove one layer of encryption.

One alternative to Tor is JonDonym, which is similar in that it enables privacy by packaging data under multiple layers of encryption. However, it is inherently less secure because its anonymization network is smaller and it uses less randomization in path selection. Even so, it is demonstrably faster than Tor, so may be suitable if you aren’t quite as concerned about privacy.

Even though Tor has become the de-facto standard for Internet privacy, some privacy advocates and security researchers have recently asserted that Tor isn’t strong enough. For example, so-called “Internet drug kingpin” Dread Pirate Roberts was nabbed by the FBI despite using Tor back in 2013. More recently, there was a big controversy over the FBI allegedly paying academics large sums of money to help them catch criminals who were using Tor.

In response to concerns about Tor-like protocols, new privacy-centric protocols have been proposed. One particularly notable design is Vuvuzela, which goes out of its way to hide any possible metadata by having idle clients continuously transmit bogus data. While many complain that browsing the Internet using Tor is too slow, using Vuvuzela for this purpose is utterly impossible. It was purposely designed with message transmission latencies of up to one minute.  Even so, it is perfectly suitable for sending sensitive emails, or other types of not-quite-instant messaging.

Privacy-centric operating systems

The Linux-based operating system Tails received notoriety back in 2013, as the operating system which Edward Snowden used to preserve his privacy while in exile. Tails is 100% open source and comes with a bundle of privacy-oriented software and state-of-the-art cryptographic tools. Perhaps most notably, it runs like a Live CD and does not store any state unless you specifically ask it to. Therefore, you can be sure that no identifying or compromising files will remain on the disk.

Though Tails is certainly the most popular, there are other privacy-oriented OSes out there. Some other notable distributions include:

  • BlackArch: Based on Arch Linux, this distro is targeted primarily at pentesters and security researchers. Nonetheless, it ships with many of the same cryptographic and privacy-preserving utilities that are included with Tails.
  • Parrot Security OS: This rolling-release Debian-based distro is similar to BlackArch in that it is aimed primarily at security researchers but still gives you all the utilities you need to ensure your privacy and security.
  • JonDo: JonDo is more like Tails in that its main goal is to protect the user’s privacy. In addition to Tor, it includes clients for the JonDo anonymity network.
  • Qubes OS: Qubes is unique in that it provides security by enforcing compartmentalization. Each program you run is contained in its own sandboxed virtual machine, so there is little chance of data leakage between applications. It is also notable in that its team is quite involved in the security research community.

Open source firmware

Recently, the NSA has been accused of infecting computers with persistent spyware that survives even if the operating system is reinstalled. The software can install itself permanently by infecting the computer’s firmware, e.g. in the BIOS/UEFI or System Management Controller. Detecting these threats is very difficult, and sometimes they can be impossible to mitigate. Updating firmware is difficult, and often vendors will never release updates to their code. In response, there have been recent efforts to make computer boot firmware open source.

The most popular free/open-source firmware projects are coreboot and libreboot. The two are intimately related, in that libreboot is actually a distribution of coreboot (e.g. in the way that Debian is a distribution of Linux). These projects aim to make full software-stack transparency possible for the average user — but this is still an ongoing endeavor.

As of now, there are not many hardware platforms supported by libreboot, and the installation process can be quite arduous, depending on the platform. Fortunately, support for open-source firmware is growing, so hopefully it will be more accessible in the near future.

Open source hardware

Though it may now be possible to run 100% open source code on your computer, all from the firmware up to application software, it is still difficult to gain full control over your hardware. As a result, the Open Source Hardware Foundation (OSHWA) and the Free Software Foundation (FSF) have recently initiated efforts to develop open hardware and certify compliant devices. Only with open-source hardware is it possible to have complete trust in your computer. With access to every detail of the design, these certified devices let you be sure that you are invulnerable to privacy holes and backdoors.

While several embedded devices and hobby platforms (e.g., Arduino) embrace the open-source hardware philosophy, it is still difficult to find a general-purpose computer that is completely open-source. The FSF’s open hardware certification program, “Respects Your Freedom,” has only certified a handful of devices since its inauguration. This is somewhat understandable, since only a small niche of users care enough about their privacy to go this far, so there is not much motivation for hardware vendors to make their products open and get them certified. However, one can rejoice in the fact that it is indeed feasible to obtain a completely open-source machine.

How much is too much?

Personally, I believe that everyone should be entitled to privacy online. However, I’m not going to go out of my way to make all of my communications 100% airtight from the prying eyes of determined state actors.

There is certainly a tradeoff between privacy and convenience. Not many people would take the time to find an open-source hardware platform with open-source firmware then install a locked-down, privacy-aware OS and route all of their Internet traffic through Vuvuzela. It is certainly possible to go to such an extent, and it will enhance your privacy. But I wouldn’t do it.

I am not a fan of the “nothing to hide, nothing to fear” argument. It is completely fallacious and does not justify surveillance. Nobody should feel guilty for valuing their privacy. Even so, you would have to be really determined to go to such extreme lengths for privacy.

I’m perfectly content using TLS to transmit private data, and using a VPN if I’m connected to a network that isn’t trustworthy. But my hard disk isn’t encrypted, I don’t use Tor, and I don’t use open-source firmware, so I’m certainly not what anyone would call paranoid.

I think that being paranoid usually isn’t worth it. For example, the average hacker isn’t capable of breaking your SSL session. A government can, but (with high probability) they would have to target you specifically. And if you’ve become the specific target of an actor with nation-state scales of resources, then you can never be 100% safe. There will always be some vulnerability that can be exploited.

A fun experiment with Twilio

Aside: I apologize for the long gap in posts. I’ve been in Mexico City for the past two weeks. I’ll post some photos once I get the film developed. In the meantime, here’s a small project I’ve been working on.

I first heard about Twilio a long, long time ago. As Google Voice faded out of relevance, it took the lead in the mobile-communication-as-a-service market. However, I had never had the chance (or inclination) to play around with its API until today.

About 12 hours after we landed back in the US from our holiday in Mexico, Lynsey departed once again — this time to the Plant and Animal Genome conference (PAG) in San Diego. She asked me to supply her with pictures of our cats for the duration of her trip. I told her I would send her a cat pic every hour, on the hour.

I didn’t realize what I had gotten myself into until I had already deposited $20 into a new Twilio account and spent 2 hours coding away… Though my goal was just to send some photos of cats, I had developed a pretty general application that lets you build a queue of MMSes to be disseminated at a constant rate.

This was not only my first experiment with Twilio — it was also my first adventure with Flask, a framework for building simple HTTP services in Python. I must say that both Flask and Twilio were extremely easy to use, and I would not hesitate to use them again.

I will certainly be finding some more uses for Twilio in the future — it will be invaluable for the SMS-based 2FA we plan to integrate into Zeall. But for the time being, it will be serving as a medium for the delivery of photos of cats.

If anyone is interested in my Twilio-based app, here is a link again: https://github.com/le1ca/twiliq

What goes around comes around: Is the Juniper backdoor the feds’ fault?

By now, everyone should know that Dual EC DRBG is unsafe. Way back in 2013, it was revealed that it has many weaknesses, some of which were traced back to the NSA (with the help of Edward Snowden’s leaks). Whether or not it really was inserted by the NSA, the backdoor has been proven to exist and is easily exploitable.

For those of you who don’t know, a DRBG, or “deterministic random bit generator,” is essentially just a way for computers to generate random numbers. We need random numbers for cryptography — they form the basis of our secret keys. If a random number generator is compromised, then the keys it produces are unsafe to use.

Earlier this week it was announced that Juniper Networks, a major vendor for network hardware, had found a massive backdoor in its own VPN appliances. The US government is pretty worried about this, since apparently they use a lot of Juniper hardware. They’re suspecting foreign governments; they think the backdoor was inserted by state sponsored hackers from China or Russia.

However, some researchers are looking back toward Dual EC. They think foreign governments are really just a scapegoat, and that the backdoor is the federal government’s own fault. There is documentation proving that Juniper still uses Dual EC in some of its products. If the Dual EC backdoor turns out to be the source of the vulnerability in Juniper’s VPNs, then this is certainly an ironic turn of events for the NSA.

Before I conclude, let me take a moment to answer the question posed in the title of this post: Is the backdoor the feds’ fault? Eh, probably not this time. But it is certainly possible. The NSA is known to have ways to backdoor hardware. And unfortunately, no backdoor is safe. Eventually, someone other than the intended user is going to find out how to get in. And what happens then?

In defense of over-engineering

I recently stumbled across an old joke about a software engineer that proposes an overly-complicated model for a simple problem. While I agree with the general sentiment of the joke, I don’t agree with the line of discourse that it represents. Among developers, overthinking is portrayed as wholly counterproductive. I disagree — I think overthinking a system is beneficial.

You may say, “But a toaster doesn’t need inheritance models for different types of breakfast foods!” I concede, this is true. But nonetheless, you should consider, “But what if it did?”

A lot of software systems unfortunately begin as a quick hack, with no formal model having ever been designed. Sometimes these systems are intended to be only temporary, stop-gap measures. Sometimes they are never intended to be used in production at all. However, with high probability, such systems end up being deployed for the long term. Through the life of the software, new requirements are born. Then new hacks are made, and the system becomes more complex (and more unstable). Finally, we end up with a mess of spaghetti code, and nobody is happy.

How do you prevent a codebase from becoming a disaster? Having good developers with good habits is not enough. You must also have a good design. This means you must consider every possibility before it becomes a requirement. You should think about what might be asked of your system in the future. You should think about how the objects it deals with could be modeled. Should you implement it this way? No, probably not. Once you have drawn up a complicated model for your system, throw it out.

Now you can start from scratch, and make a smaller, more practical model. Having considered the system in great detail, you know more about what to expect. With all those complexities in mind, you can build a simple system that still does the same job. But you will be more prepared for future changes, and you won’t need to use any dirty hacks to make them work.

Complexity is generally a bad thing, but modularity is good. Every piece of software is destined to become more complex, and the only thing you can do about it is to be prepared for that to happen.

No, fingerprints are not secure

Authentication is the process by which a system determines whether a particular user is allowed to access it. There are three widely agreed-upon methods to authenticate a user:

  • Something you have.
  • Something you know.
  • Something you are.

When you use your key to unlock your front door, you are authenticating yourself using something you have. In information security, passwords are the most popular method of authentication; they are something you know. Authentication by something you are (i.e., biometrics) has historically been only a niche practice, but in recent years it has caught on in the realm of consumer electronics.

When Apple announced Touch ID in late 2013, security experts immediately voiced their concern. The authentication mechanism was quickly compromised, and there is still very little that Apple can do about it. Why, you ask? Because fingerprints are inherently insecure.

An authentication mechanism must meet three major requirements to be considered secure:

  • It is secret.
  • It is unique.
  • It is revocable.

While fingerprints are unique (in theory — more on that later), they are neither secret nor revocable.

You wear your fingerprints every day, and everybody can see them. But what implications could this possibly have? About a year ago, a German hacker famously announced that he had compromised the fingerprint of defense minister Ursula von der Layen using only a photograph. After obtaining a photograph of someone’s fingerprint, it is trivial to create a fake. And that easily, your biometric security has been defeated — permanently.

Permanently? Yes — biometrics aren’t revocable. If someone steals your password, you can easily change it. If they steal the keys to your house, you can change the locks. But if someone steals your fingerprint, you can’t get new fingers. Just take a minute and let that sink in.

Now, more on the uniqueness aspect. We are all taught that our fingerprints are unique, which is essentially true. However, the way they are interpreted by computers is not exactly unique. When a fingerprint is used for authentication, an algorithm typically selects a handful of features it thinks are unique and distinct; these points are called minutia and are the only portions of the fingerprint considered for identity matching.

Because the environment will not always be the same when your fingerprint is scanned, each read will probably contain a different set of minutia. Whether it be caused by random noise, damage to the imaging lens, or something on your finger (like some dirt, or perhaps a burn or a cut), two scans will never be exactly the same. Thus, fingerprints can not be compared for an exact match the way that passwords are.

What are the implications of this kind of comparison? Well, it means there must be some acceptance threshold which determines how “alike” two fingerprints must be to be considered a match. And thus there will always be a chance that someone else’s fingerprint will be able to authenticate your identity. We can always raise the threshold to reduce this probability, but it will in turn make it more likely that a valid fingerprint will be rejected. In the current state of the art, a false positive rate (FPR) of 0.1% can correspond to a false negative rate (FNR) anywhere between 0.02% and 15%. Standard consumer devices probably lie on the higher end of that FNR range. I doubt that users would tolerate getting rejected 15% of the time, so it would be a safe bet to assume that the threshold is set so that the false positive rate is much higher than 1 in 1000.

So what have we learned here? Well, that’s up to you to decide. In my opinion, the integration of fingerprint scanners into consumer devices is nothing more than a shifty marketing scheme. The general public thinks biometrics are super high-tech, and thus they are a means to inspire consumers to invest in a new gadget.

If you have any sensitive data on your phone, I suggest you at least put it behind a PIN. I reiterate, your fingerprint can be trivially stolen. It might be a good choice for second authentication mechanism (i.e., for two-factor authentication), but it should not be used by itself.

Dell joins Lenovo’s MITM bandwagon

Several months ago, news broke that Lenovo was shipping a rogue root certificate with its laptops. It was included as part of a pervasive adware called Superfish, which had already been annoying users (and support techs) for years prior to it being included by default on these machines. More recent news indicates that this is only the least of security concerns for Lenovo users — it appears that there are also backdoors in the hardware itself, and now governments around the world have blacklisted Lenovo as a vendor for this reason.

Dell and Lenovo have been battling over PC market share for years now, and this understandably gave Dell a pretty decent boost in popularity. This makes Dell’s latest action all the more surprising.

Reddit user rotorcowboy discovered this past Sunday that Dell is including its own rogue CA on new laptops, and has since demonstrated that it is feasible to launch an attack using it. Another user, iamwpg, has identified that the certificate is also being installed through an update for machines that didn’t include it from the factory.

Let’s backtrack a little bit. What exactly does this mean?

Websites are delivered securely using a protocol called Transport Layer Security (TLS). To use TLS effectively, you need to get a certificate signed by a Certificate Authority (CA). Each certificate names the website it is intended to be used with, and when a CA signs it they are essentially vouching that you are in charge of that site. There are several CAs that are trusted by default, and they all tend to play by the rules. If you have a signed certificate from a trusted CA for a particular website, then you can act as that website for all intents and purposes.

The CA certificate that Dell is now distributing can be trivially cracked so that you can use it to sign new certificates. And since Dell’s laptops now trust this CA by default, they will trust your fake certificates. Then you can perform a Man-In-The-Middle (MITM) attack against these Dell users, effectively impersonating any website they might use. While doing this, you can snoop on or modify any of their traffic without their knowledge. It’s a pretty effective way to acquire personal and financial information, if the attack can be orchestrated.

One of the main purposes of TLS is to prevent MITM attacks. It is relatively easy to hijack an IP address or a domain name, but a TLS certificate and its chain of trust to a root CA helps us identify when we aren’t talking to the website we expected. With Dell machines now trusting a vulnerable CA certificate, this line of defense is gone and the security of the Internet is greatly undermined for many users.

Dell has made a serious mistake here. If you have a Dell computer, do yourself a favor and check if the eDellRoot certificate is present on your machine.