Sadly (or not), this article is not going to be on dogs or any real birthdays but I’ll consider this topic for future articles, I promise.

Instead, we are going to discuss a cryptographic attack that exploits the math behind the birthday problem, named the ‘Birthday Attack’ which sounds just as much fun.

### The Birthday Problem

The birthday problem is concerned with the likelihood that some pair of people in a group of *N* randomly picked people would share the same birthdate.

The birthday problem is often called ‘The birthday paradox’ since it produces a surprising result — A group of *23* people has a more than *50%* chance of having a common birthdate, whereas a group of *70* has a *99.9%* chance of having a shared birthday.

Let’s go over the math quickly.

Say we randomly take one person out of the group of N people. this person obviously has a birthdate which we will notate with A.

Now, let’s pick another person, with birthdate B. what is the probability the A is different from B?

To find the probability that these two people share a birthday we need to calculate 1-P, which is* 0.0027*.

Let’s take another step and try to calculate the probability of choosing randomly* 23* people so that no one of them shares a birthdate.

Similarly, to find the probability of a shared birthdate we need to calculate 1-P which is *50.7*.

We can generalize this with some combinatorics trickery, namely, the pigeonhole principle

### The Birthday Attack

A birthday attack is a type of cryptographic attack which exploits the mathematics underlying the birthday problem in probability theory.

As explained in the birthday problem, the attack is based on a fixed number of permutations (pigeonholes) and the greater chance of collisions detected between random attack attempts.

To better understand what all of this means, let’s get familiar with hash functions.

A **hash function** is any function that can be used to map data of arbitrary size to fixed-size values.

For example, given a hash function H, the mapping can look like

H(“hello”) = 7a

H(“pasta”) = b9

H(“please”) = 1d

H(“goodbye”) = 7a

As we can see, the inputs “hello” and “goodbye” got the same hash — which is how a **hash collision** is defined.

The birthday attack finds two different messages *m₁, m₂*, such that *H(m₁) = H(m₂)*, namely a hash collision between two messages.

This is where the birthday problem discussed earlier comes in handy — we can think of a hash collision as of two people that are having the same birthdate.

Applying the results we had with the birthday problem, we can get that given *23* messages, there is a *50%* that* 2* of them share a hash.

#### Why do we care if two messages share a hash?

Let’s consider passwords.

Whenever you make a password for a website, this password is encrypted into a hash that stores and identifies the combination of characters you input.

For example, given the password “12345678” — which is my bank account password (joking), MD5(message-digest algorithm 5) encryption algorithm would output a 32 hexadecimal characters hash: 25d55ad283aa400af464c76d713c07ad (you can play around with MD5 with this online free tool)

The attack’s goal was to find and force a hash collision for any two values.

If there was indeed a collision, we could change information **only in one** of the messages and affect **both of them**.

It is worth mentioning that hash algorithms nowadays are **not vulnerable** to this kind of attack since it is not feasible computationally.

### Shameless Self-Promotion

- My girlfriend and I recently opened a greeting card shop together.

In case a loved one has a birthday soon or you need some Christmas cards in advance, we got you covered. - Don’t forget to subscribe with your email to get the latest posts to your inbox