Reversing CRC

Cyclic Redundancy Code

CRC is a hash which is frequently used as a checksum for data in for instance archives. Who hasn’t had bad CRC errors once when opening corrupted zips. CRC is a very old algorithm and over time it changed a lot from the original idea.

The original idea behind CRC was representing the data that you wanted the hash from as a big number and dividing it by a certain number called the polynomial and taking the remainder of the division as the hash. For instance: 23 % 3 = 2 (% is the modulus operator, which is the remainder of a division)

The initial problem was that dividing is a rather intensive operation. They wanted to simplify CRC to make it easier to implement in hardware and make it faster. They did this by getting rid of the carry used in the substraction of the division:

Normal substraction (binary): 10101 - 01100 = 01001
Without carry: 10101 - 01100 = 11001

Substraction without a carry is basicly a eXclusive bitwise OR (XOR), which returns only 1 when one of the operands is 1 and the other 0.

The algorithm required was faster bit still worked bit by bit, which isn’t really what a computer likes. A computer works best with one to four bytes. To make CRC faster they cached 8 operations at the time by precomputing the results for a certain start value and put it in a table called a XOR Table.

The required code for the CRC calculation itself now became very simple:

hash = (hash >> 8 ) ^ table[data ^ (0xff & hash)]

They changed the CRC algorithm once more by making it reflected. This means that the input data is reversed bitwise: 011101011 <-> 110101110. This was done because most of the hardware chips at the time reversed data bitwise. For it was too much work to reflect each byte of incoming data they changed the algorithm that generates the Crc table to create a table which has the effect of reflected data.

This is by the way not totally correct; the result still was different for a reflected than a non-reflected algorithm for they wouldn’t cache the whole piece of data to reverse it but did it per byte at calculation.

At this moment CRC barely resembles the original idea of a modulus.

Reversing CRC

First off, credits for the original idea of CRC patching go to anarchriz.

CRC is a cryptographicly very weak algorithm. It can be easily reversed for it has got the property that with 4 bytes you append to the current computed hash you can get every required hash. You can change the whole message and add 4 patch bytes to patch the hash to the one you like.

The ability to patch a CRC also makes it possible to very efficiently generate all possible source data of a checksum. Although it still is a bruteforce method you got 4 bytes freely and patching is faster than calculating.

Patching is basicly going back the way CRC works. Crc basicly takes the hash, moves it 1 byte to the right (dropping one byte) and xor-ring it with the table entry. The nice thing about normal CRC is that the first byte of a table entry is unique for that entry.

For the first byte of the entry is unique for that entry and it is put in the hash xor-red with 0 for that is what is shifted in from the right you can work back the whole entry used.

For instance:

My is: 0x012345678, this means that it was xorred with the entry in the CRC table that starts with 0x12. When you xor the hash with that full entry the only thing that the next byte was xorred with was the start of a table entry too.

When reversing the current hash you know what will be xorred on the patch you’ll give. Xorring this with your wanted hash is enough.

The resulting algorithm is suprisingly simple:

– Put the current hash byte wise reversed at the start of a buffer. Put the wanted hash byte wise reversed at the end of the current hash in the same buffer.
– Look up the entry in the table that starts with byte 7 in the buffer. Xor this value of position 4, and Xor the entry number on position 3. Repeat this 4 times with the positions each time one decremented. (thus 7,6,5,4 and 4,3,2,1 and 3,2,1,0)

When you’ve done this the required patch bytes are the first 4 bytes in the buffer.

Some Crc variants tend to use duplicates in the crc-table which means there could be more than one original table entry used. You should just branch and try all of them.

I’ve made a simple python script to work with crc32 and to patch a hash.

You can download it Here. And there is a C implementation here.

Update Fixed a bug in the example.

Update Andrea Pannitti wrote an implementation in java.

Update I found a great article by Stigge, Pl�tz, M�ller and Redlich about reversing CRC. They pretty much nailed it by bringing it down to elementary algebra. They explain an algorithm which is able to patch data at any given point to adjust the CRC to any desired value.