Practically Reversing CRC

The algorithm described in my previous post on how to compute a patch for a message to get the wanted crc computed from that text has a usefull application when wanting to get the possible texts that could compute to your own provided crc.

A little util I wrote, reversecrc (, lists (the endless) list of possible original texts, has been usefull for the Freelancer modding community, for in that game Crc is used a lot and was an obstacle.

Although it is almost impossible to reverse a 20 character string because it would be insane to check all the texts that could result in the hash, using one of the provided texts would do too as it results in the same hash anyways.

Subversion repositry:

Opeth’s Ghost Reveries

Sweden’s OPETH and Jens Bogren have finished the recording and mixing of the group’s latest opus, “Ghost Reveries”. The album was mastered on Thursday (June 16) at the Cutting Room facilities in Stockholm. The total running time will be around 65 minutes.

The two released tracks, Ghost of Perdition and The Grand Conjuration sound promising.

The album will be for sale on my birthday :-), the 30th of August.

And they will perform it during their Europian tour, which I`ll visit on the 11th of September in the 013 in Tilburg.

I`m thrilled.

Reversing CRC

Cyclic Redundancy Code

CRC is a hash which is frequently used as a checksum for data in for instance archives. Who hasn’t had bad CRC errors once when opening corrupted zips. CRC is a very old algorithm and over time it changed a lot from the original idea.

The original idea behind CRC was representing the data that you wanted the hash from as a big number and dividing it by a certain number called the polynomial and taking the remainder of the division as the hash. For instance: 23 % 3 = 2 (% is the modulus operator, which is the remainder of a division)

The initial problem was that dividing is a rather intensive operation. They wanted to simplify CRC to make it easier to implement in hardware and make it faster. They did this by getting rid of the carry used in the substraction of the division:

Normal substraction (binary): 10101 - 01100 = 01001
Without carry: 10101 - 01100 = 11001

Substraction without a carry is basicly a eXclusive bitwise OR (XOR), which returns only 1 when one of the operands is 1 and the other 0.

The algorithm required was faster bit still worked bit by bit, which isn’t really what a computer likes. A computer works best with one to four bytes. To make CRC faster they cached 8 operations at the time by precomputing the results for a certain start value and put it in a table called a XOR Table.

The required code for the CRC calculation itself now became very simple:

hash = (hash >> 8 ) ^ table[data ^ (0xff & hash)]

They changed the CRC algorithm once more by making it reflected. This means that the input data is reversed bitwise: 011101011 <-> 110101110. This was done because most of the hardware chips at the time reversed data bitwise. For it was too much work to reflect each byte of incoming data they changed the algorithm that generates the Crc table to create a table which has the effect of reflected data.

This is by the way not totally correct; the result still was different for a reflected than a non-reflected algorithm for they wouldn’t cache the whole piece of data to reverse it but did it per byte at calculation.

At this moment CRC barely resembles the original idea of a modulus.

Reversing CRC

First off, credits for the original idea of CRC patching go to anarchriz.

CRC is a cryptographicly very weak algorithm. It can be easily reversed for it has got the property that with 4 bytes you append to the current computed hash you can get every required hash. You can change the whole message and add 4 patch bytes to patch the hash to the one you like.

The ability to patch a CRC also makes it possible to very efficiently generate all possible source data of a checksum. Although it still is a bruteforce method you got 4 bytes freely and patching is faster than calculating.

Patching is basicly going back the way CRC works. Crc basicly takes the hash, moves it 1 byte to the right (dropping one byte) and xor-ring it with the table entry. The nice thing about normal CRC is that the first byte of a table entry is unique for that entry.

For the first byte of the entry is unique for that entry and it is put in the hash xor-red with 0 for that is what is shifted in from the right you can work back the whole entry used.

For instance:

My is: 0x012345678, this means that it was xorred with the entry in the CRC table that starts with 0x12. When you xor the hash with that full entry the only thing that the next byte was xorred with was the start of a table entry too.

When reversing the current hash you know what will be xorred on the patch you’ll give. Xorring this with your wanted hash is enough.

The resulting algorithm is suprisingly simple:

– Put the current hash byte wise reversed at the start of a buffer. Put the wanted hash byte wise reversed at the end of the current hash in the same buffer.
– Look up the entry in the table that starts with byte 7 in the buffer. Xor this value of position 4, and Xor the entry number on position 3. Repeat this 4 times with the positions each time one decremented. (thus 7,6,5,4 and 4,3,2,1 and 3,2,1,0)

When you’ve done this the required patch bytes are the first 4 bytes in the buffer.

Some Crc variants tend to use duplicates in the crc-table which means there could be more than one original table entry used. You should just branch and try all of them.

I’ve made a simple python script to work with crc32 and to patch a hash.

You can download it Here. And there is a C implementation here.

Update Fixed a bug in the example.

Update Andrea Pannitti wrote an implementation in java.

Update I found a great article by Stigge, Pl�tz, M�ller and Redlich about reversing CRC. They pretty much nailed it by bringing it down to elementary algebra. They explain an algorithm which is able to patch data at any given point to adjust the CRC to any desired value.

Linux Mount Security

With the linux Set UID Attribute you can let the owner of the file be the one the execute it when another user executes the file. This feature has traditionaly be used for system tools in linux which require root access to run but also must be able to be run my users.

It came to mind that a floppy with the ext2 filesystem could contain files of the root user with this Set UID Attribute set. Which theoraticly would allow anyone who is allowed to mount floppy’s or other media with an filesystem that supports this attribute to gain root access for a program.

On my system I got this entry in my /mnt/fstab, which allows an user to mount the floppy:

/dev/floppy/0 /mnt/floppy auto noauto,user 0 0

I made a simple C program which would show the contents of /etc/shadow, which contains the password hashes of the users, and chmodded it accordingly. (chmod = showshadow; chmod go+x showshadow; chmod u+rs showshadow)

I ran my program, and it seemed to work! The contents of the /etc/shadow file was streaming on my console.

Euforicly I went to another linux computer and tried the same trick.

darkshines@darkshines-one /mnt/floppy $ ./showshadow
bash: ./showshadow: Permission denied

Dissapointed but releived it seemed that linux had already some precaution against a root Set UID-ed executable.

I copied the contents of the folder whilest preserving permissions to another folder outside the /mnt/floppy and it all seemed to work again, although I couldn’t do it with a normal user account for I can’t preserve the owner when copying a file as a normal user.

I wondered how linux would secure it and tried to run the program while it was unchmodded.

darkshines@darkshines-one /mnt/floppy $ ./showshadow.unchmodded
bash: ./showshadow.unchmodded: Permission denied

The warning is from bash which can’t seem to execute the program. (note that it isn’t the program that can’t acces shadow) . After recompiling it on the floppy itself it seems that linux prevents any program to be executed in an user mounted folder.

I recon that that security precaution is a bit too strict. Although copying the file from the medium to a normal folder and then executing is still possible, I find it a bit strange that nothing of the user itself can be executed.

This could result in trouble when dealing with netmounts where one user can has a share on a server where he puts his files and can access only that mount for space on a terminal, when dealing with an user mount which would be required for security.

London Bombings

One thing that amazed me about the bombings is that the people didn’t panic, but remained calm.

Even though the people of London and the rest of England were hurt, they weren’t terrorized.

Although it might sound harsh, the terrorists have lost this fight: they gained nothing by these bombings.

Update: although it is s a bit inconvenient, this post is my 100th blog post. And currently the 100th comment has been made too.

Welcome to

When you read this, this page has been served from my virtual server.

This also means that luckily the whole web content transfer has been a success :-D.

Sadly things haven’t gone as smoothly with the qmail configuration for my mail accounts.

So please mail me to instead of my acccount.

Even though the minor setback with the mail (which is giving me headaches), it has been rather fun to do. And I learned a lot. (like that putting a restriction on the amount of memory that php uses isn’t just to annoy the customer but just to safe the server from some terribly php scripts)

When everything is working fine I`ll post some more on setting up your own server.

I desperately need some sleep.

Update: seems that there are quite some bugs in the transfer after all: ftp which only recursed a certain amount of times; .htaccess files not being transfered; permissions changing; php sucking too much memory (again); not enough sleep 😛
But seems to be working right now


Quite some time ago google started honouring the rel="nofollow" attribute value pair in a html tags, which should prevent spammers from gaining a high page rank by spamming blogs with comments.

It is useless.

Spamming costs almost nothing, if there is a slightest amount of gain in it for the spammers they will keep doing it. Of all those hundred thousands of people visiting blogs and seeing comment spam a few will still follow the link, those few are enough for the spammers.

One could argue that the websites the comment spam point to now haven’t got a really high pagerank anymore. This is only partially true, for only a selective amount of people install the code to add the rel="nofollow" attribute in links that can be spammed. Even though the highest ranking blogs have already installed it, the tons of small blogs are still enough to raise the page rank enourmously.

In my opinion search engines should just ignore links which are considered spam.

Function Recursion Overhead

In courses I followed at university and in most books about programming, recursion is praised as a good way to solve some problems.

An example of recursion (not-existing-api used for the sake of simplicity):

int GetFolderSize(Folder folder)
 int sum = 0;
 foreach(File file in folder.files)
  sum += file.Size;
 foreach(Folder subFolder in folder.folders)
  sum += GetFolderSize(subFolder);
 return sum;

This example function calculates the combined size of all files in a folder and those inside any subfolder.

It does its job and it is very clear how it works, but it is inefficient, take another way to write the algorithm:

int GetFolderSize(Folder folder)
 Stack<Folder> stack = new Stack<Folder>();
 int sum = 0;
 while(stack.Count > 0)
  Folder folder = stack.Pop();
  foreach(File file in folder.files)
   sum += file.Size;
  foreach(Folder subFolder in folder.folders)
 return sum;

This version is harder to understand. Basicly it maintains a stack of folder of which the total size of the files in it are not yet calculated. While going through each of the folders in the stack it adds new sub-folders from folders and removes the ones processed.

The latter method is way more efficient.

For each function function call in the first (recursive) version of the algoritm a new sum-instance, a subFolder-instance and an enumerator instance over the folder.Folders is required which stacks up. When 5 deep in the first algorithm you already use more than double of the amount of memory ever required in the second algorithm.

Additionally a function call itself requires more memory of its own, which depending on the language used, can be quite significant. Debugging also gets really hard when you have got a stack trace of hundreds of functions.

Using your own stack instead of poluting the call stack (the thing where function calls are kept) sound great. There is only one little problem, it can get pretty complex.

Take for instance a program that would put the files of folders and their subfolders in a file in that same folder, for instance for a playlist generater. This requires the algorithm to detect when all child items of a certain item have been processed. The problem with this is that the parent item is already gone from the stack when the child item is processed. This requires some additional fields and a few function pointers to get it to work and it can get a mess.

The best way to get it to work is to mimic what a normal recursive method would have done, which involves a lot of extra helper classes, which all depend on how the original recursive algorithm has worked. In a language with a Garbage Collector (which automaticly free’s unused classes) it is managable, but in a language without it like C(++) trouble doubles when you also need to monitor the lifetime of the helper classes.

I noticed that it has been very tempting to use recursion and that there are almost no occasions where something like your own stack is used, espacially in the complex cases. A shame, it is challenging :-P.