SINP Certificates and redirects

Tuesday the 11th, we (Noud, Bram and I) had a meeting with some guys of the Security of Systems Group at the Radboud University. We discussed the security of the current SINP protocol. There hasn’t been a hard verdict on whether SINP is secure, because the SINP specification leaves a lot of details to implementations and SINP doesn’t make hard claims on its security yet (which can be either proved or disproved).

The meeting has yielded two new additions to SINP: document certificates1 and redirects.

First of, SINP document certificates. At the moment you can’t trust the information in a SINP document. I can forge my SINP document and claim that I’m Belgium, which I am not. To allow some trust which some people and services care about, we’ll allow certificates in your identity document. Basically you let someone sign a part of your SINP document and include that certificate.

Your goverment could sign your name in your SINP document for instance and you’d add that certificate into your document, which could be required by some services. These certificates are a bit tricky though to design, because they do need to be secure and they need to be a bit more dynamic than your usual certificate because of the way SINP documents are queried.

A second problem we encountered during the meeting was how to be able to trust your SINP server. I (and other tech savvy people) can set up their own SINP server, which we can fully trust because we set it up ourselves. Not so tech savvy people can’t — they need to rely on existing SINP servers. The problem is whether we can trust those servers with our secrets.

Cees (if I recall correctly) coined the idea that some of your secrets are already on the internet. If you’ve got a VISA creditcard number, then VISA obviously has that creditcard number, and you trust them with it. What if VISA would store the part of your SINP identity document with your creditcardnumber on its own SINP Server?

Basically I go to a big SINP provider (which I don’t trust), I create a SINP identity and put in my SINP document that you can find my creditcard number under the SINP identity bas@visa.com. This act of redirecting clients to other SINP identities is called a SINP Redirect. SINP Redirects could also proof very usefull when you change your SINP server. The only thing you’d have to do is to set up a SINP redirect in your old identity document to your new identitiy document.

Both SINP Certificates and SINP Redirects will require a lot of though to implement cleanly and simple, which is tricky.

Any thoughts would be welcome.

1: Actually, this certificates aren’t new, Bram came up with the idea quite a while ago.

Typed Yields: Non fatal exceptions

Wouldn’t it be nice to have:

begin {
&nsbr;LoadTheDatabase("foo.bar");
} rescue (Exception e) {
print "Fatal exception happened: ", e
} on (Warning w) {
print "Database Warning: ", w
} on (Message m) {
print "Database Message: ", m
}

The rescue (Exception e) should be familiar with everyone — something failed, maybe the database file was corrupted very bad, and raised an exception and the rescue block will be executed.

But what if the database has a small error, or something is only a little bit out of place. You wouldn’t want to just ignore it, but warn about it. Usually one would implement a ‘Logger’ class to which a function can log certain events, but that is ugly and inconvenient.

Enter non fatal exceptions. Basically there would be two ways to raise an exception, fatal like we all know it, and non fatal. When the on block for a non fatal exception has been executed, control will be returned to the function in which the raise was called.

This is done in about the same ways as a lot of languages implement yield. But this time the handling code depends on the type of the yielded object.

As far as Kaja and I concern this will be a feature of Paradox.

Thanks go to Bram for the idea.

sinp.rb

irb> require 'sinp'
irb> c = SINP::Client.new nil, nil, [:http]
irb> c.getPublicDocument('Kristy@w-nz.com').write
<requested version='2'>
<sinp-id>
<name><nick>kristy</nick></name>
<address type='email'>kbuiter@hotmail.com</address>
<uri>hotmail.com</uri>
</sinp-id>
</requested>

As you can see, I’ve almost finished the implementation of a Ruby SINP client — I only got to finish SINP Negotiation.

SINP

SINP is a protocol based on HTTP(S) and XML that provides you with an identity on the web. You register a so called SINP Identity on a SINP Server of your choice. To address a certain identity, we use an email like notation: bas@w-nz.com is the SINP Identity of the user bas on the SINP Server w-nz.com.

The first big feature of SINP is authentication. If someone claims to be bram@w-nz.com, I can check that by asking w-nz.com to check it. I’ll redirect that guy to w-nz.com to let him be checked by his proclaimed SINP Server. If he really is bram@w-nz.com, he’ll have a nice session cookie for w-nz.com and w-nz.com will check that. After that w-nz.com will redirect him back and I’ll ask w-nz.com whether he succeeded.

One major application of this authentication is that someone who posts a blog comment as noud@w-nz.com, really is/are the same guy(s) that posted before as noud@w-nz.com, for they are allowed by w-nz.com.

The second big feature of SINP is that each identity comes bundled with a XML document, which can store information about the owner like his name, email address, date of birth, etc. The SINP Server stores this document. The identity owner, the guy who owns the identity, can pick an access policy for each little bit of information in this document. You might want to share your real address only to those who you’ve explicitly allowed. Everyone can see the parts of your document you’ve allowed everyone access to. This is the one for bas@w-nz.com.

To get specific parts instead of the whole thing, or get to stuff you’ve limited access to, one needs to use SINP Negotiation. To get some specific information from noud@w-nz.com, I ask w-nz.com for this specific information, in the form of a few xpaths. Along with the xpaths I can send my own SINP address, bas@w-nz.com. The server will respond on each request according to the access policy which the SINP Server has set. There are several possibilities:

  • Ok, the requested information will be included in the response.
  • Nope, you’re denied access to that.
  • Not found, that stuff isn’t in this document.
  • You’ll have to ask Noud. Basically you’ll have to redirect Noud to the server, where he will be authenticated and after that he can decide whether to allow you access to it, and you can try again lateron.
  • If you’re bas@w-nz.com, you can see it. You’ll have to authenticate, though. This is done via sinp authentication as described before.

Another big feature of SINP is versioning, which allows caching. The version of a specific bit of information is send back on each response in negotiation. In a negotation request, I can specify the current version I already have. In case that specific part of the document hasn’t updated, the SINP server will let me know, instead of sending the whole thing.

One advantage of caching and negotation is that information can be kept synchronized with your document when it updates. A blog, on which you’ve posted a comment, might periodically check whether the information it retreived from your SINP document has changed. This can be done cheaply with negotiation and versioning.

SINP is easy to implement, it is quite simple. It also is portable, it uses widely supported technologies like XML and HTTP(S) as its base.

SINP is under development, but you can already (and really should) take a look to:

SINP is based on things I’ve seen floating around on the web, for instance Zef`s SPTP.

At the moment of writing we’re developing a PHP client, a Python client and continueing development of the PHP server. You’re welcome to participate!

We hope you like it, comments or any other forms of participation would be very welcome.

Bas, Bram and Noud.

On IE7

I’ve had to test some website I designed on IE (VMWare offcourse) and it went really bad. IE 6 sucks, that’s for sure. It was a blow in the face for me, because everything went so terribly smoothly, escpecially the CSS.

I was wondering whether it would be as bad in IE 7 as in 6, so I installed it (Genuine Advantage doesn’t work by the way).

I was amazed by the amount of improvement they made.

The website was a mess in IE6: backgrounds wouldn’t show up; no attribute selectors on CSS; blocks were positioned in absolute ridiculous locations. Whereas in IE7 it showed up pretty nicely compared to 6. It wasn’t perfect though, there were quite a few glitches in positioning (go IE box model!) and font sizes, but that’s managable.

I wonder whether IE7 took so long and why it still isn’t as bug-less and feature-complete as almost every other big browser. If microsoft wanted to make a perfect browser, they could do it easily! They’ve got more than enough very talented people and even more than enough money to hire/bribe any expert they’d want.

Maybe they’re afraid of big AJAX applications. I bet gmail took a pretty big market share of outlook. At least, that is what other people are thinking. They may be right, but google wasn’t stopped by the bugs in IE, google doesn’t use nice and semantically correct HTML (and it is the correct stuff that usually doesn’t work on IE), they just use the stuff that works anyway.

More likely they don’t care. They force website devs already to create sites that show up properly in IE, just because of its market share. Only website creators feel the pain of the buggy IE engine, not the users of the browser. Besides that it could be a pain for them to create a whole new IE, which needs to be backwards compatible with the old IE. A lot of people would be angry if their site doesn’t show up properly anymore in the new IE — and these people can’t be persuaded with the nice semantics ideology.

A little bit more pressure by alternative browsers could force microsoft to release something more profound than IE7. Although IE7 is an improvement, they could have done better.

DDOS on Hash Tables (Self Balancing Hash Tables)

Hash Tables are widely used in server software. A malicious user can easily forge keys in the communication with the server that will result in hashes from the keys so that they will end up in the same bucket. This will force the hashtable to grow each time this bucket is filled.

In a worst case implementation, this will result in an enormous amount of empty buckets and only one full bucket. Even if the implementation is smart and will therefore only will grow the targetted bucket, the hash table will fail its use. The hash table is meant to make lookups for a key fast by distributing the pairs in buckets, this effect is gone when all pairs are located in one bucket, which will cause linear time look-ups.

One malicious user won’t be able to crash a server, but a DDOS attack with several users would be tremendously more efficient when targeting this weakness in hash tables.

A simple solution would be to randomly pick a salt for the hash function on each new instance of a hash table. Basically we prefix each key with a random salt before we hash it. The malicious user might know the hash function, but he has to guess the salt, which neglects the effect.

Additionally logic could be added to detect an unbalanced hash table and switch the salt on when the table is unbalanced.

This could also be usefull to balance hash tables when there is no malicious user, but the keys are possibly unfortunate.

sudo is evil

sudo, the *nix tool with which you can easily execute commands as a super user is harmful.

The idea behind sudo is good. You are able to execute super user commands — offcourse when the super user has allowed you — without having to log into the super user.

The problem though is that the default configuration of sudo only asks for the password of the current user, when he first uses it. That means that if the security is breached of the user, which could happen, it can simply use sudo to gain access to root, for it is very likely that users who have enabled sudo, use it a lot.

My recommendation: don’t use sudo, if you haven’t explicitly configured it to require your password every single time you use it — use su instead.

update: If you want to force sudo to ask your password every time you use it, add this in your /etc/sudoers:

Defaults timestamp_timeout=0

They could’ve used a more sensible name for that in my opinion. I had to read the manual carefully before I found out it was this entry.

First SINP draft

SINP is a set of protocols to transfer a profile/identity; to authenticate owners of identities and negotiate for restricted information in protocols. It’s designed to be simple, being based on HTTPS and XML.

You can find the first draft here.

Subversion repository: https://cvs.codeyard.net/svn/webid

Acknoledgements: it’s loosely based on other stuff that has been floating around the web, like Zef’s SPTP.

Comments would be appreciated.

Update: Photo’s are in of the presentation we’ve given about SINP last wednesday:

http://www.codeyard.net/fotos/capaward-1.php
Our presentation has got several penguin mascottes.

IThreads

While working on paradox, Kaja and I felt that we needed a cross-platform threading library, which handles all the subtle differences in capabilities and semantics of each platform`s API.

After a rewrite of Kaja`s original code, I present to you ithreads. You can find the source in this subversion repository.

svn://w-nz.com/px

At this moment ithreads gives you implementations of:

Mutex

  • Event (aka. conditional variable)
  • Semaphore
  • R/w lock
  • Barrier
  • Gate
  • Thread

By using (one or more of) the following APIs:

  • pthreads, posix thread API (unix, linux …)
  • pth, portable user-space threads
  • futex, linux`s lightweight fast hybrid locking

ithreads will use generic implementations in case one of the primitives (eg. Semaphore for pthreads) isn’t available on your system.

I’m espacially happy with the futex based implementations. They outperform the pthreads implementation, and are a lot smaller (4 vs. ~16 bytes for a mutex in pthreads) – although they aren’t as robust (but still correct).
Support for BeOS, OS/2, Windows and more will be added, but testing them will be tricky.

The library will still need a lot of stress testing and probably still contains some bugs – you’re warned. When everything works nicely I’ll add all kinds of extra flavours of the primitives, eg. read-favouring r/w-lock.

I hope it will prove useful.

Thread synchronization mechanisms (Locks)

To share it with you and making sure I won’t forget, I present you with (some of) the different thread synchronization mechanisms known to mankind. Most of the mechanisms are used to lock operations or data, some aren’t. You know, those nifty little things that ought to help you with thread hell.

Semaphore
The semaphore is the cournerstone of locking – often other locks are implemented using one or more semaphores. A semaphore is a simple integer. The initial value of this integer is the amount of threads that can attain (aka. lock, down, enter, pass …) the semaphore before it will block incoming threads – a blocked thread will have to wait until a thread that has attained the semaphore releases (aka. unlocks, ups, leaves …) the semaphore.

The implementation of a semaphore is an atomic integer on most systems. When a thread wants to attain the semaphore it will check whether the value is greater than 0. If it is it will decrement the value and the semaphore has been attained. If the value is 0 it means that there are already as much threads that have attained the lock as the semaphore allows – and the thread will wait a while and try again. If a thread releases the semaphore it will increment the value.

You can compare a semaphore with a dining table in a mess hall at which only a certain amount of people can dine. People will either have to wait for someone to finish and stand up, or they’ll have to look for some other place.

Mutex
The mutual exclusion lock is the lock you’ll be most familiar with and is often just referred to as a lock, because of its popularity. A mutex is either locked or unlocked. Only one thread can hold the lock – all others will have to wait for that other thread to release the mutex. The mutex is often used to assure mutual exclusive access to a certain operation or piece of data.

A mutex is simply just implemented as a semphore with a starting value of 1. The first thread that would attain the mutex would decrement the value to 0 and block all other incoming threads – until it releases the lock.

You can compare a mutex with the door lock on your restroom, it will help you make sure that you got some privacy.

Event
An event will block all incoming threads, until it is signalled to release a thread or all of them at the same time. Events are quite useful when dealing with situations when you want to queue threads before they are allowed to pass. Events aren’t locks that you lock and after that have to unlock. A thread just waits for an event, and doesn’t have to ‘unlock’ it afterwards.

An event is usually also implemented using a semaphore. This time the default value of the semaphore would be 0. If a thread wants to pass the event it’ll try to decrement the semaphore`s value, unless it is 0 in which case it’ll wait, just as you would normally lock a semaphore. If the event is signalled to release a thread it will increment the semaphore`s value – one of the waiting threads will notice that the value isn’t 0 anymore, decrement it to 0 again and pass. If the event is signalled to release all waiting threads it’ll keep on incrementing the value until all threads have passed. Usually there is a mutex alongside the semaphore to make sure that a thread can’t slip in, while all threads are being released, and be released to soon.

You can compare an event with a (chaotic) queue in front of a fun park attraction. People have to wait until the operator of the attraction tells one of them to continue, or tells all of them to continue at once. (the attraction could be nearly empty with enough room for all of them).

Gate
A gate is a flavour of an event. It blocks incoming threads just like an event, and lets all of them through if signalled. The difference it that it won’t close again – once signalled (‘opened’) it’ll allow all threads through until it is explicitly closed. It also can’t let threads through one by one.

A gate is usually implemented using a semaphore, with the initial value of 0 so it will block incoming threads. Each incoming thread tries to attain the semaphore as is done too with the event – but when a semaphore lock is attained it’ll directly release the semaphore so it’ll be available for another thread waiting to pass too.

You can compare a gate with a … gate. Or a draw bridge if you like. If it’s closed none can pass. If it’s opened everyone can pass. Very straightforward.

Barrier
A barrier blocks all incoming thread until there are a certain number of threads. In that case the barrier will release all those threads, but will remain to block incoming threads (until they have reached the specified count together too, etc). A barrier is useful to make sure that a set of threads all start something at the same time and maybe even more important aren’t doing what they did before.

A barrier is usually implemented as a custom event, that will have a waiter-count-field (and a mutex to synchronize access to it), which will auto release all threads when the waiter count has reached the specified number.

You can compare a barrier with watter drops from a tap. The drop won’t fall until there is a specific amount of water that causes it to fall.

So far for the thread synchronization mechanisms today. Tomorrow I’ll maybe talk about the hassles of implementing them yourself.

Some credit goes to Kaja, who told me about all these locks in the first place :-).

Gentoo linux 2006.0 installer

Today Gentoo released a live-cd with a gentoo-installer on it. They also posted some nice screenshots.

Gentoo Linux is a linux redistribution that features its home-made package manager Portage. This package manager makes Gentoo special. There are a lot of other package managers, but Portage is one of the only package managers that compiles each package on your computer, instead of installing precompiled binary’s.

Compiling everything yourself has got the advantage that you can tweak your package to your specific needs. This means you can have a nice customized system. But the key advantage is performance. Most binaries that package managers install are compiled for generic x86. That means that it can even run on your old 486. The issue is that a lot has changed since the 486. A few new instruction sets (MMX, SSE, etc), specifically designed to increase performance for common tasks, are present in almost every processor which aren’t used by generic x86 compiled binaries. And even more performance advantages1.

For this reason some people already did this (see Linux From Scratch). The problem is that it takes a lot of time to configure, build, install and trouble shoot at least 200 packages. If you take into account that your average server and desktop have at least 15 updated packages per week you are looking at a huge amount of dedication and time required to keep your own Linux From Scratch up to date.

This is where Gentoo and Portage comes in. If you want apache, you simply type emerge apache. If you want to update everything, you simply type emerge -u world. Portage checks which packages are required, it downloads them, it patches them, compiles them and installs them for you.

But wait? Maybe I don’t want ssl build into apache. How do I do that? Fairly simple actually. Portage has got a system called use-flags. There is a useflag called ssl. You can put ~ssl in the USE entry in /etc/make.conf to tell to portage that you don’t want ssl on any package. Or you could put net-www/apache ~ssl in /etc/portage/package.use, which tells that you want ssl to be disabled for apache.

Off course Portage isn’t perfect, and you will have some trouble once in a while, but it’s better than the headaches caused by LFS and the reward is similar2.

— A happy Gentoo user.

1When compiling for generic x86 a lot of memory is aligned which causes a lot of overhead for processors that don’t really care a lot about alignment. Not only are there these nice new instruction sets, but each processor has got specific differences in the implementation of the instruction. On some processors instruction X may be faster than instruction Y. Also some processor information in tight loops can be useful. On one processor the loop may fit in L1-cache, on others it wouldn’t.

2You can offcourse do more with LFS. But usually it isn’t that important, and if it were it would be better to add it into Portage yourself than to switch to LFS for that reason.

Virtualization isn’t hardware efficient

Virtualization is the thing right now. Escpacially because VMWare released VMServer for free.

I’ve read quite a lot of posts about the pro’s about virtualization, and I`m annoyed by some.

This for some claim that virtualization is hardware efficient. Quite frankly, it isn’t. It is quite inefficient.

The thing though is that you don’t use your server optimally usually, and that with virtualization you can more easily get most out of your server, although you could get more from your server if you wouldn’t use virtualization. But that is a lot trickier, virtualization is quite flexible.

Virtualization doesn’t make hardware more efficient, it makes you more efficient at allocating resources of your server.

This server, w-nz.com, runs on a virtual server. I can’t affort or would fully use a ful dedicated server, but I can affort and use a virtual server. Efficient. But running all sites on the real server on just one server would be more efficient, but harder to do, espacially related to security.

Virtual locality of reference

Quite a lot of programmers consider the advantages of locality of reference; if they put all the data they work with in about the same space the processor will be able to retreive them more easily, for the processor caches regions of memory in the Lx caches.

Quite a lot of memory managers even move objects for you in the same place which seem to depend on each other. This doesn’t only decrease fragmentation, for they would be freed in the same timespan, but also increase performance due to locality of reference.

Although the advantage of locality of reference isn’t as big as most people presume in some situations due to virtual memory.

The address space a process sees is a virtual address space, it is mapped in place by the OS. This means that 0x1234 can contain an image for one process and a normal integer for another process. This enables two processes to be loaded, virtually, in the same place.

One implication this has on locality of reference is that a seamingly contiguous piece of memory can in physically be distributed over quite a big span of real physical memory. One part of your list could be at 0x1300, the other at 0x30240.

If your program wants to enumerate through that list it`ll have to query the RAM for the second part, which it wouldn’t have to if it would be contiguous in the physical memory, for then it would most likely be in complete in the L2 cache.

The physical memory is mapped page by page into the virtual memory space. Usually pages are 2KB. Each page is always contiguous in physical memory for it can’t be split up by mapping. Therefore locality of reference always works within a page.

It doesn’t mean that if you are working with more than a page of memory it will be all fragmented. The OS usually does a good job at keeping pages contiguous in physical memory when it is contiguous in virtual memory.

Although when a computer is under some load and it’ll have to use each bit of memory and swap in and out memory it will likely be a lot more fragmented.

I don’t want people to be ignorant about the advantage of locality of reference when dealing with more than a page of memory. I want people that trade of literally everything for their holy locality of reference to see that it’s just a nice extra, but that it just doesn’t always work due to fragmentation of the physical memory.

Side notes
Virtual memory is a bit more complicated than I explained, read about it on wikipedia.
Processors will try to keep virtual memory address space in mind to avoid this issue, although it’s limited.

Rainbow Tables: Coverage

A rainbow table is generated by creating (m) chains using randomly picked starting keys. The reduction functions result (or ought to result at least) in evenly distributed new keys. Their is only a probability that a certain key/cipher pair is in the table.

I’ll gradually explain how the coverage can be calculated.

Take m chains, each 1 long to cover a keyspace of N keys. The chance that you can find the key is:

[tex]\displaystyle P = \frac{m}{N}[/tex]

When we have 1/2N chains, we’ll subsequently have a P of 1/2, a 50% coverage. Obvious.

When each chain would be 2 in length it gets trickier.

[tex]P = \frac{m}{N}[/tex] still is the chance that we can find the key in the first column (first item of each chain of the table) of the table. The second column isn’t as straight forward. We’ve blindly assumed that each key in the first column is unique, which is true for the first column, but certainly isn’t for the second column. A few chains can merge. The second column is ‘random’.

The chance that a certain key is in a specific chain of the second column is [tex]\frac{1}{N}[/tex]. The chance that a certain key is in the second column is [tex]1 – \left(1 – \frac{1}{N} \right)^m[/tex] (One minus the chance that there isn’t a certain key in a certain chain multiplied by itself the-amount-of-chains times).

The amount of keys covered by this column is the chance that a certain key is in the column times the amount of keys: [tex]N \left(1 – \left(1 – \frac{1}{N} \right)^m \right)[/tex].

The chance a certain key is in any of the two columns would be:

[tex]\left( \frac{m}{N} \right) \cdot N \left(1 – \left(1 – \frac{1}{N} \right)^m \right) \over N[/tex]

The third column’s coverage can be calculated by taking the unique keys in the second column as m was taken for the amount of unique keys for the first column. With each chain t in length this formula applies:

[tex]\displaystyle m_1=m \\ m_{n+1}=N \left( 1 – \left( 1 – \frac {1}{N} \right) \right) \\ P=1-\prod^m_{i=1}1-\frac{m_i}{N}[/tex]

Rainbow Tables: Introduction

A rainbow table is a data structure with which it is possible to retreive the key used to generate a ciphertext with a fixed plaintext. One common application is to reverse (password) hashes; which can be seen as cipher functions who take the message as the key and the ciphertext as the generated hash.

One important thing to note is that all keys and ciphertext pairs stored in the table have been precomputed during generation of the table. Generating a table that stores 1 miljon md5 password hashes, requires all those passwords to be hashed during precompution.

I’ve been doing some research about rainbow tables and I think it will be usefull to share some of my experiences.

But before I can dive into the matter I first got to explain how rainbow tables work. (or you can read the ‘official’ paper about rainbow tables by Philippe Oechslin).

Precomputing a significant amount of key/ciphertext (password/password hash) pairs is fairly easy. The issue is that a significant amount is fairly large, 237 for alphanumerical windows password hashes to give a real-life example. Because it isn’t practical to store 237 pairs of key/ciphertext’s (~2TB) on your computer; rainbow tables store chains of key/ciphertext’s pairs, more precisely they store only the starting and ending key of a chain.

Each chain starts with a key (randomly generated; in our example a password). The next item in the chain is the ciphertext which is calculated by the cipher (in our example the hash for the password). The next item is a key (password) again, which was generated from the ciphertext (hash) by a reduction function. This reduction function can be almost everything, it can be a XOR hash, it can be CRC32, as long as it changes the (larger) cipher text into a new key.

k1 — cipher –> C1 — reduction function 1–> k2 — cipher –> C2 — reduction function 2 –>…

Each chain has got a fixed length (t). In the table only the first and the last key are stored (k1 and kt).

To look up a certain ciphertext, the reduction function is applied on the cipher text and all the end-keys of all chains are checked whether they match. If one of them matches you can regenerate the chain from the begin key stored allong the end key of the chain, and you’ll find the key that when applied the reduction and the cipher on will result in your ciphertext (which is the last ciphertext in the chain). When no end key matches, the cipher and reduction functions are applied to generate a new key, and all the end key’s are checked again. If one of the end key’s matches you know that your ciphertext is in the chain in the second-last position. Repeat this t times for each position in the chain.

It sounds straight forward, but there are some issues that appear:

  • Loops, there are some cases where the reduction function generates a key which when being applied the cipher function will become the first plain text. (K = R(S(K)), where R is the reduction function, and S is the cipher). This will cause a the rest of the chain to be useless.
  • Merging chains/False alarms, because the ciphertext is larger than the key there will likely be a lot of chains with the same end point, this reduces the total coverage of the table and will cause a lot of additional time spent on the ‘false alarms’ caused by the fact that you have to check each matching endpoint, even if the starting key is the wrong branch.

To avoid loops; avoid merging chains a bit more and to increase lookup speed, rainbow tables use rainbow chains: each column in the table uses a different reduction function.

Because we use a different reduction function for each ciphertext in the chain there can’t be loops, or at least not loops that cause redundant data, because a certain ciphertext will become a different key in a different column of the chain.

The amount of merging chains is also cut by t times, because a certain key in a chain will only lead to the same chain when it is in the same place of the chain.

This is a very superficial explanation, read more (when you’re interested) in the original paper.

The problem with rainbow tables

Rainbow tables (Making a Faster Cryptanalytic Time-Memory Trade-Off) work fairly well, although – because they use different regression functions in the chains – they contain a lot of redundant data.

The example table in the paper (lanmanager hash; 8 char. alpha; 4666×38223872) contains a lot of redundant data, which can be calculated:

>>> CalcRedundancy(4666, 38223872, CalcN(26, 1, 7))
21.50866258033486

The 300mb table would be stored in a 20mb table if the regression function was to be perfect.

The problem is that to make a regression function perfect, it should be ‘in paralel’ with the cipher function, which isn’t an option for most of these are engineered to prevent this.

Adding variable length chains back into the rainbow chains could allow exploiting existing chains more efficiently.

This probably sounds messy and to be honoust this idea is still a bit messy.

I`ll write something decent along the way.

Code Golf 3: Googler

Code Golf. As with the real sport the goal is to reduce the number of strokes that it takes to complete a particular objective, although with Code Golf “strokes” refers to keystrokes rather than swings of a golf club.

Write code to return the url’s of the first page of search results for the query ‘perl’ on google.com

There’ll be two categories, one for using perl’s Socket module only, and for those using external libraries.

Good luck!

Code Golf 2: Line Sort

Code Golf. As with the real sport the goal is to reduce the number of strokes that it takes to complete a particular objective, although with Code Golf “strokes” refers to keystrokes rather than swings of a golf club.

Create an algorithm that reads lines from the stdin until it hits an EOF and prints those lines sorted to the stdout

  • Don’t use the language’s native sorting capabilities, write your own sorting algorithm. (print for sort<> would be too easy.)
  • Any language is allowed, except for those that myseriously implement a 1 byte command that does exactly this.

Example implementation using inverted bubble sort:

@a = <>;
for ($i = 0; $i <= $#a; $i++) {
 for ($j = $i; $j <= $#a; $j++) {
  if (@a[$i] ge @a[$j]) {
   $t = @a[$j];
   @a[$j] = @a[$i];
   @a[$i] = $t;
  }
 }
}
foreach(@a) {
 print;
}

This implementation is very large and inefficient, and just an example.

Good luck!

Code Golf 1: Fibonacci

Code Golf. As with the real sport the goal is to reduce the number of strokes that it takes to complete a particular objective, although with Code Golf “strokes” refers to keystrokes rather than swings of a golf club.

You may compete using any language.1

Create an algorithm that prints the 30 first fibonacci number to the screen, each followed by a newline.

An algorithm which could do this would be (148 bytes):

my $a = 1;
my $b = 0;
my $c;

my $limit = 30;
my $i = 0;

while ($i < $limit) {
        $i += 1;
        $c = $a + $b;
        $a = $b;
        $b = $c;
        print $c . "\n";
}

But that can be written way shorter.

My first:

$a=1;for((0..29)){$c=$a+$b;$a=$b;$b=$c;print"$c\n"}

My second:
$a=1;$c=$a+$b,$a=$b,$b=$c,print"$c\n"for 0..29

Noud’s reponse:
$i=1;print$i+=$a,"\n",$a+=$i,"\n"while$i<317811

My revenge:
$a=1;$c=$a+$b,$a=$b,$b=$c,print"$c\n"for 0..29

Noud was 30 seconds later with:
$a=1;print$a+=$b,"\n",$b+=$a,"\n"for 1..15

My latest one: partially by Bram too
print$a+=$b,"\n",$b+=$a,"\n"for$a++..14

Update:Twan:
print$a+=$b,$/,$b+=$a,$/for$a++..14

1. a language that implements fib returning mysteriously the first 30 fibonacci numbers isn’t allowed.