Good and bad CAPTCHA`s

CAPTCHA’s are images which content needs to be written into a textbox by a user to make sure it’s a human instead of some computer script. This is an example of a good CAPTCHA of yahoo:


This is an example of a really bad CAPTCHA:

What makes a CAPTCHA good, as in hard to solve by a computer? Lets look how a computer would solve a CAPTCHA, there basically are 3 parts:

  1. Remove rubish background.
  2. Remove rubish lines and partition the image into sections, with in each section a letter.
  3. Recognize the letter with a neural network.

Part 1 is very easy in most cases — just filter everything out that isn’t black and isn’t a glyph-ic curve. It gets a bit more difficult if the font and background colors are random, but usually it’s simple to distinguish between a glyph (small, curve-ish, solid color) and a background (solid, usually gradients). Software is way better in this step than humans.

Part 2 is the most difficult part for software. Distorting fonts isn’t that much of a problem, as long as the software can recognize seperate curve-blobs. The real problem comes in when there are red-hering-curves or when several glyphs are connected with curves like in the yahoo CAPTCHA. When the captcha uses undistorted fixed aligned fonts, it isn’t a problem even if you add glyph connecting curves like in the dotmac CAPTCHA, because you only need to add a little bit of code to recognize an authentic glyph curve (small, thin) and then you can predict the position of the other curves. Humans are better in this step than computers.

Part 3 is a bit tedious for software, but usually easier for specifically trained neural networks than for humans.

How to make a good CAPTCHA:

  • Do not add stupid background or differently coloured polygons, they won’t work at all — they will only confuse the human.
  • Do not use a fixed font, size or alignment. Rotate the font a bit, transform it a bit and, most importantly, place them unpredictably.
  • Add glyph like curves the intersect preferably only two glyphs to make them less recognizable. Take care though that you don’t make them too font like, because that’ll prevent the human from recognizing. These extra intersecting curves make CAPTCHA’s strong, because it prevents proper partitioning.
  • Don’t use strange fonts that might seem hard to see, but are easy recognizable. For instance, dotted fonts are very easy to locate when everything else are solid curves.

Update: nice blogpost on breaking captcha’s: