Syslets and other untrusted kernelspace programming

In the wake of creating true async. IO in the linux kernel Ingo worked on so called syslets. A syslet is nothing more than one async system call to chain together a few arbitrary system calls. An example syslet would be:

1. Read from file descriptor 123
2. If 1 returned with error, break
3. Write which was read to file descriptor 321
4. If 3 succeeded jump to 1 otherwise break

This is represented by a few struct’s linked together of which the first is passed on to the kernel with the syslet system call. The systemcall returns practically directly and eventually the application can be notified (or can wait) on the syslet to complete. This could safe an enormous amount of system calls. This means way less context switches, which is very good for performance.

Syslets dawn, in a very primitive manner, kernelspace scripting. But hey, wasn’t kernelside scripting the thing that all linux dev’s dreaded? Wasn’t it Linus who joked that he would be in a mental institute whilst releasing Linux 3 with a VB message pump? Yes, they’re afraid of putting untrusted programs/scripts in kernelspace and they’ll barely acknowledge that syslets is the first step.

The problem with the current full-featured scripting languages is that they are, well, full-features gone wrong: they’re bloated and not really secure. In kernelspace you can’t allow any memory except for the scripts’ own to be accessed, not to mention the restrictions on resources and virtual memory won’t help you there. Most scripting languages weren’t developed with these restrictions in mind. Most languages have got evil functions in dark corners of the standard library that will allow you to do really evil stuff with memory.

As far as I know only .net (thus mono) have got a quite rigorous trust framework build in. .Net is bloated and proprietary and Mono is still (and probably will never be) feature complete though still being very bloated.

A very simple safe language is what we need, of which a compiler daemon is running as a system service, with which untrusted userspace programs can have scripts running in the kernel. I’m tempted to use one of the brainf*ck JIT’s, they’re small enough to thoroughly review :).

A kernelspace interpreter would do to, though, as a PoC.

Linux 2.6.21-bw-r1

There isn’t a tree that contains reiser4, suspend2 and the gentoo patches — so I created one. This revision adds the -stable patches, new gentoo patches and some powertop patches.

One big patch: bw-r1-for-2.6.21.diff.bz2
Patches broken out: bw-r1-for-2.6.21.tar.bz2

To apply the one big patch, use: bzcat bw-r1-for-2.6.21.diff.bz2 | patch -p1 inside a vanilla 2.6.21.

Upgrading wordpress with git

I didn’t like upgrading wodpress much. Everytime I did it, I needed to re-apply all my little tweaks to the new wordpress. It took too much time.

I tried to diff -uNr on the current version I was running and the newer version and then applying the resulting diff to the current version, but it seems wordpress has been backporting changes so I got conflicts, quite a lot of them.

Because I was quite tired of porting my changes, I’ve tried git, the Source Code Managment tool used by the linux kernel, to do it for me:

I did this all in the parent directory of the root of blog.w-nz.com. This folder contains:

  • htdocs current installation (2.1.2)
  • 2.1.2 the unmodified wordpress
  • 2.2.0 the new wordpress I want to upgrade to

First, I created an empty git repository:

mkdir git; cd git; git init-db; cd ..

Then I copied over the unmodified version of wordpress I was running, and commited them:

cp 2.1.2/* git -R
cd git
git add *
git commit -a -s
cd ..

Then I copied over my current installation:

cp htdocs/* git -R
git status # lets see what changed

There are lots of files like uploads I want git to ignore, so I edit .gitignore to make git ignore them. There weren’t any files I added though, otherwise I’d had to run git add to let git know.

And let commit my changes:

git commit -a -s

Now, lets go back to the original commit — the clean 2.1.2 wordpress — and start a branch from there:

git checkout HEAD^ # HEAD^ means parent commit of HEAD: the previous commit
git checkout -b tmp # create a new branch tmp from here

Now I’m in a branch without my own changes, which was forked from the master branch. Lets apply the new wordpress on this branch:

cd ..
cp 2.2.0/* git -R
cd git
git status # see what changed

git-status showed me that there are a few new files in wordpress 2.2.0, I git-add-ed all of these new files. And then committed it all:

git commit -a -s

Now I’ve got two branches:

  • master which contains wordpress 2.1.2 with my own changes on top as a commit
  • tmp which is forked from the wordpress 2.1.2 from the master branch without my own changes but with the 2.2.0 changes on top

What I want to do is to reapply the 2.2.0 changes on top of my current changes’ commit instead of on top of the 2.1.2 commit. To do this, git has a very powerfull util called git-rebase:

git rebase master

This will search down the tree until the point where the current branch (tmp) forked from the target branch (master). Then it will re-apply all commits in between on the latest commit of the target branch.

Just like if I’d use diff/patch I get a merge conflict. git rebase lets me know this and git status shows me which one are these. The one little difference with the diff/patch approach is, that there are way less merge conflicts (git is smarter) and that the merge conflict are way easier to identify and they’re inline in the original files. Not to mention that when I would have fucked up I’d always have a way back.

After I fixed the merge conflict, I git update-index each conflicted file (to tell git it’s resolved) and git rebase --continue-ed.

Now I’ve got my updated wordpress in the git folder. Then I backuped the current, copied over from git and visited wp-admin/upgrade.php and I’m done :).

By the way: “I didn’t say Subversion doesn’t work. Subversion users are just ugly and stupid.” — Linus on this Google tech talk.

Sidenote, I switched from Hashcash to Akismet. Hashcash didn’t work anymore and Akismet theoretically should be the best solution because it isn’t based on security by obscurity.

The Future of the Internet

One of the prominent people behind the current internet discusses the history (telephony, wire oriented), the current (IP, endpoint oriented) and the future (?, data oriented) at google tech talks.

A short synopsis: the internet is having trouble at the moment for it has been designed in a time the problem was different. In these days most of the data is duplicate data, which is a tremendous waste. Also connecting to the internet (getting an address) (and resulting from that keeping everything in sync) is hard. Van suggests and predicts a data oriented internet. A bit like a secure P2P bittorrent network, but instead of on top of IP on the IP level.

It’s a very interesting talk, worth watching.

linux 2.6.21-bw

There aren’t official reiser4 patches for .20 or .21. There are quite a few branches that contain support for reiser4, but these are highly unstable. (for instance the -mm tree). Also because I wanted to give stacked git a shot, I started my own kernel tree:

One big bzip2-ed diff: bw-for-2.6.21.diff.bz2.
bzip2-ed tar with the separate patches: bw-for-2.6.21.tar.bz2.

This release contains reiser4, suspend2, the gentoo patches and a few patches to get everything working together nicely.

Virtual package for your python application

When you’ve got a big python application, you’ll usually split it up in modules. One big annoyance I’ve had is that a module inside a directory cannot (easily) import a module higher up in the tree. Eg: drawers/gtk.py cannot import state/bla.py.

This is usually solved by making the application a package. This allows for import myapp.drawers.gtk from everywhere inside your application. To make it a package though, you need to add the parent directory in the sys.path list. But unfortunately this also includes all other subdirectories of the parent directory as packages.

However, when the package module (eg: myapp) was already loaded, then the path from which myapp was loaded is used to find the submodules (eg: myapp.drawers.gtk) and sys.path isn’t looked at, at all. So, here is the trick:

import sys
import os.path

p = os.path.dirname(__file__)
sys.path.append(os.path.abspath(p+"/.."))
__import__(os.path.basename(p))
sys.path.pop()

Note that this script doesn’t work when directly executed, because the __file__ attribute is only available when loaded as a module.

Save this script as loader.py in the root of your application. import loader from the main script in your app, and you’ll be able to import modules by myapp.a.module, where myapp is the root directory name of your application.

The Filesystem Failed. Part I: introduction

The Filesystem (I’ll consider the Linux VFS as an example) has failed:

  • Database storage is implemented on top of the Filesystem, because the Filesystem is incapable of serving the needs of relational storage.
  • Metadata is stored inside files in many different formats which can only be guessed by clumsy ‘magic’ in the headers. This forces many media player and desktop search application to duplicate tag information in their own databases. Each of them has only limited support for each of the many different formats.
  • More and more device and service abstractions are moving from the Filesystem to seperate namespaces, because the Filesystem’s API is inadequate. Take for instance oss which used /dev/dsp, whereas alsa uses its own. Many new abstractions don’t even go near the filesystem anymore, for instance kevents, futexes, networking, dbus and hal.
  • Small files are stored in (compressed) packs and archives because the Filesystem can’t handle them. This happens with for instance your mailbox.

The problem comes down to fragmentation of data and metadata in too many namespaces because the Filesystem doesn’t seem to be an adequate one.

In a series of posts I’ll look at the possibilities to create one unified filesystem.

ati-drivers-8.33.6 for Gentoo

This is a slightly adjusted 25.3 ebuild that will give you the 8.33.6 ati-drivers for Gentoo. Yes, it’s dirty. They aren’t in the main tree yet because they are considered broken, although it works just fine for me.

Download: ati-drivers-overlay-8.33.6.tar.bz2

Extract them to an overlay.

Update, the 8.33.6 drivers are in the mainline tree now, so you should use those instead of mine.

(auto)mounting removable media as user

I’ve always been bothered by the fact that you need to be root to mount anything (like an usb stick). It can be solved a bit by setting up udev rules and putting a specific device in /etc/fstab, but that only works for that single usb stick. Pretty annoying.

Googling only gives you stupid and silly solution (like allowing users to mount /dev/sd[a-z] — security risk).

Luckily I’ve recently been pointed to ivman, which is an automounter. It automatically mounts removable media for you in /media.

I looked at the internals of ivman, and noticed that it uses pmount, which is a wrapper around mount which allows users to mount removable media on a /media folder. Great!

Btw, you need to be in the plugdev group to use pmount.

Update It seems that gnome-mount also works fine when you’re in the plugdev group. Gnome-mount does about the same as pmount with the advantage that gnome-mount has got the nice gui integration everywhere in gnome.

Long mySQL keys

Instead of limiting your long key (for instance a path) to approximately 800 characters (on mySQL with generic UTF8), you can hash your key and store the hash as index.

The drawback is that you need to use a quite good hash function to avoid duplicates if you want to use it for a unique or primary key. These good hash functions tend to require some computing time.

md5(microtime())

Don’t use md5(microtime()). You might think it’s more secure than md5(rand()), but it isn’t.

With a decent amount of tries and a method of syncing (like a clock on your website) one can predict the result of microtime() to the millisecond. This only leaves about a 1000 different possible return values for microtime() to be guessed. That isn’t safe.

Just stick with md5(rand()), and if you’re lucky and rand() is backed by /dev/random you won’t even need the md5(). In both cases it will be quite a lot more secure than using microtime().

Simple Branch Prediction Analysis

This paper outlines simple branch prediction analysis attack against the RSA decryption algorithm.

At the core of RSA decryption is a loop over all bits of the secret key number d. When the bit 1 there is other code executed than when the bit is 0. The CPU branches on a different bit.

A spy process can be run on the CPU which measures the branch cache of the CPU by flooding the cache with branches and measuring the time it takes. When the sequentially running secret process doing RSA decryption makes a different branch (1 instead of 0) it can be noticed in a change of execution time on the spy process’s branches.

In this way quite a lot of secret bits can be derived.

There are some clear buts:

  • You must be able to insert a spy process on the computer itself and it should know exactly when the RSA process runs.
  • To attain clear readings, there shouldn’t be other processes claiming too much CPU time.
  • The spy and CPU process should run on the same physical processor and preferably at the same time (dual core)

An easy fix would be to allocate a whole processor for the RSA decryption time, so no process can spy. Another option would be to add noise in the Branch Prediction Buffer, but that would result in a performance loss.

RTFM, where?

Recently a buddy on msn asked me a linux question, he just started linux so he had some problems getting stuff done.

He downloaded an installer, he said, a .run, but he doesn’t know how to execute it. He tried googling for it and asking on forums, but didn’t get an answer, so he asked me.

I solved his problem, but I still wondered, where you can find that you need to put ‘./’ in front of a file in bash to execute it and where can you find that you probably need to chmod +x the file too if you downloaded it from somewhere, if you are a total newcomer to linux.

The bash tutorial would’ve probably solved it, but do you know that that thing in which you are typing actually is a separate program? Probably not.

I basically learned all this trivial stuff while following the gentoo installation manual, but I guess that’s a bit too much to ask from each new linux user. There should be a good linux introduction that explains this trivial stuff somewhere to which I can redirect new users. Anyone knows one?

Good and bad CAPTCHA`s

CAPTCHA’s are images which content needs to be written into a textbox by a user to make sure it’s a human instead of some computer script. This is an example of a good CAPTCHA of yahoo:

yahoo53.jpeg

This is an example of a really bad CAPTCHA:
dotmac18.jpeg

What makes a CAPTCHA good, as in hard to solve by a computer? Lets look how a computer would solve a CAPTCHA, there basically are 3 parts:

  1. Remove rubish background.
  2. Remove rubish lines and partition the image into sections, with in each section a letter.
  3. Recognize the letter with a neural network.

Part 1 is very easy in most cases — just filter everything out that isn’t black and isn’t a glyph-ic curve. It gets a bit more difficult if the font and background colors are random, but usually it’s simple to distinguish between a glyph (small, curve-ish, solid color) and a background (solid, usually gradients). Software is way better in this step than humans.

Part 2 is the most difficult part for software. Distorting fonts isn’t that much of a problem, as long as the software can recognize seperate curve-blobs. The real problem comes in when there are red-hering-curves or when several glyphs are connected with curves like in the yahoo CAPTCHA. When the captcha uses undistorted fixed aligned fonts, it isn’t a problem even if you add glyph connecting curves like in the dotmac CAPTCHA, because you only need to add a little bit of code to recognize an authentic glyph curve (small, thin) and then you can predict the position of the other curves. Humans are better in this step than computers.

Part 3 is a bit tedious for software, but usually easier for specifically trained neural networks than for humans.

How to make a good CAPTCHA:

  • Do not add stupid background or differently coloured polygons, they won’t work at all — they will only confuse the human.
  • Do not use a fixed font, size or alignment. Rotate the font a bit, transform it a bit and, most importantly, place them unpredictably.
  • Add glyph like curves the intersect preferably only two glyphs to make them less recognizable. Take care though that you don’t make them too font like, because that’ll prevent the human from recognizing. These extra intersecting curves make CAPTCHA’s strong, because it prevents proper partitioning.
  • Don’t use strange fonts that might seem hard to see, but are easy recognizable. For instance, dotted fonts are very easy to locate when everything else are solid curves.

Update: nice blogpost on breaking captcha’s: http://www.brains-n-brawn.com/default.aspx?vDir=aicaptcha

The rm -r / typo

Today I accidently made a (yes, very stupid) typo in a root console:

rm -r /

I noticed the typo almost directly, but rm managed to wipe out my /bin and started removing parts of /boot. This situation wasn’t very helpful for the stability of my system, as you might understand.

For the windows user: it’s a bit like deleting half of all executables in the windows folder.

One key difference: when running linux, you can fix it easily. I booted a livecd, mounted my system, copied the /bin from a stage3 tarball to my root partition and rebooted.

And it’s working again! There were some complaints about a libproc version mismatch with the binaries, but that’ll be easily solved by a emerge -e system.

You just got to love linux. (and other nixes for that matter)

SINP: Push versus Pull

SINP is pull based — I give my SINP address to someone, and he will pull the information he wants from my SINP server.

Our competitor SXIP is push based. When I use my SXIP identity I push all information I want to provide to the service — there doesn’t even have to be a SXIP server (‘homesite’).

Push has got certain advantages over pull:

  • Pull is complexer: you need more traffic and more complicated traffic. Push is simpler.
  • You most likely need a seperate server for pull (you need one with SINP at least), this makes you rely on your SINP server. You don’t need a real one for push.

But pull too got advantages:

  • You don’t need to actively give your information. When I’m offline someone can still pull information from my SINP identity.
  • Pull doesn’t require the actual information to go via your computer. If someone requests my creditcard number and I allow it, it won’t be redirected through the computer I’m using, which is safer.

Tilda

Tilda is a drop-down terminal for linux. Press the assigned hotkey and the terminal will dropdown and gains focus, press it once again and it’ll dropout. And even better, the terminal isn’t closed, just hidden.

This is great for development. I write some code, press Ctrl+S to save, Alt+Q for the terminal and make. If there is a bug I can Alt+Q and return to my code, and if I didn’t look closely enough I can press Alt+Q again to see the output again in the terminal.