The Linux Paradox

Linux wants to get popular. Everyone using linux wants linux to grow bigger and bigger to ultimately have linux be the operating system dominating. err.. correction.. independant distributions and inheritances of linux to avoid the nasty monopoly stuff. Even your mom should start using it.

Sounds good. Even if linux is in some people`s opinion not the best OS it still has the best community backing it and is the most secure OS. It also has got the most support from the community. If something is required or has to be fixed it`ll happen in notime.

So why doesn’t everyone use it already?

The problem is that linux demands too much from the user.

A linux user should be able to at least have a grip of how a console works. And preferably has to know how to compile application and fix bugs when it won’t compile. And a user should know how to get dependencies it misses. An user should read through the fast different documentation of each application to even begin to hope to get it running easily. And an user should start over every time a new version is released.

Sure, they got RPM’s and yumm which makes life a lot easier. But the yum repositries don’t contain everything. If you want some new stuff, or an application with a certain module installed you have to go dig into configuration files, master the console, read through documentation, compile, and offcourse check your version dependencies, etc.

The average windows user like your mom doesn’t even know how drives work and certainly have never heard about the command line. It just wouldn’t work out. They got a lot of trouble even with just one or two options. When confronted with linux with hundreds of possible options and even stranger errors it just won’t work out well between them.

It certainly is possible to stick to the main applications provided by your redistribution and using the yum repositry. But linux will never run as satisfying as when you get it all compiled yourself and configured yourself. This really makes a big difference! I made linux start up a lot faster by compiling the kernel with only the modules I require with computer specific optimalizations. These little things give linux a really big edge over windows.

Linux is and remains only interested for adventerous ‘power’ users with quite some free time on their hands.

Don’t rely on parsing

Most applications store their settings (espacially in *nix) in text configuration files. These text files need to be parsed every time the application starts.

Parsing is the action of (usualy streaming) dividing a certain piece of data into understandable parts. This usualy comes down to looking at the text file character by character and deciding what should be done. Usualy this is done by maintaining a state which contains the data collected sofar. And with a bit more complicated files this even means having a stack of states and complicated actions when a certain state is left.

The problems:

  • Parsing is slow, very slow
  • Formats required to be parsed contain overhead, a lot of overhead

But it certainly has got advantages:

  • Humans can easily edit it, you don’t need to rely on configuration tools
  • (Usualy) makes a configuration format more extensible by nature (adding one new field in the average programmer’s binary format would break it)

Now, there are attempts to help improve speed. This by standardizing the format, which makes the amount of oddities to expect less, which ultimately makes the whole parsing slightly faster. This at cost of the easiness it can be edited.

A good example would be Xml. Xml is damned ugly. Xml is too strict. And Xml still takes a hell of a lot of time to parse.

Yaml looked like a decent alternative: easy to edit, looks nice. But then I encountered this:

%YAML 1.1
---
!!map {
? !!str "sequence"
: !!seq [
!!str "one", !!str "two"
],
? !!str "mapping"
: !!map {
? !!str "sky" : !!str "blue",
? !!str "sea" : !!str "green",
}
}

Ugly…

So what to use instead?

Use binary configuration files, which are easy to load and save for the application. And create a parser to parse the configuration file and save it to the binary format! In other words: serialize the usefull data from the parsed document and only parse again when it is required.

When you only parse stuff when it has changed by the user than it doesn’t really matter how long it takes to parse. Which can get rid of the really ugly stuff and let us just have a very loose kind of format without the ugly rules and regulations.

‘Objects’ in C – Part 3

In Part 3 of this mini serie I`ll discuss how interfaces / (multi)inheritance like behaviour can be achieved in a language like C as in C++.

In the posts before (part 1, part 2) I discussed the possibility to inherit behaviour of an ‘object’ in C by adding a type pointer which handles initialization and destruction of the object; achieving the possibility to extend an ‘object’ by adding fields at the end and updating the type pointer to a new type.

Any object contains a pointer at the start to its type. The type of an object is passed to the CreateInstance function when an instance needs to be created of the object. The type itself just contains a few fields:

  • Length, the required length of the object, this allows an inheritance to extend the amount of data.
  • Constructor Function Pointer, the Constructor Function Pointer is called after the object is allocated to construct its contents.
  • Destructor Function Pointer, the Destructor Function Pointer is called before the object is freed to destruct its contents.

These fields, along with using function pointers in the instance itself instead of functions, allow an inherited type to expand the existing type. The problem is that an object can’t inherit from more than one object, for both object’s expect to start just after the pointer to the type. To solve this an additional field needs to be added:

  • QueryInterface Function Pointer, the QueryInterface Function Pointer can be used to get an instance of a supported type from the instance called.
  • Top Object Pointer, the Top pointer points to the top object. (An object returned from QueryInterface its Top pointer would point to the object where QueryInterface was called)

nb. The latter one will be added after the type pointer in the instance instead of in the type

A good example would be a type called ICloneable which itself only contains one field: Clone, a functionpointer to a function to clone the object. When you would want to clone an object given you would use:

ICloneable* interface = toClone->top->type->QueryInterface(toClone->top->type, toClone->top, ICloneableType);
Object* clone = interface->Clone(interface);

Lets take a look to the first line:
ICloneable* interface = toClone->top->type->QueryInterface(toClone->top->type, toClone->top, ICloneableType);
This line queries the interface that the object that will be cloned wants you to use to clone it. The top object is used for that is the only way to be certain that you are dealing with the top object and the latest code instead of some base. Using this code an object can be cloned by just providing its interface!

(the QueryInterface takes 3 parameters: first one is the type itself where the function pointer relies, second is the object in question, third is the requested type where ICloneableType is an instance of the ICloneable type).

Object* clone = interface->Clone(interface);
The second line just calls the functionpointer in the interface queried.

To ensure that every bit of behaviour can be overriden, the object queried using ->top->type->QueryInterface on the provided object must be used instead of the provided object itself for the provided object can be a whole different object merely supporting that interface secondarily.

The nice thing about this architecture is that a top object merely wraps its implemented objects and returns them (possibly addapted) when queried for it. This allows way more access to the inherited types than in any other object orientated model.

This model is quite easy to implement in C and (I hope) will be easy to intergrate with for instance objects in a Virtual Machine.

How to GC in C(++)

GC`s, Garbage Collectors, are systems to manage memory. Even the good-old C-runtime library has its GC. free, malloc and the others. Now, these aren’t the best memory managers there are. They cause fragmented memory, and scatter the memory of your application over the whole memory rendering the great processor l1 and l2 cache almost useless.

I won’t talk about how to use the C ‘GC’, but how to implement new-generation alike Garbage Collectors in C(++).

The best GC`s can be found in Intermediate Language Virtual Machines which keep track of every object`s type in the memory and can therefore freely move objects in the memory. The great advantage of this is that object that have survived about as much Garbage Collects tend to link a lot to eachother and use eachother. That group survivors is called a generation. When you put a generation close to eachother in the memory, it will usualy only take 1 RAM memory load for the CPU, caching the whole generation in the CPU cache, for a while which is a lot quicker considering that l2-cache is about a thousand times faster than normal RAM.

The problem in C(++) is that you can’t just move objects. You don’t know what part of the memory of an object is a pointer, and what is just a mere integer. Therefore it is impossible to make a generation based Garbage Collector for you just can’t move stuff.

Allocating a big chunks and putting the objects of the C(++) application however using a custom allocation function will generate some additional performance above the traditional malloc although it still isn’t perfect.

One way to get it to work is to let the GC know what a pointer is and what now. This can be done by letting the first 4 (or 8 bytes in case of a 64bit CPU) be a pointer to a function which returns an array of offsets which resemble the pointers in the type.

Now the GC knows what the pointers are in a structure :-D.

The GC can now move the object by updating the pointers from the other objects to the new location of the object!

The only problem with this is that there can’t be pointers to the object, or inside the object from a not referenced pointer exposing object or a object that doesn’t exposes it pointers at all.

To overcome this I guess it could be feasable to add a flag byte in each object that exposes it pointers which allows it to specify a function that can be called when moving the object or when an object referenced is moved.

I’ve tried out some ways to get this all wrapped up in a nice framework (which is very feasable using my ‘objects in c-framework’ or something similar (easier) in C++ using inheritance).

I`m afraid however that such a GC comes too late for by the time it is reliable the Intermediate Languages would have gained way more ground for these can implement way more optimalizations, including in the GC, during runtime for they tend to use an Ahead of Time and Just in Time compiler.

Feedback on this would be welcome :-).

PHPavascript

Due to the recent hypes around Ajax and web development I’ve been thinking about a more effective method to write web applications running both on the server as on the client by writing code in just one environment and language instead of two: PHPavascript.

Having all kinds of fancy libraries to help you does help a lot; but you still got to manage transfering all data from the client and the server and back by hand; which espacially with javascript is quite tricky to do not only because it isn’t the easiest language to debug but also because every browser tends to do things a bit different with javascript.

It would be nice, I thought, to have a language for developing sererside and client side at the same time:

client TextBox mytextbox = TextBox.FromId("mytextbox");

int server function CalculateSum(first, second) {
  return first + second;
}

void client function ShowSum(first, second) {
  mytextbox.text = CalculateSum(first, second);
}

ShowSum(10,1000);

Basicly you mark a language entity to be either on the client or server side. The compiler would take care of the rest.

This would be really cool if it would be implemented but there would be quite a few issues:

  • All client side functions and variables can be forged by the user; a naive programmer could put too much trust in functions on the client side
  • Synchronizing variables on server and client side and executing functions on server and client side could with a suffisticated algorithm be managed pretty decently although it would still create a lot of overhead when the programmer doesn’t pay attention on the usage of his function/variable location. Having hundreds of variable transfers for one page would be a lot, although very possible when a programmer doesn’t take care or the compiler is too dumb.
  • Language features available on the server side and only available on the client side with dirty hacks could be a bottleneck if the programmer doesnt take enough care. How to implement a mySQL connection for instance (not considering the fact that it wouldn’t be very safe in the first place)
  • etc

Basicly it would be way easier to develop a web application but I don’t know what would be better: having a hard to handle situation as it is now where you are forced to think efficiently, or an easy to use environment as such a language could provide where everything works with the temptingness of not even thinking about what happens beneath the bonnet.

Although there still are these advantages:

  • Vastly reduced development time
  • Highly maintainable code; a compiler could even create seperate javascript files for each specific browser and compile to the desired server side language

Ajax, the hype

It’s new!

It’s cool!

And now it even got a name! Ajax.

(what’s ajax?)

The funny this is that it is quite old and it has been used for a long time already. It just wasn’t hyped before.

Also this phenomenon clearly illustrates the dissatisfaction with the current static Html document standard common to the world wide web. Don’t deny it: html sucks for user interfaces! Html is a format for documents notuser interfaces.

A child of Html could make a nice User Interface definition; main thing would getting rid of awquard javascript. Using a more dynamic Document Object Model, and espacially a consistant one, with some kind of Intermediate Language providing power but still security like Java, or .Net or maybe a new one, which should be hosted by the browser itself instead of a nasty plugin noone has, would be perfect.

Phalanger

Phalanger is a PHP-Compiler for the .net framework. It can be used to run existing PHP applications on asp.net webservers with far greater performance than using PHP itself:

In contrary with PHP itself the bottleneck isn’t the execution of the code itself but the underlying functions for executing the code: phalanger still uses the PHP library for the PHP functions which creates a lot of overhead due to interop and makes the PHP objects not as native to .net as it could. In contrary to PHP you would be best off not using the functions but trying to use the .net ones or your own written.

In my own little benchmark for basic differences between normal .net and PHP, PHP came out to be 5000 times slower. When phalanger finaly compiles into proper .net code by avoiding any PHP library and PHP interop it would be a -lot- faster. When people would start to like it and install mod_mono on their apache webservers to run it they would probably find that they’ll be better of with asp.net with c# or vb.net. They after all got a way cleaner syntax than PHP and are happier working with the .net framework. (I don’t want to know what hacks they used at the phalanger compiler to get include and require working when compiling everything to one dll).

In the mean time microsoft is rubbing its hands

Copy protecting

From software, audio to video are being illegaly copied and everytime the major brands try to implement some kind of protection. They always claim their protection to be perfect, and yet it is always broken, for it is quite simple:

As long as the intended user has the platform on which he`ll run it in his own possession he can always adapt it in someway to extract the data. Even the best video protection can’t beat making a bypass in your monitor to acquire the image on your screen.

Even protecting something like a DVD is almost impossible. The dvd player hardware and software must be able to read what’s on the cd, and a protection must be able to be read to. Also there must be dvd writers to write a protection. Now all major brands can say they’ll put a protection in their DVD burners to protect from writing to the DVD protection section, but then another brand creates their DVD burners that can write to it and everyone will buy those which the big brands won’t let happen. And even if they got the disk itself truly protected someone can emulate the DVD using software or even hardware.

Also there are things that allow itself to be copied, but the original copier can be tracked; this by putting in every video/song/software a unique signature which can be tracked back to the store which then can track it back to the person who copied it. Sounds great, would be impossible to forge when they use strong RSA like cryptography, just one problem, when inserting random trash instead of the signature someone can know the piece is illegal but cannot track someone, hopeless.

The only, and only way, to stop illegal copying is making buying legaly less of an effort than acquiring illegaly. I hope they will relize this sooner or later for honoustly I`m becoming sick of all those ‘magic’ protections.

Enter The Unkown: Algorithm`s A programmer Has Got To Know

I don’t like to link to other blogs or articles for I am just a lame copier, but I’d like to point my select few readers to Enter The Unkown, the weblog of Kaja Fumei, where he will point out some Algorithms a Programmer should know. First one in the list are the Hash Tables.

You don’t need to know the stuff behind these kind of features a language library exposes; but it helps a lot if you know how it works and what the weak and strong points are.

Easy Crossbrowser JavaScript

The major problem when dealing with javascript for me was that javascript acts differently on each different browser. And there are a lot of browsers supporting javascript.
Usualy to get it working it would include having for each sensitive operation a big if block. And in the a bit nicer javascripts that would become a lot. Also it becomes hard to maintain.

So what to do about it?

Actually.. C(++) gave me a possible solution. Use macro`s. It ain’t possible to use macro`s in javascript itself feasable, so what you do is you compile your javascript with macro`s to a different javascript file for each browser. Then using a simple server side script you can let the browser get what it wants.

I’m not a javascript guru, so I hardly know all the problems of each browser.

If a javascript guru does read it, please contact me – it would be great to have such a goodie.

Introducing Paradox

Kaja Fumei and I are currently developing a light-weight rich jit-ed language called Paradox.

The main feature of Paradox will be that it will be very light weight in memory and startup time.

It will be great to use for scripting in other applications which demand high performance like for instance games. Also for normal scripting purposes or normal applications it will be way more suitable than normal interpreted languages.

Basicly it will feature a JIT Compiler, Generation GC, rich (modulair) base library and a being very extensible.

It will probably not perform as good as the .net framework but will rather come in the range of mono`s and java`s performance, which is very high compared to the performance of normal scripts.

When Kaja, who is currently working on the JITc and gGC finished the base I’ll start working on the core library and help optimizing and I will post a preview of its capabilities.

(for those wondering how long it will take:

The Unknown – 森の中に木がない。 says:
reeally long week

)

Beating Spam

Most mail clients now include spam filters, which are learning and improving themselves.

The problem though is that when spam keeps getting smarter your program has got too which still means reporting half of your spam mail as spam and checking your spam mail for your regular mail. Also when you start again after a reinstall and the spam filter has lost its experience you got to start over again.

Now, I guess it would be great to create a centralized independant organization specificly to regognize (and when it becomes successfull also extract the spammers for prosecution) spam.

The problem is how to organize such a centralized system, for when someone receives a spam email it has got to check with the centralized server whether the spam is spam. The amounts of spam are huge and doubling the enourmous bandwidth and cpu spam has costed already isn’t a very pleasant foresight.

It would be feasable however to create a spam regognition service to run on a clients computer which updates itself with the latest definitions once in a while. This would undoubtly be way more efficient.

The only problem that we are left with is how to get such a system to be intergrated with existing applications, if no one uses it it would be rather useless.

When you have some idea`s on this, please share them.

Trying Thunderbird

When I got my windows reinstalled I tought it was time to try something else than Office Outlook (espacially because a big mailbox makes Outlook slow), so I tried out Mozilla Thunderbird.

What I noticed about Thunderbird in comparison with Outlook:

Faster, slicker UI
Although Thunderbird doesn’t look as great as Outlook it certainly has got a nice interface which is customizable enough.
The Interface itself is also a lot faster than Outlook`s, which I find more important than grafical splender.
Also the collapsable headerview above emails and information bars above the email (eg. a bar showing that thunderbird thinks the email or spam or that thunderbird is blocking images) are a nice addition and are worked out better than in Outlook containing them in a limited way.
Also thunderbird seems to interface directly with firefox letting it show html emails a lot faster than Outlook does.

Configuration
In the beginning the configuration was rather confusing. Although this also applied to the configuration with Outlook when I first tried it out. I still think that the account configuration interface could have been a lot more intuitive.

Outgoing mail
Thunderbird requires and actually uses only one smtp server for all your mail accounts. Although you can specify more it doesn’t seem to be able to specify which Smtp server to use to sent the actual mail. This is a shame for espacially when sending important emails it is more trustworthy if the email can be traced back to the smtp server of the original domain. Also some smtp servers don’t allow seemingly spoofed `from` fields.

Extensions
Thunderbird seems to have a very similar extension system as Firefox which I can appreciate. There are some very usefull extensions like Enigmail which is a front end for GPG for Assymetric email encryption. It doesn’t only manage your encrypted emails, it also got a very good interface for all kinds of GPG operations like all kinds of key managment.

Everybody should give thunderbird a try, it’s a great toy :).

Plugins, the web and python

Plugins for web applications seems to be discussed a lot more recently by bloggers (Zef did for instance).

Plugins usualy come in 2 shapes: hacks and modules.

The key difference between a hack and a module is that a hack changes existing behaviour and a module adds behaviour or maybe extends some. In Zef`s example he is talking about possibly adding a calander to google`s gmail – that is definitely a module.

When changing the whole interface by for instance being able to plan appointments inside other module`s (like viewing email) that would require changing those modules and basicly make it a hack.

Modules are easy to make and maintain for they form seperate entitites which don’t often conflict. Most web applications already support module-based plugins like IPB. Some more functionality can be added to modules by letting modules hook onto certain events in the other application, but this has its limits.

Where hacks are also widely used, although these are very hard to maintain and usualy conflict which eachother.

So what is the best way to write a plugin based application?

I was thinking about that when I was working on a computer version of the popular card game Magic. There are already a lot of computer version which let players play the game but the players have to apply the rules themselves.. the computer doesn’t do that for them for there are for every card a lot of exceptions and new rules are added every month. To make such a game plugins would be required.

The way I solved this is by putting everything in the game in modules/plugins. The only thing the game does is saying ‘start’ to all plugins. One plugin responds to that and that may for instance be the UI plugin which shows the main UI form and then sais ‘postgui-start’, where another plugin may respond that extends the UI form for some new features in the game and then calls ‘postseries4gui-start’. A card itself is represented by a big hashtable containing all info of that card dynamicly, including functions. Letting one create-card attack another basicly is just a big dynamicly getting properties and calling functions which all are overloaded for new rules which seems to work pretty fine.

Guess what I used to develop this highly dynamic interaction which eachother?

Python!

Python is the perfect language for plugins.

Now.. why don’t we already got really nice web applications allowing this kind of highly dynamic plugins? Web applications are although using sessions, primarily stateless.. On every page view the whole script is loaded in the memory, and executed again and then unloaded. Nothing keeps persistant. Using plugins generates a lot of overhead on espacially loading stuff for every plugin must be able to hook onto and interact with everything. Doing this with the current model webservers just don’t work.

I already made some suggestions how to solve this at the server side, but changing this would take a lot of time.

So to get back to how to get this done -now- and not later is to keep working with a modulair way exposing the most used hooks and data access.

A suggestion for gmail:

Allow SOAP based access to gmail`s features. Exactly the same amount as access as the user has got.
Then allow adding new menu items and hooks on which a call is made to your specified url, which then can work with gmail on the client-access level using SOAP calls.

Best would be for google to just expose the api of gmail and give a limited functionality dev download so people can make modules in the source and send them to google back. If they like it they’ll use it.. I guess google will come up with something like that once. Or at least I hope it.

selfish driver control panels

A week ago my windows was agonizing slow. Starting up would take ages, and getting those ‘windows is out of virtual memory’ messages was common, so I decided to reinstall windows.
When reinstalled windows was using 120 mb without anything special installed.. so I started to install my usual application: apache, mysql, mssql.net express, php.. etc..
At the point I got those installed windows was using 250 mb instead of 120mb! msSQL server uses 50mb`s (although that number isn’t displayed in the task manager properly (i guess due to modules). MySQL and Apache both 25mb`s…
So I put the apache/sql servers services off by default and made a little bat script to start those when I want and put that one on my desktop. I also made a backup of windows so I didn’t need that horribly slow reinstall..
I installed some basic drivers via the provided driver cd’s loading a lot of junk like tray icons on my computer. They did not appear to use a lot of memory in the taskmanager but when quiting them it saved 100 mb :-/. 100mb used by selfish driver control panels thinking they are the single-most-used application and the only one sucking that amount of memory…
So instead of using those horrible flashy flash autorun installers I let windows find the correct driver .inf files and install it.
This really saves a lot of performance.. (I’ve cut down my startup time by 4 times).

So, better let windows find the drivers you require on your driver cd’s instead of using those selfish flashy traybar spamming driver control panel installers.

Xr12 concept

I’ve been wanting to develop a massive online browser based game for a long time, for I have been addicted to those for quite some time and have been missing a lot of features and user input.

And here it is, I got myself already a few other people to help, and I just finished the server architecture for when it gets big:

Xr12 concept

I just hope I`ll be able to affort the system due to user activity, from experience I’ve seen that these online games tend to grow really fast.