Variable sized floating points

A lot of selfrespecting programming languages have got Integer classes which theoraticly can have an unlimited size by using a dynamic array underneath it.
Python got it, and lots of others too.
But still, they haven’t got a variable sized floating point, which I find odd for it shouldn’t be a big problem to create:

  • Wrap one existing variable sized Integer
  • Add an integer which points to the place where the point will be
  • Add operator overrides to get it working
  • Add an integer which specifies how much numbers after the dot there will be

The later one is quite important to have got. We wouldn’t want 1 /3 causing an infinite loop.

You even could do it a second way:

  • Wrap two existing variable sized Integers.
  • Add overriden operators to get it working

These 2 integers would represent a fraction: a / b.

By storing the fraction in stead of the result you are actually very precise, although some numbers cannot be represented in this way, but can be approached.

Subtext

Subtext

I like subtext, I am just wondering how it would be possible to create a big practical project in it:

  • How would it do async operations? GUI’s; networking… If there is no difference between runtime and execution time this could be very hard to visualize
  • How would this ever perform properly? Compiling would just do but when debugging a big application it would get rather slow. When it would get dynamic, meaning that the code can drag-n-drop itself, it would be virtually impossible to get a compiled version running quick.
  • What about internal arguments? Lets make a static field in a function, and if the language wisely doesn’t support it this behaviour could also be replicated by having a file in which a number is incremented. So now we got a field in that function which number gets incremented on every call, an internal argument not known by the engine. Every function generates the same results with the same arguments but now it doesn’t and for there is no difference between runtime and developtime it would behave unpredictable.

Subtext is rather a way to create a function which is ‘instant’, it is a defenition rather than an operation. Computers can’t just do stuff instantly like read a whole file or execute operations.

I like the idea, but I just see no real practical use.

Torrent sharing p2p network

In my previous post I discussed Exeem. Exeem is (or actually will be for it hasn’t been launched, just announced) a p2p network for sharing, rating and commenting torrents.

What is a torrent? A torrent is a small file which is used for thebittorent p2p file redistribution system to identify a certain file, or files you can download. You first need the torrent for a file/folder before you can download it.

The major problem with this is that it is impossible to use a bittorent client itself to search for the downloads you want, therefore a lot of sites have been created over time which contain huge searchable collections of torrents. One of these sites was suprnova.org, which has recently been terminated due to legal issues.

As I elaborated in my previous post Exeem probably will suck. So someone will need to do stuff right by making an alternative.

What issues would have need to be solved to create such a p2p torrent sharing network?

  • No centralized client list, most p2p networks were terminated because they had a centralised tracker to which a client connected to receive the file list and all the users available for a certain file. Instead of a centralised server every single client should tell other clients who else is in the network and what files are there. When giving every client a buildin list of IP’s it can update these by querying these for better IP’s. By rating an IP by uptime and connection bandwidth a big changing group of frequently online users could provide the other IP’s and port search queries for the rest.
  • Searching, how to handle a search query? At this moment our client is connected to a few big clients who are frequently online in their neighbourhood, lets call them super nodes for now. When we send them a search query they would look in their cache whether they got the result and if not they look in their own torrents to see whether one of those matched the query and if it doesn’t they’ll just forward it to another a-bit-smaller supernode. The problem with this method is that one query could travel a huge amount of nodes and when you are connected with a good bandwidth you are doing nothing more than passing through queries to other nodes. To solve this the query feedback (when they found a result) should contain the source along with the estamated amount of different search queries the node which had the result can provide. By doing this a shortcut can be formed by one client if it finds a node which either has a lot of files searchable or which has an enourmous cache and offcourse along with that is online often and has a neat bandwidth
  • Rating Alongside every torrent you download or expose for upload there would be a meta data file containing a description, rating and comments on the torrent itself. The problem with this system is that descriptions and ratings can change and it is very hard to keep every instance of a torrent on the whole network synchronized. It is possible to send a message through the network to the original node from which you received the file with the new comment message, or you could search for the torrent again by unique id and message the nodes found to have the torrent too. All these methods still include a lot of passing through messages.
  • Privacy
  • Client side ‘hacking’, When everyone would use the default client which automaticly selects super nodes and lets people pass through queries everything will work fine. The big problem is that it is very possible that people would start using illegal client applications which would just leech from the network. To incorpirate methods to get rid of leechers would work when most people are still using the default client, but when people massivly start using illegal clients the network won’t block itself anymore but would certainly get rid off itself for everyone is leeching. This is the major problem that could happen to this p2p network which heavily relies on the fact that everyone should help others whether they like it or not by proxying, caching and passing through various queries to maintain privacy and decentralization.

I’d be rather interested in how exeem will address these issues. I guess they would just outrule client site hacking by incorperating various encrypting tricks in their protocol.

Exeem the hype

Slashdot on exeem

Since suprnova.org has been offline due to it being illegal the main source for torrents has dissapeared and has lead to a hype around the replacement made by the original maintainers of suprnova, exeem.

First, what actually was suprnova.org. It was a site which maintained torrents for legal and illegal files. And it did even more; all torrents were thouroughly checked, commented and rated by an enourmous team of editors, making sure that the torrents on Suprnova.org were the best you could possibly find.

Because Suprnova moderated, commented, rated and checked every single torrent they offered they were without any doubt illegal. If they would have only offered user uploaded torrents with a nice disclaimer that the torrents are the property of their respective owners they would have probably gotten away with it, but they also wouldn’t have got as big as they were.

Exeem basicly works offers the same as Supernova.org, except that it is a p2p application, not a centralised website.

It basicly stores torrents and comments on these on a Peer to Peer network similar to Kazaa, which basicly makes every single user of it just as legal instead of just the main servers as it was the case with Supernova’s servers. It is very hard for authorities to punish every single user of a p2p network. There would have to be a trail for every single user which would never be profitable. Governments have tried to confict the big p2p users for this actually is profitable. The main problem is that there aren’t a lot of really big p2p users, just an incredible amount of small users who combined are even worse than a few big ones.

Exeem sounds great, exeem is an enourmous hype. But I think Exeem will suck:

  • Exeem is p2p, this will most likely cause the rating, comments and moderation on torrents go down a lot and make it less attractive for the user. When having a very secure system so that only a few people can add new torrents to the network you have to have some kind of centralised authority which will be very vulnerable to legal persueds.
  • Exeem will be addware, this will cause a lot of people to drop off. Noone wants to have addware on his computer.

Although I could be mistaking, most hypes like this one tend to turn out really dissapointing.

In my opinion the only way to get a neat new system like exeem which works cool is to get a p2p torrent redistribution network for legal purposes. Bittorent grew big for it was used to redistribute linux redists. Although it will probably will get used for illegal purposes it just would be a very handy system for legal purposes too.

More on this later…

The dynamic of python

Python is dynamic (duh)

Dynamic typing
You don’t need to specify the type of an object anywhere.

def PrintSomething(something):
    print something

Everything is an object
Everything, yes, everything in Python is an object. Even an integer, a method, a class, etc.
Every object has got a type (__class__), which actually can be changed at runtime, which usually results in conflicts about the allocated size for the PyObject.

No typing needed
Python doesn’t know interfaces, for if a certain object wants to expose certain behaviour it just implements the required methods.
When you want a object to be comparible in python you don’t inherit something like IComparible like in .Net, but you just create the __cmp__ method.

Fields and methods can be changed (and are the same)
A method and a field in python are 2 the same things. They both rely in the object’s dictionary (__dict__) or of the object’s type’s dictionary (__class__.__dict__).
This allows you to use an object which implements __call__ instead of a method.
For __dict__ is writeable you can change / create attributes (members) at runtime:

>>> sys.__dict__['foo'] = "bar"
>>> sys.foo
'bar'

A class member function is a normal method, wrapped

>>> class exampleclass:
	value = ""
	def examplefunction(self):
		print self.value
>>> def examplereplacement(self):
	print "replacement!"
	print self.value
>>> instance = exampleclass()
>>> instance.examplefunction()
>>> instance.value = "foobar"
>>> instance.examplefunction()
foobar
>>> instance.examplefunction = examplereplacement
>>> instance.examplefunction()
Traceback (most recent call last):
  File "<pyshell#15>", line 1, in -toplevel-
    instance.examplefunction()
TypeError: examplereplacement() takes exactly 1 argument (0 given)
>>> instance.__class__.examplefunction = examplereplacement
>>> instance.examplefunction()
Traceback (most recent call last):
  File "<pyshell#17>", line 1, in -toplevel-
    instance.examplefunction()
TypeError: examplereplacement() takes exactly 1 argument (0 given)
>>> instance.__class__.__dict__['examplefunction'] = examplereplacement
>>> instance.examplefunction()
Traceback (most recent call last):
  File "<pyshell#19>", line 1, in -toplevel-
    instance.examplefunction()
TypeError: examplereplacement() takes exactly 1 argument (0 given)
>>> instance = exampleclass()
>>> instance.__class__.__dict__['examplefunction'] = examplereplacement
>>> instance.examplefunction()
replacement!

The bound methods seem to work a bit more tricky than they appear to work. Usually editing member functions via the __class__‘s __dict__ will result in a proper replacement.

These were just a few examples of what Python has to offer. These features aren’t limited to Python. Iron Python, an implementation of Python in .Net is capable of letting usual .Net objects to be manipulated in similar ways with Python via Iron Python. Although this doesn’t actually change the .Net objects, changing the wrapper resulting in the required similar behaviour is good enough.

The author of Iron Python has been recruited by Microsoft and is now working on making the CLR more dynamic.

I’d love a static dynamic language.

Base64 encoding/decoding algorithm

I’ve made some python functions to encode/decode base64. I’ve been trying to develop my own algorithm for base64 for the email protection script which can be found here.

Python again has proved itself again to be a great language for quickly developing stuff.

def tobase64(s, padd = False):
    b64s = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
    b64p = "="
    ret = ""
    left = 0
    for i in range(0, len(s)):
        if left == 0:
            ret += b64s[ord(s[i]) >> 2]
            left = 2
        else:
            if left == 6:
                ret += b64s[ord(s[i - 1]) & 63]
                ret += b64s[ord(s[i]) >> 2]
                left = 2
            else:
                index1 = ord(s[i - 1]) & (2 ** left - 1)
                index2 = ord(s[i]) >> (left + 2)
                index = (index1 << (6 - left)) | index2
                ret += b64s[index]
                left += 2
    if left != 0:
        ret += b64s[(ord(s[len(s) - 1]) & (2 ** left - 1)) << (6 - left)]
    if(padd):
        for i in range(0, (4 - len(ret) % 4) % 4):
            ret += b64p
    return ret
def frombase64(s):
    b64s = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
    b64p = "="
    ret = ""
    s2 = s.replace(b64p, "")
    left = 0
    for i in range(0, len(s2)):
        if left == 0:
            left = 6
        else:
            value1 = b64s.index(s2[i - 1]) & (2 ** left - 1)
            value2 = b64s.index(s2[i]) >> (left - 2)
            value = (value1 << (8 - left)) | value2
            ret += chr(value)
            left -= 2
    return ret

The algorithm doesn’t automaticly add the required =‘s while encoding, nor does it require while deencoding.

Update on the anti-email-harvester mailto links

In the previous post I described a simple though effective method to get rid of the constantly cleverer spam email harvester bots.

I’ve made a little update on the algorithm, it now uses only 1 number for each character and uses a cascading incremental xor transform.

Python code for the algorithm itself:

def alphaicx(s):
    ret = ""
    cascvalue = 0
    for i in range(0, len(s)):
        ret = ret + chr(ord(s[i]) ^ cascvalue)
        cascvalue = (ord(ret[i]) + 1) % 255 
    return ret
def betaicx(s):
    ret = ""
    cascvalue = 0
    for i in range(0, len(s)):
        ret = ret + chr(ord(s[i]) ^ cascvalue)
        cascvalue = ((ord(ret[i]) ^ cascvalue) + 1) % 255
    return ret

I designed the algorithm in Python. Python is great for that kind of stuff.

As you can see there are 2 functions, when you encode something with alphaicx you can decode it with betaicx, and visa versa. betaicx creates tougher code though. This encryption is pretty lousy, but hard enough to stop spam bots.

I’ve ported betaicx to PHP, and alphaicx to Javascript. The running example (very usefull though) has been updated.

The PHP/Javascript code for the function:

function JSBotProtect($text){
	$cxred = "0";
	$cascval = 0;
	for($i = 0; $i < strlen($text); $i++){
		$value = (ord($text[$i]) ^ $cascval);
		$cxred .= "," . $value;
		$cascval = (($value ^ $cascval) + 1) % 255;
	}
	return <<<EOF
<script type="text/javascript">var cxred=String.fromCharCode({$cxred});
var uncxred=""; var cascval=0;for(i=1;i<cxred .length; i++)
{uncxred+=String.fromCharCode(cxred.charCodeAt(i)^cascval);
cascval=((uncxred.charCodeAt(i-1))+1)%255;}document.write(uncxred);</script>
EOF;
}

I’ll more compact uncxred storage. Probable just normal hex, or when I can get it working BASE64.

Protecting your email address against spam bots

Spam bots get smarter these days in harvesting email addresses. They usualy use a regex which searches for ‘.. dot .. ltd’, which isn’t that resource intensive. When that is done a more advanced regex is put in there to get the email adress somehow removing stuff like ‘spam’.

Using normal javascript encoding doesn’t work anymore, for it isn’t that hard for a spider to regognize encoded strings and decode them, whether this is in javascript code or normal html escapes.

Therefore we need to get more inventive:

function JSBotProtect($text){
	$xorred = "0";
	$layer = "0";
	for($i = 0; $i < strlen($text); $i++){
		$layerbit = mt_rand(0, 255);
		$xorred .= "," . (string)(ord($text[$i]) ^ $layerbit);
		$layer .= "," . (string)$layerbit;
	}
	return <<<EOF
	<script type="text/javascript">
		var xorred = String.fromCharCode({$xorred});
		var layer = String.fromCharCode({$layer});
		var unxorred = "";
		for(i = 1; i < xorred.length; i++){
			unxorred += String.fromCharCode(
				xorred.charCodeAt(i)^layer.charCodeAt(i));
		}
		document.write(unxorred);
	</script>
EOF;
}

This PHP function returns a javascript block of code which stores the sensitive string like an email address in 2 parts, which when xorred with eachother result in the original email address.

An implementation to get a mailto: link

Developing plugin-based applications

When you want to write an application, whether this is a website, or a database manager which is based on plugin provided functionality you should remember two things when programming:

  • Can I change this behaviour easily with another plugin? You want to be able to edit a menu item for instance.
  • Do I want to be able to change this with another plugin? You don’t want a plugin to edit a menu item if it is meant to only be a simple format interpreter.

Basicly there are 2 big groups of plugin based applications, the static and dynamic. Where the static plugins are the easiest to program and most commonly used. An example would be a plugin which would load and save an image from and to a ‘.jpg’. These plugins usualy contain one function (gateway) which exposes a few plugins which usualy expose a static interface.

On the other side there are the over-dynamic plugins, which rather should be called ‘modules’. These applications would use a high level slow ‘message’ system where basicly everything is able to be intercepted, hooked up to, responded to…

Deciding what and how you want base behaviour to be able to be extended, edited or removed by plugins is the hardest part of developing a plugin based application. If everything should be able to be changed you get stuck with complicated code, and a lot of overhead due to extra ‘hookable’ calls. If almost nothing is overridable you problably get stuck with worthless plugins.

While developing a plugin extendable application it is very helpfull to put the base code in several plugins themselves, which prevents a lot of mistakes.

I’ve found python and c# to be very capable languages to write plugins, although C# is a bit more trouble than Python, Python allows you to do too much and just call functions without knowing for sure with what object you are handling.

I’ve been working on a modular server (see the modular server article I’ve posted), programmed in C#.

The base of the server is nothing more than a framework for loading / unloading libraries & plugins, managing configuration files, and hosting ‘server’ plugin instances. Every server plugin itself can load plugins (like for instance a custom authentication handler for a HTTP server). I’ve noticed that having a very dynamic configuration storage is very usefull for every instance of a plugin requires its own configurations and usualy its own plugins to be loaded. I’m currently using multiple tree based files which are merged at runtime, which I’ll replace with a concept format called XTL about which I’ll tell more lateron.

RGB to Hex (and why the python interactive mode is so damned handy)

Update 2010-02-01 Thanks to Sameer for pointing out the short way of doing this: return “#%02X%02X%02X” % (r,g,b)

def tohex(r,g,b):
	hexchars = "0123456789ABCDEF"
	return "#" + hexchars[r / 16] + hexchars[r % 16] + hexchars[g / 16] + hexchars[g % 16] + hexchars[b / 16] + hexchars[b % 16]

Python is very convenient when you need to program a simple algorithm. I programmed this RGB to Hex converter in less than 1 minute, including using the function for a few RGB values I needed to convert.

Usualy I just grab a pen and a paper and do the calculations myself for getting either the calculator of windows itself to show up and actually calculate stuff properly (looking to screen, writing down, forgetting to press C, starting again…), or writing a program in a language like C# would just take too much time.

I’ve been using Idle for quite some time as a very good replacement for both my calculator and my pen and paper.

Markup? Nah, wysiwyg!

When you post something on a forum, on a guestbook, on anything on the web which supports some kind of formatting of your code it works with some sort of BB-Code.
BB-Code is just like HTML formatting, with ‘[]’s instead of ‘<>‘s, and a lot less features. It is hard to type and doesn’t look neat.

But who cares? Everyone uses BB-Code, almost everyone knows BB-Code, and most people don’t find it hard to type for people are just too used to it!

There are alternatives to BB-Code, like Textile and Markdown, which use a more convenient syntax.
Personally I don’t like using them for I ain’t used to the syntax.

However.. why would we want to write formating anyway? Why not just use wysiwyg, and I do not mean a java applet but rather a standard for browsers; a new tag: “<input type="formatted" name="example" />“, which would act for the server as a normal input field returning the formatted text in Html.

The problem is that it would be very hard to get every browser to support such a new tag. Most browsers I guess would be very willing to comply. But browsers like for instance Internet Explorer wouldn’t. They don’t even comply with the simplest of CSS at the moment which gives website developers a headache.

Intrepid C – ‘Objects’ in C

I’m working on a library which can be used to achieve most OOP functionality of OO languages in C, by providing an API (not by extending the syntax).

I’ve written 2 articles (artc. 1, artc. 2) so far about getting OO like behaviour in C, and will write a few more about the more advanced stuff which I’m still experimenting with.

You can download the source of the API I have developed so far here, which by the way is far from finished. Stuff will change, and I am aware of some issues.

I’ll explain how it works in the next parts in the ‘Objects in C’ serie, but to give you a kickstart in the code I’ll explain it briefly:

Every object has got a pointer to its type and its toplevel object, which usualy is itself. The type instance pointed to from the instance provides the size, constructor, destructor and the interfacegetter.
The GetInterface function in the type (the interfacegetter) returns a pointer to an instance of an object of the given type if supported by the object.
Using interfaces can serve different purposes:

Exposing certain objectwide functionality

((ICloneable*)object->top->type->GetInterface(object->top->type, object->top, GetICloneableType()))->Clone(object->top);

This code will return a Clone of the object, even if only the object is passed by a not top level pointer. Using the top pointer ensures proper polyforism, and enables the ability for the top most class to hide, or show some internal functionality of wrapped objects.

Exposing interface specific functionality

((String*)object->type->GetInterface(object->type, object, GetIToStringAble()))->ToString(object);

It now depends on which interface of a certain object is passed which behaviour results.

And more.. and more.. which I will explain in the coming articles.

Download the source so far

‘Objects’ in C – part 2, ‘type’ing stuff

In my previous article I’ve talked about the basics of using ‘object like’ programming in C. Actually they rather were more like examples. To continue further I’ll discuss the stuff behind it more detailed, but first I’ll explain the basics of structs, if you know them already, just skip that part of this part.

Structs in C (and in the memory)
An example structure in C:

struct example{int a; int b;};

A structure is nothing more than a way of giving some parts in a fixed size amount of memory a specific name, and a recommended type.

When one would create a new instance of the example structure it would require 8 bytes of space in the memory, 4 for a, the first integer and again 4 bytes for b, the second integer:

00 a
01 a
02 a
03 a
04 a
05 b
06 b
07 b
08 b

An instance of the example struct could begin at the memory address0x0A001000. In that case the value of a would be at the memory adress 0x0A001000 too, for a is the first field in the structure. The value of b however would be located at 0x0A001004 for b is the second field and is located 4 bytes after the start of the structure. I could use that knowladge to access b in several different ways:

example* e = malloc(sizeof(example)); e->a = 1; e->b = 10;
printf("%d", e->b); // The usual way
printf("%d", *((int*)((int)e + 4)); // Converts e to an integer, adds 4 and then interprets the result as a pointer to an integer
printf("%d", *((int*)((int)(&(e->a)) + 4)); // Basicly the same way, but now via 'a' which is at the offset of the struct anyway

Structure inheritance

typedef struct {
	int a; 
	int b; 
	int result;
} SumHandle;
void GetResult(SumHandle* handle){
	handle->result = handle->a + handle->b;
}

The example SumHandle struct is a very simple (and useless) structure to store 2 operands, and the result which is filled by calling the GetResult function providing a pointer to the handle.

This works fine, but we could want to add more functionality, for instance why not want to know the result of multiplying those 2 operands? We could make a new struct defenition and a new GetResult function which still uses the old GetResult:

typedef struct {
	int a; 
	int b; 
	int sumResult;
	int multResult;
} SumHandle2;
void GetResult2(SumHandle2* handle){
	handle->multResult = handle->a * handle->b;
	GetResult((SumHandle*)handle);
}

As you can see changing the name of the originaly named result field hasn’t got any effect for only the offset from the offset from the struct offset is used at runtime, therefore I could add the extra field and still use the original GetResult function. This can be done for the original GetResult function only uses the first 12 bytes of the pointed to piece of memory. Adding stuff after that does not effect the original function. Note that the oposite is not possible for changing memory outside your own allocated memory could do really nasty stuff causing the all-too-known GPF‘s.

Virtual functions
In the previous example only an extension has been made to the original behaviour. To overwrite the previous behaviour you need to use virtual functions to overwrite the previous call, the original struct would look like:

typedef void (*GetResultHandler)(void* result);
typedef struct {
	int a; 
	int b; 
	int result;
	GetResultHandler GetResult;
} SumHandle;
void GetResultImpl(void* handle){
	((SumHandle*)handle)->result = ((SumHandle*)handle)->a + ((SumHandle*)handle)->b;
}

The GetResult field is a function pointer which would be set in an initialization function of SumHandle to the GetResultImpl method.

Overwriting the behaviour would be as simple as writing a new initialization function used which would use another GetResult implementation. This for calling resultHandleInstance->GetResult(resultHandleInstance) is nothing more than calling the function at the address specified in the GetResult field.

A type
Every inheritance at the moment still needs a custom constructor and destructor function. Therefore some functionality could not be ashieved like making a copy of a handle for that would need both the constructor function and the actual size of the struct, which both aren’t available when only a pointer is passed. Possibly a custom clone function would need to be provided.
If every object would expose a pointer to a type struct containing information about the struct this problem would be eliminated:

#include <malloc.h>
#include <stdio.h>
// Default defenitions
typedef struct _Type	Type;
typedef struct _Object	Object;
typedef void (*ConstructorHandler)	(Type* type, Object* instance);
typedef void (*DestructorHandler)	(Type* type, Object* instance);
struct _Type {
	int					size;
	ConstructorHandler	Construct;
	DestructorHandler	Destruct;
};
struct _Object {
	Type*				type;
};
Object* Construct(Type* type) {
	Object* object	= malloc(type->size);
	object->type	= type;
	type->Construct(type, object);
	return object;
}
void Destruct(Object* object) {
	object->type->Destruct(object->type, object);
	free(object);
}
Type* CreateType(int size, ConstructorHandler constructor, DestructorHandler destructor) {
	Type* type		= malloc(sizeof(Type));
	type->size		= size;
	type->Construct	= constructor;
	type->Destruct	= destructor;
	return type;
}
// An example implementation
typedef struct _Example Example;
typedef void (*GenerateResultHandler)(Example* example);
struct _Example {
	Type*					type;
	int						a;
	int						b;
	int						result;
	GenerateResultHandler	GenerateResult;
};
void GenerateResultImpl(Example* example) {
	example->result = example->a + example->b;
}
void ConstructExample (Type* type, Object* instance) {
	((Example*)instance)->a					= 0;
	((Example*)instance)->b					= 0;
	((Example*)instance)->result			= 0;
	((Example*)instance)->GenerateResult	= GenerateResultImpl;
}
void DestructExample (Type* type, Object* instance) {
}
Type* CreateExampleType() {
	return CreateType(sizeof(Example), ConstructExample, DestructExample); 
}
void main() {
	Type* exampleType = CreateExampleType();
	Example* example = (Example*)Construct(exampleType);
	example->a = 5;
	example->b = 5;
	example->GenerateResult(example);
	printf("%d", example->result);
	Destruct((void*)example);
	getchar();
}

To add an inheritance:

#include <malloc.h>
#include <stdio.h>
// Default defenitions
typedef struct _Type	Type;
typedef struct _Object	Object;
typedef void (*ConstructorHandler)	(Type* type, Object* instance);
typedef void (*DestructorHandler)	(Type* type, Object* instance);
struct _Type {
	int					size;
	ConstructorHandler	Construct;
	DestructorHandler	Destruct;
};
struct _Object {
	Type*				type;
};
Object* Construct(Type* type) {
	Object* object	= malloc(type->size);
	object->type	= type;
	type->Construct(type, object);
	return object;
}
void Destruct(Object* object) {
	object->type->Destruct(object->type, object);
	free(object);
}
Type* CreateType(int size, ConstructorHandler constructor, DestructorHandler destructor) {
	Type* type		= malloc(sizeof(Type));
	type->size		= size;
	type->Construct	= constructor;
	type->Destruct	= destructor;
	return type;
}
// An example implementation
typedef struct _Example Example;
typedef void (*GenerateResultHandler)(Example* example);
struct _Example {
	Type*					type;
	int						a;
	int						b;
	int						result;
	GenerateResultHandler	GenerateResult;
};
void GenerateResultImpl(Example* example) {
	example->result = example->a + example->b;
}
void ConstructExample (Type* type, Object* instance) {
	((Example*)instance)->a					= 0;
	((Example*)instance)->b					= 0;
	((Example*)instance)->result			= 0;
	((Example*)instance)->GenerateResult	= GenerateResultImpl;
}
void DestructExample (Type* type, Object* instance) {
}
Type* CreateExampleType() {
	return CreateType(sizeof(Example), ConstructExample, DestructExample); 
}
// An inheritance
typedef struct _Example2 Example2;
typedef void (*GenerateResult2Handler)(Example2* example);
struct _Example2 {
	Type*					type;
	int						a;
	int						b;
	int						result;
	GenerateResult2Handler	GenerateResult;
	int						result2;
};
void GenerateResult2Impl(Example2* example) {
	example->result = example->a * example->b;
	example->result2 = example->a + example->b;
}
void ConstructExample2 (Type* type, Object* instance) {
	((Example2*)instance)->a				= 0;
	((Example2*)instance)->b				= 0;
	((Example2*)instance)->result			= 0;
	((Example2*)instance)->result2			= 0;
	((Example2*)instance)->GenerateResult	= GenerateResult2Impl;
}
void DestructExample2 (Type* type, Object* instance) {
}
Type* CreateExampleType2() {
	return CreateType(sizeof(Example2), ConstructExample2, DestructExample2); 
}
void main() {
	Type* exampleType = CreateExampleType();
	Type* example2Type = CreateExampleType2();
	Example* example = (Example*)Construct(exampleType);
	Example2* example2 = (Example2*)Construct(example2Type);
	example->a = 5;	example->b = 5;
	example2->a = 5;	example2->b = 5;
	example->GenerateResult(example);
	example2->GenerateResult(example2);
	printf("example:  result: %dn", example->result);
	printf("example2: result: %d result2: %dn", example2->result, example2->result2);
	Destruct((void*)example);
	Destruct((void*)example2);
	getchar();
}

Limitations
Although using types, and casts creates a lot more freedom, it still has its limitations:

  • No multi inheritance possible
  • No ‘new’ functions (functions that are only used when a pointer is specificly cast to a struct)
  • No static support throughout types
  • No type behaviour inheritance (a type still has its own basetype and isn’t an object itself)

I’ll explain some stuff about wrapping old implementations, or even multiple implementations in the next part, which will overcome these limitations.

‘Objects’ in C – part 1, the basics

C is on memory allocation quite faster than C++ for C++ contains a lot of overhead due to its object orientated nature.
Although not as convenient in C it is possible to get ‘Object’ like stuff in C as in C++.

I’ll explain the basics in this first part, and continue with more complicated (and neater stuff) in the later parts. I hope you’ll find them usefull.

The base
For this example we’ll use a ‘class’ called ‘example’, first the base:

typedef struct{
	int dummy;
} Example;
Example* ConstructExample(){
	return malloc(sizeof(Example));
}
void DestructExample(Example* example){
	free(example);
}
void main(){
	Example* e;
	e = ConstructExample();
	DestructExample(e);
}

Simple inheritance
Lets add some values, and some simple functions and inheritance. Using the inherited class as the base class is just a simple matter of casting via void*.

typedef struct {
	int a;
} Example;
typedef struct {
	int a;
	int b;
} Example2;
Example2* ConstructExample2() {
	Example2* example = malloc(sizeof(Example2));
	example->b = 10;
	example->a = 11;
	return example;
}
Example* ConstructExample() {
	Example* example = malloc(sizeof(Example));
	example->a = 1;
	return example;
}
void DestructExample(Example* example) {
	free(example);
}
void DestructExample2(Example2* example) {
	free(example);
}
void PrintExample(Example* example) {
	printf("%d ", example->a);
}
void PrintExample2(Example2* example) {
	printf("%d ", example->b);
}
void main() {
	Example* e;
	Example2* e2;
	e = ConstructExample();
	e2 = ConstructExample2();
	PrintExample((void*)e);
	PrintExample((void*)e2);
	PrintExample2((void*)e2);
	DestructExample(e);
	DestructExample2(e2);
}

When inheriting you have to copy the original defenition and only append at the bottom, changing nothing of the previous stuff for otherwise you’ll get nasty errors. When you want to override stuff you got to use neat tricks, more on that lateron.

Virtual functions
Using function pointers, virtual functions can be used:

typedef void (*PrintExample)(void* example);
typedef struct {
	int a;
	PrintExample Print;
} Example;
typedef struct {
	int a;
	PrintExample Print;
	int b;
} Example2;
void PrintExampleImpl(void* example) {
	printf("(printexample) %d", ((Example*)example)->a);
}
void PrintExample2Impl(void* example) {
	printf("(printexample2) %d ", ((Example2*)example)->b);
	PrintExampleImpl(example);
}
Example2* ConstructExample2() {
	Example2* example = malloc(sizeof(Example2));
	example->b = 10;
	example->a = 11;
	example->Print = PrintExample2Impl;
	return example;
}
Example* ConstructExample() {
	Example* example = malloc(sizeof(Example));
	example->a = 1;
	example->Print = PrintExampleImpl;
	return example;
}
void DestructExample(Example* example) {
	free(example);
}
void DestructExample2(Example2* example) {
	free(example);
}
void main() {
	Example* e;
	Example2* e2;
	e = ConstructExample();
	e2 = ConstructExample2();
	e->Print(e);
	e2->Print(e2);
	DestructExample(e);
	DestructExample2(e2);
	getchar();
}

A virtual function still has to be supplied with the function in which it has been called. 2 simple macro can be made to make a this call a bit easier, (for some :p):

#define THISCALLPAR(x,y,z) x->y(x,z)
#define THISCALL(x,y) x->y(x)

Wrapped inheritance
To override some functionality and retain other functionality while adding your own functionality is virtually impossible by using one simple object. Multi-inheritance would be virtually impossible.
2 parts ahead I will talk about these limitations and how to overcome them, the next part discusses inheritance in more detail, and adding ‘Types’ in the mix.

Intrepid.IO.FragmentedStream

While working on a design of a .pak files, which are nothing more then archives, I noticed most existing .pak files are just very simple. This for they put all files after eachother which makes it really hard to even just add a file, or delete one.

Therefor I started thinking about a way to make an efficient archive file for read and write access.

There are basicly 2 ways to accomplish this:

  • Clustering
  • Fragmenting

Fragmenting.
The archive consists out of several files, and an index which is counted as a file. The start of every file is pointed to from the index by its offset. When a file is deleted, its record in the index is null-ed, which is lateron filled with a new entry and in the meantime ignored (for the only file that can start at position 0 is the index itself).

A file itself is fragmented, every fragment starts with 4 bytes marking the length of the fragment and after that again 4 bytes marking the offset of the next fragment this time. In case there is no next fragment present the next-cluster-offset is 0. The first fragment of every file contains after the 8 bytes marking the length and offset of the next fragment some information about the file like the name.

For the index itself is a file, it also is fragmented and can grow, and shrink.

The 4 byte offset value could be default set to 8 bytes in case files are going to be bigger than 2 Gig, and the oposite in case you are dealing with small files. Although this can decrease the size of files which are frequently edited it isn’t possible to increase or decrease to 8 or 2 bytes in runtime.

Clustered
The clustered way is practicly the same as the fragmented way except for that every fragment is padded to a multiple of the cluster size. Therefore the ‘next fragment’ value can now be a cluster number instead of an offset. This will both increase the speed of writing due to having always a buffer, also defragmenting will be faster. The files itself will also be able to retain a 2 byte next-cluster pointer longer. The only downside is that files will just be larger.

FragmentedStream
The fragmented stream wraps these methods pretty efficiently at the moment.
I am thinking to implement an inteligent algorithm which checks which streams are often used which are then placed at the end of the file with a buffer. This also could be applied for files which are marked to be expanding.

Rich Client Side Framework

On several blogs the idea of having a rich java script passed, for example on ZefHemel.com: Rich Web UI: Search As You Type

Guess due to google, which has made a neat Webmail interface for gmail and Google suggest with find as you type.

The demands on java script keeps growing. People want to make better webUI’s and features with Javascript although javascript is defenitely not designed for this stuff.

Using flash, and java is an overkill, but using javascript is espacially an overkill for javascript isnt handled consistantly on different browsers, and isn’t as quick and maintainable as it could be.

I guess it would be time to extend HTML itself with a more advanced script; java like preferably although then directly supported by the browser, and less aimed at custom drawing but using an API provided by the browser.

I’m currently experimenting with Microsoft .net assemblies which are downloaded in slimmed form as webpage which are executed with very limited access. Which works neat although it is still an overkill (a .dll is about 20 Kb, even if you got only one line of code..)

Just a thought.

OO Stated Stackbased Parsing

Every parsable format (like INI, XML) consists out of certain area’s. You can be parsing the section name at one moment, or be parsing a comment when parsing an INI. These certain area’s where you can parse result in a parse state. In every state you expect something else, and you gather other kinds of information.

When you are parsing in a certain state you can find that the state has changed (the parser found a new xml node in a xml file), then the old state is pushed on the state stack.

In certain circumstances you need to have the ability to fall back to a previous state, this can happen when you are parsing a apparently new section name and suddenly there is a comment character instead of a sectionname end character. In this case you need to be able to fall back on the previous section you were parsing. Although when you successfully have parsed the sectionname you want the old sectionstate removed from the stack (and the data of it emited).
Continue reading OO Stated Stackbased Parsing

Parsing $_SERVER[‘PATH_INFO’]

The PHP global variable $_SERVER['PATH_INFO'] contains the path suffixed to a PHP script, if I would call the URL:

http://domain.ext/path/to/script.php/foo/bar.htm?a=b&c=d

Then $_SERVER['PATH_INFO'] would contain:

/foo/bar.htm

Traditionaly the $_GET variables are used for certain parameters like a page to display:

http://domain.ext/page.php?page=about.htm

This method is easy to program, but not only looks strange, but also is very search engine unfriendly. Most searchengines ignore the QueryString (the part of the URL after the ?). And therefor would index the first page.php?page=x they would find and ignore the rest.
Some searchengines like Google do not ignore the query string, but would give a page without using a querystring for different content a way higher ranking.

Parsing the $_SERVER['PATH_INFO'] is relatively easy, this code would do most of the stuff just fine:

if (!isset($_SERVER['PATH_INFO'])){
	$pathbits= array('');
}else{
	$pathbits = explode("/",  $_SERVER['PATH_INFO']);
}

The $pathbits array would always contain / as first element if a path info was provided, otherwise it will be an empty array.

Here is a quite simple example which parses the path info to decide which file to include:

<?php
if (!isset($_SERVER['PATH_INFO'])){
	$pathbits= array('');
}else{
	$pathbits = explode("/",  $_SERVER['PATH_INFO']);
}
if (!isset($pathbits[1]) || $pathbits[1] == ""){
	$page = "default"
}else{
	$page = basename($pathbits[1]);
}
$file = "./pages/{$page}.php";
if (!is_file($file)){
	echo "File not found";
}else{
	require $file;
}
?>