The dynamic of python

Python is dynamic (duh)

Dynamic typing
You don’t need to specify the type of an object anywhere.

def PrintSomething(something):
    print something

Everything is an object
Everything, yes, everything in Python is an object. Even an integer, a method, a class, etc.
Every object has got a type (__class__), which actually can be changed at runtime, which usually results in conflicts about the allocated size for the PyObject.

No typing needed
Python doesn’t know interfaces, for if a certain object wants to expose certain behaviour it just implements the required methods.
When you want a object to be comparible in python you don’t inherit something like IComparible like in .Net, but you just create the __cmp__ method.

Fields and methods can be changed (and are the same)
A method and a field in python are 2 the same things. They both rely in the object’s dictionary (__dict__) or of the object’s type’s dictionary (__class__.__dict__).
This allows you to use an object which implements __call__ instead of a method.
For __dict__ is writeable you can change / create attributes (members) at runtime:

>>> sys.__dict__['foo'] = "bar"
>>> sys.foo
'bar'

A class member function is a normal method, wrapped

>>> class exampleclass:
	value = ""
	def examplefunction(self):
		print self.value
>>> def examplereplacement(self):
	print "replacement!"
	print self.value
>>> instance = exampleclass()
>>> instance.examplefunction()
>>> instance.value = "foobar"
>>> instance.examplefunction()
foobar
>>> instance.examplefunction = examplereplacement
>>> instance.examplefunction()
Traceback (most recent call last):
  File "<pyshell#15>", line 1, in -toplevel-
    instance.examplefunction()
TypeError: examplereplacement() takes exactly 1 argument (0 given)
>>> instance.__class__.examplefunction = examplereplacement
>>> instance.examplefunction()
Traceback (most recent call last):
  File "<pyshell#17>", line 1, in -toplevel-
    instance.examplefunction()
TypeError: examplereplacement() takes exactly 1 argument (0 given)
>>> instance.__class__.__dict__['examplefunction'] = examplereplacement
>>> instance.examplefunction()
Traceback (most recent call last):
  File "<pyshell#19>", line 1, in -toplevel-
    instance.examplefunction()
TypeError: examplereplacement() takes exactly 1 argument (0 given)
>>> instance = exampleclass()
>>> instance.__class__.__dict__['examplefunction'] = examplereplacement
>>> instance.examplefunction()
replacement!

The bound methods seem to work a bit more tricky than they appear to work. Usually editing member functions via the __class__‘s __dict__ will result in a proper replacement.

These were just a few examples of what Python has to offer. These features aren’t limited to Python. Iron Python, an implementation of Python in .Net is capable of letting usual .Net objects to be manipulated in similar ways with Python via Iron Python. Although this doesn’t actually change the .Net objects, changing the wrapper resulting in the required similar behaviour is good enough.

The author of Iron Python has been recruited by Microsoft and is now working on making the CLR more dynamic.

I’d love a static dynamic language.

Base64 encoding/decoding algorithm

I’ve made some python functions to encode/decode base64. I’ve been trying to develop my own algorithm for base64 for the email protection script which can be found here.

Python again has proved itself again to be a great language for quickly developing stuff.

def tobase64(s, padd = False):
    b64s = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
    b64p = "="
    ret = ""
    left = 0
    for i in range(0, len(s)):
        if left == 0:
            ret += b64s[ord(s[i]) >> 2]
            left = 2
        else:
            if left == 6:
                ret += b64s[ord(s[i - 1]) & 63]
                ret += b64s[ord(s[i]) >> 2]
                left = 2
            else:
                index1 = ord(s[i - 1]) & (2 ** left - 1)
                index2 = ord(s[i]) >> (left + 2)
                index = (index1 << (6 - left)) | index2
                ret += b64s[index]
                left += 2
    if left != 0:
        ret += b64s[(ord(s[len(s) - 1]) & (2 ** left - 1)) << (6 - left)]
    if(padd):
        for i in range(0, (4 - len(ret) % 4) % 4):
            ret += b64p
    return ret
def frombase64(s):
    b64s = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
    b64p = "="
    ret = ""
    s2 = s.replace(b64p, "")
    left = 0
    for i in range(0, len(s2)):
        if left == 0:
            left = 6
        else:
            value1 = b64s.index(s2[i - 1]) & (2 ** left - 1)
            value2 = b64s.index(s2[i]) >> (left - 2)
            value = (value1 << (8 - left)) | value2
            ret += chr(value)
            left -= 2
    return ret

The algorithm doesn’t automaticly add the required =‘s while encoding, nor does it require while deencoding.

Inactivity

For some undisclosed reasons I won’t be able to access the internet as often as I would like to in the next few weeks. Therefore my activity on the internet will drop, so will the amount of posts on this blog propably. I will try to find some time to post at least 3 articles per week.

If you want to reach me, try e-mail instead of MSN:

Update on the anti-email-harvester mailto links

In the previous post I described a simple though effective method to get rid of the constantly cleverer spam email harvester bots.

I’ve made a little update on the algorithm, it now uses only 1 number for each character and uses a cascading incremental xor transform.

Python code for the algorithm itself:

def alphaicx(s):
    ret = ""
    cascvalue = 0
    for i in range(0, len(s)):
        ret = ret + chr(ord(s[i]) ^ cascvalue)
        cascvalue = (ord(ret[i]) + 1) % 255 
    return ret
def betaicx(s):
    ret = ""
    cascvalue = 0
    for i in range(0, len(s)):
        ret = ret + chr(ord(s[i]) ^ cascvalue)
        cascvalue = ((ord(ret[i]) ^ cascvalue) + 1) % 255
    return ret

I designed the algorithm in Python. Python is great for that kind of stuff.

As you can see there are 2 functions, when you encode something with alphaicx you can decode it with betaicx, and visa versa. betaicx creates tougher code though. This encryption is pretty lousy, but hard enough to stop spam bots.

I’ve ported betaicx to PHP, and alphaicx to Javascript. The running example (very usefull though) has been updated.

The PHP/Javascript code for the function:

function JSBotProtect($text){
	$cxred = "0";
	$cascval = 0;
	for($i = 0; $i < strlen($text); $i++){
		$value = (ord($text[$i]) ^ $cascval);
		$cxred .= "," . $value;
		$cascval = (($value ^ $cascval) + 1) % 255;
	}
	return <<<EOF
<script type="text/javascript">var cxred=String.fromCharCode({$cxred});
var uncxred=""; var cascval=0;for(i=1;i<cxred .length; i++)
{uncxred+=String.fromCharCode(cxred.charCodeAt(i)^cascval);
cascval=((uncxred.charCodeAt(i-1))+1)%255;}document.write(uncxred);</script>
EOF;
}

I’ll more compact uncxred storage. Probable just normal hex, or when I can get it working BASE64.

Protecting your email address against spam bots

Spam bots get smarter these days in harvesting email addresses. They usualy use a regex which searches for ‘.. dot .. ltd’, which isn’t that resource intensive. When that is done a more advanced regex is put in there to get the email adress somehow removing stuff like ‘spam’.

Using normal javascript encoding doesn’t work anymore, for it isn’t that hard for a spider to regognize encoded strings and decode them, whether this is in javascript code or normal html escapes.

Therefore we need to get more inventive:

function JSBotProtect($text){
	$xorred = "0";
	$layer = "0";
	for($i = 0; $i < strlen($text); $i++){
		$layerbit = mt_rand(0, 255);
		$xorred .= "," . (string)(ord($text[$i]) ^ $layerbit);
		$layer .= "," . (string)$layerbit;
	}
	return <<<EOF
	<script type="text/javascript">
		var xorred = String.fromCharCode({$xorred});
		var layer = String.fromCharCode({$layer});
		var unxorred = "";
		for(i = 1; i < xorred.length; i++){
			unxorred += String.fromCharCode(
				xorred.charCodeAt(i)^layer.charCodeAt(i));
		}
		document.write(unxorred);
	</script>
EOF;
}

This PHP function returns a javascript block of code which stores the sensitive string like an email address in 2 parts, which when xorred with eachother result in the original email address.

An implementation to get a mailto: link

Developing plugin-based applications

When you want to write an application, whether this is a website, or a database manager which is based on plugin provided functionality you should remember two things when programming:

  • Can I change this behaviour easily with another plugin? You want to be able to edit a menu item for instance.
  • Do I want to be able to change this with another plugin? You don’t want a plugin to edit a menu item if it is meant to only be a simple format interpreter.

Basicly there are 2 big groups of plugin based applications, the static and dynamic. Where the static plugins are the easiest to program and most commonly used. An example would be a plugin which would load and save an image from and to a ‘.jpg’. These plugins usualy contain one function (gateway) which exposes a few plugins which usualy expose a static interface.

On the other side there are the over-dynamic plugins, which rather should be called ‘modules’. These applications would use a high level slow ‘message’ system where basicly everything is able to be intercepted, hooked up to, responded to…

Deciding what and how you want base behaviour to be able to be extended, edited or removed by plugins is the hardest part of developing a plugin based application. If everything should be able to be changed you get stuck with complicated code, and a lot of overhead due to extra ‘hookable’ calls. If almost nothing is overridable you problably get stuck with worthless plugins.

While developing a plugin extendable application it is very helpfull to put the base code in several plugins themselves, which prevents a lot of mistakes.

I’ve found python and c# to be very capable languages to write plugins, although C# is a bit more trouble than Python, Python allows you to do too much and just call functions without knowing for sure with what object you are handling.

I’ve been working on a modular server (see the modular server article I’ve posted), programmed in C#.

The base of the server is nothing more than a framework for loading / unloading libraries & plugins, managing configuration files, and hosting ‘server’ plugin instances. Every server plugin itself can load plugins (like for instance a custom authentication handler for a HTTP server). I’ve noticed that having a very dynamic configuration storage is very usefull for every instance of a plugin requires its own configurations and usualy its own plugins to be loaded. I’m currently using multiple tree based files which are merged at runtime, which I’ll replace with a concept format called XTL about which I’ll tell more lateron.

RGB to Hex (and why the python interactive mode is so damned handy)

Update 2010-02-01 Thanks to Sameer for pointing out the short way of doing this: return “#%02X%02X%02X” % (r,g,b)

def tohex(r,g,b):
	hexchars = "0123456789ABCDEF"
	return "#" + hexchars[r / 16] + hexchars[r % 16] + hexchars[g / 16] + hexchars[g % 16] + hexchars[b / 16] + hexchars[b % 16]

Python is very convenient when you need to program a simple algorithm. I programmed this RGB to Hex converter in less than 1 minute, including using the function for a few RGB values I needed to convert.

Usualy I just grab a pen and a paper and do the calculations myself for getting either the calculator of windows itself to show up and actually calculate stuff properly (looking to screen, writing down, forgetting to press C, starting again…), or writing a program in a language like C# would just take too much time.

I’ve been using Idle for quite some time as a very good replacement for both my calculator and my pen and paper.

Markup? Nah, wysiwyg!

When you post something on a forum, on a guestbook, on anything on the web which supports some kind of formatting of your code it works with some sort of BB-Code.
BB-Code is just like HTML formatting, with ‘[]’s instead of ‘<>‘s, and a lot less features. It is hard to type and doesn’t look neat.

But who cares? Everyone uses BB-Code, almost everyone knows BB-Code, and most people don’t find it hard to type for people are just too used to it!

There are alternatives to BB-Code, like Textile and Markdown, which use a more convenient syntax.
Personally I don’t like using them for I ain’t used to the syntax.

However.. why would we want to write formating anyway? Why not just use wysiwyg, and I do not mean a java applet but rather a standard for browsers; a new tag: “<input type="formatted" name="example" />“, which would act for the server as a normal input field returning the formatted text in Html.

The problem is that it would be very hard to get every browser to support such a new tag. Most browsers I guess would be very willing to comply. But browsers like for instance Internet Explorer wouldn’t. They don’t even comply with the simplest of CSS at the moment which gives website developers a headache.

Intrepid C – ‘Objects’ in C

I’m working on a library which can be used to achieve most OOP functionality of OO languages in C, by providing an API (not by extending the syntax).

I’ve written 2 articles (artc. 1, artc. 2) so far about getting OO like behaviour in C, and will write a few more about the more advanced stuff which I’m still experimenting with.

You can download the source of the API I have developed so far here, which by the way is far from finished. Stuff will change, and I am aware of some issues.

I’ll explain how it works in the next parts in the ‘Objects in C’ serie, but to give you a kickstart in the code I’ll explain it briefly:

Every object has got a pointer to its type and its toplevel object, which usualy is itself. The type instance pointed to from the instance provides the size, constructor, destructor and the interfacegetter.
The GetInterface function in the type (the interfacegetter) returns a pointer to an instance of an object of the given type if supported by the object.
Using interfaces can serve different purposes:

Exposing certain objectwide functionality

((ICloneable*)object->top->type->GetInterface(object->top->type, object->top, GetICloneableType()))->Clone(object->top);

This code will return a Clone of the object, even if only the object is passed by a not top level pointer. Using the top pointer ensures proper polyforism, and enables the ability for the top most class to hide, or show some internal functionality of wrapped objects.

Exposing interface specific functionality

((String*)object->type->GetInterface(object->type, object, GetIToStringAble()))->ToString(object);

It now depends on which interface of a certain object is passed which behaviour results.

And more.. and more.. which I will explain in the coming articles.

Download the source so far

Some more on my blog

I’m keeping this blog mainly to channel some thoughts about computer science (but probably about physics and other subjects which I find interesting too). I hope people will find the stuff I’ll write about interesting, and hopefully usefull. Most stuff I’ll post will as far as I know be quite new, or a new approach to an existing issue.

It seems that so far 3 people have linked to an article on this blog (all of them to the Objects in C article) already. (Thanks! without readers a webblog is quite useless!)

‘Objects’ in C – part 2, ‘type’ing stuff

In my previous article I’ve talked about the basics of using ‘object like’ programming in C. Actually they rather were more like examples. To continue further I’ll discuss the stuff behind it more detailed, but first I’ll explain the basics of structs, if you know them already, just skip that part of this part.

Structs in C (and in the memory)
An example structure in C:

struct example{int a; int b;};

A structure is nothing more than a way of giving some parts in a fixed size amount of memory a specific name, and a recommended type.

When one would create a new instance of the example structure it would require 8 bytes of space in the memory, 4 for a, the first integer and again 4 bytes for b, the second integer:

00 a
01 a
02 a
03 a
04 a
05 b
06 b
07 b
08 b

An instance of the example struct could begin at the memory address0x0A001000. In that case the value of a would be at the memory adress 0x0A001000 too, for a is the first field in the structure. The value of b however would be located at 0x0A001004 for b is the second field and is located 4 bytes after the start of the structure. I could use that knowladge to access b in several different ways:

example* e = malloc(sizeof(example)); e->a = 1; e->b = 10;
printf("%d", e->b); // The usual way
printf("%d", *((int*)((int)e + 4)); // Converts e to an integer, adds 4 and then interprets the result as a pointer to an integer
printf("%d", *((int*)((int)(&(e->a)) + 4)); // Basicly the same way, but now via 'a' which is at the offset of the struct anyway

Structure inheritance

typedef struct {
	int a; 
	int b; 
	int result;
} SumHandle;
void GetResult(SumHandle* handle){
	handle->result = handle->a + handle->b;
}

The example SumHandle struct is a very simple (and useless) structure to store 2 operands, and the result which is filled by calling the GetResult function providing a pointer to the handle.

This works fine, but we could want to add more functionality, for instance why not want to know the result of multiplying those 2 operands? We could make a new struct defenition and a new GetResult function which still uses the old GetResult:

typedef struct {
	int a; 
	int b; 
	int sumResult;
	int multResult;
} SumHandle2;
void GetResult2(SumHandle2* handle){
	handle->multResult = handle->a * handle->b;
	GetResult((SumHandle*)handle);
}

As you can see changing the name of the originaly named result field hasn’t got any effect for only the offset from the offset from the struct offset is used at runtime, therefore I could add the extra field and still use the original GetResult function. This can be done for the original GetResult function only uses the first 12 bytes of the pointed to piece of memory. Adding stuff after that does not effect the original function. Note that the oposite is not possible for changing memory outside your own allocated memory could do really nasty stuff causing the all-too-known GPF‘s.

Virtual functions
In the previous example only an extension has been made to the original behaviour. To overwrite the previous behaviour you need to use virtual functions to overwrite the previous call, the original struct would look like:

typedef void (*GetResultHandler)(void* result);
typedef struct {
	int a; 
	int b; 
	int result;
	GetResultHandler GetResult;
} SumHandle;
void GetResultImpl(void* handle){
	((SumHandle*)handle)->result = ((SumHandle*)handle)->a + ((SumHandle*)handle)->b;
}

The GetResult field is a function pointer which would be set in an initialization function of SumHandle to the GetResultImpl method.

Overwriting the behaviour would be as simple as writing a new initialization function used which would use another GetResult implementation. This for calling resultHandleInstance->GetResult(resultHandleInstance) is nothing more than calling the function at the address specified in the GetResult field.

A type
Every inheritance at the moment still needs a custom constructor and destructor function. Therefore some functionality could not be ashieved like making a copy of a handle for that would need both the constructor function and the actual size of the struct, which both aren’t available when only a pointer is passed. Possibly a custom clone function would need to be provided.
If every object would expose a pointer to a type struct containing information about the struct this problem would be eliminated:

#include <malloc.h>
#include <stdio.h>
// Default defenitions
typedef struct _Type	Type;
typedef struct _Object	Object;
typedef void (*ConstructorHandler)	(Type* type, Object* instance);
typedef void (*DestructorHandler)	(Type* type, Object* instance);
struct _Type {
	int					size;
	ConstructorHandler	Construct;
	DestructorHandler	Destruct;
};
struct _Object {
	Type*				type;
};
Object* Construct(Type* type) {
	Object* object	= malloc(type->size);
	object->type	= type;
	type->Construct(type, object);
	return object;
}
void Destruct(Object* object) {
	object->type->Destruct(object->type, object);
	free(object);
}
Type* CreateType(int size, ConstructorHandler constructor, DestructorHandler destructor) {
	Type* type		= malloc(sizeof(Type));
	type->size		= size;
	type->Construct	= constructor;
	type->Destruct	= destructor;
	return type;
}
// An example implementation
typedef struct _Example Example;
typedef void (*GenerateResultHandler)(Example* example);
struct _Example {
	Type*					type;
	int						a;
	int						b;
	int						result;
	GenerateResultHandler	GenerateResult;
};
void GenerateResultImpl(Example* example) {
	example->result = example->a + example->b;
}
void ConstructExample (Type* type, Object* instance) {
	((Example*)instance)->a					= 0;
	((Example*)instance)->b					= 0;
	((Example*)instance)->result			= 0;
	((Example*)instance)->GenerateResult	= GenerateResultImpl;
}
void DestructExample (Type* type, Object* instance) {
}
Type* CreateExampleType() {
	return CreateType(sizeof(Example), ConstructExample, DestructExample); 
}
void main() {
	Type* exampleType = CreateExampleType();
	Example* example = (Example*)Construct(exampleType);
	example->a = 5;
	example->b = 5;
	example->GenerateResult(example);
	printf("%d", example->result);
	Destruct((void*)example);
	getchar();
}

To add an inheritance:

#include <malloc.h>
#include <stdio.h>
// Default defenitions
typedef struct _Type	Type;
typedef struct _Object	Object;
typedef void (*ConstructorHandler)	(Type* type, Object* instance);
typedef void (*DestructorHandler)	(Type* type, Object* instance);
struct _Type {
	int					size;
	ConstructorHandler	Construct;
	DestructorHandler	Destruct;
};
struct _Object {
	Type*				type;
};
Object* Construct(Type* type) {
	Object* object	= malloc(type->size);
	object->type	= type;
	type->Construct(type, object);
	return object;
}
void Destruct(Object* object) {
	object->type->Destruct(object->type, object);
	free(object);
}
Type* CreateType(int size, ConstructorHandler constructor, DestructorHandler destructor) {
	Type* type		= malloc(sizeof(Type));
	type->size		= size;
	type->Construct	= constructor;
	type->Destruct	= destructor;
	return type;
}
// An example implementation
typedef struct _Example Example;
typedef void (*GenerateResultHandler)(Example* example);
struct _Example {
	Type*					type;
	int						a;
	int						b;
	int						result;
	GenerateResultHandler	GenerateResult;
};
void GenerateResultImpl(Example* example) {
	example->result = example->a + example->b;
}
void ConstructExample (Type* type, Object* instance) {
	((Example*)instance)->a					= 0;
	((Example*)instance)->b					= 0;
	((Example*)instance)->result			= 0;
	((Example*)instance)->GenerateResult	= GenerateResultImpl;
}
void DestructExample (Type* type, Object* instance) {
}
Type* CreateExampleType() {
	return CreateType(sizeof(Example), ConstructExample, DestructExample); 
}
// An inheritance
typedef struct _Example2 Example2;
typedef void (*GenerateResult2Handler)(Example2* example);
struct _Example2 {
	Type*					type;
	int						a;
	int						b;
	int						result;
	GenerateResult2Handler	GenerateResult;
	int						result2;
};
void GenerateResult2Impl(Example2* example) {
	example->result = example->a * example->b;
	example->result2 = example->a + example->b;
}
void ConstructExample2 (Type* type, Object* instance) {
	((Example2*)instance)->a				= 0;
	((Example2*)instance)->b				= 0;
	((Example2*)instance)->result			= 0;
	((Example2*)instance)->result2			= 0;
	((Example2*)instance)->GenerateResult	= GenerateResult2Impl;
}
void DestructExample2 (Type* type, Object* instance) {
}
Type* CreateExampleType2() {
	return CreateType(sizeof(Example2), ConstructExample2, DestructExample2); 
}
void main() {
	Type* exampleType = CreateExampleType();
	Type* example2Type = CreateExampleType2();
	Example* example = (Example*)Construct(exampleType);
	Example2* example2 = (Example2*)Construct(example2Type);
	example->a = 5;	example->b = 5;
	example2->a = 5;	example2->b = 5;
	example->GenerateResult(example);
	example2->GenerateResult(example2);
	printf("example:  result: %dn", example->result);
	printf("example2: result: %d result2: %dn", example2->result, example2->result2);
	Destruct((void*)example);
	Destruct((void*)example2);
	getchar();
}

Limitations
Although using types, and casts creates a lot more freedom, it still has its limitations:

  • No multi inheritance possible
  • No ‘new’ functions (functions that are only used when a pointer is specificly cast to a struct)
  • No static support throughout types
  • No type behaviour inheritance (a type still has its own basetype and isn’t an object itself)

I’ll explain some stuff about wrapping old implementations, or even multiple implementations in the next part, which will overcome these limitations.

‘Objects’ in C – part 1, the basics

C is on memory allocation quite faster than C++ for C++ contains a lot of overhead due to its object orientated nature.
Although not as convenient in C it is possible to get ‘Object’ like stuff in C as in C++.

I’ll explain the basics in this first part, and continue with more complicated (and neater stuff) in the later parts. I hope you’ll find them usefull.

The base
For this example we’ll use a ‘class’ called ‘example’, first the base:

typedef struct{
	int dummy;
} Example;
Example* ConstructExample(){
	return malloc(sizeof(Example));
}
void DestructExample(Example* example){
	free(example);
}
void main(){
	Example* e;
	e = ConstructExample();
	DestructExample(e);
}

Simple inheritance
Lets add some values, and some simple functions and inheritance. Using the inherited class as the base class is just a simple matter of casting via void*.

typedef struct {
	int a;
} Example;
typedef struct {
	int a;
	int b;
} Example2;
Example2* ConstructExample2() {
	Example2* example = malloc(sizeof(Example2));
	example->b = 10;
	example->a = 11;
	return example;
}
Example* ConstructExample() {
	Example* example = malloc(sizeof(Example));
	example->a = 1;
	return example;
}
void DestructExample(Example* example) {
	free(example);
}
void DestructExample2(Example2* example) {
	free(example);
}
void PrintExample(Example* example) {
	printf("%d ", example->a);
}
void PrintExample2(Example2* example) {
	printf("%d ", example->b);
}
void main() {
	Example* e;
	Example2* e2;
	e = ConstructExample();
	e2 = ConstructExample2();
	PrintExample((void*)e);
	PrintExample((void*)e2);
	PrintExample2((void*)e2);
	DestructExample(e);
	DestructExample2(e2);
}

When inheriting you have to copy the original defenition and only append at the bottom, changing nothing of the previous stuff for otherwise you’ll get nasty errors. When you want to override stuff you got to use neat tricks, more on that lateron.

Virtual functions
Using function pointers, virtual functions can be used:

typedef void (*PrintExample)(void* example);
typedef struct {
	int a;
	PrintExample Print;
} Example;
typedef struct {
	int a;
	PrintExample Print;
	int b;
} Example2;
void PrintExampleImpl(void* example) {
	printf("(printexample) %d", ((Example*)example)->a);
}
void PrintExample2Impl(void* example) {
	printf("(printexample2) %d ", ((Example2*)example)->b);
	PrintExampleImpl(example);
}
Example2* ConstructExample2() {
	Example2* example = malloc(sizeof(Example2));
	example->b = 10;
	example->a = 11;
	example->Print = PrintExample2Impl;
	return example;
}
Example* ConstructExample() {
	Example* example = malloc(sizeof(Example));
	example->a = 1;
	example->Print = PrintExampleImpl;
	return example;
}
void DestructExample(Example* example) {
	free(example);
}
void DestructExample2(Example2* example) {
	free(example);
}
void main() {
	Example* e;
	Example2* e2;
	e = ConstructExample();
	e2 = ConstructExample2();
	e->Print(e);
	e2->Print(e2);
	DestructExample(e);
	DestructExample2(e2);
	getchar();
}

A virtual function still has to be supplied with the function in which it has been called. 2 simple macro can be made to make a this call a bit easier, (for some :p):

#define THISCALLPAR(x,y,z) x->y(x,z)
#define THISCALL(x,y) x->y(x)

Wrapped inheritance
To override some functionality and retain other functionality while adding your own functionality is virtually impossible by using one simple object. Multi-inheritance would be virtually impossible.
2 parts ahead I will talk about these limitations and how to overcome them, the next part discusses inheritance in more detail, and adding ‘Types’ in the mix.

Intrepid.IO.FragmentedStream

While working on a design of a .pak files, which are nothing more then archives, I noticed most existing .pak files are just very simple. This for they put all files after eachother which makes it really hard to even just add a file, or delete one.

Therefor I started thinking about a way to make an efficient archive file for read and write access.

There are basicly 2 ways to accomplish this:

  • Clustering
  • Fragmenting

Fragmenting.
The archive consists out of several files, and an index which is counted as a file. The start of every file is pointed to from the index by its offset. When a file is deleted, its record in the index is null-ed, which is lateron filled with a new entry and in the meantime ignored (for the only file that can start at position 0 is the index itself).

A file itself is fragmented, every fragment starts with 4 bytes marking the length of the fragment and after that again 4 bytes marking the offset of the next fragment this time. In case there is no next fragment present the next-cluster-offset is 0. The first fragment of every file contains after the 8 bytes marking the length and offset of the next fragment some information about the file like the name.

For the index itself is a file, it also is fragmented and can grow, and shrink.

The 4 byte offset value could be default set to 8 bytes in case files are going to be bigger than 2 Gig, and the oposite in case you are dealing with small files. Although this can decrease the size of files which are frequently edited it isn’t possible to increase or decrease to 8 or 2 bytes in runtime.

Clustered
The clustered way is practicly the same as the fragmented way except for that every fragment is padded to a multiple of the cluster size. Therefore the ‘next fragment’ value can now be a cluster number instead of an offset. This will both increase the speed of writing due to having always a buffer, also defragmenting will be faster. The files itself will also be able to retain a 2 byte next-cluster pointer longer. The only downside is that files will just be larger.

FragmentedStream
The fragmented stream wraps these methods pretty efficiently at the moment.
I am thinking to implement an inteligent algorithm which checks which streams are often used which are then placed at the end of the file with a buffer. This also could be applied for files which are marked to be expanding.

Rich Client Side Framework

On several blogs the idea of having a rich java script passed, for example on ZefHemel.com: Rich Web UI: Search As You Type

Guess due to google, which has made a neat Webmail interface for gmail and Google suggest with find as you type.

The demands on java script keeps growing. People want to make better webUI’s and features with Javascript although javascript is defenitely not designed for this stuff.

Using flash, and java is an overkill, but using javascript is espacially an overkill for javascript isnt handled consistantly on different browsers, and isn’t as quick and maintainable as it could be.

I guess it would be time to extend HTML itself with a more advanced script; java like preferably although then directly supported by the browser, and less aimed at custom drawing but using an API provided by the browser.

I’m currently experimenting with Microsoft .net assemblies which are downloaded in slimmed form as webpage which are executed with very limited access. Which works neat although it is still an overkill (a .dll is about 20 Kb, even if you got only one line of code..)

Just a thought.

OO Stated Stackbased Parsing

Every parsable format (like INI, XML) consists out of certain area’s. You can be parsing the section name at one moment, or be parsing a comment when parsing an INI. These certain area’s where you can parse result in a parse state. In every state you expect something else, and you gather other kinds of information.

When you are parsing in a certain state you can find that the state has changed (the parser found a new xml node in a xml file), then the old state is pushed on the state stack.

In certain circumstances you need to have the ability to fall back to a previous state, this can happen when you are parsing a apparently new section name and suddenly there is a comment character instead of a sectionname end character. In this case you need to be able to fall back on the previous section you were parsing. Although when you successfully have parsed the sectionname you want the old sectionstate removed from the stack (and the data of it emited).
Continue reading OO Stated Stackbased Parsing

Parsing $_SERVER[‘PATH_INFO’]

The PHP global variable $_SERVER['PATH_INFO'] contains the path suffixed to a PHP script, if I would call the URL:

http://domain.ext/path/to/script.php/foo/bar.htm?a=b&c=d

Then $_SERVER['PATH_INFO'] would contain:

/foo/bar.htm

Traditionaly the $_GET variables are used for certain parameters like a page to display:

http://domain.ext/page.php?page=about.htm

This method is easy to program, but not only looks strange, but also is very search engine unfriendly. Most searchengines ignore the QueryString (the part of the URL after the ?). And therefor would index the first page.php?page=x they would find and ignore the rest.
Some searchengines like Google do not ignore the query string, but would give a page without using a querystring for different content a way higher ranking.

Parsing the $_SERVER['PATH_INFO'] is relatively easy, this code would do most of the stuff just fine:

if (!isset($_SERVER['PATH_INFO'])){
	$pathbits= array('');
}else{
	$pathbits = explode("/",  $_SERVER['PATH_INFO']);
}

The $pathbits array would always contain / as first element if a path info was provided, otherwise it will be an empty array.

Here is a quite simple example which parses the path info to decide which file to include:

<?php
if (!isset($_SERVER['PATH_INFO'])){
	$pathbits= array('');
}else{
	$pathbits = explode("/",  $_SERVER['PATH_INFO']);
}
if (!isset($pathbits[1]) || $pathbits[1] == ""){
	$page = "default"
}else{
	$page = basename($pathbits[1]);
}
$file = "./pages/{$page}.php";
if (!is_file($file)){
	echo "File not found";
}else{
	require $file;
}
?>

Modular Server

(Understanding URIs)

“A common mistake, responsible for many HTTP implementations problems, is to think this is equivalent to a filename within a computer system. This is wrong. URIs have, conceptually, nothing to do with a file system. One should remember that at all times when dealing with the World Wide Web.”

So why do most HTTP servers still heavily rely on the filesystem dictating URL’s?

This not only tends to create Uncool URI’s, but also makes it seem logical for filebased dynamic content (more on that in my previous post).

A http server, actually every server should be nothing more then a wrapper for modules which handle requests of clients.

The server would only limit itself to a very selected amount of functions

  • Handling connections
  • Exposing an API for the protocol which the modules can us
  • Hosting modules

On startup the server loads modules and binds them to certain URL. Modules remain persistant in the memory and are just signalled a request is made passing the module an API to handle the request.

I’ll be busy exploring the posibilities to implement this.. there will be more about this.

Why server side scripts aren’t that scalable at all

There are a lot of different types of server scripts, a few examples:

Practicly all of these were to be used via CGI, but most of them created tighter intergrations for popular webservers like for the Apache Http Server. Usualy the tight implementations for apache are in the form of modules like mod_python to use the general purpose python scripting language as a server side scripts.

When you, as client, request a page powered by a server side script via CGI (like for instance this blog at the time of writing), the webserver starts the responsible interpreter providing it with some arguments like the POST, GET variables, which then are processed by the PHP interpreter executing the php script, which provides a stream to return to the client.

This works pretty well with little, not too demanding, scripts.
However, problems arise when a lot of people use the scripts or when the script itself is quite demanding.

An example could be this blog, this blog uses a mySQL database to store its posts and comments. Every time the index page is requested it makes a new connection to the mySQL database server and send a query for the categories, links, latest posts, latest comments, etc. Creating a mySQL connection takes time, sending queries takes time, processing queries takes time for the mySQL server, retreiving the results takes time, processing the results take time, this all just to produce the same content for the index page over and over again, for there are at least 100 times more visits than updates to this blog.

Some blogs build there content. When you are viewing posts on these webblogs these posts are not generated for your request but are cached (usualy as normal .html files on the webserver). The control panel to post new blog items is however written in a server side script, and utilizes some sort of database. When you are finished creating new posts the php script will rebuild the cached pages from the database which will significantly reduce the server stress.

The downside of these types of blogs is that they contain only a very limited amount of dynamic features, for that would require scripts. Another downside of these blogs is that it is very hard to get such a blog hosted by multiple servers at the same time which usualy happens when a site is very popular and one server can not handle the demand. This is possible for the database powered blog for it stores its data on one centralized datbase server. However, a database powered blog will most likely require more than one server quite soon for it is a far greater strain on the server then a caching weblog. However, it is possible to be done by setting up the blog in such a manner that it stores its cached files on a centralized server too (by normal filesharing). However, I would be suprised when a server using cached pages will ever reach the limit of its server capacities.

The problem gets bigger when you are dealing with more dynamic server side software like forums. A forum requires some queries. Like a query which retreives the posts, it has no use to cache them for visitors of forums tend to post (yeah.. I know, it’s strange), which would require caches to be updated (which would be an even bigger strain then just using the database anyway).

A forum however still has got a lot of stuff that is quite static and would run a lot faster if it could be cached. This would be stuff like the templates, the categories architecture, the help pages, the statistics like user count, and the sessions, which are very dynamic but are queried by every page view. These things usualy are requested in every time you download a page.

Some forums use a cache which consists out of a table in the database which contains all cached stuff serialized so it can be immediately used. But this still means about 10 Kb transfered from the databaseevery time someone views your page!

This problem even grows bigger when you are developing even more demanding server side projects like a browser based online game.

I started developing an online game as just a hobby project in PHP, but I soon switched to making my own webserver in C# which caches all the stuff in the memory of the webserver which makes the rewritten stuff 3 times faster.

I figured it would be great if http servers and server side would be less aimed to just handling one request but create more support for inter-request caching. There is limited support for caching in JSP and ASP.net but this is used quite rarely for JSP and ASP.net still focus on just requesting. A server side script should not be loaded on request but rather be loaded already in the form of being able to cache objects like a provider of mySQL connections, which just recycles a connection; a function class which contains all the common used functions already loaded; and off course stuff like templates, sessions, and other cachable things.

The problem about caching in the memory is that memory can’t be shared between multiple servers, so it isn’t realy scalable. When you would multiple servers and store sessions in the memory it is quite possible that members would just get a “session not found” error when clicking a page which is served by the other server. A possible solution to this problem could be to redirect people when they access “domain.ext”, to a mirror (“s12.domain.ext”). This would avoid session loss. When adding a ‘cacheversion’ value in a table on the shared database server which is changed everytime something is changed for which a cache is created. Requesting just this very small number would be enough to check whether the cache should be rebuild for another server has changed something.

Just a thought…