Python Html Document Abstraction

Python is great!

>>> d = document()
>>> d.html.body.h1.value = "My Site!"
>>> d.html.body.p.value = "Welcome to this python generated site"
>>> str(d)
'<?xml version="1.0" encoding="UTF-8"?>< !DOCTYPE html PUBLIC "-//W3C//DTD XHTML
 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<body><p>Welcome to this python generated site</p>
<h1>My site!</h1></body><head><title>
</title></head></html>'

(Ignore the added slashes and the additional line breaks caused by wordpress)

By overloading the __get/set/delattr__ functions a html document can be represented like a real python object model.

I’ve just experimented a little bit with python code to ultimately go to write a framework to write nice dynamic python webbased applications in.

Although it appears that the names of the objects (html, body, p, etc) are the tag names, they aren’t. They are the identifiers of the tags.. in case the tag isn’t set by yourself but just created for it didn’t existed it uses as tag its alleged id.

The default created object when no object exists already with that id is a tag. This abstract document won’t be limited to tags. I’ve just made a styleTag class which allows:

d.html.head.style.body["font-family"] = 'verdana'

which is basicly the same as

d.html.head.style.["body"].font-family = 'verdana'

In contrary to the normal tag class where an item is an attribute, this is different in the style tag for CSS got a lot of characters which python doesn’t like (like #).

Being able to manipulate a style sheet that easily allows every custom tag (maybe a datetimepickercontrol) to set its own style information by just using simple python code.

For the styletag isn’t bound to putting its emitted css in the emitted-html string itself in case it is emit-ed in a specific context like a webserver, it can even create a seperate css for this purpose.

Python allows much more dynamic features in a dynamic framework like this than any other language, I`m quite enthousiastic about it and am playing with new idea’s like a little child :-).

All kinds of idea’s would be welcome..

Just wondering whether such a thing has already been written for Python.. anyone knows?

Strange spam

The last few days this blog has been under heavy attack of comment spam. Although the excellent wordpress filters have put all of it in the moderation queue it still is quite some work so sift out any comments that actually are of a real person.

The odd thing I noticed out of curiousity is that the links don’t even seem to work on more than half of all the spam comments. They are basicly flooding you with for them hopefully tempting comments and if someone finaly has been tempted enough to click one it doesn’t work!

Using exception or assertation

People tend to mix up the intended use of exceptions and assertations.

Assertations

Most assertation implementations will show up a form and most of them allow you to ignore the assertation. I once was using a program which even asked me whether I wanted to ignore that assertation in the future.

Using asserations as normal errors is wrong. Assertations aren’t meant to be errors, neither exceptions.

Using assertations (as you should) is to make sure that the conditions that should be right are right, if they aren’t right you will know and you know you made an error and not the program or the user.

A seemingly ridiculous example is the following assertation:

ASSERT(1 + 1 == 2)

If a programmer uses the assumption that 1 + 1 is 2 and isn’t sure about this he would use an assertation to be notified when he was wrong or missed something.

Exceptions

Most people treat exceptions as failures of a piece of program. And some programmers don’t tend to clean up stuff before they throw an exception for they just think that noone wants to do anything anymore, for an exception, they think, is fatal.

Exception aren’t fatal. Exceptions aren’t errors. Exceptions occur, for they are just as the word says exceptions. You just got to handle any exception that might occur. An exception becomes fatal when this isn’t handled for it is either totaly unexpected or just unhandable, where in both cases letting the exception stay unhandled is a better alternative than using an assertation.

Command line parser update

I made some updates on my command line parser for .net .

It now supports aliases and you can set whether the parameter should accept a value or not so you can use -foo bar where foo doesn’t accept a value and therefore bar is regognized as a position dependand parameter.

An example parameter definition:

[Parameter(Position = 0, IsMandatory = true,
AcceptsNoValue = false, Name = "folder",
Description = "The folder", 
Aliases = new string[]{"target"})]
public string FolderParameter
{
    get { return _FolderParameter; }
    set { _FolderParameter = value; }
}

(Ignore the slashes placed before the quotes, wordpress seems to add them automaticly) I also have added the Intrepid.Automation.CommandLine.OutputHelp function, which outputs help for the parameters of an object to a stream. Like this:

SUMMARY
Creates playlists in a folder and its subfolders

PARAMETERS
-target

[String excludefilter = NULL] (efilter)
Only files not matching this regex are included

[String includefilter = NULL] (ifilter)
Only files matching this regex are included

String folder (target)
The folder

[Boolean expand]
Whether to expand referenced m3u’s

REMARKS
m3u playlists are automaticly excluded; use -excludefilter to add additional excludes.

You can still download the Intrepid.Clorelib assembly here and you still may only use it for non-commercial open-source usage and may not change/reverse engineer/etc it in any way.

Hope it will prove usefull :-).

Command line parser for .net

I just wrote a command line parser for .net which parses the command line similar to the way Monad’s cmdlets do it, with the key difference that this still is for the normal .exe executables.

It’s very simple, first get yourself a program class, which contains some properties you want the user able to set via the commandline and apply the Parameter attribute:

class CopyProgram
{
    private string _Source;
    [Parameter(Position = 0, IsMandatory = true)]
    public string Source
    {
        get { return _Source;}
        set { _Source = value; }
    }
    private string _Target;
    
    [Parameter(Position = 1, IsMandatory = true)]
    public string Target
    {
        get { return _Target;}
        set { _Target= value; }
    }
    
    private bool _WalkRecursivly;
    [Parameter(Name = "recursivly")]
    public bool WalkRecursivly
    {
        get { return _WalkRecursivly;}
        set { _WalkRecursivly= value; }
    }   

    static void Main(string[] args)
    {
        new Program().Run(args);
    }

    void Run(string[] args)
    {
        CommandLine.ParseCommandLine(args, this);
        
        // Do stuff here
    }
}

The CommandLine.ParseCommandLine function takes care of parsing the command line string array and filling the required properties of the class provided.

You can call this program in a few different ways:

copy.exe c:\source.txt c:\target.txt
copy.exe -s c:\source.txt -target c:\target.txt
copy.exe -targ c:\target.txt c:\source.txt  -r

Although it only required around 250 lines of code it certainly is an enourmous time safer and makes stuff a lot easier for the user.

At the moment I’m programming a few small utilities and will change the code a bit and fix some bugs if any.

For those interesting in testing it themselves (I encourage it), just download the Intrepid.Corelib .net 2 assembly*. The classes required are in the Intrepid.Automation namespace. Please send any bug reports to: bas.westerbaan@gmail.com.

(* You may use everything in the Intrepid.Corelib assembly freely for non-commercial open-source usage. Do not modify or reverse engineer the assembly in any way.)

Update:
Got rid of some bugs and added the RequiresValue property to ParameterAttribute. Will replace lateron with Type, this allows you to use -e bleh where -e takes no value and bleh is interpreted as a nameless parameter. Will add alliases too.

Serialization

Yey, I am your average developer and I made yet another program with some kind of data I want to be able to dump on the hard drive and be able to grab it again. Let’s use serialization!

Serialization was meant to be a tool for developers to pick up their data from the memory sqeesh it a bit and let it be dropped onto the hard drive, sparing hours of work making a custom data serializer to an existing format or even worse: a home made format. However there are a lot of reasons why not to serialize in .net.

Ok, so I’ll spend endless hours to make my own algorithms to save my data producing thousands of almost similar lines of code, but just not nearly similar enough to prevent copy and paste bugs. After admiring my very own labor there will be twice as much time debugging the code.

… mm… isn’t there another way to serialize, which doesn’t have too much adverse sides?
ok, what are the demands:

  • It shouldn’t effect the way you design classes.
    Thus:
  • It should not be based on public fields or properties.
    This is probably the foremost cause of type design restrictions due to serialization support. While dropping public fields as basis of the serialization allows protection of certain fields, it also prevent those fields to be accessed by some kind of automated serialization. This however is only a minor setback: you’ll need to create Serialize and Deserialize methods to control serialization. A part of the current .net serialization also relies on this principle, however there should be some modification:
  • The actual implementation of the (de)serialization algorithm should only optionaly be in the type declaration.
    The original .net serialization restricts the user to serializing souly his/her own types. If string was to be unserializable you wouldn’t be able to tell some kind of serialization handler: “hey, here is a type, serialize it” because this handler would find that you are a moron, trying to serialize a string which is not serializable because there is no Serialize nor a Deserialize method. You would be forced to create your own string serialization algorith in every single type you want to serialize and which contains a string. While this is doable image that the string has been changed in a complex data structure containing inter references… You still would be spending a lot of time creating a serialization algorithm for someone elses type. The most obvious solution to this problem is allowing some kind of TypeSerializer which could be ‘registered’ to a serialization provider. The only drawback to this solution is that you can’t access any private fields, therefore you aren’t able to truly serialize every type and there are scenario’s imaginable where serialization is impossible. There is no easy solution to this. Luckily this should be a rare event.
  • You should be able to handle reference types as reference types.
    The most fundamental flaw in the current .net serialization in my opinion is being unable to serialize a from multiple locations referenced type only once in other words preserving the ‘reference equals’. Programmers have been known to use ‘ID’s’ to preserve some kind of referencing ability. This seems like a nice simple solution, only draining processor time every time an Id needs to be solved.
  • There should be a Serialization and Deserialization Host/Provider.
    Such a provider has 3 main function, justifying its existence:

    • Storing type serializers
      All different type serializers could also be stored in a static list, but this would enforce the ‘use’ of every single one of them (during lookup). You don’t want to know how to deserialize meat when you are vegetarian!
    • Storing serialized signatures together with their (deserialized) objects
      The only way you are able to preserve references while being able to serialize on demand is to keep track of the already serialized reference type object or the already deserialize reference objects. This could also be managed in a static list however the ‘same’ rule applies: A ice cream shop doesn’t need to know what kinds of meats already have been serialized.
    • Providing Serialize and Deserialize methods to the Serialize and Deserialize algorithms by being an argument.
      The provider should automatically redirect the serialization request for a field of a serialize method to the serialize method of the type of the field, while passing itself as serialization provider along to the next ones in the chain. While this improves user comfort it also gives the serializer the ability to store the created ‘serialized object’ or in the case of a deserializer the deserialized object. There is a catch to this system, when there is the possibility of inter referencing this could cause an endless (until the stack runs out) chain. When the pork meat isn’t finished serializing it’s fields, the olives don’t know that it is busy and will invoke a second serialization of pork meat causing a second serialization of the olives and so on. This is easily solved by a not so elegant use of a Register method inside the Serialize or Deserialize method. This method adds the not yet totally (De)serialized object to the referenced object and serialized objects pairs allowing any objects down the graph to use the correct reference.
  • It should be able to coup with any changes.
    This is a hard one and this is a problem bugging all areas of development. There is no easy solution to this. The only way to handle this is to make some sort of conversion for older files or to have different types of serialized object of the same (sometimes changed) type. It would be something like that you have the serialized type of salade mix containing the amount of salad and another one which also contains the amount of tomatoes.
  • It should get rid of the string key value pair used by the microsoft serialization.
    Strings are slow and interpreting them is even slower. It’s like describing the forms of the figures in your bank account with metaphors. There is one good thing about key value pairs and that it that they are unordered. This barely manages to try to hide the fact that it’s too slow. An alternative could look like this:

    • A Guid regogniced by the deserialization provider as a specific version of the serialized version of a specific type. This would trash any problems with extensibility of the serialized type because you would simple copy and paste the algorithm and kick iit a bit (to fit your demands) and supply it with a new Guid while preserving the old algorithm and possibly adding a friendly obsolete exception. Any type which supplies serialization and deserialization to itself or to another type should include a list of accepted guids.
    • A list of referenced serialized objects which could contain the fields of the type. How this is used is to the creator of the serialization and deserialization algorithm.
    • A byte array containing any ‘personal’ data of a type whos data isn’t distributed among fields (like natives).
  • It should be secure.
    The memory is pretty save due to access restrictions and the fact that only the application controlling the memory really knows what a byte means, however the hard disk or even worse the internet isn’t really any match for the protection the memory offers. There are a few ways to protect data on your hard disk. One of those is access policy. However this is somewhat unpractical and can’t be applied to internet traffic. Maybe the best solution is using encryption, this could be applied to the hole file, inefficient, but effective or sensitive data inside of a serialized file could be stored into the ‘raw’ data and be encrypted by the type serializer.

I think that there could be an implementation of serialization able to meet these demands. If it does meet these demands there would be little objection left to use and it would be favorable above even the most optimized hand crafted data ‘dumpers’. I will do some more research and there could be a sequential more concrete article with some closer-to-code talk.

P.S. As you may have noticed the posts now contain an author specifies and you may have also noticed this post wasn’t written by the usual author. I’ve joined Bas Westerbaan writing posts for Intrepid Blog.

Regular Expressions in .Net

Regular Expressions in .Net (System.Text.RegularExpression) are fast!

When creating a Regex with the compile flag the regex will actually be compiled to very optimized native assembly which makes the .net regular expression library faster than any c# code you could write for one specific case.

This was a pleasant suprise 🙂 for now I know I can use regular expression for all those small things like parsing input.

Why Php sucks (and I still use it)

  • Php is slow, not just slow, but really slow. A simple benchmark runs in 1 milisecond in C. It takes 2 miliseconds for .Net. Python takes 600 miliseconds for instead of a native assembly language or jit-ted language it is an interpreted language. But Php, even though it also is an interpreted language and hasn’t got the enourmous object overhead Python has got it still takes 12000 miliseconds, that are 12 seconds.
  • Stupid work arounds, for Php itself is rather slow you got to rely as much as you can on function calls instead of doing anything yourself. It is for instance faster to load a list by deserializing a string than just reading line by line through a file although the deserialization would be way faster if both methods would be implemented nativly. Another little issue here is that most very quick functions in Php are only available in the newer versions (eg. file_set_contents), this requires you to add an if statement with a home made implementation of the function which usualy is slower by a factor of 10. You can choose to use an alternative way which doesn’t exactly implements the functions of the function you require to use but still does the job for the circumstance (better) (eg. not rewriting file_put_contents when it isn’t available when you want to streamingly write data to a file but rather call fwrite a few times which gets rid of having the whole file in the memory in a string at a time.
  • State less Although the Http protocol is a stateless protocol that doesn’t have to mean that a server side scripting framework should be stateless too. Although Php attempts to become statefull by using a session implementation by serializing a session array on the harddisk for every session this isn’t very efficient. Even one global array that persists between requests would result in such a performance boost. Not only for it doesn’t require file reads, writes, serialization and deserialization (and optionaly queries when you don’t like the php session system), but also for it would allow you to store that little bit of important cache between sessions that otherwise would have needed to be read, deserialized, serialized and written again, for every mere page view!
    Allowing such a persistant array however poses a security risk in the way Php works at the moment. They should add contexts to allow one instance of apache on which mod_php runs to execute files in different context, each with its own settings (and persistant data).
  • There is no satisfying solution in Php For every single common issue in Php there is no simple solution, that works perfectly or even reasonbly well.
    Take for instance templates. There are basicly a few ways to handle templates in Php. Usualy it comes down to either caching .php-scripts which are than executed as smarty does, or using a class for every major template section where every function fetches a template bit. In both these methods executing Php code is required to just only replace a certain tag with an replacement. Php has been designed to do a lot more than that and contains a lot of overhead during interpretation. Using str_replace‘s is a lot faster than a php block inline or even using instring php variables ("Example: {$example}"). The second way using classes and functions is even worse for the whole class and all functions first need to be loaded in the memory and basicly are a lot slower.
    The proper way to use templates is streaminly inputting and replacing tags with their values and outputting it. This isn’t possible for php is slow and loading the whole template in a string is even faster.

The only reason why I still use Php is for it is just the number one supported server side scripting language.

Why is world wide web not fair?

Microsoft Anti Spyware

Microsoft Anti Spyware beta

It looks to work well, it even detects some adware that adaware didn’t remove.

It is said by many people that using more than one anti-adware program is the best way for they all don’t catch everything. Joel suggests why Microsoft Anti Spyware wouldn’t catch everything: conflicts of interest.

A lot of money is made by redistributing spyware/adaware/malware on people’s computers. They could easily bribe some anti-adware software developers to ignore their adware.

I just got to find an anti-adware program that is proffesional enough to clean my computer, and isn’t bribed to leave some.

The problem is that there are a lot of anti-adware programs, most of them are adware themselves.

Blog updated

The blog has been updated to 1.5.

It adds a lot of nice new features and I would like to advice other users to upgrade now.

The new skin is the new default skin that comes with WP1.5.

When I find time I’ll adapt it a bit to make it a bit less default.

Edit Just noticed that the wordpress ACP looks horribly scrambled in Firefox, looks all right in internet explorer though :(.

Skins and performance in PHP

There are several ways to use skins in PHP, I’ve put some through a performance test.

Basicly you can use either evaluated PHP or a string that will undergo str_replace’s.

When evaluating PHP in a file it seems to be faster than replacing tags in a string. This for PHP streams through the file during execution instead of handling one big string. The difference is minimal though (15% in my tests).

Although when the PHP code is placed in a string instead of in a file which has to be done in case a string is cached in a database or is generated from compiling from another format it is significantly slower than using str_replace’s on a normal string (600%!), this is because the original sourcecode, the intermediate code and the return from the code all take a lot of memory.

Either use cached PHP files or a string with tags instead of PHP code in a database, never the otherway around (what happens very often).

Caching in PHP

It is usefull to cache certain things between Php script executions.
Some boards written in Php cache the forum architecture so a difficult query hasn’t got to be run every time a guest views the board.

There are a few ways to cache data:

  • Php script. Data will be stored as a normal php file which will be included during execution
  • Serialized object in file. Data will be serialized and dumped to a file which will be read every page view
  • Database storage. Data will be serialized and stored in a database and queried every page view.

There are a lot of myths about using a database would be way slower than a normal php file.

I’ve run a few tests caching a ~16kB php array, the results:

Serialized object stored in file: 0.0015ms
Object in PHP script: 0.0121ms
Serialized object stored in mySQL database: 0.0015ms

It seems to be quicker to use a serialized array in a file as configuration file than a config.php php script!

Databases although just as quick as normal files are favored by me for they are much more scalable.

Gamma Wave Effect

As promised a few pictures of the electric guitar effects we’re working on,

first up, the gamma wave effect.

The gamma wave effect pulls the amplitudo’s either to the 0 line or the -1/1 line, just like the gamma on your monitor does:

Max Negative Linear Gamma Applied

Max Positive Sine Gamma Effect

The effect basicly makes the wave a lot louder and gets rid of the faint sounds. When on fully applied as in the images above it also creates some distortions due to either making the wave unharmonic-ish or getting rid of the nuances.

Another image taken on a higher oscilation (smaller zoom):

Max Pos Sine Gamma 2

This clearly shows the distortion

(The inversion of peak and valley doesn’t matter for for sound only the transition from a valley to a peak matters)

Electric Guitar Distortion

As I said before I am working with some others on a software based distortion. I finaly got the base running pretty smoothly but the main obstacle is how to create that distortion electric guitars use.

We came up with about 3 different methods:

(When I got my osciloscope control working I’ll take before and after pictures, for now I ain’t sure how the formula’s effect real-world sound yet)

Gamma corrections
This works a bit like the gamma of your monitor. The input for graph for gamma correction is called the epsilon which comes in 2 forms, linear and sine. The first one creates a more distortion like sound than the latter but also makes the wave not harmonic anymore which possible means that most of the distortion like sound is caused by the speakers not handling unharmonic waves very well. The latter one only sounds near the amplitudo distorted.

Sharpening valleys/tops
This method requires some buffering of the current top (or valley) of the wave and sharpenes it by a specific amount. This works a bit like the gamma correction method although it works on every volume making it usefull too for low-volume sounds. The major problem is that it requires tracking a top or a valley, which with a high bitrate requires a really big buffer to analyze, and it has a delay of one top/valley. The big problem is that this isn’t ideal for live playing for which it was designed, a delay of a few extra hundreds of a second would be noticable and it is fairly possible that this algorithm requires too much resources.

Adjusting speed resistance
At a certain point in a wave you can derive the speed and the angle. By registering the original speed and resistance of the wave a derived one can be created which could leed to sharper or softer edges of a top/valley just as the previous method but without being required to analyze the whole top/valley. We’re hoping this method will work best but this is just a concept hoping to work out as it should.

If anyone actually knows how analog distortions work exactly we would be more than happy to learn about it, just comment.

More on this to come…

Working on a software based distortion for the electric guitar

At the moment I am busy creating a software based distortion for an electric guitar.

The most challenging part is getting effects to work, to explain this in more detail you need to know how a computer handles audio.

Sound itself is nothing more than a vibration in the air. It can be represented by the amount of force the air is pushing or pulling.

A tone of a specific frequency would look like:

A wave

A computer stores sound by sampling the amplitude of the air at regular interfals.

It would be easy to write a program to increase the frequency of the sound above, but a normal sound doesn’t look that regular:

A wave

Increasing the bass or treble of that wave would require some advanced algorithms, which take time to execute which creates a larger delay. One thing that a distortion shouldn’t do is lag.

More on this when some stuff is working.

PHP Security Consortium

The PHPSC is a site managing a lot of resources on PHP security.

For all those starting or sometimes using PHP this is a must read.

Also I’d advice for people who want to know whether there site is safe enough is to try to play the other site by trying out hacking yourself: hackthissite.org. It is easier than you might have thought.