On IE7

I’ve had to test some website I designed on IE (VMWare offcourse) and it went really bad. IE 6 sucks, that’s for sure. It was a blow in the face for me, because everything went so terribly smoothly, escpecially the CSS.

I was wondering whether it would be as bad in IE 7 as in 6, so I installed it (Genuine Advantage doesn’t work by the way).

I was amazed by the amount of improvement they made.

The website was a mess in IE6: backgrounds wouldn’t show up; no attribute selectors on CSS; blocks were positioned in absolute ridiculous locations. Whereas in IE7 it showed up pretty nicely compared to 6. It wasn’t perfect though, there were quite a few glitches in positioning (go IE box model!) and font sizes, but that’s managable.

I wonder whether IE7 took so long and why it still isn’t as bug-less and feature-complete as almost every other big browser. If microsoft wanted to make a perfect browser, they could do it easily! They’ve got more than enough very talented people and even more than enough money to hire/bribe any expert they’d want.

Maybe they’re afraid of big AJAX applications. I bet gmail took a pretty big market share of outlook. At least, that is what other people are thinking. They may be right, but google wasn’t stopped by the bugs in IE, google doesn’t use nice and semantically correct HTML (and it is the correct stuff that usually doesn’t work on IE), they just use the stuff that works anyway.

More likely they don’t care. They force website devs already to create sites that show up properly in IE, just because of its market share. Only website creators feel the pain of the buggy IE engine, not the users of the browser. Besides that it could be a pain for them to create a whole new IE, which needs to be backwards compatible with the old IE. A lot of people would be angry if their site doesn’t show up properly anymore in the new IE — and these people can’t be persuaded with the nice semantics ideology.

A little bit more pressure by alternative browsers could force microsoft to release something more profound than IE7. Although IE7 is an improvement, they could have done better.

Assembly ‘programmers’ suck

Some programmers claim writing assembly is the solution to every single programming issue, for it would be great for being down to the basics, or it creates small programs, or even be fast. For all I care they are just trying to brag for not a lot of people know how to program in assembly and it is generally seen as difficult to do.

Programming in assembly is almost everytime a bad idea.

So first, what exactly is assembly?
Assembly is a text language with which you can write opcodes and arguments of machine code processed by the processor by hand. This gives you a very high amount of control of what the processor does. Whereas higher languages generate the machine code for you as they deem it to fit you can choose which machine code would fit.

Assembly isn’t difficult to learn. Assembly is very straight forward. It just is hard to program for you don’t have functions, you don’t have high level structures, you don’t have all the aid of a high level languages.

The reason programmers claim that assembly is superior is for with assembly you can write faster and smaller code. This is only partially true.

When you know all opcodes and tricks you can pull on all different architectures you can beat a compiler.. The problem here is that there are tens of architectures with totally different opcodes and even more subarchitectures with each a slightly different implementation and different performance characteristics of each opcode or way to do something.

So to truely beat a compiler which compiles one piece of code to assembly, you have to create for each different architecture a seperate piece of assembly source.

You’d also have to learn all the opcodes of one cpu, that aren’t just a few hundred which would suffice to get it working, but thousands which are required to get it working as fast as possible.

Compilers know all these extra opcodes and tricks for each architecture and would therefore a higher level programmer would do a better job on creating an executable than an experienced assembly programmer in the same amount of time. If the assembly programmer would want to beat the higher level programmer he would require not just 2 times more time but at least 10 times.

Also assembly isn’t very portable. If you want to change one little thing you got to change it for all optimalizations for all different processors. I haven’t seen a big application written in assembly. And that has got a reason.

Most programmers that claim to be able to write quick assembly don’t have got an idea how everything works exactly and are just thrilled that they made a little program work in assembly.

Assembly though can be usefull. An example is an arbitrary length integer class like GMP which uses optimized assembly for certain operations. It aren’t a lot of operations done in assembly, but it certainly has got a lot of assembly. And it is worth it.

Sticking with a good compiler and a high low level language like C is always the best option for good applications.

Why Php sucks (and I still use it)

  • Php is slow, not just slow, but really slow. A simple benchmark runs in 1 milisecond in C. It takes 2 miliseconds for .Net. Python takes 600 miliseconds for instead of a native assembly language or jit-ted language it is an interpreted language. But Php, even though it also is an interpreted language and hasn’t got the enourmous object overhead Python has got it still takes 12000 miliseconds, that are 12 seconds.
  • Stupid work arounds, for Php itself is rather slow you got to rely as much as you can on function calls instead of doing anything yourself. It is for instance faster to load a list by deserializing a string than just reading line by line through a file although the deserialization would be way faster if both methods would be implemented nativly. Another little issue here is that most very quick functions in Php are only available in the newer versions (eg. file_set_contents), this requires you to add an if statement with a home made implementation of the function which usualy is slower by a factor of 10. You can choose to use an alternative way which doesn’t exactly implements the functions of the function you require to use but still does the job for the circumstance (better) (eg. not rewriting file_put_contents when it isn’t available when you want to streamingly write data to a file but rather call fwrite a few times which gets rid of having the whole file in the memory in a string at a time.
  • State less Although the Http protocol is a stateless protocol that doesn’t have to mean that a server side scripting framework should be stateless too. Although Php attempts to become statefull by using a session implementation by serializing a session array on the harddisk for every session this isn’t very efficient. Even one global array that persists between requests would result in such a performance boost. Not only for it doesn’t require file reads, writes, serialization and deserialization (and optionaly queries when you don’t like the php session system), but also for it would allow you to store that little bit of important cache between sessions that otherwise would have needed to be read, deserialized, serialized and written again, for every mere page view!
    Allowing such a persistant array however poses a security risk in the way Php works at the moment. They should add contexts to allow one instance of apache on which mod_php runs to execute files in different context, each with its own settings (and persistant data).
  • There is no satisfying solution in Php For every single common issue in Php there is no simple solution, that works perfectly or even reasonbly well.
    Take for instance templates. There are basicly a few ways to handle templates in Php. Usualy it comes down to either caching .php-scripts which are than executed as smarty does, or using a class for every major template section where every function fetches a template bit. In both these methods executing Php code is required to just only replace a certain tag with an replacement. Php has been designed to do a lot more than that and contains a lot of overhead during interpretation. Using str_replace‘s is a lot faster than a php block inline or even using instring php variables ("Example: {$example}"). The second way using classes and functions is even worse for the whole class and all functions first need to be loaded in the memory and basicly are a lot slower.
    The proper way to use templates is streaminly inputting and replacing tags with their values and outputting it. This isn’t possible for php is slow and loading the whole template in a string is even faster.

The only reason why I still use Php is for it is just the number one supported server side scripting language.

Why is world wide web not fair?

PHP Security Consortium

The PHPSC is a site managing a lot of resources on PHP security.

For all those starting or sometimes using PHP this is a must read.

Also I’d advice for people who want to know whether there site is safe enough is to try to play the other site by trying out hacking yourself: hackthissite.org. It is easier than you might have thought.

Modular Server

(Understanding URIs)

“A common mistake, responsible for many HTTP implementations problems, is to think this is equivalent to a filename within a computer system. This is wrong. URIs have, conceptually, nothing to do with a file system. One should remember that at all times when dealing with the World Wide Web.”

So why do most HTTP servers still heavily rely on the filesystem dictating URL’s?

This not only tends to create Uncool URI’s, but also makes it seem logical for filebased dynamic content (more on that in my previous post).

A http server, actually every server should be nothing more then a wrapper for modules which handle requests of clients.

The server would only limit itself to a very selected amount of functions

  • Handling connections
  • Exposing an API for the protocol which the modules can us
  • Hosting modules

On startup the server loads modules and binds them to certain URL. Modules remain persistant in the memory and are just signalled a request is made passing the module an API to handle the request.

I’ll be busy exploring the posibilities to implement this.. there will be more about this.

Why server side scripts aren’t that scalable at all

There are a lot of different types of server scripts, a few examples:

Practicly all of these were to be used via CGI, but most of them created tighter intergrations for popular webservers like for the Apache Http Server. Usualy the tight implementations for apache are in the form of modules like mod_python to use the general purpose python scripting language as a server side scripts.

When you, as client, request a page powered by a server side script via CGI (like for instance this blog at the time of writing), the webserver starts the responsible interpreter providing it with some arguments like the POST, GET variables, which then are processed by the PHP interpreter executing the php script, which provides a stream to return to the client.

This works pretty well with little, not too demanding, scripts.
However, problems arise when a lot of people use the scripts or when the script itself is quite demanding.

An example could be this blog, this blog uses a mySQL database to store its posts and comments. Every time the index page is requested it makes a new connection to the mySQL database server and send a query for the categories, links, latest posts, latest comments, etc. Creating a mySQL connection takes time, sending queries takes time, processing queries takes time for the mySQL server, retreiving the results takes time, processing the results take time, this all just to produce the same content for the index page over and over again, for there are at least 100 times more visits than updates to this blog.

Some blogs build there content. When you are viewing posts on these webblogs these posts are not generated for your request but are cached (usualy as normal .html files on the webserver). The control panel to post new blog items is however written in a server side script, and utilizes some sort of database. When you are finished creating new posts the php script will rebuild the cached pages from the database which will significantly reduce the server stress.

The downside of these types of blogs is that they contain only a very limited amount of dynamic features, for that would require scripts. Another downside of these blogs is that it is very hard to get such a blog hosted by multiple servers at the same time which usualy happens when a site is very popular and one server can not handle the demand. This is possible for the database powered blog for it stores its data on one centralized datbase server. However, a database powered blog will most likely require more than one server quite soon for it is a far greater strain on the server then a caching weblog. However, it is possible to be done by setting up the blog in such a manner that it stores its cached files on a centralized server too (by normal filesharing). However, I would be suprised when a server using cached pages will ever reach the limit of its server capacities.

The problem gets bigger when you are dealing with more dynamic server side software like forums. A forum requires some queries. Like a query which retreives the posts, it has no use to cache them for visitors of forums tend to post (yeah.. I know, it’s strange), which would require caches to be updated (which would be an even bigger strain then just using the database anyway).

A forum however still has got a lot of stuff that is quite static and would run a lot faster if it could be cached. This would be stuff like the templates, the categories architecture, the help pages, the statistics like user count, and the sessions, which are very dynamic but are queried by every page view. These things usualy are requested in every time you download a page.

Some forums use a cache which consists out of a table in the database which contains all cached stuff serialized so it can be immediately used. But this still means about 10 Kb transfered from the databaseevery time someone views your page!

This problem even grows bigger when you are developing even more demanding server side projects like a browser based online game.

I started developing an online game as just a hobby project in PHP, but I soon switched to making my own webserver in C# which caches all the stuff in the memory of the webserver which makes the rewritten stuff 3 times faster.

I figured it would be great if http servers and server side would be less aimed to just handling one request but create more support for inter-request caching. There is limited support for caching in JSP and ASP.net but this is used quite rarely for JSP and ASP.net still focus on just requesting. A server side script should not be loaded on request but rather be loaded already in the form of being able to cache objects like a provider of mySQL connections, which just recycles a connection; a function class which contains all the common used functions already loaded; and off course stuff like templates, sessions, and other cachable things.

The problem about caching in the memory is that memory can’t be shared between multiple servers, so it isn’t realy scalable. When you would multiple servers and store sessions in the memory it is quite possible that members would just get a “session not found” error when clicking a page which is served by the other server. A possible solution to this problem could be to redirect people when they access “domain.ext”, to a mirror (“s12.domain.ext”). This would avoid session loss. When adding a ‘cacheversion’ value in a table on the shared database server which is changed everytime something is changed for which a cache is created. Requesting just this very small number would be enough to check whether the cache should be rebuild for another server has changed something.

Just a thought…