Don’t rely on parsing

Most applications store their settings (espacially in *nix) in text configuration files. These text files need to be parsed every time the application starts.

Parsing is the action of (usualy streaming) dividing a certain piece of data into understandable parts. This usualy comes down to looking at the text file character by character and deciding what should be done. Usualy this is done by maintaining a state which contains the data collected sofar. And with a bit more complicated files this even means having a stack of states and complicated actions when a certain state is left.

The problems:

  • Parsing is slow, very slow
  • Formats required to be parsed contain overhead, a lot of overhead

But it certainly has got advantages:

  • Humans can easily edit it, you don’t need to rely on configuration tools
  • (Usualy) makes a configuration format more extensible by nature (adding one new field in the average programmer’s binary format would break it)

Now, there are attempts to help improve speed. This by standardizing the format, which makes the amount of oddities to expect less, which ultimately makes the whole parsing slightly faster. This at cost of the easiness it can be edited.

A good example would be Xml. Xml is damned ugly. Xml is too strict. And Xml still takes a hell of a lot of time to parse.

Yaml looked like a decent alternative: easy to edit, looks nice. But then I encountered this:

%YAML 1.1
---
!!map {
? !!str "sequence"
: !!seq [
!!str "one", !!str "two"
],
? !!str "mapping"
: !!map {
? !!str "sky" : !!str "blue",
? !!str "sea" : !!str "green",
}
}

Ugly…

So what to use instead?

Use binary configuration files, which are easy to load and save for the application. And create a parser to parse the configuration file and save it to the binary format! In other words: serialize the usefull data from the parsed document and only parse again when it is required.

When you only parse stuff when it has changed by the user than it doesn’t really matter how long it takes to parse. Which can get rid of the really ugly stuff and let us just have a very loose kind of format without the ugly rules and regulations.

Leave a Reply

Your email address will not be published. Required fields are marked *