<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Intrepid Blog &#187; parsing</title>
	<atom:link href="http://blog.affien.com/archives/tag/parsing/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.affien.com</link>
	<description>A few thoughts</description>
	<lastBuildDate>Mon, 23 Jan 2012 08:47:37 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Don&#8217;t rely on parsing</title>
		<link>http://blog.affien.com/archives/2005/04/13/dont-rely-on-parsing/</link>
		<comments>http://blog.affien.com/archives/2005/04/13/dont-rely-on-parsing/#comments</comments>
		<pubDate>Wed, 13 Apr 2005 21:09:47 +0000</pubDate>
		<dc:creator>Bas Westerbaan</dc:creator>
				<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[parsing]]></category>

		<guid isPermaLink="false">http://blog.w-nz.com/archives/2005/04/13/dont-rely-on-parsing/</guid>
		<description><![CDATA[Most applications store their settings (espacially in * [...]]]></description>
			<content:encoded><![CDATA[<p>Most applications store their settings (espacially in *nix) in text configuration files. These text files need to be parsed every time the application starts.</p>
<p>Parsing is the action of (usualy streaming) dividing a certain piece of data into understandable parts. This usualy comes down to looking at the text file character by character and deciding what should be done. Usualy this is done by maintaining a state which contains the data collected sofar. And with a bit more complicated files this even means having a stack of states and complicated actions when a certain state is left.</p>
<p>The problems:</p>
<ul>
<li>Parsing is slow, <em>very slow</em></li>
<li>Formats required to be parsed contain overhead, <em>a lot of overhead</em></li>
</ul>
<p>But it certainly has got advantages:</p>
<ul>
<li>Humans can easily edit it, you don&#8217;t need to rely on configuration tools</li>
<li>(Usualy) makes a configuration format more extensible by nature (adding one new field in the average programmer&#8217;s binary format would break it)</li>
</ul>
<p>Now, there are attempts to help improve speed. This by standardizing the format, which makes the amount of oddities to expect less, which ultimately makes the whole parsing slightly faster. This at cost of the easiness it can be edited.</p>
<p>A good example would be Xml. Xml is damned ugly. Xml is too strict. And <em>Xml still takes a hell of a lot of time to parse</em>.</p>
<p><a href="http://www.yaml.org/">Yaml</a> looked like a decent alternative: easy to edit, looks nice. But then I encountered this:</p>
<blockquote><p><code>%YAML 1.1<br />
---<br />
!!map {<br />
  ? !!str "sequence"<br />
  : !!seq [<br />
    !!str "one", !!str "two"<br />
  ],<br />
  ? !!str "mapping"<br />
  : !!map {<br />
    ? !!str "sky" : !!str "blue",<br />
    ? !!str "sea" : !!str "green",<br />
  }<br />
}</code></p></blockquote>
<p>Ugly&#8230;</p>
<p>So what to use instead?</p>
<p>Use binary configuration files, which are easy to load and save for the application. And create a parser to parse the configuration file and save it to the binary format! In other words: serialize the usefull data from the parsed document and only parse again when it is required.</p>
<p>When you only parse stuff when it has changed by the user than it doesn&#8217;t really matter how long it takes to parse. Which can get rid of the really ugly stuff and let us just have a very loose kind of format without the ugly rules and regulations.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.affien.com/archives/2005/04/13/dont-rely-on-parsing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>OO Stated Stackbased Parsing</title>
		<link>http://blog.affien.com/archives/2004/12/12/oo-stated-stackbased-parsing/</link>
		<comments>http://blog.affien.com/archives/2004/12/12/oo-stated-stackbased-parsing/#comments</comments>
		<pubDate>Sun, 12 Dec 2004 12:25:29 +0000</pubDate>
		<dc:creator>Bas Westerbaan</dc:creator>
				<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[parsing]]></category>
		<category><![CDATA[stack]]></category>

		<guid isPermaLink="false">http://blog.w-nz.com/archives/2004/12/12/oo-stated-stackbased-parsing/</guid>
		<description><![CDATA[Every parsable format (like INI, XML) consists out of c [...]]]></description>
			<content:encoded><![CDATA[<p>Every parsable format (like INI, XML) consists out of certain area&#8217;s. You can be parsing the section name at one moment, or be parsing a comment when parsing an INI. These certain area&#8217;s where you can parse result in a <strong>parse state</strong>. In every state you expect something else, and you gather other kinds of information.</p>
<p>When you are parsing in a certain state you can find that the state has changed (the parser found a new xml node in a xml file), then the old state is pushed on the <strong>state stack</strong>.</p>
<p>In certain circumstances you need to have the ability to fall back to a previous state, this can happen when you are parsing a apparently new section name and suddenly there is a comment character instead of a sectionname end character. In this case you need to be able to fall back on the previous section you were parsing. Although when you successfully have parsed the sectionname you want the old sectionstate removed from the stack (and the data of it emited).<br />
<span id="more-8"></span><br />
An example of a <strong>state based</strong> parser is this INI parser I wrote in C#:</p>
<blockquote><pre>
        public override void Load(Stream stream)
        {
            StreamReader sr = new StreamReader(stream);
            Stack&lt;parsestate&gt; stack = new Stack&lt;parsestate&gt;();
            stack.Push(ParseState.NoSection);
            StringBuilder sb = null;
            IniSection section = null;
            IniEntry entry = null;
            while (true)
            {
                char c = (char)sr.Read();
                ParseState s = stack.Peek();
                if (s == ParseState.NoSection)
                {
                    if (c == 32 || c == 9 || c == 10 || c == 13)
                    {
                        // To nothing.
                    }
                    else if (c == '[')
                    {
                        stack.Push(ParseState.SectionName);
                        sb = new StringBuilder();
                    }
                    else
                    {
                        stack.Push(ParseState.TillNewLine);
                    }
                }
                else if (s == ParseState.TillNewLine)
                {
                    if (c == 13 || c == 10)
                    {
                        stack.Pop();
                    }
                }
                else if (s == ParseState.SectionName)
                {
                    if (c == 13 || c == 10)
                    {
                        stack.Pop();
                    }
                    else if (c == ']')
                    {
                        stack.Clear();
                        stack.Push(ParseState.EntryName);
                        stack.Push(ParseState.TillNewLine);
                        _Sections.Add(section = new IniSection(sb.ToString()));
                        sb = new StringBuilder();
                    }
                    else if (c == ';')
                    {
                        stack.Pop();
                        stack.Push(ParseState.TillNewLine);
                    }
                    else
                    {
                        sb.Append( c );
                    }
                }
                else if (s == ParseState.EntryName)
                {
                    if (c == 13 || c == 10)
                    {
                        string result = sb.ToString().Trim();
                        if (result != "")
                        {
                            section.Entries.Add(new IniEntry(result));
                        }
                        sb = new StringBuilder();
                    }
                    else if (c == ';')
                    {
                        string result = sb.ToString().Trim();
                        if (result != "")
                        {
                            section.Entries.Add(new IniEntry(result));
                        }
                        stack.Push(ParseState.TillNewLine);
                        sb = new StringBuilder();
                    }
                    else if (c == '=')
                    {
                        section.Entries.Add(entry = new IniEntry(sb.ToString().Trim()));
                        stack.Push(ParseState.EntryValue);
                        sb = new StringBuilder();
                    }
                    else if (c == '[')
                    {
                        if (sb.ToString().Trim() == "")
                        {
                            stack.Push(ParseState.SectionName);
                            sb = new StringBuilder();
                        }
                        else
                        {
                            sb.Append( c );
                        }
                    }
                    else
                    {
                        sb.Append( c );
                    }
                }
                else if (s == ParseState.EntryValue)
                {
                    if (c == ';')
                    {
                        entry.Values.Add(new TextIniValue(sb.ToString().Trim()));
                        sb = new StringBuilder();
                        stack.Pop();
                        stack.Push(ParseState.EntryName);
                        stack.Push(ParseState.TillNewLine);
                    }
                    else if (c == 10 || c == 13)
                    {
                        entry.Values.Add(new TextIniValue(sb.ToString().Trim()));

                        sb = new StringBuilder();
                        stack.Pop();
                        stack.Push(ParseState.EntryName);
                    }
                    else if (c == ',')
                    {
                        entry.Values.Add(new TextIniValue(sb.ToString().Trim()));
                        sb = new StringBuilder();
                    }
                    else
                    {
                        sb.Append( c );
                    }
                }
            }
        }
</pre>
</blockquote>
<p>I am now investigating whether creating a more Object Orientated implementation would be feasible, and even more importantly whether with this technique it would be possible to write a state description file which the parser reads, and just produces the data which is meant to be captured as described in the state description file.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.affien.com/archives/2004/12/12/oo-stated-stackbased-parsing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

