OO Stated Stackbased Parsing

Every parsable format (like INI, XML) consists out of certain area’s. You can be parsing the section name at one moment, or be parsing a comment when parsing an INI. These certain area’s where you can parse result in a parse state. In every state you expect something else, and you gather other kinds of information.

When you are parsing in a certain state you can find that the state has changed (the parser found a new xml node in a xml file), then the old state is pushed on the state stack.

In certain circumstances you need to have the ability to fall back to a previous state, this can happen when you are parsing a apparently new section name and suddenly there is a comment character instead of a sectionname end character. In this case you need to be able to fall back on the previous section you were parsing. Although when you successfully have parsed the sectionname you want the old sectionstate removed from the stack (and the data of it emited).

An example of a state based parser is this INI parser I wrote in C#:

        public override void Load(Stream stream)
        {
            StreamReader sr = new StreamReader(stream);
            Stack<parsestate> stack = new Stack<parsestate>();
            stack.Push(ParseState.NoSection);
            StringBuilder sb = null;
            IniSection section = null;
            IniEntry entry = null;
            while (true)
            {
                char c = (char)sr.Read();
                ParseState s = stack.Peek();
                if (s == ParseState.NoSection)
                {
                    if (c == 32 || c == 9 || c == 10 || c == 13)
                    {
                        // To nothing.
                    }
                    else if (c == '[')
                    {
                        stack.Push(ParseState.SectionName);
                        sb = new StringBuilder();
                    }
                    else
                    {
                        stack.Push(ParseState.TillNewLine);
                    }
                }
                else if (s == ParseState.TillNewLine)
                {
                    if (c == 13 || c == 10)
                    {
                        stack.Pop();
                    }
                }
                else if (s == ParseState.SectionName)
                {
                    if (c == 13 || c == 10)
                    {
                        stack.Pop();
                    }
                    else if (c == ']')
                    {
                        stack.Clear();
                        stack.Push(ParseState.EntryName);
                        stack.Push(ParseState.TillNewLine);
                        _Sections.Add(section = new IniSection(sb.ToString()));
                        sb = new StringBuilder();
                    }
                    else if (c == ';')
                    {
                        stack.Pop();
                        stack.Push(ParseState.TillNewLine);
                    }
                    else
                    {
                        sb.Append( c );
                    }
                }
                else if (s == ParseState.EntryName)
                {
                    if (c == 13 || c == 10)
                    {
                        string result = sb.ToString().Trim();
                        if (result != "")
                        {
                            section.Entries.Add(new IniEntry(result));
                        }
                        sb = new StringBuilder();
                    }
                    else if (c == ';')
                    {
                        string result = sb.ToString().Trim();
                        if (result != "")
                        {
                            section.Entries.Add(new IniEntry(result));
                        }
                        stack.Push(ParseState.TillNewLine);
                        sb = new StringBuilder();
                    }
                    else if (c == '=')
                    {
                        section.Entries.Add(entry = new IniEntry(sb.ToString().Trim()));
                        stack.Push(ParseState.EntryValue);
                        sb = new StringBuilder();
                    }
                    else if (c == '[')
                    {
                        if (sb.ToString().Trim() == "")
                        {
                            stack.Push(ParseState.SectionName);
                            sb = new StringBuilder();
                        }
                        else
                        {
                            sb.Append( c );
                        }
                    }
                    else
                    {
                        sb.Append( c );
                    }
                }
                else if (s == ParseState.EntryValue)
                {
                    if (c == ';')
                    {
                        entry.Values.Add(new TextIniValue(sb.ToString().Trim()));
                        sb = new StringBuilder();
                        stack.Pop();
                        stack.Push(ParseState.EntryName);
                        stack.Push(ParseState.TillNewLine);
                    }
                    else if (c == 10 || c == 13)
                    {
                        entry.Values.Add(new TextIniValue(sb.ToString().Trim()));

                        sb = new StringBuilder();
                        stack.Pop();
                        stack.Push(ParseState.EntryName);
                    }
                    else if (c == ',')
                    {
                        entry.Values.Add(new TextIniValue(sb.ToString().Trim()));
                        sb = new StringBuilder();
                    }
                    else
                    {
                        sb.Append( c );
                    }
                }
            }
        }

I am now investigating whether creating a more Object Orientated implementation would be feasible, and even more importantly whether with this technique it would be possible to write a state description file which the parser reads, and just produces the data which is meant to be captured as described in the state description file.

Leave a Reply

Your email address will not be published. Required fields are marked *