NOTE: When I started this, my intent was to use "deserializing" as an example of parsing, building on FParsec. Currently, the "language" is so trivial that I didn't need a full-blown parser, so I don't use FParsec yet.

The lexer simply trims & counts the indentation at start of line, then splits tokens at blank spaces.

The parser simply constructs a tree of lines based on the amount of indentation, with a list of strings for the tokens on a line.

The type of expected object is used as a schema to consume tokens. Each child starts on a new line. Currently there is slightly messy logic to handle the first child being on the same line as parent/head token. I expect to instead make a tree-transformer that takes the original lexed tree and puts it into canonical form (any tokens on the head line will be moved to a separate child line). The schema-driven code then can be simplified, as it will only see this canonical representation. [The XML version will be refactored to do XML -> canonical tree, and then share the schema-driven phase.]

Conversely, "serializing" will be changed to generate a canonical tree (each child being a child node); this tree will then be used to drive XML output, or will be transformed to the more compact first-child-inline form to drive Terse output.

Compared with doing this in C#, the most interesting change [besides using pattern-matching], is the maintenance of "SerialState" in an immutable object. The serialization state has two parts: (1) boolean flag: at first token of head line versus in middle of that line [the "messiness" due to working with inline children]; and (2) the current indentation depth. Whenever state changes, a new immutable SerialState object is returned. While this could be done in C#, requiring "mutable" on any declaration that is allowed to change value helped me shift thinking to a non-mutable solution. This in turn allowed me to keep separate several different related states, so that I passed along to children or to helper functions the correct state -- and didn't inadvertently contaminate the parent's state. This is the clarity I was hoping for from a functional language. And this didn't require warping the program flow into some unfamiliar form -- it still reads very much like it would if written in C#.

I use local mutables in some functions as a convenience, but these don't escape scope; they are logically equivalent to a non-mutable sequence of induction (x[i ], x[i+1] = f(x[i ]), ..) so are functionally sound.

By on 4/3/2008 1:55 PM ()
IntelliFactory Offices Copyright (c) 2011-2012 IntelliFactory. All rights reserved.
Home | Products | Consulting | Trainings | Blogs | Jobs | Contact Us | Terms of Use | Privacy Policy | Cookie Policy
Built with WebSharper