You could for example use `opt puuid`, or if no UUID is the same as an empty one, you could use `puuid <|>$ ""`.

I don't know the SVN dump format, but if it's easy to skip over the binary blobs, you could extract the blobs as strings and then pipe them through a decoder.

FParsec is optimized for text-based input, so it won't be an optimal fit for binary data. If you decide to use something else I'd suggest writing a simple recursive descent parser by hand. Most binary formats are easy enough for that and in particular don't need any sophisticated error recovery.

By on 4/5/2009 12:53 AM ()

You could for example use `opt puuid`, or if no UUID is the same as an empty one, you could use `puuid <|>$ ""`.

Nice! Didn't think to look for 'opt' when 'optional' and 'option' turned out to be the wrong thing.

1
2
3
4
5
6
let pformat_header : Parser<_> =
    parse {
        let! v = pversion_string
        let! u = opt puuid
        return { version=v; uuid=u }
        } 

WRT binary data in revision records, it should be possible to pluck it out of the underlying byte stream directly, when necessary. Can you recommend a call on CharStream that would allow access to this without messing up the internal buffers or pointers?

Thanks much,

James

By on 4/5/2009 11:23 AM ()

Switching between reading from the CharStream and the underlying byte stream at arbitrary points is not supported because the CharStream reads the byte stream blockwise (for performance reasons) and switching would require a lot of seeking and updating bookkeeping information.

Honestly, if I were you I'd probably just hand-write a small special purpose parser. The svn-dump format seems easy enough.

By on 4/5/2009 4:06 PM ()

Honestly, if I were you I'd probably just hand-write a small special purpose parser. The svn-dump format seems easy enough.

True, but then I don't get to learn FParsec :-) I'll give a recursive-decent parser a go, thanks.

-- James

By on 4/5/2009 6:03 PM ()

Thanks again, the recursive descent parser worked out quite nicely.

I've got to comment that F# is just a whole lot of fun to work with! Perhaps the best part was writing the transform function for revision records. The cool thing is that revision records can be represented as a sequence, and that sequence can be replaced in the record with another sequence that lazily maps the original via a transform:

1
2
3
4
5
6
7
8
9
10
11
12
type SvnDumpFile = {
[...]
    member x.Transform( f:(Revision->Revision) ) =
        { x with
            revisions = x.revisions |> Seq.map f
            }
[...]

let main() =
[...]
    let transformed = (parse_dumpfile rdr).Transform( filter )
    transformed.Serialize( writer )

I also love bannana splits, which made parsing regex's and tokens a breeze. Type extensions are pretty cool, too:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
type System.IO.TextReader with
    member x.ReadLineOption() =
        let line = x.ReadLine()
        if line = null then None else Some(line)


let (|Regex|_|) s v =
    let opt = RegexOptions.None
    let re = new Regex(s,opt)
    let m = re.Match(v)
    if m.Success then
        Some(m.Groups)
    else
        None

let (|AcceptHeader|_|) hdr line =
    match line with
    | Some(line) ->
        let re = new Regex( sprintf "%s: (.*)" hdr )
        let m = re.Match(line)
        if m.Success
        then Some(m.Groups.[1].Value)
        else None
    | None -> None

let (|ExpectHeader|_|) hdr line =
    match line with
    | AcceptHeader hdr value -> Some(value)
    | Line line ->
        failwithf "Expected '%s: <value>', found %s" hdr line
    | _ -> failwithunexpected()
    | None -> None


let rversion_string (rdr:TextReader) =
    match rdr.ReadLineOption() with
    | ExpectHeader "SVN-fs-dump-format-version" value ->
        rnewline rdr |> ignore
        Some(System.Int32.Parse( value ))
    | _ -> None

let ruuid (rdr:TextReader) =
    match rdr.ReadLineOption() with
    | AcceptHeader "UUID" value ->
        rnewline rdr |> ignore
        Some(value)
    | _ -> None
By on 4/9/2009 9:48 PM ()
IntelliFactory Offices Copyright (c) 2011-2012 IntelliFactory. All rights reserved.
Home | Products | Consulting | Trainings | Blogs | Jobs | Contact Us | Terms of Use | Privacy Policy | Cookie Policy
Built with WebSharper