The LexBuffer<'pos,'char> interface does not expose buffer_scan_length, so "putting back" the current regexp match is not an option. This is in contrast to Ocaml, which allows allows to manipulate lexbuf.lex_curr_pos for this purpose.

Is there any chance that a future version of the F# lexer exposes an interface for "putting back" (part of) the current regexp match in a lexer action?

Is there maybe any UGLY HACK (tm), other than changing the fslib, which would allow me to access a hidden field of LexBuffer?

Stephan

By on 9/5/2007 2:04 PM ()

The following code shows how to use reflection to manipulate the protected fields in lexbuf in order to put back chars of the current match:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
{
// (...)
open Lexing
open System.Reflection

let scanLengthField = (type Lexing.lexbuf).GetField("_buffer_scan_length", BindingFlags.Instance ||| BindingFlags.NonPublic)

let lexemeLengthField = (type Lexing.lexbuf).GetField("_lexemeLength", BindingFlags.Instance ||| BindingFlags.NonPublic)

let putBack lb n = scanLengthField.SetValue(lb, (scanLengthField.GetValue(lb) :?> int) - n);

                   lexemeLengthField.SetValue(lb, (lexemeLengthField.GetValue(lb) :?> int) - n)
// (...)
}

let text = // (...)
let markup = ['{' '}']

rule token = parse
| text markup { putBack lexbuf 1; TEXT(lexeme lexbuf) } // put back the markup char and return text 
// (...)
By on 9/20/2007 2:12 PM ()

Hi Stephan,

Would it be better to have two different lexer rules for this? you can create seperate rules using the "and" keyword then call them in your rule matched code. So the idea would be as soon as you find the first chacter of your inbetween bit you hop into a new rule which gathers up all the tokens and then passes them to lex (the lexer supplied with F# uses this technique for parsing comments and strings). I haven't got time to put together a working sample, but a lexer like this in pseduo code would look something like:

1
2
3
4
5
6
rule token = parse
 | "<starttag>" { STARTTAG }
 | .            { middleBit lexeme} 
and middleBit x = parse
 |  .           { middleBit (x + lexeme) } 
 | "<endtag>"   { ENDTAG }

Hope that helps,
Rob

By on 9/6/2007 5:50 AM ()

Hi Robert,

thanks for your reply.

Actually I'm already using separate rules and I could need the "put back match" feature for the following kind of setup:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
let regex1 = // a more or less complicated regex identifying a markup token
(...)
let non_markup = // any character that can not be the
                 // first char of regex1,...,regexn

rule token = parse
| regex1 { MARKUP1() }
| regex2 { MARKUP2() }
(...)
| _      { text (lexeme lexbuf) lexbuf }

and text str = parse
| non_markup* { TEXT(str ^ lexeme lexbuf) }
| _           { put back character on lexbuf; token lexbuf }

My problem is that my "endtag" is not context independent and may in a particular context just be normal Text. The above lexer would allow me to parse the text relatively efficiently.

Stephan

By on 9/6/2007 7:47 AM ()

I don't know how to do this in fslex / fsyacc, but in the past I've done something like the following:

1) on the 'entry' token, switch the lexer to a 'text' state.
2) in the text state, recognize strings that match any chars followed by F(end) as a token, and return the token (call it PARTIALTEXT or something like that, for instance), where F(end) is the first character of the 'exit' token.
3) in the text state, recognize strings that match F(end)...L(end) (for instance '<' ... '>') where ... is anything valid for a 'exit' token, not necessary the correct exit token. If the text matches the current 'entry' token, then exit the 'text' state when returning this token. If it doesn't match, return the text as a PARTIALTEXT token.
4) in the text state, return invalid end tokens as PARTIALTEXT tokens (i.e. '<' followed by something not valid for a end token).
5) in the parser, make a rule that combines a string of PARTIALTEXT into a single TEXT (or something like that).

Hope this helps,
Kelly Leahy
Milliman, Inc.

By on 9/17/2007 12:04 PM ()
IntelliFactory Offices Copyright (c) 2011-2012 IntelliFactory. All rights reserved.
Home | Products | Consulting | Trainings | Blogs | Jobs | Contact Us | Terms of Use | Privacy Policy | Cookie Policy
Built with WebSharper