Hi Ales,

As far as I know, what you're asking is not possible.

However, I'd like you to reconsider your options.

- Either, like you suggest, you make a separate token for each keyword. Actually this should, I think, be your default choice. It makes sure that the lexer and parser will detect possible errors, and will make for the most efficient parser (I assume you're going to have to deal with the separate keywords eventually anyway, might as well do it in the parser)

- Or, you let the lexer make one KEYWORD token that carries the keyword with it as a string. However if you make that choice, you sort of need to accept that that string is going to be carried over to your actual abstract syntax tree unchanged (i.e. as a string), and you' re going to have to deal with it manually. I'd really only take this options if there are really many keywords that change often.

If you're interested, I've just completed a blog series about fslex and fsyacc: [link:fortysix-and-two.blogspot.com]

In that series, with the ID token I took the second option above; with all the others tokens the first.

Other resources on fslex/fsyacc with beginner' examples (so it looks less like a shameless plug :) )

[link:blogs.msdn.com]

[link:www.strangelights.com]

[link:www.strangelights.com]

By on 5/11/2008 10:41 AM ()

The problem of second choice is, that there is some precedence and so I can't treat all the keywords in one rule, because ... lets look at this short sample:

// Syntax for arithmetics (e.g "5*(3+4)" )

S = A + S

| A - S

| A

A = T * A

| T / A

| T

T = ( S )

| num

Now lets say there is Token constructor OPERATOR with one string parameter where is actual operator. Only two operators (+, -) can be in first rule and other two (*, /) can be in second rule and this is why I was looking for some kind of pattern matching on constructor values. Here in this simple example I can of course make one token contructor for each operator, but in some simple language where is 30 keywords and 15 operators, it would be really lot of constructors.

By on 5/11/2008 11:08 AM ()

It is exactly (one of) the point(s) of a parser to solve such problems of prededence - problems you'll need to solve anyway... Looks like you're just trying to shift the problem to a later time. I wouldn't let the number of different tokens worry you.

Kurt

By on 5/11/2008 2:31 PM ()

Actually mathing also on constructor parameters would solve the situation very clearly like this:

tmS: tmA OPERATOR("+") tmS { Plus($1, $3) }

| tmA OPERATOR("-") tmS { Minus($1, #3) }

| tmA

tmA: tmT OPERATOR("*") tmA { Mul($1, $3) }

| tmT OPERATOR("/") tmA { Div($1, $3) }

| tmT

tmT: SPECIAL( "(" ) tm SPECIAL( ")" ) { $2 }

| NUM { Num($1) }

.. where OPERATOR, SPECIAL and NUM are token constructors and Add, Sub, Mul, Div and Num are already AST constructors.

I am asking because I know in Parsec it is possible to do the rules on constructor parameters. Anyway you are right, there is no problem to use one constructor for each token, it just won't be so sexy :]

By on 5/12/2008 12:46 PM ()

Hmm, you may gain a few token definitions, but with your idea you would have to define the operator symbols twice, once in your lexer and once in your parser. Your token OPERATOR (and similarly SPECIAL and NUM) are defined in the lexer file as:

| ("+" | "-" | "*" | "/") { OPERATOR (lexeme lexbuf) }

would they not?

Now what if you want to change this so that the operator * is written as "." ? You would have to change both files and keep them consistent. This re-tangles the carefully separated concerns of lexing and parsing...

Comparing with Parsec is a bit unfair imo, it's a different approach to parsing alltogether.

Kurt

By on 5/12/2008 2:14 PM ()
IntelliFactory Offices Copyright (c) 2011-2012 IntelliFactory. All rights reserved.
Home | Products | Consulting | Trainings | Blogs | Jobs | Contact Us | Terms of Use | Privacy Policy | Cookie Policy
Built with WebSharper