926
views5
comments

Hi, I am trying to do a parser in fsyacc.
Those are tokens:
%token <System.Int32> NUM
%token <System.String> STRING
%token <System.String> KEYWORD

And Iwould like to know, if it is possible to do rules pattern matching rules not only on contructor name, but also on it's value, like this:

tm:
tmAtom EOF { $1 }

tmAtom: KEYWORD("let") { Let }

It seems it only works on contructor names like this:

tmAtom: KEYWORD { Let }

.. but that would mean, that I would have to do one contructor for each keyword and this is what I do not want. So is there a way how to do patter namthicng in yacc also on the contructor value?

Txs in advance
Ales Sturala

Hi Ales,

As far as I know, what you're asking is not possible.

However, I'd like you to reconsider your options.

- Either, like you suggest, you make a separate token for each keyword. Actually this should, I think, be your default choice. It makes sure that the lexer and parser will detect possible errors, and will make for the most efficient parser (I assume you're going to have to deal with the separate keywords eventually anyway, might as well do it in the parser)

- Or, you let the lexer make one KEYWORD token that carries the keyword with it as a string. However if you make that choice, you sort of need to accept that that string is going to be carried over to your actual abstract syntax tree unchanged (i.e. as a string), and you' re going to have to deal with it manually. I'd really only take this options if there are really many keywords that change often.

If you're interested, I've just completed a blog series about fslex and fsyacc: [link:fortysix-and-two.blogspot.com]

In that series, with the ID token I took the second option above; with all the others tokens the first.

Other resources on fslex/fsyacc with beginner' examples (so it looks less like a shameless plug :) )

[link:blogs.msdn.com]

[link:www.strangelights.com]

By Kurt on 5/11/2008 10:41 AM (permalink)

The problem of second choice is, that there is some precedence and so I can't treat all the keywords in one rule, because ... lets look at this short sample:

// Syntax for arithmetics (e.g "5*(3+4)" )

S = A + S

| A - S

| A

A = T * A

| T / A

| T

T = ( S )

| num

Now lets say there is Token constructor OPERATOR with one string parameter where is actual operator. Only two operators (+, -) can be in first rule and other two (*, /) can be in second rule and this is why I was looking for some kind of pattern matching on constructor values. Here in this simple example I can of course make one token contructor for each operator, but in some simple language where is 30 keywords and 15 operators, it would be really lot of constructors.

By Andy on 5/11/2008 11:08 AM (permalink)

It is exactly (one of) the point(s) of a parser to solve such problems of prededence - problems you'll need to solve anyway... Looks like you're just trying to shift the problem to a later time. I wouldn't let the number of different tokens worry you.

Kurt

By Kurt on 5/11/2008 2:31 PM (permalink)

Actually mathing also on constructor parameters would solve the situation very clearly like this:

tmS: tmA OPERATOR("+") tmS { Plus($1, $3) }

| tmA OPERATOR("-") tmS { Minus($1, #3) }

| tmA

tmA: tmT OPERATOR("*") tmA { Mul($1, $3) }

| tmT OPERATOR("/") tmA { Div($1, $3) }

| tmT

tmT: SPECIAL( "(" ) tm SPECIAL( ")" ) { $2 }

| NUM { Num($1) }

.. where OPERATOR, SPECIAL and NUM are token constructors and Add, Sub, Mul, Div and Num are already AST constructors.

I am asking because I know in Parsec it is possible to do the rules on constructor parameters. Anyway you are right, there is no problem to use one constructor for each token, it just won't be so sexy :]

By Andy on 5/12/2008 12:46 PM (permalink)

Hmm, you may gain a few token definitions, but with your idea you would have to define the operator symbols twice, once in your lexer and once in your parser. Your token OPERATOR (and similarly SPECIAL and NUM) are defined in the lexer file as:

| ("+" | "-" | "*" | "/") { OPERATOR (lexeme lexbuf) }

would they not?

Now what if you want to change this so that the operator * is written as "." ? You would have to change both files and keep them consistent. This re-tangles the carefully separated concerns of lexing and parsing...

Comparing with Parsec is a bit unfair imo, it's a different approach to parsing alltogether.

Kurt

By Kurt on 5/12/2008 2:14 PM (permalink)

Topic tags

Built with WebSharper

Home

Answers

Events

Courses

Groups and Conferences

Blogs

Jobs

Developers

Topic tags