This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: semantic error: cannot expand unknown type


On Wed, 2006-05-10 at 21:04 -0400, Frank Ch. Eigler wrote:
> Hi, David -

... stuff deleted ...

> > 2) Should this be scrapped and the parser somehow upgraded to notice
> > I was using a "return" instead of "next"?
> 
> There are a couple of different shortcomings there:

> - The parser does not recognize keywords as such.  Keywords are just
>   tok_identifier instances with magic names.  It would be nice if a
>   list of keywords was given to the scanner, and a new tok_keyword tag
>   was used to represent them.  Then this "if (cond) next" would have
>   been a parse error.  This is probably a worthwhile, somewhat
>   mechanical change throughout parser.h/cxx.  At least our test suite
>   should be enough for the "ok" case.  Want to give it a try?
> 
> - FChE

I've been trying to wrap my head around the parser so I could give this
a try.  I added the "tok_keyword" tok_type enumeration and added code to
lexer::scan() to recognize keywords.

We've got several cases of a single token being used for multiple uses:

The token "function" - it has 2 different uses:

- as a keyword, as in 'function foo()'
- as an identifier, as in 'probe kernel.function("sys_read")'

The token "return" - it has 2 different uses:

- as a keyword, as in "function foo() { return; }"
- as an identifier, as in "probe kernel.function("sys_read").return'

The above two seem reasonable and I've worked around them (not very
elegantly).


Then there are the odd cases, like:

The token "string" - it has (at least) 3 different uses:

- as a keyword, as in "function foo(a:string)"
- as an identifier as in the name of a function, like "function
string:string(num:long)" (as is done in conversions.stp)
- as an identifier as in the name of a variable, like 'string = "abc"'

Note that there are other keywords that could be used similarly.  The
keyword "long" could be used as a function name or variable name.  The
keywords "if", "while", "foreach", etc. can be used as function names
(of course they will never get called, but still).  The keyword "global"
can be used as a variable name (and is in the testsuite that way).  The
keyword "probe" could be used as a variable name.

Also note that the parser currently lets you do silly things like:

    function foo(while:long)
    {
        printf("foo: %d\n", while);
    }

which will fail during compilation, but the parser could catch it.

My suggestion would be to "reserve" keywords, so that using keywords as
function names, parameter names, or variable names isn't allowed.  Note
that this would require changing conversions.stp (and any script that
calls the "string" function).

So, do we want to "reserve" keywords?


Here is my current code in patch form with enough workarounds for the
issues above to make it through the testsuite with the same results as
without the patches.  If we don't reserve keywords, more workarounds
will need to be added.

Also note that we could speed things up a bit by adding a new
enumeration to "struct token" of something like 'keyword_type', so we
only have to do the string compares once for tokens (but there wasn't a
point to doing that until the main approach is accepted).

Finally note that this doesn't actually solve my original problem of
using "return" instead of "next" in a probe but is a step in that
direction.

-- 
David Smith
dsmith@redhat.com
Red Hat, Inc.
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

Index: parse.cxx
===================================================================
RCS file: /cvs/systemtap/src/parse.cxx,v
retrieving revision 1.45
diff -u -p -r1.45 parse.cxx
--- parse.cxx	9 May 2006 12:55:57 -0000	1.45
+++ parse.cxx	12 May 2006 16:59:23 -0000
@@ -72,6 +72,7 @@ tt2str(token_type tt)
     case tok_string: return "string";
     case tok_number: return "number";
     case tok_embedded: return "embedded-code";
+    case tok_keyword: return "keyword";
     }
   return "unknown token";
 }
@@ -91,7 +92,7 @@ operator << (ostream& o, const token& t)
 {
   o << tt2str(t.type);
 
-  if (t.type != tok_embedded) // XXX: other types?
+  if (t.type != tok_embedded && t.type != tok_keyword) // XXX: other types?
     {
       o << " '";
       for (unsigned i=0; i<t.content.length(); i++)
@@ -353,7 +354,11 @@ const token* 
 parser::expect_unknown (token_type tt, string & target)
 {
   const token *t = next();
-  if (!(t && t->type == tt))
+  if (!(t && t->type == tt)
+      && !(t && tt == tok_identifier && t->type == tok_keyword
+	   && (t->content == "function" || t->content == "return"
+	       // "string" and "global" probably shouldn't be allowed here
+	       || t->content == "string" || t->content == "global")))
     throw parse_error ("expected " + tt2str(tt));
   target = t->content;
   return t;
@@ -505,6 +510,26 @@ lexer::scan ()
               n->content = arg;
             }
         }
+      else
+        {
+	  if (n->content    == "probe"
+	      || n->content == "global"
+	      || n->content == "function"
+	      || n->content == "if"
+	      || n->content == "else"
+	      || n->content == "for"
+	      || n->content == "foreach"
+	      || n->content == "in"
+	      || n->content == "return"
+	      || n->content == "delete"
+	      || n->content == "while"
+	      || n->content == "break"
+	      || n->content == "continue"
+	      || n->content == "next"
+	      || n->content == "string"
+	      || n->content == "long")
+	    n->type = tok_keyword;
+        }
 
       return n;
     }
@@ -725,11 +750,11 @@ parser::parse ()
 	    break;
 
           empty = false;
-	  if (t->type == tok_identifier && t->content == "probe")
+	  if (t->type == tok_keyword && t->content == "probe")
             parse_probe (f->probes, f->aliases);
-	  else if (t->type == tok_identifier && t->content == "global")
+	  else if (t->type == tok_keyword && t->content == "global")
 	    parse_global (f->globals);
-	  else if (t->type == tok_identifier && t->content == "function")
+	  else if (t->type == tok_keyword && t->content == "function")
             parse_functiondecl (f->functions);
           else if (t->type == tok_embedded)
             f->embeds.push_back (parse_embeddedcode ());
@@ -782,7 +807,7 @@ parser::parse_probe (std::vector<probe *
 		     std::vector<probe_alias *> & alias_ret)
 {
   const token* t0 = next ();
-  if (! (t0->type == tok_identifier && t0->content == "probe"))
+  if (! (t0->type == tok_keyword && t0->content == "probe"))
     throw parse_error ("expected 'probe'");
 
   vector<probe_point *> aliases;
@@ -926,29 +951,32 @@ parser::parse_statement ()
     }
   else if (t && t->type == tok_operator && t->content == "{")  
     return parse_stmt_block ();
-  else if (t && t->type == tok_identifier && t->content == "if")
+  else if (t && t->type == tok_keyword && t->content == "if")
     return parse_if_statement ();
-  else if (t && t->type == tok_identifier && t->content == "for")
+  else if (t && t->type == tok_keyword && t->content == "for")
     return parse_for_loop ();
-  else if (t && t->type == tok_identifier && t->content == "foreach")
+  else if (t && t->type == tok_keyword && t->content == "foreach")
     return parse_foreach_loop ();
-  else if (t && t->type == tok_identifier && t->content == "return")
+  else if (t && t->type == tok_keyword && t->content == "return")
     return parse_return_statement ();
-  else if (t && t->type == tok_identifier && t->content == "delete")
+  else if (t && t->type == tok_keyword && t->content == "delete")
     return parse_delete_statement ();
-  else if (t && t->type == tok_identifier && t->content == "while")
+  else if (t && t->type == tok_keyword && t->content == "while")
     return parse_while_loop ();
-  else if (t && t->type == tok_identifier && t->content == "break")
+  else if (t && t->type == tok_keyword && t->content == "break")
     return parse_break_statement ();
-  else if (t && t->type == tok_identifier && t->content == "continue")
+  else if (t && t->type == tok_keyword && t->content == "continue")
     return parse_continue_statement ();
-  else if (t && t->type == tok_identifier && t->content == "next")
+  else if (t && t->type == tok_keyword && t->content == "next")
     return parse_next_statement ();
   // XXX: "do/while" statement?
   else if (t && (t->type == tok_operator || // expressions are flexible
                  t->type == tok_identifier ||
                  t->type == tok_number ||
-                 t->type == tok_string))
+                 t->type == tok_string ||
+		 // "string" and "global" probably shouldn't be allowed here
+		 (t->type == tok_keyword
+		  && (t->content == "string" || t->content == "global"))))
     return parse_expr_statement ();
   // XXX: consider generally accepting tok_embedded here too
   else
@@ -960,7 +988,7 @@ void
 parser::parse_global (vector <vardecl*>& globals)
 {
   const token* t0 = next ();
-  if (! (t0->type == tok_identifier && t0->content == "global"))
+  if (! (t0->type == tok_keyword && t0->content == "global"))
     throw parse_error ("expected 'global'");
 
   while (1)
@@ -994,12 +1022,14 @@ void
 parser::parse_functiondecl (std::vector<functiondecl*>& functions)
 {
   const token* t = next ();
-  if (! (t->type == tok_identifier && t->content == "function"))
+  if (! (t->type == tok_keyword && t->content == "function"))
     throw parse_error ("expected 'function'");
 
 
   t = next ();
-  if (! (t->type == tok_identifier))
+  if (! (t->type == tok_identifier)
+      && ! (t->type == tok_keyword
+	    && (t->content == "string" || t->content == "long")))
     throw parse_error ("expected identifier");
 
   for (unsigned i=0; i<functions.size(); i++)
@@ -1014,9 +1044,9 @@ parser::parse_functiondecl (std::vector<
   if (t->type == tok_operator && t->content == ":")
     {
       t = next ();
-      if (t->type == tok_identifier && t->content == "string")
+      if (t->type == tok_keyword && t->content == "string")
 	fd->type = pe_string;
-      else if (t->type == tok_identifier && t->content == "long")
+      else if (t->type == tok_keyword && t->content == "long")
 	fd->type = pe_long;
       else throw parse_error ("expected 'string' or 'long'");
 
@@ -1044,9 +1074,9 @@ parser::parse_functiondecl (std::vector<
       if (t->type == tok_operator && t->content == ":")
 	{
 	  t = next ();
-	  if (t->type == tok_identifier && t->content == "string")
+	  if (t->type == tok_keyword && t->content == "string")
 	    vd->type = pe_string;
-	  else if (t->type == tok_identifier && t->content == "long")
+	  else if (t->type == tok_keyword && t->content == "long")
 	    vd->type = pe_long;
 	  else throw parse_error ("expected 'string' or 'long'");
 	  
@@ -1078,8 +1108,12 @@ parser::parse_probe_point ()
   while (1)
     {
       const token* t = next ();
-      if (! (t->type == tok_identifier ||
-             (t->type == tok_operator && t->content == "*")))
+      if (! (t->type == tok_identifier
+	     || (t->type == tok_keyword
+		 && (t->content == "function" || t->content == "return"
+		     // "string" probably shouldn't be allowed here
+		     || t->content == "string"))
+	     || (t->type == tok_operator && t->content == "*")))
         throw parse_error ("expected identifier or '*'");
 
       if (pl->tok == 0) pl->tok = t;
@@ -1160,7 +1194,7 @@ if_statement*
 parser::parse_if_statement ()
 {
   const token* t = next ();
-  if (! (t->type == tok_identifier && t->content == "if"))
+  if (! (t->type == tok_keyword && t->content == "if"))
     throw parse_error ("expected 'if'");
   if_statement* s = new if_statement;
   s->tok = t;
@@ -1178,7 +1212,7 @@ parser::parse_if_statement ()
   s->thenblock = parse_statement ();
 
   t = peek ();
-  if (t && t->type == tok_identifier && t->content == "else")
+  if (t && t->type == tok_keyword && t->content == "else")
     {
       next ();
       s->elseblock = parse_statement ();
@@ -1205,7 +1239,7 @@ return_statement*
 parser::parse_return_statement ()
 {
   const token* t = next ();
-  if (! (t->type == tok_identifier && t->content == "return"))
+  if (! (t->type == tok_keyword && t->content == "return"))
     throw parse_error ("expected 'return'");
   return_statement* s = new return_statement;
   s->tok = t;
@@ -1218,7 +1252,7 @@ delete_statement*
 parser::parse_delete_statement ()
 {
   const token* t = next ();
-  if (! (t->type == tok_identifier && t->content == "delete"))
+  if (! (t->type == tok_keyword && t->content == "delete"))
     throw parse_error ("expected 'delete'");
   delete_statement* s = new delete_statement;
   s->tok = t;
@@ -1231,7 +1265,7 @@ next_statement*
 parser::parse_next_statement ()
 {
   const token* t = next ();
-  if (! (t->type == tok_identifier && t->content == "next"))
+  if (! (t->type == tok_keyword && t->content == "next"))
     throw parse_error ("expected 'next'");
   next_statement* s = new next_statement;
   s->tok = t;
@@ -1243,7 +1277,7 @@ break_statement*
 parser::parse_break_statement ()
 {
   const token* t = next ();
-  if (! (t->type == tok_identifier && t->content == "break"))
+  if (! (t->type == tok_keyword && t->content == "break"))
     throw parse_error ("expected 'break'");
   break_statement* s = new break_statement;
   s->tok = t;
@@ -1255,7 +1289,7 @@ continue_statement*
 parser::parse_continue_statement ()
 {
   const token* t = next ();
-  if (! (t->type == tok_identifier && t->content == "continue"))
+  if (! (t->type == tok_keyword && t->content == "continue"))
     throw parse_error ("expected 'continue'");
   continue_statement* s = new continue_statement;
   s->tok = t;
@@ -1267,7 +1301,7 @@ for_loop*
 parser::parse_for_loop ()
 {
   const token* t = next ();
-  if (! (t->type == tok_identifier && t->content == "for"))
+  if (! (t->type == tok_keyword && t->content == "for"))
     throw parse_error ("expected 'for'");
   for_loop* s = new for_loop;
   s->tok = t;
@@ -1333,7 +1367,7 @@ for_loop*
 parser::parse_while_loop ()
 {
   const token* t = next ();
-  if (! (t->type == tok_identifier && t->content == "while"))
+  if (! (t->type == tok_keyword && t->content == "while"))
     throw parse_error ("expected 'while'");
   for_loop* s = new for_loop;
   s->tok = t;
@@ -1364,7 +1398,7 @@ foreach_loop*
 parser::parse_foreach_loop ()
 {
   const token* t = next ();
-  if (! (t->type == tok_identifier && t->content == "foreach"))
+  if (! (t->type == tok_keyword && t->content == "foreach"))
     throw parse_error ("expected 'foreach'");
   foreach_loop* s = new foreach_loop;
   s->tok = t;
@@ -1426,7 +1460,7 @@ parser::parse_foreach_loop ()
     }
 
   t = next ();
-  if (! (t->type == tok_identifier && t->content == "in"))
+  if (! (t->type == tok_keyword && t->content == "in"))
     throw parse_error ("expected 'in'");
  
   s->base = parse_indexable();
@@ -1672,7 +1706,7 @@ parser::parse_array_in ()
     }
 
   t = peek ();
-  if (t && t->type == tok_identifier && t->content == "in")
+  if (t && t->type == tok_keyword && t->content == "in")
     {
       array_in *e = new array_in;
       e->tok = t;
@@ -1892,7 +1926,7 @@ parser::parse_value ()
         throw parse_error ("expected ')'");
       return e;
     }
-  else if (t->type == tok_identifier)
+  else if (t->type == tok_identifier || t->type == tok_keyword)
     return parse_symbol ();
   else
     return parse_literal ();
Index: parse.h
===================================================================
RCS file: /cvs/systemtap/src/parse.h,v
retrieving revision 1.20
diff -u -p -r1.20 parse.h
--- parse.h	9 May 2006 12:55:57 -0000	1.20
+++ parse.h	12 May 2006 16:59:33 -0000
@@ -29,8 +29,7 @@ std::ostream& operator << (std::ostream&
 enum token_type 
   {
     tok_junk, tok_identifier, tok_operator, tok_string, tok_number,
-    tok_embedded
-    // XXX: add tok_keyword throughout
+    tok_embedded, tok_keyword
   };
 
 

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]