This is the mail archive of the
gdb@sources.redhat.com
mailing list for the GDB project.
Re: GDB/MI Output Syntax
Bob Rossi <bob@brasko.net> wrote:
> so far, it seems to parse everything I throw at it. However, I haven't
> tested it to much because I am building an intermediate representation.
> This is what I'll use from the front end.
How can we hook this up with the gdb test suite?
I've got a corpus of gdb.log files. Someone could write some Perl
script to pick out pieces and invoke your parser as an external program.
It might help to add a few more rules at the top:
session -> input_output_pair_list
input_output_pair_list -> epsilon | input_output_pair_list input output
input -> ...
The sticky part is that dejagnu mixes its own output into this.
Ick.
Getting into the grammar itself:
Comma separators and lists are kludgy. In these rules:
result_record -> opt_token "^" result_class result_list_prime
result_list_prime -> result_list | epsilon
result_list -> result_list "," result | "," result
The actual gdb output for a result_record could be either:
105^done
103^done,BreakPointTable={...}
It looks a little weird to me to parse the first comma as part
of result_list_prime. How about:
result_record -> opt_token "^" result_class
result_record -> opt_token "^" result_class "," result_list
result_list -> result | result_list "," result
That simplifies tuple and list as well:
tuple -> "{}" | "{" result_list "}"
list -> "[]" | "[" value_list "]" | "[ result_list ]"
That simplifies the rules also, because they won't need any special code
to construct a list for: "[" result result_list "]" .
This also gets rid of the foo_prime constructions, which can cause
trouble. The original oob_record_list_prime caused the original
shift/reduce conflict, because the parser had to decide whether to
reduce an epsilon to oob_record_list_prime or keep shifting and reduce
later to the non-epsilon form of the oob_record_list.
Style point: there is a lot of:
foo_list -> foo_list foo | epsilon
bar_list -> bar_list bar | bar
I think this is more readable:
foo_list -> epsilon | foo_list foo
bar_list -> bar | bar_list bar
Another nit: how is the grammar even working with:
nl -> CR | CR_LF
Doesn't this have to be:
nl -> LF | CR | CR LF
Or is the lexer quietly defining CR_LF to include "\n"?
For coding purposes it would be more efficient to make NL
a single token and have the lexer recognize all three forms.
For doco purposes it might be better to explicitly make nl
a non-terminal and show the LF, CR, CR LF terminals.
Either way is okay, but I'd like to have one or the other:
either have the lexer do all the work, or have the lexer be
stupid simple and have the grammar do the work.
Michael