This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: (PR11207) Macroprocessor discussion
- From: Serguei Makarov <smakarov at redhat dot com>
- To: "Frank Ch. Eigler" <fche at redhat dot com>
- Cc: systemtap at sourceware dot org
- Date: Thu, 14 Jun 2012 16:49:44 -0400 (EDT)
- Subject: Re: (PR11207) Macroprocessor discussion
Okay, another stab at reasoning this through. jistone raised the very good point on IRC that we may be considering a macroprocessor which works on already-tokenized data. This would be very different in some respects from the text-based proposal I'd been considering.
Anyhow, the following goals/issues would be necessary to consider for either approach:
- Source Coordinates - correctly preserved for the sake of error reporting.
- Documentation Generation - macros can by used to generate custom docstrings.
- Correct Handling of Brackets - if the preprocessor syntax uses brackets {} or parens (), these interact correctly with any brackets or parens inside the macro parameters
- by default, the preprocessor respects bracket nesting in the obvious way (brackets are expected to match)
- the preprocessor knows about the possibility of brackets inside string literals and doesn't attempt to match them
- there is some kind of e.g. quoting facility for emitting non-matching parens from a macro
- Explicit Macro Invocation - so far we seem to be leaning away from the implicit macro invocation style m4 (and cpp) use, where any identifier is a possible macro invocation. Instead almost all preprocessor stuff, including macro invocations, would be prefixed with a special character such as '%'.
These are just some haphazard notes; I'll come back to them and organize more coherently very soon :)
# Token-Based Approach
Design Challenges
- Source Coordinates - almost trivial to solve due to the tokens being tagged appropriately.
- Documentation Generation - EITHER rig the lexer to retain comments and emit lexed output back as text, OR subsume kernel-doc into the systemtap lexer.
- Correct Handling of Brackets - mostly done for us by the lexer. We still have to handle bracket balancing, EITHER counting bracket depth (and introducing a special mechanism to emit unmatched brackets) OR using some distinct bracketing syntax such as %begin ... %end,
- Explicit Macro Invocation - consists mostly of the lexer recognizing an addition macro invocation token of the form %ident.
Proposed Syntax
- %define foo(param1, param2, ...) ... %end
- %undef foo
- %foo, %foo(param1, param2, ...)
- /** docstring */
- /*** docstring to attach to previous one */
- %\( , %\) or something for emitting unmatched brackets if necessary
# Text-Based Approach
This would be a macroprocessor with a standalone mode for documentation generation, and an embedded mode to be used as a preprocessing stage before the lexer.
Design Challenges
- Source Coordinates - the macro processor needs to be hooked up directly to the lexer, feeding it a suitable sequence of characters and source coordinate directives.
- Documentation Generation - the macro processor emits text that is consumed by kernel-doc. EITHER the built-in macros need to be defined to magically handle docstrings (as described in a previous email) OR we again use the /*** continuation-docstring notation fche suggested.
- Correct Handling of Brackets - in addition to balancing brackets within a macro invocation, the macroprocessor needs to recognize string constants in order to ignore the brackets within them.
- Explicit Macro Invocation - not too hard or too different from implicit invocation, really.
Proposed Syntax
- %define(foo,param1,param2,...)
- %macro foo(param1, param2, ...) { ... }
- %foo, %foo(...)
- /** docstring */
- /*** this continuation-docstring as well if necessary */
- %\( , %\) or such if necessary
# Misc
Still thinking over where these fit in:
- %( ... %? ... %: ... %) conditionals (these need access to systemtap-internal logic to be really convenient -- perhaps the standalone macroprocessor mode ignores them, while the embedded mode has callbacks into systemtap code?)
- command line arguments $1, $2, ... (on IRC it was brought to my attention that these are effectively macro-substituted in the current systemtap) -- again, handle these by giving the macroprocessor some callbacks when in embedded mode?