[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
When properly understood, M4 seems like child’s play. However, it is common to learn M4 in a piecemeal fashion and to have an incomplete or inaccurate understanding of certain concepts. Ultimately, this leads to hours of furious debugging. It is important to understand the fundamentals well before progressing to the details.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
m4
scans its input stream, generating (often, just copying)
text to the output stream. The first step that m4
performs in
processing is to recognize tokens. There are three kinds of
tokens:
A name is a sequence of characters that starts with a letter or an underscore and may be followed by additional letters, characters and underscores. The end of a name is recognized by the occurrence a character which is not any of the permitted characters—for example, a period. A name is always a candidate for macro expansion (Macros and macro expansion), whereby the name will be replaced in the output by a macro definition of the same name.
A sequence of characters may be quoted (Quoting) with a starting quote at the beginning of the string and a terminating quote at the end. The default M4 quote characters are ‘`’ and ‘'’, however Autoconf reassigns them to ‘[’ and ‘]’, respectively. Suffice to say, M4 will remove the quote characters and pass the inner string to the output (Quoting).
All other tokens are those single characters which are not recognized as belonging to any of the other token types. They are passed through to the output unaltered.
Like most programming languages, M4 allows you to write comments in the input which will be ignored. Comments are delimited by the ‘#’ character and by the end of a line. Comments in M4 differ from most languages, though, in that the text within the comment, including delimiters, is passed through to the output unaltered. Although the comment delimiting characters can be reassigned by the user, this is highly discouraged, as it may break GNU Autotools macros which rely on this fact to pass Bourne shell comment lines–which share the same comment delimiters–through to the output unaffected.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Macros are definitions of replacement text and are identified by a
name—as defined by the syntax rules given in Token scanning.
M4 maintains an internal table of macros, some of which are
built-ins defined when m4
starts. When a name is found in the
input that matches a name registered in M4’s macro table, the
macro invocation in the input is replaced by the macro’s
definition in the output. This process is known as
expansion—even if the new text may be shorter! Many beginners
to M4 confuse themselves the moment they start to use phrases
like ‘I am going to call this particular macro, which returns this
value’. As you will see, macros differ significantly from
functions in other programming languages, regardless of how
similar their syntax may seem. You should instead use phrases like ‘If
I invoke this macro, it will expand to this text’.
Suppose M4 knows about a simple macro called ‘foo’ that is
defined to be ‘bar’. Given the following input, m4
would
produce the corresponding output:
That is one big foo. ⇒That is one big bar. |
The period character at the end of this sentence is not permitted in
macro names, thus m4
knows when to stop scanning the ‘foo’
token and consult the table of macro definitions for a macro named
‘foo’.
Curiously, macros are defined to m4
using the built-in macro
define
. The example shown above would be defined to m4
with the following input:
define(`foo', `bar') |
Since define
is itself a macro, it too must have an
expansion—by definition, it is the empty string, or void.
Thus, m4
will appear to consume macro invocations like these from
the input. The `
and '
characters are M4’s default
quote characters and play an important role (Quoting). Additional
built-in macros exist for managing macro definitions (Macro management).
We’ve explored the simplest kind of macros that exist in M4. To make macros substantially more useful, M4 extends the concept to macros which accept a number of arguments (49). If a macro is given arguments, the macro may address its arguments using the special macro names ‘$1’ through to ‘$n’, where ‘n’ is the maximum number of arguments that the macro cares to reference. When such a macro is invoked, the argument list must be delimited by commas and enclosed in parentheses. Any whitespace that precedes an argument is discarded, but trailing whitespace (for example, before the next comma) is preserved. Here is an example of a macro which expands to its third argument:
define(`foo', `$3') That is one big foo(3, `0x', `beef'). ⇒That is one big beef. |
Arguments in M4 are simply text, so they have no type. If a
macro which accepts arguments is invoked, m4
will expand the
macro regardless of how many arguments are provided. M4 will
not produce errors due to conditions such as a mismatched number of
arguments, or arguments with malformed values/types. It is the
responsibility of the macro to validate the argument list and this is an
important practice when writing GNU Autotools macros. Some common
M4 idioms have developed for this purpose and are covered in
Conditionals. A macro that expects arguments can still be invoked
without arguments—the number of arguments seen by the macro will be
zero:
This is still one big foo. ⇒That is one big . |
A macro invoked with an empty argument list is not empty at all, but rather is considered to be a single empty string:
This is one big empty foo(). ⇒That is one big . |
It is also important to understand how macros are expanded. It is here that you will see why an M4 macro is not the same as a function in any other programming language. The explanation you’ve been reading about macro expansion thus far is a little bit simplistic: macros are not exactly matched in the input and expanded in the output. In actual fact, the macro’s expansion replaces the invocation in the input stream and it is rescanned for further expansions until there are none remaining. Here is an illustrative example:
define(`foobar', `FUBAR') define(`f', `foo') f()bar ⇒FUBAR |
If the token ‘a1’ were to be found in the input, m4
would
replace it with ‘a2’ in the input stream and rescan. This
continues until no definition can be found for a4
, at which point
the literal text ‘a4’ will be sent to the output. This is by
far the biggest point of misunderstanding for new M4 users.
The same principles apply for the collection of arguments to macros
which accept arguments. Before a macro’s actual arguments are handed to
the macro, they are expanded until there are no more expansions left.
Here is an example—using the built-in define
macro (where the
problems are no different) which highlights the consequences of this.
Normally, define
will redefine any existing macro:
define(foo, bar) define(foo, baz) |
In this example, we expect ‘foo’ to be defined to ‘bar’ and
then redefined to ‘baz’. Instead, we’ve defined a new macro
‘bar’ that is defined to be ‘baz’! Why? The second
define
invocation has its arguments expanded prior to the
expanding the define
macro. At this stage, the name ‘foo’
is expanded to its original definition, bar
. In effect, we’ve
stated:
define(foo, bar) define(bar, baz) |
Sometimes this can be a very useful property, but mostly it serves to
thoroughly confuse the GNU Autotools macro writer. The key is to know that
m4
will expand as much text as it can as early as possible in its
processing. Expansion can be prevented by quoting (50) and is discussed in detail in the
following section.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
It is been shown how m4
expands macros when it encounters a name
that matches a defined macro in the input. There are times, however,
when you wish to defer expansion. Principally, there are three situations
when this is so:
There may be free-form text that you wish to appear at the output–and as such, be unaltered by any macros that may be inadvertently invoked in the input. It is not always possible to know if some particular name is defined as a macro, so it should be quoted.
Sometimes you may wish to form strings which would violate M4’s syntax rules – for example, you might wish to use leading whitespace or a comma in a macro argument. The solution is to quote the entire string.
This is the most common situation for quoting: when arguments to macros
are to be taken literally and not expanded as the arguments are
collected. In the previous section, an example was given that
demonstrates the effects of not quoting the first argument to
define
. Quoting macro arguments is considered a good practice
that you should emulate.
Strings are quoted by surrounding the quoted text with the ‘`’ and
‘'’ characters. When m4
encounters a quoted string–as a
type of token (Token scanning)–the quoted string is expanded to
the string itself, with the outermost quote characters removed.
Here is an example of a string that is triple quoted:
```foo''' ⇒``foo'' |
A more concrete example uses quoting to demonstrate how to prevent unwanted expansion within macro definitions:
define(`foo', ``bar'')dnl define(`bar', `zog')dnl foo ⇒bar |
When the macro ‘foo’ is defined, m4
strips off the outermost
quotes and registers the definition `bar'
. The dnl
text has
a special purpose, too, which will be covered in Discarding input.
As the macro ‘foo’ is expanded, the next pair of quote characters are stripped off and the string is expanded to ‘bar’. Since the expansion of the quoted string is the string itself (minus the quote characters), we have prevented unwanted expansion from the string ‘bar’ to ‘zog’.
As mentioned in Token scanning, the default M4 quote characters are ‘`’ and ‘'’. Since these are two commonly used characters in Bourne shell programming (51), Autoconf reassigns these to the ‘[’ and ‘]’ characters–a symmetric looking pair of characters least likely to cause problems when writing GNU Autotools macros. From this point forward, we shall use ‘[’ and ‘]’ as the quote characters and you can forget about the default M4 quotes.
Autoconf uses M4’s built-in changequote
macro to
perform this reassignment and, in fact, this built-in is still available
to you. In recent years, the common practice when needing to use the
quote characters ‘[’ or ‘]’ or to quote a string with an
legitimately imbalanced number of the quote characters has been to
invoke changequote
and temporarily reassign them around the
affected area:
dnl Uh-oh, we need to use the apostrophe! And even worse, we have two dnl opening quote marks and no closing quote marks. changequote(<<, >>)dnl perl -e 'print "$]\n";' changequote([, ])dnl |
This leads to a few potential problems, the least of which is that it’s easy to reassign the quote characters and then forget to reset them, leading to total chaos! Moreover, it is possible to entirely disable M4’s quoting mechanism by blindly changing the quote characters to a pair of empty strings.
In hindsight, the overwhelming conclusion is that using
changequote
within the GNU Autotools framework is a bad idea.
Instead, leave the quote characters assigned as ‘[’ and ‘]’
and use the special strings @<:@
and @:>@
anywhere you
want real square brackets to appear in your output. This is an easy
practice to adopt, because it’s faster and less error prone than using
changequote
:
perl -e 'print "$@:>@\n";' |
This, and other guidelines for using M4 in the GNU Autotools framework are covered in detail in Writing macros within the GNU Autotools framework.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] |
This document was generated by Ben Elliston on July 10, 2015 using texi2html 1.82.