[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

21.3 Fundamentals of M4 processing

When properly understood, M4 seems like child’s play. However, it is common to learn M4 in a piecemeal fashion and to have an incomplete or inaccurate understanding of certain concepts. Ultimately, this leads to hours of furious debugging. It is important to understand the fundamentals well before progressing to the details.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

21.3.1 Token scanning

m4 scans its input stream, generating (often, just copying) text to the output stream. The first step that m4 performs in processing is to recognize tokens. There are three kinds of tokens:

Names

A name is a sequence of characters that starts with a letter or an underscore and may be followed by additional letters, characters and underscores. The end of a name is recognized by the occurrence a character which is not any of the permitted characters—for example, a period. A name is always a candidate for macro expansion (Macros and macro expansion), whereby the name will be replaced in the output by a macro definition of the same name.

Quoted strings

A sequence of characters may be quoted (Quoting) with a starting quote at the beginning of the string and a terminating quote at the end. The default M4 quote characters are ‘`’ and ‘'’, however Autoconf reassigns them to ‘[’ and ‘]’, respectively. Suffice to say, M4 will remove the quote characters and pass the inner string to the output (Quoting).

Other tokens

All other tokens are those single characters which are not recognized as belonging to any of the other token types. They are passed through to the output unaltered.

Like most programming languages, M4 allows you to write comments in the input which will be ignored. Comments are delimited by the ‘#’ character and by the end of a line. Comments in M4 differ from most languages, though, in that the text within the comment, including delimiters, is passed through to the output unaltered. Although the comment delimiting characters can be reassigned by the user, this is highly discouraged, as it may break GNU Autotools macros which rely on this fact to pass Bourne shell comment lines–which share the same comment delimiters–through to the output unaffected.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

21.3.2 Macros and macro expansion

Macros are definitions of replacement text and are identified by a name—as defined by the syntax rules given in Token scanning. M4 maintains an internal table of macros, some of which are built-ins defined when m4 starts. When a name is found in the input that matches a name registered in M4’s macro table, the macro invocation in the input is replaced by the macro’s definition in the output. This process is known as expansion—even if the new text may be shorter! Many beginners to M4 confuse themselves the moment they start to use phrases like ‘I am going to call this particular macro, which returns this value’. As you will see, macros differ significantly from functions in other programming languages, regardless of how similar their syntax may seem. You should instead use phrases like ‘If I invoke this macro, it will expand to this text’.

Suppose M4 knows about a simple macro called ‘foo’ that is defined to be ‘bar’. Given the following input, m4 would produce the corresponding output:

 
That is one big foo.
⇒That is one big bar.

The period character at the end of this sentence is not permitted in macro names, thus m4 knows when to stop scanning the ‘foo’ token and consult the table of macro definitions for a macro named ‘foo’.

Curiously, macros are defined to m4 using the built-in macro define. The example shown above would be defined to m4 with the following input:

 
define(`foo', `bar')

Since define is itself a macro, it too must have an expansion—by definition, it is the empty string, or void. Thus, m4 will appear to consume macro invocations like these from the input. The ` and ' characters are M4’s default quote characters and play an important role (Quoting). Additional built-in macros exist for managing macro definitions (Macro management).

We’ve explored the simplest kind of macros that exist in M4. To make macros substantially more useful, M4 extends the concept to macros which accept a number of arguments (49). If a macro is given arguments, the macro may address its arguments using the special macro names ‘$1’ through to ‘$n’, where ‘n’ is the maximum number of arguments that the macro cares to reference. When such a macro is invoked, the argument list must be delimited by commas and enclosed in parentheses. Any whitespace that precedes an argument is discarded, but trailing whitespace (for example, before the next comma) is preserved. Here is an example of a macro which expands to its third argument:

 
define(`foo', `$3')
That is one big foo(3, `0x', `beef').
⇒That is one big beef.

Arguments in M4 are simply text, so they have no type. If a macro which accepts arguments is invoked, m4 will expand the macro regardless of how many arguments are provided. M4 will not produce errors due to conditions such as a mismatched number of arguments, or arguments with malformed values/types. It is the responsibility of the macro to validate the argument list and this is an important practice when writing GNU Autotools macros. Some common M4 idioms have developed for this purpose and are covered in Conditionals. A macro that expects arguments can still be invoked without arguments—the number of arguments seen by the macro will be zero:

 
This is still one big foo.
⇒That is one big .

A macro invoked with an empty argument list is not empty at all, but rather is considered to be a single empty string:

 
This is one big empty foo().
⇒That is one big .

It is also important to understand how macros are expanded. It is here that you will see why an M4 macro is not the same as a function in any other programming language. The explanation you’ve been reading about macro expansion thus far is a little bit simplistic: macros are not exactly matched in the input and expanded in the output. In actual fact, the macro’s expansion replaces the invocation in the input stream and it is rescanned for further expansions until there are none remaining. Here is an illustrative example:

 
define(`foobar', `FUBAR')
define(`f', `foo')
f()bar
⇒FUBAR

If the token ‘a1’ were to be found in the input, m4 would replace it with ‘a2’ in the input stream and rescan. This continues until no definition can be found for a4, at which point the literal text ‘a4’ will be sent to the output. This is by far the biggest point of misunderstanding for new M4 users.

The same principles apply for the collection of arguments to macros which accept arguments. Before a macro’s actual arguments are handed to the macro, they are expanded until there are no more expansions left. Here is an example—using the built-in define macro (where the problems are no different) which highlights the consequences of this. Normally, define will redefine any existing macro:

 
define(foo, bar)
define(foo, baz)

In this example, we expect ‘foo’ to be defined to ‘bar’ and then redefined to ‘baz’. Instead, we’ve defined a new macro ‘bar’ that is defined to be ‘baz’! Why? The second define invocation has its arguments expanded prior to the expanding the define macro. At this stage, the name ‘foo’ is expanded to its original definition, bar. In effect, we’ve stated:

 
define(foo, bar)
define(bar, baz)

Sometimes this can be a very useful property, but mostly it serves to thoroughly confuse the GNU Autotools macro writer. The key is to know that m4 will expand as much text as it can as early as possible in its processing. Expansion can be prevented by quoting (50) and is discussed in detail in the following section.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

21.3.3 Quoting

It is been shown how m4 expands macros when it encounters a name that matches a defined macro in the input. There are times, however, when you wish to defer expansion. Principally, there are three situations when this is so:

Free-form text

There may be free-form text that you wish to appear at the output–and as such, be unaltered by any macros that may be inadvertently invoked in the input. It is not always possible to know if some particular name is defined as a macro, so it should be quoted.

Overcoming syntax rules

Sometimes you may wish to form strings which would violate M4’s syntax rules – for example, you might wish to use leading whitespace or a comma in a macro argument. The solution is to quote the entire string.

Macro arguments

This is the most common situation for quoting: when arguments to macros are to be taken literally and not expanded as the arguments are collected. In the previous section, an example was given that demonstrates the effects of not quoting the first argument to define. Quoting macro arguments is considered a good practice that you should emulate.

Strings are quoted by surrounding the quoted text with the ‘`’ and ‘'’ characters. When m4 encounters a quoted string–as a type of token (Token scanning)–the quoted string is expanded to the string itself, with the outermost quote characters removed.

Here is an example of a string that is triple quoted:

 
```foo'''
⇒``foo''

A more concrete example uses quoting to demonstrate how to prevent unwanted expansion within macro definitions:

 
define(`foo', ``bar'')dnl
define(`bar', `zog')dnl
foo
⇒bar

When the macro ‘foo’ is defined, m4 strips off the outermost quotes and registers the definition `bar'. The dnl text has a special purpose, too, which will be covered in Discarding input.

As the macro ‘foo’ is expanded, the next pair of quote characters are stripped off and the string is expanded to ‘bar’. Since the expansion of the quoted string is the string itself (minus the quote characters), we have prevented unwanted expansion from the string ‘bar’ to ‘zog’.

As mentioned in Token scanning, the default M4 quote characters are ‘`’ and ‘'’. Since these are two commonly used characters in Bourne shell programming (51), Autoconf reassigns these to the ‘[’ and ‘]’ characters–a symmetric looking pair of characters least likely to cause problems when writing GNU Autotools macros. From this point forward, we shall use ‘[’ and ‘]’ as the quote characters and you can forget about the default M4 quotes.

Autoconf uses M4’s built-in changequote macro to perform this reassignment and, in fact, this built-in is still available to you. In recent years, the common practice when needing to use the quote characters ‘[’ or ‘]’ or to quote a string with an legitimately imbalanced number of the quote characters has been to invoke changequote and temporarily reassign them around the affected area:

 
dnl Uh-oh, we need to use the apostrophe! And even worse, we have two
dnl opening quote marks and no closing quote marks.
changequote(<<, >>)dnl
perl -e 'print "$]\n";'
changequote([, ])dnl

This leads to a few potential problems, the least of which is that it’s easy to reassign the quote characters and then forget to reset them, leading to total chaos! Moreover, it is possible to entirely disable M4’s quoting mechanism by blindly changing the quote characters to a pair of empty strings.

In hindsight, the overwhelming conclusion is that using changequote within the GNU Autotools framework is a bad idea. Instead, leave the quote characters assigned as ‘[’ and ‘]’ and use the special strings @<:@ and @:>@ anywhere you want real square brackets to appear in your output. This is an easy practice to adopt, because it’s faster and less error prone than using changequote:

 
perl -e 'print "$@:>@\n";'

This, and other guidelines for using M4 in the GNU Autotools framework are covered in detail in Writing macros within the GNU Autotools framework.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]

This document was generated by Ben Elliston on July 10, 2015 using texi2html 1.82.