[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

16.3 Compiler Quirks

C++ compilers are complex pieces of software. Sadly, sometimes the details of a compiler’s implementations leak out and bother the application programmer. The two aspects of C++ compiler implementation that have caused grief in the past are efficient template instantiation and name mangling. Both of these aspects will be explained.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

16.3.1 Template Instantiation

The problem with template instantiation exists because of a number of complex constraints:

This problem is exacerbated by separate compilation—that is, the method bodies for List<T> may be located in a header file or in a separate compilation unit. These files may even be in a different directory than the current directory!

Life is easy for the compiler when the template definition appears in the same compilation unit as the site of the instantiation—everything that is needed is known:

 
template <class T> class List
{
private:
  T* head;
  T* current;
};

List<int> li;

This becomes significantly more difficult when the site of a template instantiation and the template definition is split between two different compilation units. In Linkers and Loaders, Levine describes in detail how the compiler driver deals with this by iteratively attempting to link a final executable and noting, from ‘undefined symbol’ errors produced by the linker, which template instantiations must be performed to successfully link the program.

In large projects where templates may be instantiated in multiple locations, the compiler may generate instantiations multiple times for the same type. Not only does this slow down compilation, but it can result in some difficult problems for linkers which refuse to link object files containing duplicate symbols. Suppose there is the following directory layout:

 
src
|
`--- core
|    `--- core.cxx
`--- modules
|    `--- http.cxx
`--- lib
     `--- stack.h

If the compiler generates ‘core.o’ in the ‘core’ directory and ‘libhttp.a’ in the ‘http’ directory, the final link may fail because ‘libhttp.a’ and the final executable may contain duplicate symbols—those symbols generated as a result of both ‘http.cxx’ and ‘core.cxx’ instantiating, say, a Stack<int>. Linkers, such as that provided with AIX will allow duplicate symbols during a link, but many will not.

Some compilers have solved this problem by maintaining a template repository of template instantiations. Usually, the entire template definition is expanded with the specified type parameters and compiled into the repository, leaving the linker to collect the required object files at link time.

The main concerns about non-portability with repositories center around getting your compiler to do the right thing about maintaining a single repository across your entire project. This often requires a vendor-specific command line option to the compiler, which can detract from portability. It is conceivable that Libtool could come to the rescue here in the future.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

16.3.2 Name Mangling

Early C++ compilers mangled the names of C++ symbols so that existing linkers could be used without modification. The cfront C++ translator also mangled names so that information from the original C++ program would not be lost in the translation to C. Today, name mangling remains important for enabling overloaded function names and link-time type checking. Here is an example C++ source file which illustrates name mangling in action:

 
class Foo
{
public:
  Foo ();

  void go ();
  void go (int where);

private:
  int pos;
};
Foo::Foo ()
{
  pos = 0;
}
void
Foo::go ()
{
  go (0);
}
void
Foo::go (int where)
{
  pos = where;
}
int
main ()
{
  Foo f;
  f.go (10);
}
$ g++ -Wall example.cxx -o example.o
$ nm --defined-only example.o
00000000 T __3Foo
00000000 ? __FRAME_BEGIN__
00000000 t gcc2_compiled.
0000000c T go__3Foo
0000002c T go__3Fooi
00000038 T main

Even though Foo contains two methods with the same name, their argument lists (one taking an int, one taking no arguments) help to differentiate them once their names are mangled. The ‘go__3Fooi’ is the version which takes an int argument. The ‘__3Foo’ symbol is the constructor for Foo. The GNU binutils package includes a utility called c++filt that can demangle names. Other proprietary tools sometimes include a similar utility, although with a bit of imagination, you can often demangle names in your head.

 
$ nm --defined-only example.o | c++filt
00000000 T Foo::Foo(void)
00000000 ? __FRAME_BEGIN__
00000000 t gcc2_compiled.
0000000c T Foo::go(void)
0000002c T Foo::go(int)
00000038 T main

Name mangling algorithms differ between C++ implementations so that object files assembled by one tool chain may not be linked by another if there are legitimate reasons to prohibit linking. This is a deliberate move, as other aspects of the object file may make them incompatible—such as the calling convention used for making function calls.

This implies that C++ libraries and packages cannot be practically distributed in binary form. Of course, you were intending to distribute the source code to your package anyway, weren’t you?


[ < ] [ > ]   [ << ] [ Up ] [ >> ]

This document was generated by Ben Elliston on July 10, 2015 using texi2html 1.82.