[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
C++ compilers are complex pieces of software. Sadly, sometimes the details of a compiler’s implementations leak out and bother the application programmer. The two aspects of C++ compiler implementation that have caused grief in the past are efficient template instantiation and name mangling. Both of these aspects will be explained.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The problem with template instantiation exists because of a number of complex constraints:
This problem is exacerbated by separate compilation—that is, the method
bodies for List<T>
may be located in a header file or in a
separate compilation unit. These files may even be in a different
directory than the current directory!
Life is easy for the compiler when the template definition appears in the same compilation unit as the site of the instantiation—everything that is needed is known:
template <class T> class List { private: T* head; T* current; }; List<int> li; |
This becomes significantly more difficult when the site of a template instantiation and the template definition is split between two different compilation units. In Linkers and Loaders, Levine describes in detail how the compiler driver deals with this by iteratively attempting to link a final executable and noting, from ‘undefined symbol’ errors produced by the linker, which template instantiations must be performed to successfully link the program.
In large projects where templates may be instantiated in multiple locations, the compiler may generate instantiations multiple times for the same type. Not only does this slow down compilation, but it can result in some difficult problems for linkers which refuse to link object files containing duplicate symbols. Suppose there is the following directory layout:
src | `--- core | `--- core.cxx `--- modules | `--- http.cxx `--- lib `--- stack.h |
If the compiler generates ‘core.o’ in the ‘core’ directory and
‘libhttp.a’ in the ‘http’ directory, the final link may fail
because ‘libhttp.a’ and the final executable may contain duplicate
symbols—those symbols generated as a result of both ‘http.cxx’ and
‘core.cxx’ instantiating, say, a Stack<int>
. Linkers, such
as that provided with AIX will allow duplicate symbols during a
link, but many will not.
Some compilers have solved this problem by maintaining a template repository of template instantiations. Usually, the entire template definition is expanded with the specified type parameters and compiled into the repository, leaving the linker to collect the required object files at link time.
The main concerns about non-portability with repositories center around getting your compiler to do the right thing about maintaining a single repository across your entire project. This often requires a vendor-specific command line option to the compiler, which can detract from portability. It is conceivable that Libtool could come to the rescue here in the future.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Early C++ compilers mangled the names of C++ symbols so that existing linkers could be used without modification. The cfront C++ translator also mangled names so that information from the original C++ program would not be lost in the translation to C. Today, name mangling remains important for enabling overloaded function names and link-time type checking. Here is an example C++ source file which illustrates name mangling in action:
class Foo { public: Foo (); void go (); void go (int where); private: int pos; }; Foo::Foo () { pos = 0; } void Foo::go () { go (0); } void Foo::go (int where) { pos = where; } int main () { Foo f; f.go (10); } $ g++ -Wall example.cxx -o example.o $ nm --defined-only example.o 00000000 T __3Foo 00000000 ? __FRAME_BEGIN__ 00000000 t gcc2_compiled. 0000000c T go__3Foo 0000002c T go__3Fooi 00000038 T main |
Even though Foo
contains two methods with the same name, their
argument lists (one taking an int
, one taking no arguments) help
to differentiate them once their names are mangled. The
‘go__3Fooi’ is the version which takes an int
argument. The
‘__3Foo’ symbol is the constructor for Foo
. The GNU
binutils package includes a utility called c++filt
that can
demangle names. Other proprietary tools sometimes include a similar
utility, although with a bit of imagination, you can often demangle
names in your head.
$ nm --defined-only example.o | c++filt 00000000 T Foo::Foo(void) 00000000 ? __FRAME_BEGIN__ 00000000 t gcc2_compiled. 0000000c T Foo::go(void) 0000002c T Foo::go(int) 00000038 T main |
Name mangling algorithms differ between C++ implementations so that object files assembled by one tool chain may not be linked by another if there are legitimate reasons to prohibit linking. This is a deliberate move, as other aspects of the object file may make them incompatible—such as the calling convention used for making function calls.
This implies that C++ libraries and packages cannot be practically distributed in binary form. Of course, you were intending to distribute the source code to your package anyway, weren’t you?
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] |
This document was generated by Ben Elliston on July 10, 2015 using texi2html 1.82.