Stability of the C++ ABI: Evolution of a Programming Language
Published March 2011 (updated June 2016)
As C++ evolved over the years, the Application Binary Interface (ABI) used by a compiler often needed changes to support new or evolving language features. Consequently, programmers were expected to recompile all their binaries with every new compiler release. However, an unstable ABI is incompatible with the Oracle Solaris philosophy of shared libraries, and it is a nightmare for library and middleware vendors. With the advent of the C++ Standard in 1998, there was new hope for a stable C++ ABI on supported platforms. This paper addresses these issues, and what you can expect when you develop programs using Oracle Developer Studio C++.
Contents
- Introduction
- The C ABI
- The C++ ABI
- Name Mangling
- Name Mangling and ABI
- Hierarchical Layout
- The C++ Standard Library
- The Sources of ABI Instability
- The Consequences of ABI Instability
- A History Of Sun C++ ABIs
- The Runtime Libraries
- Here an ABI, There an ABI...
- ABI Status, March 2011
- ABI Status, June 2016
- The Future
Introduction
The ABI of a programming-language implementation is a specification of all the low-level details that allow separately compiled modules to work together. Without a stable ABI, all parts of a program must be compiled with the same version of the same compiler. That situation creates a maintenance nightmare for distributed projects, particularly for suppliers of binary libraries. The early rapid evolution of the C++ programming language precluded a stable ABI. The advent of the C++ international standard in 1998 (ISO/IEC 14882:1998 Programming Languages - C++) provided a base for a stable C++ ABI, at least for a given C++ implementation. Later issues of the standard (2003, 2011, 2014) introduced the possibility that an existing ABI might not be able to support the new language version. This article explores the stability question for Oracle Developer Studio C++ compilers, past and present.
The C ABI
The Oracle Solaris ABI is also the C ABI, because C is the standard UNIX implementation language. Among other things, the C ABI specifies:
- Size and layout of predefined types (char, int, float, and so on)
- Layout of compound types (arrays and structs)
- External (linker-visible) spelling of programmer-defined names
- Machine-code function-calling sequence
- Stack layout
- Register usage
The C++ ABI
The C++ ABI includes the C ABI. In addition, it covers the following features:
- Layout of hierarchical class objects, that is, base classes and virtual base classes
- Layout of pointer-to-member
- Passing of hidden function parameters (for example,
this
) - How to call a virtual function:
- Vtable contents and layout
- Location in objects of pointers to vtables
- Finding the adjustment for the
this
pointer
- Finding base-class offsets
- Calling a function via pointer-to-member
- Managing template instances
- External spelling of names ("name mangling")
- Construction and destruction of static objects
- Throwing and catching exceptions
- Some details of the standard library:
- Implementation-defined details
- typeinfo and run-time type information
- Inline function access to members
Name Mangling
C++ allows different functions to have the same name, and it allows an unbounded number of scopes where different global entities with the same name can be declared. Example:
int f(int);
float f(float);
class T {
int f(int);
int f(char*);
class U {
int f(int);
};
};
namespace N {
class T {
int f(int);
};
}
This example has two classes named T
and six functions named f
, some of which are in the same scope. All the functions have external linkage. To differentiate entities with the same name, the C++ implementation must make references to these functions unique. To ensure that references to the same entity from different modules can be resolved correctly, the method of making references unique must be predictable.
The usual scheme involves decorating the name of the entity with encodings of the scope names, along with the parameter types and the return type if it is a function. The resulting names appear to be scrambled, or "mangled." For example, the names of the six functions above would be encoded by the Oracle Developer Studio C++ compiler as follows.
Note: All examples use the mangling associated with the -compat=5 option, the ABI introcuced in C++ 5.0.
Examples of Mangled Function Names:
Function | Mangled Name |
---|---|
float f(float) |
__1cBf6Ff_f_ |
int f(int) |
__1cBf6Fi_i_ |
int T::f(int) |
__1cBTBf6Mi_i_ |
int T::f(char*) |
__1cBTBf6Mpc_i_ |
int T::U::f(int) |
__1cBTBUBf6Mi_i_ |
int N::T::f(int) |
__1cBNBTBf6Mi_i_ |
C++ also provides a way to specify that a name is accessible from C code and, therefore, should not be mangled.
Name Mangling and ABI
The name-mangling algorithm is part of the ABI, because it defines how a compiler must generate external references and definitions for program entities. If two compilers or compiler versions do not mangle equivalent declarations the same way, a program composed of parts compiled from the two compilers will not link correctly.
Hierarchical Layout
C++ allows user definition of hierarchies of class types, wherein a "derived" class implicitly includes all the data and functions of the classes from which it inherits. An ordinary base class is laid out in an object similarly to a member of class type, at a fixed offset from the start of the complete object. Example:
class Base {
...
};
class Derived : public Base {
int i, j;
};
class Composed {
Base b;
int i, j;
};
In many C++ implementations, the layout of classes Derived
and Composed
will be the same.
A pointer to a complete object can be converted to a pointer to one of its base classes, but the address the pointer represents must be adjusted by the offset of the base class within the complete object.
A C++ class can have more than one immediate base class, a feature called multiple inheritance.
If classes A
and B
each have a base class Z
, a class C
derived from both A
and B
could have two copies of Z
. Sometimes, it is appropriate for C
to have two independent copies of Z
. Other times, Z
represents a resource of which there must be only one copy.
To specify that there is to be only one copy of a base class in a hierarchical object, that base class can be declared "virtual."
The offset of a virtual base class relative to an intermediate class depends on the entire hierarchy. Example:
class Z {
...
};
class A : virtual public Z { // has one instance of Z
...
};
class B : virtual public Z { // has one instance of Z
...
};
class C : public A, public B { // has only one instance of Z
...
};
Suppose in an A
object, the Z
portion is at offset OA, and in a B
object, it is at offset OB. There is only one copy of Z
in a C
object. In general, the Z
portion cannot simultaneously be at offset OA from the A
portion and at offset OB from the B
portion. At least one of these offsets must be different when the entire object is of type C
.
Given a pointer to A
, the location of the Z
sub-object, therefore, cannot be determined at compile time, because the A
object might be in turn a sub-object of some more complex type, such as C
. The run-time system must allow for the dynamic determination of the type of the complete object so that the offsets of other objects can be found.
C++ implementations typically store the offset information for each object type in an auxiliary table, often called a vtable. There is usually one vtable for each type that needs one, shared by all objects of that type. An object needing a vtable then contains a pointer to the vtable. The vtable also contains addresses of virtual functions to allow dynamic function dispatch based on the actual object type referred to by a pointer or reference.
The C++ Standard Library
The C++ standard defines the names and properties of types and functions in the library, as well as the programming interface to the library. Source code written to the specification is, therefore, portable among conforming implementations. The binary interface is a different story, however.
The C++ standard allows considerable variation in implementation details, as long as the programming interface is not affected. Many of those implementation details therefore become part of the ABI -- particularly the size and layout of class objects.
Many parts of the standard library are best implemented with inline functions to enhance performance. Somewhat like a macro in C, a call to an inline function is replaced by the body of the function. If the function accesses members of a class defined in the standard library, the location of the class members become built into the code of application programs that use the inline function. Any library member referred to by an inline function is, therefore, part of the C++ ABI.
Even if an enhancement or bug fix to the standard library does not affect the programming interface, the change would affect the ABI if it altered the size or layout of classes defined in the library.
The Sources of ABI Instability
A new or changed language feature can require a change, not just an extension, to an ABI. Here are two examples:
-
The C++ standard allows an overriding virtual function to have a return type different from the function it overrides. The return type must be a pointer or reference type, and the return type of the function in the derived class must refer to a type derived from the type referred to by the function it overrides. Example:
class Base { virtual Base* clone(); }; class Derived : public Base { virtual Derived* clone(); }; void f(Base* p) { Base* copy = p->clone(); }
The compiler cannot know whether the call to clone will return a
Base*
or a pointer to a derived type. The ABI must provide a way to accomplish the correct pointer adjustment no matter what type is returned. The required mechanism would not have been available in a pre-standard ABI. -
Consider a template function specialization and a non-template function with the same name and type:
template<class T> T min(T, T) { ... } int min<int>(int, int); // old specialization syntax int min(int, int); // non-template
Under pre-standard language rules, a non-template function with the same name and type as a template function was considered to be a specialization of the template. Such a function and the corresponding specialization must, therefore, have the same mangled name.
Under the rules in the C++ standard, they are distinct functions and must have different mangled names. The external name of at least one of the functions must change compared to a pre-standard ABI.
Fixing some bugs requires an ABI change. Here are two examples:
-
Early C++ compilers typically could not support calls to functions in a virtual base class from the constructor or destructor of a derived class under some circumstances. Eventually a reasonable-cost solution to this problem was invented, but it required a different vtable organization, and a different way of calling constructors and destructors.
-
The Oracle Developer Studio C++ compiler in -compat=5 mode generates different mangled names for some function declarations that are supposed to be equivalent. Fixing the bug would mean that some existing functions get a different mangled name, an ABI change.
The Consequences of ABI Instability
Any difference in the ABI can mean that object files from different compilers will not link, or, if they do link, they will not run correctly. (To help prevent code generated for different ABIs from accidentally linking, different compiler implementations typically use different name mangling schemes.)
In the early days of C++, when the language was evolving rapidly, ABIs changed frequently. C++ programmers were accustomed to recompiling everything whenever they updated a compiler.
Suppose an application uses an window library from vendor A and a database library from vendor B. The vendors do not wish to distribute source code, and so they provide binary libraries. The application code and the libraries from both vendors must all use the same ABI.
If every compiler release has a different ABI, application programmers will not want to upgrade compilers frequently. It would mean coordinating the upgrade among all developers on the project, and recompiling everything on the official upgrade installation date.
If vendor A and vendor B must support many clients, each of whom is using a different compiler release, they must release and support a library for each compiler version. This situation is very expensive in resources, and typically is not feasible. Under this scenario, vendors might release source code, and clients would build the libraries themselves. That in turn creates new support problems, since different clients will use different tool sets, and the build scripts must be configured to conform to local practices.
The Oracle Solaris vision of shared libraries is not well-supported by the scenario above. A different version of a C++ shared library must be generated for every supported ABI variation. Even when a compiler is no longer supported, programs may exist in the field that depend on using an old shared library. The obsolete library versions must continue to be shipped for a long time.
Successful distribution of libraries as products--particularly shared libraries--depends on having a stable ABI.
A History of Oracle Developer Studio C++ ABIs
Major releases of Oracle Developer Studio C++ compilers have always used incompatible ABIs, in accordance with the engineering taxonomy of release numbers: A new major version number signifies an incompatible release.
Beginning with C++ 3.0, Sun attempted to inject some stability into the C++ ABI. The C++ runtime support library became a shared library shipped along with Oracle Solaris: libC.so.3.
But it was also recognized that C++ was still evolving, and work was in progress on a C++ standard that would doubtless change some important details.
C++ 4.0, released in 1993, introduced a new, incompatible ABI. This ABI was intended to be stable. The ABI was published as a public document. The new C++ support library, libC.so.5, was added to Oracle Solaris shipments.
Over time, some bugs were found that required small changes in name mangling. These bugs were corrected, and users were provided with ways to restore the previous behavior when it was necessary to link with older code. C++ 4.2, released in 1996, represented the final version of this ABI.
This ABI contained known bugs, such as the virtual base-class problem described in the The Sources of ABI Instability section. In addition, work on the C++ standard was nearing completion, and the standard was known to contain features that would require a different ABI. The change in template semantics described in the The Sources of ABI Instability section is one of several changes that affected the ABI.
To avoid the scenario of a constantly changing ABI that would result from trying to track the evolving C++ standard, the C++ development team adopted the strategy of implementing in C++ 4.2 only those features that were assumed to remain stable and that did not require an ABI change. At the cost of lagging behind in C++ features, Sun provided a stable ABI for its customers.
The Runtime Libraries
The earlier C++ runtime library consisted of an I/O library known as "iostreams" and the run-time "helper" functions for the compiler, including support for heap memory allocation, exception handling, and dynamic type information.
The library specified in the C++ standard includes an extensive set of template classes and functions, including strings, iostreams, numerics, and the "STL."
Due to time pressure, version 5.0 (1998) of the C++ compiler did not implement all the features of the C++ standard. For example, parts of the standard library definition involve templates as members of classes, a feature not supported by C++ 5.0. In the library to be delivered with the compiler, those parts were missing, or they were implemented slightly differently.
For those reasons, the library implementation was split into two parts:
- libCrun, consisting of the compiler helper functions, including support for heap memory allocation, exception handling, and dynamic type information
- libCstd, consisting of the remainder of the C++ standard library
Here an ABI, There an ABI...
Sun (and now, Oracle) policy deems continued compatibility more important than conformance to the C++ standard, and more important even than correctness. Even though libCstd was sub-standard and some bugs were found in name mangling, those deficiencies would remain in the following releases.
Because libCstd would remain binary compatible, it could be shipped as a shared library. C++ 5.2 (2000) shipped with libCstd.so.1 as an optional library, and C++ 5.3 (2001) provided libCstd.so.1 as part of an Oracle Solaris package. To support customers who need more of the features of the C++ standard library but do not need binary compatibility, C++ 5.4 (2002) also shipped with the open source STLport implementation of the C++ standard library.
ABI Status, March 2011
The compiler current as of 2011 was C++ 5.11, a component of Oracle Solaris Studio 12.2.
The C++ compilers from 5.0 through 5.11 provide a mode compatible with C++ 4.2, and a default mode that supports the C++ standard. The two modes represent two different ABIs. For a given mode, all the 5.x compilers generate binary-compatible code.
C++ 5.11 is also supported on selected Linux platforms. Since developers on Linux are likely to want to link with code compiled by Gnu C++ (g++), the Linux compilers also provide a mode where they generate the same ABI as g++ (which is very different from the default Oracle Developer Studio C++ ABI).
Despite the compatibility policy, some customers have demanded fixes for ABI bugs that prevent their programs from working. To satisfy these customers, the compiler has an undocumented option that generates a correct, but incompatible, ABI. Customers who do not depend on third-party binary libraries can use the corrected ABI, provided they are careful to compile all their code with the option. The libraries that ship with the C++ compiler do not trigger any of the ABI bugs, so no separate version of those libraries is required.
To support customers who need a more standard-conforming library but who do not need compatibility with libCstd, the compiler ships with the open source library from STLport. In addition, C++ 5.10 and 5.11 directly support the use of Apache stdcxx on Oracle Solaris when installed in a designated location. Finally, the compiler also provides a way to substitute a third-party standard library for the ones shipped with the compiler.
ABI Status, June 2016
The current shipping compiler is C++ 5.14, a component of Oracle Developer Studio 12.5. The compiler no longer supports the 20-year-old C++ 4.2 ABI. The compiler continues to support the -compat=5 ABI, compatible with releases going back to C++ 5.0. In this mode, the compiler continues to support libCstd, STLport, and Apache libstdcxx.
To address the problems associated with name mangling in -compat=5 mode, the compiler now has a public, documented option to select the incorrect but backward-compatible mangling, or correct name mangling.
C++ 5.14 also provides the g++ ABI on Oracle Solaris and supported versions of Linux, linking with the actual g++ runtime libraries. The C++ 2003 and 2011 standards are supported in this mode, as well as some features of C++ 2014.
The g++ runtime library encountered an ABI problem with the version 5 release. C++ 2011 invalidated some formerly allowed implementations of C++ standard library classes. Changing the class implementation to conform to the new standard meant an ABI change. Both g++ and Oracle Developer Studio 12.5 provide support for both versions of the library ABI, version 4 and version 5.
The Future
There is no question that the -compat=5 C++ ABI must continue to be supported for some years. That means shipping compilers that generate this ABI, along with compatible versions of libraries.
As has happened before, new versions of the C++ standard might require ABI changes. Oracle Developer Studio C++ expects to provide support as needed for language changes.
About the Author
Steve Clamage has been at Sun (now Oracle) since 1994, and is currently technical lead for the Oracle Developer Studio C++ compiler. He has been a member of the ANSI C++ Committee since its founding in 1990, and served as chair from 1995 through 2014.