GCC 4.8 Release Series Changes, New Features, and Fixes
Caveats
GCC now uses C++ as its implementation language. This means that
to build GCC from sources, you will need a C++ compiler that
understands C++ 2003. For more details on the rationale and specific
changes, please refer to the C++ conversion
page.
To enable the Graphite framework for loop optimizations you now
need CLooG version 0.18.0 and ISL version 0.11.1. Both can be obtained
from the
GCC infrastructure
directory. The installation manual contains
more information about requirements to build GCC.
GCC now uses a more aggressive analysis to derive an upper bound for
the number of iterations of loops using constraints imposed by language
standards. This may cause non-conforming programs to no longer work as
expected, such as SPEC CPU 2006 464.h264ref and 416.gamess. A new
option, -fno-aggressive-loop-optimizations, was added
to disable this aggressive analysis. In some loops that have known
constant number of iterations, but undefined behavior is known to occur
in the loop before reaching or during the last iteration, GCC will warn
about the undefined behavior in the loop instead of deriving lower upper
bound of the number of iterations for the loop. The warning
can be disabled with -Wno-aggressive-loop-optimizations.
On ARM, a bug has been fixed in GCC's implementation of the AAPCS
rules for the layout of vectors that could lead to wrong code being
generated. Vectors larger than 8 bytes in size are now by default
aligned to an 8-byte boundary. This is an ABI change: code that makes
explicit use of vector types may be incompatible with binary objects
built with older versions of GCC. Auto-vectorized code is not affected
by this change.
On AVR, support has been removed for the command-line
option -mshort-calls deprecated in GCC 4.7.
On AVR, the configure option --with-avrlibc supported since
GCC 4.7.2 is turned on per default for all non-RTEMS configurations.
This option arranges for a better integration of
AVR Libc with avr-gcc.
For technical details, see PR54461.
To turn off the option in non-RTEMS configurations, use
--with-avrlibc=no. If the compiler is configured for
RTEMS, the option is always turned off.
More information on porting to GCC 4.8 from previous versions
of GCC can be found in
the porting
guide for this release.
General Optimizer Improvements (and Changes)
DWARF4 is now the default when generating DWARF debug information.
When -g is used on a platform that uses DWARF debugging
information, GCC will now default to
-gdwarf-4 -fno-debug-types-section.
GDB 7.5, Valgrind 3.8.0 and elfutils 0.154 debug information consumers
support DWARF4 by default. Before GCC 4.8 the default version used
was DWARF2. To make GCC 4.8 generate an older DWARF version use
-g together with -gdwarf-2 or
-gdwarf-3.
The default for Darwin and VxWorks is still
-gdwarf-2 -gstrict-dwarf.
A new general optimization level, -Og, has been
introduced. It addresses the need for fast compilation and a
superior debugging experience while providing a reasonable level
of runtime performance. Overall experience for development should
be better than the default optimization level -O0.
A new option -ftree-partial-pre was added to control
the partial redundancy elimination (PRE) optimization.
This option is enabled by default at the -O3 optimization
level, and it makes PRE more aggressive.
The option -fconserve-space has been removed; it
was no longer useful on most targets since GCC supports putting
variables into BSS without making them common.
The struct reorg and matrix reorg optimizations (command-line
options -fipa-struct-reorg and
-fipa-matrix-reorg) have been removed. They did not
always work correctly, nor did they work with link-time optimization
(LTO), hence were only applicable to programs consisting of a single
translation unit.
Several scalability bottle-necks have been removed from GCC's
optimization passes. Compilation of extremely large functions,
e.g. due to the use of the flatten attribute in the
"Eigen" C++ linear algebra templates library, is significantly
faster than previous releases of GCC.
Link-time optimization (LTO) improvements:
LTO partitioning has been rewritten for better reliability
and maintanibility. Several important bugs leading to link
failures have been fixed.
Interprocedural optimization improvements:
A new symbol table has been implemented. It
builds on existing callgraph and varpool modules and
provide a new API. Unusual symbol visibilities and aliases
are handled more consistently leading to, for example,
more aggressive unreachable code removal with LTO.
The inline heuristic can now bypass limits on the size of
of inlined functions when the inlining is particularly profitable.
This happens, for example, when loop bounds or array strides
get propagated.
Values passed through aggregates (either by value or reference)
are now propagated at the inter-procedural level leading to better
inlining decisions (for example in the case of Fortran
array descriptors) and devirtualization.
AddressSanitizer
, a fast memory error detector, has been added and can be
enabled via -fsanitize=address. Memory access
instructions will be instrumented to detect heap-, stack-, and
global-buffer overflow as well as use-after-free bugs. To get
nicer stacktraces, use -fno-omit-frame-pointer. The
AddressSanitizer is available on IA-32/x86-64/x32/PowerPC/PowerPC64
GNU/Linux and on x86-64 Darwin.
ThreadSanitizer has been added and can be enabled via
-fsanitize=thread. Instructions will be instrumented to
detect data races. The ThreadSanitizer is available on x86-64
GNU/Linux.
New Languages and Language specific improvements
C family
Each diagnostic emitted now includes the original
source line and a caret '^' indicating the column.
The option -fno-diagnostics-show-caret
suppresses this information.
The option -ftrack-macro-expansion=2 is now
enabled by default. This allows the compiler to display the
macro expansion stack in diagnostics. Combined with the caret
information, an example diagnostic showing these two features
is:
t.c:1:94: error: invalid operands to binary < (have ‘struct mystruct’ and ‘float’)
#define MYMAX(A,B) __extension__ ({ __typeof__(A) __a = (A); __typeof__(B) __b = (B); __a < __b ? __b : __a; })
^
t.c:7:7: note: in expansion of macro 'MYMAX'
X = MYMAX(P, F);
^
A new -Wsizeof-pointer-memaccess warning has been added
(also enabled by -Wall) to warn about suspicious length parameters
to certain string and memory built-in functions if the argument uses
sizeof. This warning warns e.g.
about memset (ptr, 0, sizeof (ptr)); if ptr is not an array,
but a pointer, and suggests a possible fix, or about
memcpy (&foo, ptr, sizeof (&foo));.
The new option -Wpedantic is an alias for
-pedantic, which is now deprecated. The forms
-Wno-pedantic, -Werror=pedantic, and
-Wno-error=pedantic work in the same way as for any other
-W option. One caveat is that
-Werror=pedantic is not equivalent to
-pedantic-errors, since the latter makes into errors
some warnings that are not controlled by -Wpedantic,
and the former only affects diagnostics that are disabled when using
-Wno-pedantic.
The option -Wshadow no longer warns if a
declaration shadows a function declaration, unless the former
declares a function or pointer to function, because this is a common and valid case
in real-world code.
C++
G++ now implements the C++11thread_local keyword; this differs from the
GNU __thread keyword primarily in that it allows dynamic
initialization and destruction semantics. Unfortunately, this support
requires a run-time penalty for references to non-function-local
thread_local variables defined in a different
translation unit even if they don't need dynamic
initialization, so users may want to continue to
use __thread for TLS variables with static initialization
semantics.
If the programmer can be sure that no use of the variable in a
non-defining TU needs to trigger dynamic initialization (either because
the variable is statically initialized, or a use of the variable in the
defining TU will be executed before any uses in another TU), they can
avoid this overhead with the -fno-extern-tls-init option.
OpenMP threadprivate variables now also support dynamic
initialization and destruction by the same mechanism.
G++ now implements the C++11
attribute syntax, e.g.
[[noreturn]] void f();
and also the alignment specifier, e.g.
alignas(double) int i;
G++ now implements C++11 inheriting constructors, e.g.
struct A { A(int); };
struct B: A { using A::A; }; // defines B::B(int)
B b(42); // OK
As of GCC 4.8.1, G++ implements the change to decltype semantics from N3276.
struct A f();
decltype(f()) g(); // OK, return type of f() is not required to be complete.
G++ now supports a -std=c++1y option for experimentation
with features proposed for the next revision of the standard, expected
around 2017. Currently the only difference from -std=c++11
is support for return type deduction in normal functions, as proposed in
N3386.
The G++ namespace association extension,
__attribute ((strong)),
has been deprecated. Inline namespaces should be used instead.
G++ now supports a -fext-numeric-literal option to control
whether GNU numeric literal suffixes are accepted as extensions or processed
as C++11 user-defined numeric literal suffixes. The flag is on
(use suffixes for GNU literals) by default for -std=gnu++*,
and -std=c++98. The flag is off (use suffixes for user-defined
literals) by default for -std=c++11 and later.
forward_list meets the allocator-aware container requirements;
this_thread::sleep_for(), this_thread::sleep_until() and
this_thread::yield() are defined without requiring the configure
option --enable-libstdcxx-time;
Improvements to <random>:
SSE optimized normal_distribution.
Use of hardware RNG instruction for random_device
on new x86 processors (requires the assembler to support the
instruction.)
and <ext/random>:
New random number engine simd_fast_mersenne_twister_engine
with an optimized SSE implementation.
New random number distributions beta_distribution,
normal_mv_distribution, rice_distribution,
nakagami_distribution, pareto_distribution,
k_distribution, arcsine_distribution,
hoyt_distribution.
Added --disable-libstdcxx-verbose configure option
to disable diagnostic messages issued when a process terminates
abnormally. This may be useful for embedded systems to reduce the
size of executables that link statically to the library.
Fortran
Compatibility notice:
Module files: The version of module files (.mod)
has been incremented. Fortran MODULEs compiled by earlier
GCC versions have to be recompiled, when they are USEd by
files compiled with GCC 4.8. GCC 4.8 is not able to read
.mod files created by earlier versions; attempting to do so
gives an error message.
Note: The ABI of the produced assembler data itself has not changed;
object files and libraries are fully compatible
with older versions except as noted below.
ABI: Some internal names (used in the assembler/object file) have
changed for symbols declared in the specification part of a module.
If an affected module – or a file using it via use
association – is recompiled, the module and all files which
directly use such symbols have to be recompiled as well. This change
only affects the following kind of module symbols:
Procedure pointers. Note: C-interoperable function pointers
(type(c_funptr)) are not affected nor are
procedure-pointer components.
Deferred-length character strings.
The BACKTRACE intrinsic subroutine has been added. It shows
a backtrace at an arbitrary place in user code; program execution
continues normally afterwards.
The
-Wc-binding-type warning option has been added (disabled
by default). It warns if the a variable might not be C interoperable; in
particular, if the variable has been declared using an intrinsic type with
default kind instead of using a kind parameter defined for C
interoperability in the intrinsic ISO_C_Binding module.
Before, this warning was always printed. The -Wc-binding-type
option is enabled by -Wall.
The -Wrealloc-lhs and -Wrealloc-lhs-all warning
command-line options have been added, which diagnose when code to is
inserted for automatic (re)allocation of a variable during assignment.
This option can be used to decide whether it is safe to use
-fno-realloc-lhs. Additionally, it can be used to find automatic
(re)allocation in hot loops. (For arrays, replacing var=
by var(:)= disables the automatic reallocation.)
The -Wcompare-reals command-line option has been added. When
this is set, warnings are issued when comparing REAL or
COMPLEX types for equality and inequality; consider replacing
a == b by abs(a−b) < eps with a suitable
eps. -Wcompare-reals is enabled by
-Wextra.
The -Wtarget-lifetime command-line option has been added
(enabled with -Wall), which warns if the pointer in a
pointer assignment might outlive its target.
Reading floating point numbers which use q for
the exponential (such as 4.0q0) is now supported as vendor
extension for better compatibility with old data files. It is strongly
recommended to use for I/O the equivalent but standard conforming
e (such as 4.0e0).
(For Fortran
source code, consider replacing the q in
floating-point literals by a kind parameter (e.g. 4.0e0_qp
with a suitable qp). Note that – in Fortran
source code – replacing q by a simple
e is not equivalent.)
The GFORTRAN_TMPDIR environment variable for specifying
a non-default directory for files opened with STATUS="SCRATCH",
is not used anymore. Instead gfortran checks the POSIX/GNU standard
TMPDIR environment variable. If TMPDIR is not
defined, gfortran falls back to other methods to determine the directory
for temporary files as documented in the
user
manual.
Experimental support for assumed-rank arrays
(dimension(..)) has been added. Note that currently
gfortran's own array descriptor is used, which is different from the
one defined in TS29113, see
gfortran's header file or use the Chasm Language
Interoperability Tools.
Go
GCC 4.8.0 implements a preliminary version of the upcoming Go
1.1 release. The library support is not quite complete, due to
release timing.
Go has been tested on GNU/Linux and Solaris platforms for
various processors including x86, x86_64, PowerPC, SPARC, and
Alpha. It may work on other platforms as well.
New Targets and Target Specific Improvements
AArch64
A new port has been added to support AArch64, the new 64-bit
architecture from ARM. Note that this is a separate port from the
existing 32-bit ARM port.
The port provides initial support for the Cortex-A53 and the
Cortex-A57 processors with the command line options
-mcpu=cortex-a53 and -mcpu=cortex-a57.
ARM
Initial support has been added for the AArch32 extensions defined
in the ARMv8 architecture.
Code generation improvements for the Cortex-A7 and Cortex-A15 CPUs.
A new option, -mcpu=marvell-pj4, has been added to
generate code for the Marvell PJ4 processor.
The compiler can now automatically generate the VFMA,
VFMS, REVSH and REV16
instructions.
A new vectorizer cost model for Advanced SIMD configurations to
improve the auto-vectorization strategies used.
The scheduler now takes into account the number of live registers to
reduce the amount of spilling that can occur. This should improve code
performance in large functions. The limit can be removed by using the
option -fno-sched-pressure.
Improvements have been made to the Marvell iWMMX code generation and
support for the iWMMX2 SIMD unit has been added. The option
-mcpu=iwmmxt2 can be used to enable code generation for
the latter.
A number of code generation improvements for Thumb2 to reduce code
size when compiling for the M-profile processors.
The RTEMS (arm-rtems) port has been updated to use the
EABI.
Code generation support for the old FPA and Maverick floating-point
architectures has been removed. Ports that previously relied on these
features have also been removed. This includes the targets:
arm*-*-linux-gnu (use
arm*-*-linux-gnueabi)
arm*-*-elf (use arm*-*-eabi)
arm*-*-uclinux* (use
arm*-*-uclinux*eabi)
arm*-*-ecos-elf (no alternative)
arm*-*-freebsd (no alternative)
arm*-wince-pe* (no alternative).
AVR
Support for the "Embedded C" fixed-point has been
added. For details, see the
GCC wiki and the
user manual. The support is not complete.
A new print modifier %r for register operands in inline
assembler is supported. It will print the raw register number without the
register prefix 'r':
/* Return the most significant byte of 'val', a 64-bit value. */
unsigned char msb (long long val)
{
unsigned char c;
__asm__ ("mov %0, %r1+7" : "=r" (c) : "r" (val));
return c;
}
The inline assembler in this example will generate code like
mov r24, 8+7
provided c is allocated to R24 and
val is allocated to
R8…R15. This works because
the GNU assembler accepts plain register numbers without register prefix.
Static initializers with 3-byte symbols are supported now:
Allow -mpreferred-stack-boundary=3 for the x86-64
architecture with SSE extensions disabled. Since the x86-64 ABI
requires 16 byte stack alignment, this is ABI incompatible and
intended to be used in controlled environments where stack space
is an important limitation.
This option will lead to wrong code when functions compiled with 16 byte
stack alignment (such as functions from a standard library) are called
with misaligned stack. In this case, SSE instructions may lead to
misaligned memory access traps. In addition, variable arguments will
be handled incorrectly for 16 byte aligned objects (including x87
long double and __int128), leading to
wrong results. You must build all
modules with -mpreferred-stack-boundary=3, including any
libraries. This includes the system libraries and startup modules.
Support for the new Intel processor codename Broadwell with
RDSEED, ADCX, ADOX,
PREFETCHW is available through -madx,
-mprfchw, -mrdseed command-line options.
Support for the Intel RTM and HLE intrinsics, built-in
functions and code generation is available via -mrtm and
-mhle.
Support for the Intel FXSR, XSAVE and XSAVEOPT instruction
sets. Intrinsics and built-in functions are available
via -mfxsr, -mxsave and
-mxsaveopt respectively.
New -maddress-mode=[short|long] options for x32.
-maddress-mode=short overrides default 64-bit addresses to
32-bit by emitting the 0x67 address-size override prefix.
This is the default address mode for x32.
New built-in functions to detect run-time CPU type and ISA:
A built-in function __builtin_cpu_is has been added to
detect if the run-time CPU is of a particular type. It returns a
positive integer on a match and zero otherwise. It accepts one string
literal argument, the CPU name. For example,
__builtin_cpu_is("westmere") returns a positive integer if
the run-time CPU is an Intel Core i7 Westmere processor. Please refer
to the
user manual for the list of valid CPU names recognized.
A built-in function __builtin_cpu_supports has been
added to detect if the run-time CPU supports a particular ISA feature.
It returns a positive integer on a match and zero otherwise. It accepts
one string literal argument, the ISA feature. For example,
__builtin_cpu_supports("ssse3") returns a positive integer
if the run-time CPU supports SSSE3 instructions. Please refer to the
user manual for the list of valid ISA names recognized.
Caveat: If these built-in functions are called before any static
constructors are invoked, like during IFUNC initialization, then the CPU
detection initialization must be explicitly run using this newly provided
built-in function, __builtin_cpu_init. The initialization
needs to be done only once. For example, this is how the invocation would
look like inside an IFUNC initializer:
static void (*some_ifunc_resolver(void))(void)
{
__builtin_cpu_init();
if (__builtin_cpu_is("amdfam10h") ...
if (__builtin_cpu_supports("popcnt") ...
}
Function Multiversioning Support with G++:
It is now possible to create multiple function versions each targeting a
specific processor and/or ISA. Function versions have the same signature
but different target attributes. For example, here is a program with
function versions:
__attribute__ ((target ("default")))
int foo(void)
{
return 1;
}
__attribute__ ((target ("sse4.2")))
int foo(void)
{
return 2;
}
int main (void)
{
int (*p) = &foo;
assert ((*p)() == foo());
return 0;
}
The x86 backend has been improved to allow option
-fschedule-insns to work reliably.
This option can be used to schedule instructions better and leads to
improved performace in certain cases.
Windows MinGW-w64 targets (*-w64-mingw*) require at least r5437 from the Mingw-w64 trunk.
Support for new AMD family 15h processors (Steamroller core)
is now available through the -march=bdver3 and
-mtune=bdver3 options.
Support for new AMD family 16h processors (Jaguar core) is
now available through the -march=btver2 and
-mtune=btver2 options.
FRV
This target now supports the -fstack-usage
command-line option.
MIPS
GCC can now generate code specifically for the R4700, Broadcom XLP
and MIPS 34kn processors. The associated -march options
are -march=r4700, -march=xlp
and -march=34kn respectively.
GCC now generates better DSP code for MIPS 74k cores thanks
to further scheduling optimizations.
The MIPS port now supports the -fstack-check
option.
GCC now passes the -mmcu and -mno-mcu
options to the assembler.
Previous versions of GCC would silently accept -fpic
and -fPIC for -mno-abicalls targets
like mips*-elf. This combination was not intended
or supported, and did not generate position-independent code.
GCC 4.8 now reports an error when this combination is used.
PowerPC / PowerPC64 / RS6000
SVR4 configurations (GNU/Linux, FreeBSD, NetBSD) no longer save,
restore or update the VRSAVE register by default. The respective
operating systems manage the VRSAVE register directly.
Large TOC support has been added for AIX through the command line
option -mcmodel=large.
Native Thread-Local Storage support has been added for AIX.
VMX (Altivec) and VSX instruction sets now are enabled implicitly when targetting processors that support those hardware features on AIX 6.1 and above.
RX
This target will now issue a warning message whenever multiple fast
interrupt handlers are found in the same cpmpilation unit. This feature can
be turned off by the new -mno-warn-multiple-fast-interrupts
command-line option.
S/390, System z
Support for the IBM zEnterprise zEC12 processor has been
added. When using the -march=zEC12 option, the
compiler will generate code making use of the following new
instructions:
load and trap instructions
2 new compare and trap instructions
rotate and insert selected bits - without CC clobber
The -mtune=zEC12 option enables zEC12 specific
instruction scheduling without making use of new
instructions.
Register pressure sensitive instruction scheduling is enabled
by default.
The ifunc function attribute is enabled by default.
memcpy and memcmp invokations on big
memory chunks or with run time lengths are not generated inline
anymore when tuning for z10 or higher. The purpose is to make
use of the IFUNC optimized versions in Glibc.
SH
The default alignment settings have been reduced to be less aggressive.
This results in more compact code for optimization levels other than
-Os.
Improved support for the __atomic built-in functions:
A new option -matomic-model=model selects the
model for the generated atomic sequences. The following models are
supported:
soft-gusa
Software gUSA sequences (SH3* and SH4* only). On SH4A targets this
will now also partially utilize the movco.l and
movli.l instructions. This is the default when the target
is sh3*-*-linux* or sh4*-*-linux*.
hard-llcs
Hardware movco.l / movli.l sequences
(SH4A only).
soft-tcb
Software thread control block sequences.
soft-imask
Software interrupt flipping sequences (privileged mode only). This
is the default when the target is sh1*-*-linux* or
sh2*-*-linux*.
none
Generates function calls to the respective __atomic
built-in functions. This is the default for SH64 targets or when
the target is not sh*-*-linux*.
The option -msoft-atomic has been deprecated. It is
now an alias for -matomic-model=soft-gusa.
A new option -mtas makes the compiler generate
the tas.b instruction for the
__atomic_test_and_set built-in function regardless of the
selected atomic model.
The __sync functions in libgcc now reflect
the selected atomic model when building the toolchain.
Added support for the mov.b and mov.w
instructions with displacement addressing.
Added support for the SH2A instructions movu.b and
movu.w.
Various improvements to code generated for integer arithmetic.
Improvements to conditional branches and code that involves the T bit.
A new option -mzdcbranch tells the compiler to favor
zero-displacement branches. This is enabled by default for SH4* targets.
The pref instruction will now be emitted by the
__builtin_prefetch built-in function for SH3* targets.
The fmac instruction will now be emitted by the
fmaf standard function and the __builtin_fmaf
built-in function.
The -mfused-madd option has been deprecated in favor of
the machine-independent -ffp-contract option. Notice that the
fmac instruction will now be generated by default for
expressions like a * b + c. This is due to the compiler
default setting -ffp-contract=fast.
Added new options -mfsrra and -mfsca to allow
the compiler using the fsrra and fsca
instructions on targets other than SH4A (where they are already enabled by
default).
Added support for the __builtin_bswap32 built-in function.
It is now expanded as a sequence of swap.b and
swap.w instructions instead of a library function call.
The behavior of the -mieee option has been fixed and the
negative form -mno-ieee has been added to control the IEEE
conformance of floating point comparisons. By default -mieee
is now enabled and the option -ffinite-math-only implicitly
sets -mno-ieee.
Added support for the built-in functions
__builtin_thread_pointer and
__builtin_set_thread_pointer. This assumes that
GBR is used to hold the thread pointer of the current thread.
Memory loads and stores relative to the address returned by
__builtin_thread_pointer will now also utilize GBR
based displacement address modes.
SPARC
Added optimized instruction scheduling for Niagara4.
TILE-Gx
Added support for the -mcmodel=MODEL
command-line option. The models supported are small
and large.
V850
This target now supports the E3V5 architecture via the use
of the new -mv850e3v5 command-line option. It also has
experimental support for the e3v5 LOOP instruction which
can be enabled via the new -mloop command-line option.
XStormy16
This target now supports the -fstack-usage
command-line option.
Operating Systems
Windows (Cygwin)
Executables are now linked against shared libgcc by default.
The previous default was to link statically, which can still be
done by explicitly specifying -static or -static-libgcc on the
command line. However it is strongly advised against, as it
will cause problems for any application that makes use of DLLs
compiled by GCC. It should be alright for a monolithic stand-alone
application that only links against the Windows OS DLLs, but
offers little or no benefit.
For questions related to the use of GCC,
please consult these web pages and the
GCC manuals. If that fails,
the gcc-help@gcc.gnu.org
mailing list might help.
Comments on these web pages and the development of GCC are welcome on our
developer list at gcc@gcc.gnu.org.
All of our lists
have public archives.
Copyright (C)
Free Software Foundation, Inc.
Verbatim copying and distribution of this entire article is
permitted in any medium, provided this notice is preserved.