Since the switch to -std=gnu23 by default, float.h (included from
tsystem.h) defines INFINITY macro (to __builtin_inff ()), which now
results in a warning when compiling libgcc2.c which defines it
to something else (and, worse aarch64 compiles it with -Werror and
build fails).
libgcc2.c asserts INFINITY has the expected type which depends on
the macros with which libgcc2.c is being compiled, so guarding
the define with #ifndef INFINITY wouldn't work.
So this patch instead #undefs the macro before defining it.
2024-11-16 Jakub Jelinek <jakub@redhat.com>
PR libgcc/117624
* libgcc2.c (INFINITY): Add #undef before #define.
Follows the current linux ABI that uses single signal entry token
and shared shadow stack between thread and alt stack.
Could be behind __ARM_FEATURE_GCS_DEFAULT ifdef (only do anything
special with gcs compat codegen) but there is a runtime check anyway.
Change affected tests to be compatible with -mbranch-protection=standard
libgcc/ChangeLog:
* config/aarch64/aarch64-unwind.h (_Unwind_Frames_Extra): Update.
(_Unwind_Frames_Increment): Define.
We recently forced -Werror when building libgcc for aarch64, to make
sure we'd catch and fix the kind of problem described in the PR.
In this case, when building for aarch64_be (so, big endian), gcc emits
this warning/error:
libgcc/config/libbid/bid_conf.h:847:25: error: missing braces around initializer [-Werror=missing-braces]
847 | UINT128 arg_name={ bid_##arg_name.w[1], bid_##arg_name.w[0]};
libgcc/config/libbid/bid_conf.h:871:8: note: in expansion of macro 'COPY_ARG_VAL'
871 | COPY_ARG_VAL(arg_name)
This patch fixes the problem by adding curly braces around the
initializer for COPY_ARG_VAL in the big endian case.
It seems that COPY_ARG_REF (just above COPY_ARG_VAL) has a similar
issue, but DECIMAL_CALL_BY_REFERENCE seems always defined to 0, so
COPY_ARG_REF is never used. The patch fixes it too, though.
libgcc/config/libbid/ChangeLog:
PR libgcc/117537
* bid_conf.h (COPY_ARG_REF): Fix initializer.
(COPY_ARG_VAL): Likewise.
This patch adds -Werror to LIBGCC2_CFLAGS so that aarch64 can catch
warnings during bootstrap, while not impacting other targets.
The patch also adds -Wno-prio-ctor-dtor to avoid a warning when
compiling lse_init.c
libgcc/
* config/aarch64/t-aarch64: Always use -Werror
-Wno-prio-ctor-dtor.
Add prototypes for __init_cpu_features_resolver and
__init_cpu_features to avoid warnings due to -Wmissing-prototypes.
libgcc/
* config/aarch64/cpuinfo.c (__init_cpu_features_resolver): Add
prototype.
(__init_cpu_features): Likewise.
Since
Commit c608ada288
Author: Zac Walker <zacwalker@microsoft.com>
CommitDate: 2024-01-23 15:32:30 +0000
Ifdef `.hidden`, `.type`, and `.size` pseudo-ops for `aarch64-w64-mingw32` target
lse.S includes aarch64-asm.h, leading to a conflicting definition of macro 'L':
- in lse.S it expands to either '' or 'L'
- in aarch64-asm.h it is used to generate .L ## label
lse.S does not use the second, so this patch just undefines L after
the inclusion of aarch64-asm.h.
libgcc/
* config/aarch64/lse.S: Undefine L() macro.
In some cases, we don't need to handle implied extensions. Add detailed
comments to help developers understand what implied ISAs should be
considered.
libgcc/ChangeLog:
* config/riscv/feature_bits.c (__init_riscv_features_bits_linux):
Add detailed comments on processing implied extensions.
Signed-off-by: Yangyu Chen <chenyangyu@isrc.iscas.ac.cn>
This provides a common abstraction layer to probe the available extensions at
run-time. These functions can be used to implement function multi-versioning or
to detect available extensions.
The advantages of providing this abstraction layer are:
- Easy to port to other new platforms.
- Easier to maintain in GCC for function multi-versioning.
- For example, maintaining platform-dependent code in C code/libgcc is much
easier than maintaining it in GCC by creating GIMPLEs...
This API is intended to provide the capability to query minimal common available extensions on the system.
The API is defined in the riscv-c-api-doc:
https://github.com/riscv-non-isa/riscv-c-api-doc/blob/main/src/c-api.adoc
Proposal to use unsigned long long for marchid and mimpid:
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/91
Full function multi-versioning implementation will come later. We are posting
this first because we intend to backport it to the GCC 14 branch to unblock
LLVM 19 to use this with GCC 14.2, rather than waiting for GCC 15.
Changes since v7:
- Remove vendorID field in __riscv_vendor_feature_bits.
- Fix C implies Zcf only for RV32.
- Add more comments to kernel versions.
Changes since v6:
- Implement __riscv_cpu_model.
- Set new sub extension bits which implied from previous extensions.
Changes since v5:
- Minor fixes on indentation.
Changes since v4:
- Bump to newest riscv-c-api-doc with some new extensions like Zve*, Zc*
Zimop, Zcmop, Zawrs.
- Rename the return variable name of hwprobe syscall.
- Minor fixes on indentation.
Changes since v3:
- Fix non-linux build.
- Let __init_riscv_feature_bits become constructor
Changes since v2:
- Prevent it initialize more than once.
Changes since v1:
- Fix the format.
- Prevented race conditions by introducing a local variable to avoid load/store
operations during the computation of the feature bit.
Co-Developed-by: Yangyu Chen <chenyangyu@isrc.iscas.ac.cn>
Signed-off-by: Yangyu Chen <chenyangyu@isrc.iscas.ac.cn>
libgcc/ChangeLog:
* config/riscv/feature_bits.c: New.
* config/riscv/t-elf (LIB2ADD): Add feature_bits.c.
From 8b3c5ebe8aacbcc4ddf1be8dea9a555e7e1bcc39 Mon Sep 17 00:00:00 2001
From: Jim Lin <jim@andestech.com>
Date: Fri, 4 Oct 2024 14:48:12 +0800
Subject: [PATCH] RISC-V/libgcc: Fix incorrect .cfi_offset for saving ra in
__riscv_save_[0-3] on ilp32e.
libgcc/ChangeLog:
* config/riscv/save-restore.S: Fix .cfi_offset for saving ra in
__riscv_save_[0-3] on ilp32e.
0001-RISC-V-libgcc-Fix-incorrect-and-missing-.cfi_offset-.patch
From 06a370a0a2329dd4da0ffcab7c35ea7df2353baf Mon Sep 17 00:00:00 2001
From: Jim Lin <jim@andestech.com>
Date: Tue, 1 Oct 2024 14:42:56 +0800
Subject: [PATCH] RISC-V/libgcc: Fix incorrect and missing .cfi_offset for
__riscv_save_[0-3] on RV32.
libgcc/ChangeLog:
* config/riscv/save-restore.S: Fix .cfi_offset for
__riscv_save_[0-3] on RV32.
A previous patch ([1]) introduced a build regression on aarch64-none-elf
target. The changes were primarilly tested on aarch64-unknown-linux-gnu,
so the issue was missed during development.
The includes are slighly different between the two targets, and due to some
include rules ([2]), "aarch64-unwind-def.h" was not found.
[1]: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=bdf41d627c13bc5f0dc676991f4513daa9d9ae36
[2]: https://gcc.gnu.org/onlinedocs/cpp/Include-Syntax.html
> include "file"
> ... It searches for a file named file first in the directory
> containing the current file, ...
libgcc/ChangeLog:
* config/aarch64/aarch64-unwind.h: Fix header path.
d9cafa0c4f stopped building libgcc_s.1 on macOS >= 15, in part because
that is required to bootstrap the compiler using the macOS 15 SDK. The
macOS 15 SDK ships in Xcode 16, which also runs on macOS 14. libgcc_s.1
can no longer be built on macOS 14 using Xcode 16 by the same logic that
the previous change disabled it for macOS 15.
PR target/116809
libgcc/ChangeLog:
* config.host: Don't build legacy libgcc_s.1 on macOS 14.
Signed-off-by: Mark Mentovai <mark@mentovai.com>
That Save/Restore routines for E can be used for RVI with ILP32E ABI.
libgcc/ChangeLog:
* config/riscv/save-restore.S: Check with __riscv_abi_rve rather than
__riscv_32e.
In C++20, modules streaming check for exposures of TU-local entities.
In general exposing internal linkage functions in a header is liable to
cause ODR violations in C++, and this is now detected in a module
context.
This patch goes through and removes 'static' from many declarations
exposed through libstdc++ to prevent code like the following from
failing:
export module M;
extern "C++" {
#include <bits/stdc++.h>
}
Since gthreads is used from C as well, we need to choose whether to use
'inline' or 'static inline' depending on whether we're compiling for C
or C++ (since the semantics of 'inline' are different between the
languages). Additionally we need to remove static global variables, so
we migrate these to function-local statics to avoid the ODR issues.
There doesn't seem to be a good workaround for weakrefs, so I've left
them as-is and will work around it in the modules streaming code to
consider them as not TU-local.
The same issue occurs in the objective-C specific parts of gthreads, but
I'm not familiar with the surrounding context and we don't currently
test modules with Objective C++ anyway so I've left it as-is.
PR libstdc++/115126
libgcc/ChangeLog:
* gthr-posix.h (__GTHREAD_ALWAYS_INLINE): New macro.
(__GTHREAD_INLINE): New macro.
(__gthread_active): Convert from variable to (hidden) function.
(__gthread_active_p): Mark as __GTHREAD_INLINE instead of
static; make visibility("hidden") when it has a static local
variable.
(__gthread_trigger): Mark as __GTHREAD_INLINE instead of static.
(__gthread_create): Likewise.
(__gthread_join): Likewise.
(__gthread_detach): Likewise.
(__gthread_equal): Likewise.
(__gthread_self): Likewise.
(__gthread_yield): Likewise.
(__gthread_once): Likewise.
(__gthread_key_create): Likewise.
(__gthread_key_delete): Likewise.
(__gthread_getspecific): Likewise.
(__gthread_setspecific): Likewise.
(__gthread_mutex_init_function): Likewise.
(__gthread_mutex_destroy): Likewise.
(__gthread_mutex_lock): Likewise.
(__gthread_mutex_trylock): Likewise.
(__gthread_mutex_timedlock): Likewise.
(__gthread_mutex_unlock): Likewise.
(__gthread_recursive_mutex_init_function): Likewise.
(__gthread_recursive_mutex_lock): Likewise.
(__gthread_recursive_mutex_trylock): Likewise.
(__gthread_recursive_mutex_timedlock): Likewise.
(__gthread_recursive_mutex_unlock): Likewise.
(__gthread_recursive_mutex_destroy): Likewise.
(__gthread_cond_init_function): Likewise.
(__gthread_cond_broadcast): Likewise.
(__gthread_cond_signal): Likewise.
(__gthread_cond_wait): Likewise.
(__gthread_cond_timedwait): Likewise.
(__gthread_cond_wait_recursive): Likewise.
(__gthread_cond_destroy): Likewise.
(__gthread_rwlock_rdlock): Likewise.
(__gthread_rwlock_tryrdlock): Likewise.
(__gthread_rwlock_wrlock): Likewise.
(__gthread_rwlock_trywrlock): Likewise.
(__gthread_rwlock_unlock): Likewise.
* gthr-single.h: (__GTHREAD_ALWAYS_INLINE): New macro.
(__GTHREAD_INLINE): New macro.
(__gthread_active_p): Mark as __GTHREAD_INLINE instead of static.
(__gthread_once): Likewise.
(__gthread_key_create): Likewise.
(__gthread_key_delete): Likewise.
(__gthread_getspecific): Likewise.
(__gthread_setspecific): Likewise.
(__gthread_mutex_destroy): Likewise.
(__gthread_mutex_lock): Likewise.
(__gthread_mutex_trylock): Likewise.
(__gthread_mutex_unlock): Likewise.
(__gthread_recursive_mutex_lock): Likewise.
(__gthread_recursive_mutex_trylock): Likewise.
(__gthread_recursive_mutex_unlock): Likewise.
(__gthread_recursive_mutex_destroy): Likewise.
libstdc++-v3/ChangeLog:
* include/bits/shared_ptr.h (std::__is_shared_ptr): Remove
unnecessary 'static'.
* include/bits/unique_ptr.h (std::__is_unique_ptr): Likewise.
* include/std/future (std::__create_task_state): Likewise.
* include/std/shared_mutex (_GLIBCXX_GTRHW): Likewise.
(__glibcxx_rwlock_init): Likewise.
(__glibcxx_rwlock_timedrdlock): Likewise.
(__glibcxx_rwlock_timedwrlock): Likewise.
(__glibcxx_rwlock_rdlock): Likewise.
(__glibcxx_rwlock_tryrdlock): Likewise.
(__glibcxx_rwlock_wrlock): Likewise.
(__glibcxx_rwlock_trywrlock): Likewise.
(__glibcxx_rwlock_unlock): Likewise.
(__glibcxx_rwlock_destroy): Likewise.
(__glibcxx_rwlock_init): Likewise.
* include/pstl/algorithm_impl.h
(__pstl::__internal::__set_algo_cut_off): Mark inline.
* include/pstl/unseq_backend_simd.h
(__pstl::__unseq_backend::__lane_size): Mark inline.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Reviewed-by: Jakub Jelinek <jakub@redhat.com>
We have been building a legacy libgcc_s.1 DSO to support code that
was built with older compilers.
From macOS 15, the unwinder no longer exports some of the symbols used
in that library which (a) cuases bootstrap fail and (b) means that the
legacy library is no longer useful.
No open branch of GCC emits references to this library - and any already
-built code that depends on the symbols would need rework anyway.
PR target/116809
libgcc/ChangeLog:
* config.host: Build legacy libgcc_s.1 on hosts before macOS 15.
* config/i386/t-darwin: Remove reference to legacy libgcc_s.1
* config/rs6000/t-darwin: Likewise.
* config/t-darwin-libgccs1: New file.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
Architecture-specific CFI directives are currently declared an processed
among others architecture-independent CFI directives in gcc/dwarf2* files.
This approach creates confusion, specifically in the case of DWARF
instructions in the vendor space and using the same instruction code.
Such a clash currently happen between DW_CFA_GNU_window_save (used on
SPARC) and DW_CFA_AARCH64_negate_ra_state (used on AArch64), and both
having the same instruction code 0x2d.
Then AArch64 compilers generates a SPARC CFI directive (.cfi_window_save)
instead of .cfi_negate_ra_state, contrarilly to what is expected in
[DWARF for the Arm 64-bit Architecture (AArch64)](https://github.com/
ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst).
This refactoring does not solve completely the problem, but improve the
situation by moving some of the processing of those directives (more
specifically their output in the assembly) to the backend via 2 target
hooks:
- DW_CFI_OPRND1_DESC: parse the first operand of the directive (if any).
- OUTPUT_CFI_DIRECTIVE: output the CFI directive as a string.
Additionally, this patch also contains a renaming of an enum used for
return address mangling on AArch64.
gcc/ChangeLog:
* config/aarch64/aarch64.cc
(aarch64_output_cfi_directive): New hook for CFI directives.
(aarch64_dw_cfi_oprnd1_desc): Same.
(TARGET_OUTPUT_CFI_DIRECTIVE): Hook for output_cfi_directive.
(TARGET_DW_CFI_OPRND1_DESC): Hook for dw_cfi_oprnd1_desc.
* config/sparc/sparc.cc
(sparc_output_cfi_directive): New hook for CFI directives.
(sparc_dw_cfi_oprnd1_desc): Same.
(TARGET_OUTPUT_CFI_DIRECTIVE): Hook for output_cfi_directive.
(TARGET_DW_CFI_OPRND1_DESC): Hook for dw_cfi_oprnd1_desc.
* coretypes.h
(struct dw_cfi_node): Forward declaration of CFI type from
gcc/dwarf2out.h.
(enum dw_cfi_oprnd_type): Same.
(enum dwarf_call_frame_info): Same.
* doc/tm.texi: Regenerated from doc/tm.texi.in.
* doc/tm.texi.in: Add doc for new target hooks.
type of enum to allow forward declaration.
* dwarf2cfi.cc
(struct dw_cfi_row): Update the description for window_save
and ra_mangled.
(dwarf2out_frame_debug_cfa_negate_ra_state): Use AArch64 CFI
directive instead of the SPARC one.
(change_cfi_row): Use the right CFI directive's name for RA
mangling.
(output_cfi): Remove explicit architecture-specific CFI
directive DW_CFA_GNU_window_save that falls into default case.
(output_cfi_directive): Use target hook as default.
* dwarf2out.cc (dw_cfi_oprnd1_desc): Use target hook as default.
* dwarf2out.h (enum dw_cfi_oprnd_type): specify underlying type
of enum to allow forward declaration.
(dw_cfi_oprnd1_desc): Call target hook.
(output_cfi_directive): Use dw_cfi_ref instead of struct
dw_cfi_node *.
* hooks.cc
(hook_bool_dwcfi_dwcfioprndtyperef_false): New.
(hook_bool_FILEptr_dwcfiptr_false): New.
* hooks.h
(hook_bool_dwcfi_dwcfioprndtyperef_false): New.
(hook_bool_FILEptr_dwcfiptr_false): New.
* target.def: Documentation for new hooks.
include/ChangeLog:
* dwarf2.h (enum dwarf_call_frame_info): specify underlying
libffi/ChangeLog:
* include/ffi_cfi.h (cfi_negate_ra_state): Declare AArch64 cfi
directive.
libgcc/ChangeLog:
* config/aarch64/aarch64-asm.h (PACIASP): Replace SPARC CFI
directive by AArch64 one.
(AUTIASP): Same.
libitm/ChangeLog:
* config/aarch64/sjlj.S: Replace SPARC CFI directive by
AArch64 one.
gcc/testsuite/ChangeLog:
* g++.target/aarch64/pr94515-1.C: Replace SPARC CFI directive by
AArch64 one.
* g++.target/aarch64/pr94515-2.C: Same.
This patch provides a new handler MD_ARCH_FRAME_STATE_T to hide an
architecture-specific structure containing CIE and FDE data related
to DWARF architecture extensions.
Hiding the architecture-specific attributes behind a handler has the
following benefits:
1. isolating those data from the generic ones in _Unwind_FrameState
2. avoiding casts to custom types.
3. preserving typing information when debugging with GDB, and so
facilitating their printing.
This approach required to add a new header md-unwind-def.h included at
the top of libgcc/unwind-dw2.h, and redirecting to the corresponding
architecture header via a symbolic link.
An obvious drawback is the increase in complexity with macros, and
headers. It also caused a split of architecture definitions between
md-unwind-def.h (types definitions used in unwind-dw2.h) and
md-unwind.h (local types definitions and handlers implementations).
The naming of md-unwind.h with .h extension is a bit misleading as
the file is only included in the middle of unwind-dw2.c. Changing
this naming would require modification of others backends, which I
prefered to abstain from. Overall the benefits are worth the added
complexity from my perspective.
libgcc/ChangeLog:
* Makefile.in: New target for symbolic link to md-unwind-def.h
* config.host: New parameter md_unwind_def_header. Set it to
aarch64/aarch64-unwind-def.h for AArch64 targets, or no-unwind.h
by default.
* config/aarch64/aarch64-unwind.h
(aarch64_pointer_auth_key): Move to aarch64-unwind-def.h
(aarch64_cie_aug_handler): Update.
(aarch64_arch_extension_frame_init): Update.
(aarch64_demangle_return_addr): Update.
* configure.ac: New substitute variable md_unwind_def_header.
* unwind-dw2.h (defined): MD_ARCH_FRAME_STATE_T.
* config/aarch64/aarch64-unwind-def.h: New file.
* configure: Regenerate.
* config/no-unwind.h: Updated comment
The RA state register is local to a frame, so it should not be copied to
the target frame during the context installation.
This patch adds a new backend handler that check whether a register
needs to be skipped or not before its installation.
libgcc/ChangeLog:
* config/aarch64/aarch64-unwind.h
(MD_FRAME_LOCAL_REGISTER_P): new handler checking whether a register
from the current context needs to be skipped before installation into
the target context.
(aarch64_frame_local_register): Likewise.
* unwind-dw2.c (uw_install_context_1): use MD_FRAME_LOCAL_REGISTER_P.
This patch is only a refactoring of the existing implementation
of PAuth and returned-address signing. The existing behavior is
preserved.
_Unwind_FrameState already contains several CIE and FDE information
(see the attributes below the comment "The information we care
about from the CIE/FDE" in libgcc/unwind-dw2.h).
The patch aims at moving the information from DWARF CIE (signing
key stored in the augmentation string) and FDE (the used signing
method) into _Unwind_FrameState along the already-stored CIE and
FDE information.
Note: those information have to be saved in frame_state_reg_info
instead of _Unwind_FrameState as they need to be savable by
DW_CFA_remember_state and restorable by DW_CFA_restore_state, that
both rely on the attribute "prev".
Those new information in _Unwind_FrameState simplifies the look-up
of the signing key when the return address is demangled. It also
allows future signing methods to be easily added.
_Unwind_FrameState is not a part of the public API of libunwind,
so the change is backward compatible.
A new architecture-specific handler MD_ARCH_EXTENSION_FRAME_INIT
allows to reset values (if needed) in the frame state and unwind
context before changing the frame state to the caller context.
A new architecture-specific handler MD_ARCH_EXTENSION_CIE_AUG_HANDLER
isolates the architecture-specific augmentation strings in AArch64
backend, and allows others architectures to reuse augmentation
strings that would have clashed with AArch64 DWARF extensions.
aarch64_demangle_return_addr, DW_CFA_AARCH64_negate_ra_state and
DW_CFA_val_expression cases in libgcc/unwind-dw2-execute_cfa.h
were documented to clarify where the value of the RA state register
is stored (FS and CONTEXT respectively).
libgcc/ChangeLog:
* config/aarch64/aarch64-unwind.h
(AARCH64_DWARF_RA_STATE_MASK): The mask for RA state register.
(aarch64_ra_signing_method_t): The diversifiers used to sign a
function's return address.
(aarch64_pointer_auth_key): The key used to sign a function's
return address.
(aarch64_cie_signed_with_b_key): Deleted as the signing key is
available now in _Unwind_FrameState.
(MD_ARCH_EXTENSION_CIE_AUG_HANDLER): New CIE augmentation string
handler for architecture extensions.
(MD_ARCH_EXTENSION_FRAME_INIT): New architecture-extension
initialization routine for DWARF frame state and context before
execution of DWARF instructions.
(aarch64_context_ra_state_get): Read RA state register from CONTEXT.
(aarch64_ra_state_get): Read RA state register from FS.
(aarch64_ra_state_set): Write RA state register into FS.
(aarch64_ra_state_toggle): Toggle RA state register in FS.
(aarch64_cie_aug_handler): Handler AArch64 augmentation strings.
(aarch64_arch_extension_frame_init): Initialize defaults for the
signing key (PAUTH_KEY_A), and RA state register (RA_no_signing).
(aarch64_demangle_return_addr): Rely on the frame registers and
the signing_key attribute in _Unwind_FrameState.
* unwind-dw2-execute_cfa.h:
Use the right alias DW_CFA_AARCH64_negate_ra_state for __aarch64__
instead of DW_CFA_GNU_window_save.
(DW_CFA_AARCH64_negate_ra_state): Save the signing method in RA
state register. Toggle RA state register without resetting 'how'
to REG_UNSAVED.
* unwind-dw2.c:
(extract_cie_info): Save the signing key in the current
_Unwind_FrameState while parsing the augmentation data.
(uw_frame_state_for): Reset some attributes related to architecture
extensions in _Unwind_FrameState.
(uw_update_context): Move authentication code to AArch64 unwinding.
* unwind-dw2.h (enum register_rule): Give a name to the existing
enum for the register rules, and replace 'unsigned char' by 'enum
register_rule' to facilitate debugging in GDB.
(_Unwind_FrameState): Add a new architecture-extension attribute
to store the signing key.
For libgcc, we have (so far) supported building a DSO that supports
earlier versions of the OS than the target. From macOS 11, there are
APIs that do not exist on earlier OS versions, so limit the libgcc
range to macOS11..current.
libgcc/ChangeLog:
* config.host: From macOS 11, limit earliest macOS support
to macOS 11.
* config/t-darwin-min-11: New file.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
mips16.S was missing since
commit 29b7454553
Date: Thu Jun 1 10:14:24 2023 +0800
MIPS: Add speculation_barrier support
Without mips16.S included, some symbols will miss for mips16, and
so some software will fail to build.
libgcc/ChangeLog:
* config/mips/lib1funcs.S: Includes mips16.S.
This patch adds missing assembly directives to the CMSE library wrapper to call
functions with attribute cmse_nonsecure_call. Without the .type directive the
linker will fail to produce the correct veneer if a call to this wrapper
function is to far from the wrapper itself. The .size was added for
completeness, though we don't necessarily have a usecase for it.
libgcc/ChangeLog:
PR target/115360
* config/arm/cmse_nonsecure_call.S: Add .type and .size directives.
Adding stub '_Unwind_Backtrace', '_Unwind_GetIPInfo' functions is necessary
for linking libbacktrace, as a normal (non-'LIBGFOR_MINIMAL') configuration
of libgfortran wants to do, for example.
The file 'libgcc/config/nvptx/unwind-nvptx.c' is copied from
'libgcc/config/gcn/unwind-gcn.c'.
libgcc/ChangeLog:
* config/nvptx/t-nvptx: Add unwind-nvptx.c.
* config/nvptx/unwind-nvptx.c: New file.
Co-authored-by: Andrew Stubbs <ams@gcc.gnu.org>
The CPU features initialization code uses CPUID registers (rather than
HWCAP). The equality comparisons it uses are incorrect: for example FEAT_SVE
is not set if SVE2 is available. Using HWCAPs for these is both simpler and
correct. The initialization must also be done atomically to avoid multiple
threads causing corruption due to non-atomic RMW accesses to the global.
libgcc:
PR target/115342
* config/aarch64/cpuinfo.c (__init_cpu_features_constructor):
Use HWCAP where possible. Use atomic write for initialization.
Fix FEAT_PREDRES comparison.
(__init_cpu_features_resolver): Use atomic load for correct
initialization.
(__init_cpu_features): Likewise.
The function attributes 'constructor', 'destructor', and 'init_priority' now
work, as do the C++ features making use of this. Test cases with effective
target 'global_constructor' and 'init_priority' now generally work, and
'check-gcc-c++' test results greatly improve; no more
"sorry, unimplemented: global constructors not supported on this target".
For proper execution test results, this depends on
<96f8fc59a7>
"ld: Global constructor/destructor support".
gcc/
* config/nvptx/nvptx.h: Configure global constructor, destructor
support.
gcc/testsuite/
* gcc.dg/no_profile_instrument_function-attr-1.c: GCC/nvptx is
'NO_DOT_IN_LABEL' but not 'NO_DOLLAR_IN_LABEL', so '$' may apper
in identifiers.
* lib/target-supports.exp
(check_effective_target_global_constructor): Enable for nvptx.
libgcc/
* config/nvptx/crt0.c (__gbl_ctors): New weak function.
(__main): Invoke it.
* config/nvptx/gbl-ctors.c: New.
* config/nvptx/t-nvptx: Configure global constructor, destructor
support.
The libgcc implementation of __clzhi2 can be tweaked by
one cycle in some situations by re-arranging the instructions.
It also reduces the WCET by 1 cycle.
libgcc/
PR target/115065
* config/avr/lib1funcs.S (__clzhi2): Tweak.
This supports __powidf2 by means of a double wrapper for already
existing f7_powi (renamed to __f7_powi by f7-renames.h).
It tweaks the implementation so that it does not perform trivial
multiplications with 1.0 any more, but instead uses a move.
It also fixes the last statement of f7_powi, which was wrong.
Notice that f7_powi was unused until now.
PR target/114981
libgcc/config/avr/libf7/
* libf7-common.mk (F7_ASM_PARTS): Add D_powi
* libf7-asm.sx (F7MOD_D_powi_, __powidf2): New module and function.
* libf7.c (f7_powi): Fix last (wrong) statement.
Tweak trivial multiplications with 1.0.
gcc/testsuite/
* gcc.target/avr/pr114981-powil.c: New test.
Reuse MinGW definitions from i386 for libgcc. Move reused files to
libgcc/config/mingw folder.
libgcc/ChangeLog:
* config.host: Add aarch64-w64-mingw32 target. Adjust targets
after moving MinGW files.
* config/i386/t-gthr-win32: Move to...
* config/mingw/t-gthr-win32: ...here.
* config/i386/t-mingw-pthread: Move to...
* config/mingw/t-mingw-pthread: ...here.
* config/aarch64/t-no-eh: New file. EH is not yet implemented for
the target, and the default definition should be disabled.
Support for Solaris 11.3 had already been obsoleted in GCC 13. However,
since the only Solaris system in the cfarm was running 11.3, I've kept
it in tree until now when both Solaris 11.4/SPARC and x86 systems have
been added.
This patch actually removes the Solaris 11.3 support. Apart from
several minor simplifications, there are two more widespread changes:
* In Solaris 11.4, libsocket and libnsl were folded into libc, so
there's no longer a need to link them explictly.
* Since Solaris 11.4, Solaris includes all crts needed by gcc (like
crt1.o and gcrt1.o) with the base system. All workarounds to provide
fallbacks can thus go.
Bootstrapped without regressions on i386-pc-solaris2.11 and
sparc-sun-solaris2.11 (as/ld, gas/ld, and gas/gld) as well as Solaris
11.3/x86 to ascertain that version is actually rejected.
2024-04-30 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
c++tools:
* configure.ac (ax_lib_socket_nsl.m4): Don't sinclude.
(AX_LIB_SOCKET_NSL): Don't call.
(NETLIBS): Remove.
* configure: Regenerate.
* Makefile.in (NETLIBS): Remove.
(g++-mapper-server$(exeext)): Remove $(NETLIBS).
gcc:
* config.gcc: Move *-*-solaris2.11.[0-3]* to unsupported list.
<*-*-solaris2*> (default_use_cxa_atexit): Set unconditionally.
* configure.ac (AX_LIB_SOCKET_NSL): Don't call.
(NETLIBS): Remove.
(gcc_cv_ld_aligned_shf_merge): Remove.
(hidden_linkonce) <i?86-*-solaris2* | x86_64-*-solaris2*>: Remove.
(gcc_cv_target_dl_iterate_phdr) <*-*-solaris2*>: Always set to yes.
* Makefile.in (NETLIBS): Remove.
* configure, config.in, aclocal.m4: Regenerate.
* config/sol2.h: Don't check HAVE_SOLARIS_CRTS.
(STARTFILE_SPEC): Remove !HAVE_SOLARIS_CRTS case.
[USE_GLD] (LINK_EH_SPEC): Remove TARGET_DL_ITERATE_PHDR guard.
* config/i386/i386.cc (USE_HIDDEN_LINKONCE): Remove guard.
* varasm.cc (mergeable_string_section): Remove
HAVE_LD_ALIGNED_SHF_MERGE handling.
(mergeable_constant_section): Likewise.
* doc/install.texi (Specific,i?86-*-solaris2*): Reference Solaris
11.4 only.
(Specific, *-*-solaris2*): Document Solaris 11.3 removal. Remove
11.3 references and caveats. Update for 11.4.
gcc/cp:
* Make-lang.in (cc1plus$(exeext)): Remove $(NETLIBS).
gcc/objcp:
* Make-lang.in (cc1objplus$(exeext)): Remove $(NETLIBS).
gcc/testsuite:
* lib/target-supports.exp (check_effective_target_pie): Always
enable on *-*-solaris2*.
libgcc:
* configure.ac <*-*-solaris2*> (libgcc_cv_solaris_crts): Remove.
* config.host <*-*-solaris2*>: Remove !libgcc_cv_solaris_crts
support.
* configure, config.in: Regenerate.
* config/sol2/gmon.c (internal_mcount) [!HAVE_SOLARIS_CRTS]: Remove.
* config/i386/sol2-c1.S, config/sparc/sol2-c1.S: Remove.
* config/sol2/t-sol2 (crt1.o, gcrt1.o): Remove.
libstdc++-v3:
* testsuite/lib/dg-options.exp (add_options_for_net_ts)
<*-*-solaris2*>: Don't link with -lsocket -lnsl.
1 At point <https://github.com/riscv/riscv-bfloat16>,
BF16 has already been completed "post public review".
2 LLVM has also added support for RISCV BF16 in
<https://reviews.llvm.org/D151313> and
<https://reviews.llvm.org/D150929>.
3 According to the discussion <https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/367>,
this use __bf16 and use DF16b in riscv_mangle_type like x86.
Below test are passed for this patch
* The riscv fully regression test.
gcc/ChangeLog:
* config/riscv/iterators.md: New mode iterator HFBF.
* config/riscv/riscv-builtins.cc (riscv_init_builtin_types):
Initialize data type _Bfloat16.
* config/riscv/riscv-modes.def (FLOAT_MODE): New.
(ADJUST_FLOAT_FORMAT): New.
* config/riscv/riscv.cc (riscv_mangle_type): Support for BFmode.
(riscv_scalar_mode_supported_p): Ditto.
(riscv_libgcc_floating_mode_supported_p): Ditto.
(riscv_init_libfuncs): Set the conversion method for BFmode and
HFmode.
(riscv_block_arith_comp_libfuncs_for_mode): Set the arithmetic
and comparison libfuncs for the mode.
* config/riscv/riscv.md (mode" ): Add BF.
(movhf): Support for BFmode.
(mov<mode>): Ditto.
(*movhf_softfloat): Ditto.
(*mov<mode>_softfloat): Ditto.
libgcc/ChangeLog:
* config/riscv/sfp-machine.h (_FP_NANFRAC_B): New.
(_FP_NANSIGN_B): Ditto.
* config/riscv/t-softfp32: Add support for BF16 libfuncs.
* config/riscv/t-softfp64: Ditto.
* soft-fp/floatsibf.c: For si -> bf16.
* soft-fp/floatunsibf.c: For unsi -> bf16.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/bf16_arithmetic.c: New test.
* gcc.target/riscv/bf16_call.c: New test.
* gcc.target/riscv/bf16_comparison.c: New test.
* gcc.target/riscv/bf16_float_libcall_convert.c: New test.
* gcc.target/riscv/bf16_integer_libcall_convert.c: New test.
Co-authored-by: Jin Ma <jinma@linux.alibaba.com>
The Intel Decimal Floating-Point Math Library is available as open-source on Netlib[1].
[1] https://www.netlib.org/misc/intel/.
libgcc/config/libbid/ChangeLog:
* bid128_fma.c (add_and_round): Fix bug: the result
of (+5E+368)*(+10E-34)+(-10E+369) was returning
-9999999999999999999999999999999999E+336 instead of expected
result -1000000000000000000000000000000000E+337.
(bid128_ext_fma): Ditto.
(bid64qqq_fma): Ditto.
* bid128_noncomp.c: Change return type of bid128_class from
int to class_t.
* bid128_round_integral.c: Add default case to avoid compiler
warning.
* bid128_string.c (bid128_to_string): Replace 0x30 with '0'
for zero digit.
(bid128_from_string): Ditto.
* bid32_to_bid128.c (bid128_to_bid32): Fix Bug. In addition
to the INEXACT flag, the UNDERFLOW flag needs to be set (and
was not) when converting an input such as
+6931674235302037148946035460357709E+1857 to +1000000E-101
* bid32_to_bid64.c (bid64_to_bid32): fix Bug, In addition to
the INEXACT flag, the UNDERFLOW flag needs to be set (and was
not) when converting an input such as +9999999000000001E-111
to +1000000E-101. Furthermore, significant bits of NaNs are
set correctly now. For example, 0x7c00003b9aca0000 was
returning 0x7c000002 instead of 0x 7c000100.
* bid64_noncomp.c: Change return type of bid64_class from int
to class_t.
* bid64_round_integral.c (bid64_round_integral_exact): Add
default case to avoid compiler warning.
* bid64_string.c (bid64_from_string): Fix bug for rounding
up. The input string "10000000000000000" was returning
+1000000000000001E+1 instead of +1000000000000000E+1.
* bid64_to_bid128.c (bid128_to_bid64): Fix bug, in addition to
the INEXACT flag, the UNDERFLOW flag needs to be set (and was
not) when converting an input such as
+9999999999999999999999999999999999E-417 to
+1000000000000000E-398.
* bid_binarydecimal.c (bid32_to_binary64): Fix bug for
conversion between binary and bid types. For example,
0x7c0F4240 was returning 0x7FFFA12000000000 instead of
expected double precision 0x7FF8000000000000.
(binary64_to_bid32): Ditto.
(binary80_to_bid32): Ditto.
(binary128_to_bid32): Ditto.
(binary80_to_bid64): Ditto.
(binary128_to_bid64): Ditto.
* bid_conf.h (BID_HIGH_128W): New macro.
(BID_LOW_128W): Ditto.
* bid_functions.h (__ENABLE_BINARY80__): Ditto.
(ALIGN): Ditto.
* bid_inline_add.h (get_add128): Add default case to avoid compiler
warning.
* bid_internal.h (get_BID64): Ditto.
(fast_get_BID64_check_OF): Ditto.
(ALIGN): New macro.
Co-authored-by: Anderson, Cristina S <cristina.s.anderson@intel.com>
Co-authored-by: Akkas, Ahmet <ahmet.akkas@intel.com>
Co-authored-by: Cornea, Marius <marius.cornea@intel.com>
libgcc/
* libgcov-util.c (tag_counters): Swap order of arguments to xcalloc.
(topen_to_memory_representation): Likewise.
Signed-off-by: Peter Damianov <peter0x44@disroot.org>
On Mon, Apr 29, 2024 at 01:44:24PM +0000, Joseph Myers wrote:
> > glibc 2.34 and later doesn't have separate libpthread (libpthread.so.0 is a
> > dummy shared library with just some symbol versions for compatibility, but
> > all the pthread_* APIs are in libc.so.6).
>
> I suspect this has caused link failures in the glibc testsuite for Hurd,
> which still has separate libpthread.
>
> https://sourceware.org/pipermail/libc-testresults/2024q2/012556.html
So like this then?
2024-04-30 Jakub Jelinek <jakub@redhat.com>
* gthr.h (GTHREAD_USE_WEAK): Don't redefine to 0 for glibc 2.34+
on GNU Hurd.
glibc 2.34 and later doesn't have separate libpthread (libpthread.so.0 is a
dummy shared library with just some symbol versions for compatibility, but
all the pthread_* APIs are in libc.so.6).
So, we don't need to do the .weakref dances to check whether a program
has been linked with -lpthread or not, in dynamically linked apps those
will be always true anyway.
In -static linking, this fixes various issues people had when only linking
some parts of libpthread.a and getting weird crashes. A hack for that was
what e.g. some Fedora glibcs used, where libpthread.a was a library
containing just one giant *.o file which had all the normal libpthread.a
*.o files linked with -r together.
libstdc++-v3 actually does something like this already since r10-10928,
the following patch is meant to fix it even for libgfortran, libobjc and
whatever else uses gthr.h.
2024-04-25 Jakub Jelinek <jakub@redhat.com>
* gthr.h (GTHREAD_USE_WEAK): Redefine to 0 for GLIBC 2.34 or later.
The following testcase is miscompiled because the code to decrement
vn on negative value with all ones in most significant limb (even partial)
and 0 in most significant bit of the second most significant limb doesn't
take into account the case where all bits below the most significant limb
are zero. This has been a problem both in the version before yesterday's
commit where it has been done only if un was one shorter than vn before this
decrement, and is now problem even more often when it is done earlier.
When we decrement vn in such case and negate it, we end up with all 0s in
the v2 value, so have both the problems with UB on __builtin_clz* and the
expectations of the algorithm that the divisor has most significant bit set
after shifting, plus when the decremented vn is 1 it can SIGFPE on division
by zero even when it is not division by zero etc. Other values shouldn't
get 0 in the new most significant limb after negation, because the
bitint_reduce_prec canonicalization should reduce prec if the second most
significant limb is all ones and if that limb is all zeros, if at least
one limb below it is non-zero, carry in will make it non-zero.
The following patch fixes it by checking if at least one bit below the
most significant limb is non-zero, in that case it decrements, otherwise
it will do nothing (but e.g. for the un < vn case that also means the
divisor is large enough that the result should be q 0 r u).
2024-04-18 Jakub Jelinek <jakub@redhat.com>
PR libgcc/114762
* libgcc2.c (__divmodbitint4): Perform the decrement on negative
v with most significant limb all ones and the second least
significant limb with most significant bit clear always, regardless of
un < vn.
* gcc.dg/torture/bitint-70.c: New test.
The following testcase aborts on aarch64-linux but does not on x86_64-linux.
In both cases there is UB in the __divmodbitint4 implemenetation.
When the divisor is negative with most significant limb (even when partial)
all ones, has at least 2 limbs and the second most significant limb has the
most significant bit clear, when this number is negated, it will have 0
in the most significant limb.
Already in the PR114397 r14-9592 fix I was dealing with such divisors, but
thought the problem is only if because of that un < vn doesn't imply the
quotient is 0 and remainder u.
But as this testcase shows, the problem is with such divisors always.
What happens is that we use __builtin_clz* on the most significant limb,
and assume it will not be 0 because that is UB for the builtins.
Normally the most significant limb of the divisor shouldn't be 0, as
guaranteed by the bitint_reduce_prec e.g. for the positive numbers, unless
the divisor is just 0 (but for vn == 1 we have special cases).
The following patch moves the handling of this corner case a few lines
earlier before the un < vn check, because adjusting the vn later is harder.
2024-04-18 Jakub Jelinek <jakub@redhat.com>
PR libgcc/114755
* libgcc2.c (__divmodbitint4): Perform the decrement on negative
v with most significant limb all ones and the second least
significant limb with most significant bit clear always, regardless of
un < vn.
* gcc.dg/torture/bitint-69.c: New test.
cppcheck apparently warns on the | !!sticky part of the expression and
using | (!!sticky) quiets it up (it is correct as is).
The following patch adds the ()s, and also adds them around mant >> 1 just
in case it makes it clearer to all readers that the expression is parsed
that way already.
2024-04-15 Jakub Jelinek <jakub@redhat.com>
PR libgcc/114689
* config/m68k/fpgnulib.c (__truncdfsf2): Add parentheses around
!!sticky bitwise or operand to quiet up cppcheck. Add parentheses
around mant >> 1 bitwise or operand.
This patch adds support for C23's _BitInt for the AArch64 port when compiling
for little endianness. Big Endianness requires further target-agnostic
support and we therefor disable it for now.
gcc/ChangeLog:
* config/aarch64/aarch64.cc (TARGET_C_BITINT_TYPE_INFO): Declare MACRO.
(aarch64_bitint_type_info): New function.
(aarch64_return_in_memory_1): Return large _BitInt's in memory.
(aarch64_function_arg_alignment): Adapt to correctly return the ABI
mandated alignment of _BitInt(N) where N > 128 as the alignment of
TImode.
(aarch64_composite_type_p): Return true for _BitInt(N), where N > 128.
libgcc/ChangeLog:
* config/aarch64/t-softfp (softfp_extras): Add floatbitinthf,
floatbitintbf, floatbitinttf and fixtfbitint.
* config/aarch64/libgcc-softfp.ver (GCC_14.0.0): Add __floatbitinthf,
__floatbitintbf, __floatbitinttf and __fixtfbitint.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/bitint-alignments.c: New test.
* gcc.target/aarch64/bitint-args.c: New test.
* gcc.target/aarch64/bitint-sizes.c: New test.
* gcc.target/aarch64/bitfield-bitint-abi.h: New header.
* gcc.target/aarch64/bitfield-bitint-abi-align16.c: New test.
* gcc.target/aarch64/bitfield-bitint-abi-align8.c: New test.
There is currently no unwinding implementation.
libgcc/ChangeLog:
* config.host: Recognize aarch64*-*-gnu* hosts.
* config/aarch64/gnu-unwind.h: New file.
* config/aarch64/heap-trampoline.c
(allocate_trampoline_page): Support GNU/Hurd.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
This patch adds support in gcc+gcov for modified condition/decision
coverage (MC/DC) with the -fcondition-coverage flag. MC/DC is a type of
test/code coverage and it is particularly important for safety-critical
applicaitons in industries like aviation and automotive. Notably, MC/DC
is required or recommended by:
* DO-178C for the most critical software (Level A) in avionics.
* IEC 61508 for SIL 4.
* ISO 26262-6 for ASIL D.
From the SQLite webpage:
Two methods of measuring test coverage were described above:
"statement" and "branch" coverage. There are many other test
coverage metrics besides these two. Another popular metric is
"Modified Condition/Decision Coverage" or MC/DC. Wikipedia defines
MC/DC as follows:
* Each decision tries every possible outcome.
* Each condition in a decision takes on every possible outcome.
* Each entry and exit point is invoked.
* Each condition in a decision is shown to independently affect
the outcome of the decision.
In the C programming language where && and || are "short-circuit"
operators, MC/DC and branch coverage are very nearly the same thing.
The primary difference is in boolean vector tests. One can test for
any of several bits in bit-vector and still obtain 100% branch test
coverage even though the second element of MC/DC - the requirement
that each condition in a decision take on every possible outcome -
might not be satisfied.
https://sqlite.org/testing.html#mcdc
MC/DC comes in different flavors, the most important being unique cause
MC/DC and masking MC/DC. This patch implements masking MC/DC, which is
works well with short circuiting semantics, and according to John
Chilenski's "An Investigation of Three Forms of the Modified Condition
Decision Coverage (MCDC) Criterion" (2001) is as good as unique cause at
catching bugs.
Whalen, Heimdahl, and De Silva "Efficient Test Coverage Measurement for
MC/DC" describes an algorithm for finding the masking table from an AST
walk, but my algorithm figures this out by analyzing the control flow
graph. The CFG is considered a reduced ordered binary decision diagram
and an input vector a path through the BDD, which is recorded. Specific
edges will mask ("null out") the contribution from earlier path
segments, which can be determined by finding short circuit endpoints.
Masking is most easily understood as circuiting of terms in the
reverse-ordered Boolean function, and the masked conditions do not
affect the decision like short-circuited conditions do not affect the
decision.
A tag/discriminator mapping from gcond->uid is created during
gimplification and made available through the function struct. The
values are unimportant as long as basic conditions constructed from a
single Boolean expression are given the same identifier. This happens in
the breaking down of ANDIF/ORIF trees, so the coverage generally works
well for frontends that create such trees.
Like Whalen et al this implementation records coverage in fixed-size
bitsets which gcov knows how to interpret. Recording conditions only
requires a few bitwise operations per condition and is very fast, but
comes with a limit on the number of terms in a single boolean
expression; the number of bits in a gcov_unsigned_type (which is usually
typedef'd to uint64_t). For most practical purposes this is acceptable,
and by default a warning will be issued if gcc cannot instrument the
expression. This is a practical limitation in the implementation, and
not a limitation of the algorithm, so support for more conditions can be
supported by introducing arbitrary-sized bitsets.
In action it looks pretty similar to the branch coverage. The -g short
opt carries no significance, but was chosen because it was an available
option with the upper-case free too.
gcov --conditions:
3: 17:void fn (int a, int b, int c, int d) {
3: 18: if ((a && (b || c)) && d)
conditions covered 3/8
condition 0 not covered (true false)
condition 1 not covered (true)
condition 2 not covered (true)
condition 3 not covered (true)
1: 19: x = 1;
-: 20: else
2: 21: x = 2;
3: 22:}
gcov --conditions --json-format:
"conditions": [
{
"not_covered_false": [
0
],
"count": 8,
"covered": 3,
"not_covered_true": [
0,
1,
2,
3
]
}
],
Expressions with constants may be heavily rewritten before it reaches
the gimplification, so constructs like int x = a ? 0 : 1 becomes
_x = (_a == 0). From source you would expect coverage, but it gets
neither branch nor condition coverage. The same applies to expressions
like int x = 1 || a which are simply replaced by a constant.
The test suite contains a lot of small programs and functions. Some of
these were designed by hand to test for specific behaviours and graph
shapes, and some are previously-failed test cases in other programs
adapted into the test suite.
gcc/ChangeLog:
* builtins.cc (expand_builtin_fork_or_exec): Check
condition_coverage_flag.
* collect2.cc (main): Add -fno-condition-coverage to OBSTACK.
* common.opt: Add new options -fcondition-coverage and
-Wcoverage-too-many-conditions.
* doc/gcov.texi: Add --conditions documentation.
* doc/invoke.texi: Add -fcondition-coverage documentation.
* function.cc (free_after_compilation): Free cond_uids.
* function.h (struct function): Add cond_uids.
* gcc.cc: Link gcov on -fcondition-coverage.
* gcov-counter.def (GCOV_COUNTER_CONDS): New.
* gcov-dump.cc (tag_conditions): New.
* gcov-io.h (GCOV_TAG_CONDS): New.
(GCOV_TAG_CONDS_LENGTH): New.
(GCOV_TAG_CONDS_NUM): New.
* gcov.cc (class condition_info): New.
(condition_info::condition_info): New.
(condition_info::popcount): New.
(struct coverage_info): New.
(add_condition_counts): New.
(output_conditions): New.
(print_usage): Add -g, --conditions.
(process_args): Likewise.
(output_intermediate_json_line): Output conditions.
(read_graph_file): Read condition counters.
(read_count_file): Likewise.
(file_summary): Print conditions.
(accumulate_line_info): Accumulate conditions.
(output_line_details): Print conditions.
* gimplify.cc (next_cond_uid): New.
(reset_cond_uid): New.
(shortcut_cond_r): Set condition discriminator.
(tag_shortcut_cond): New.
(gimple_associate_condition_with_expr): New.
(shortcut_cond_expr): Set condition discriminator.
(gimplify_cond_expr): Likewise.
(gimplify_function_tree): Call reset_cond_uid.
* ipa-inline.cc (can_early_inline_edge_p): Check
condition_coverage_flag.
* ipa-split.cc (pass_split_functions::gate): Likewise.
* passes.cc (finish_optimization_passes): Likewise.
* profile.cc (struct condcov): New declaration.
(cov_length): Likewise.
(cov_blocks): Likewise.
(cov_masks): Likewise.
(cov_maps): Likewise.
(cov_free): Likewise.
(instrument_decisions): New.
(read_thunk_profile): Control output to file.
(branch_prob): Call find_conditions, instrument_decisions.
(init_branch_prob): Add total_num_conds.
(end_branch_prob): Likewise.
* tree-core.h (struct tree_exp): Add condition_uid.
* tree-profile.cc (struct conds_ctx): New.
(CONDITIONS_MAX_TERMS): New.
(EDGE_CONDITION): New.
(topological_cmp): New.
(index_of): New.
(single_p): New.
(single_edge): New.
(contract_edge_up): New.
(struct outcomes): New.
(conditional_succs): New.
(condition_index): New.
(condition_uid): New.
(masking_vectors): New.
(emit_assign): New.
(emit_bitwise_op): New.
(make_top_index_visit): New.
(make_top_index): New.
(paths_between): New.
(struct condcov): New.
(cov_length): New.
(cov_blocks): New.
(cov_masks): New.
(cov_maps): New.
(cov_free): New.
(find_conditions): New.
(struct counters): New.
(find_counters): New.
(resolve_counter): New.
(resolve_counters): New.
(instrument_decisions): New.
(tree_profiling): Check condition_coverage_flag.
(pass_ipa_tree_profile::gate): Likewise.
* tree.h (SET_EXPR_UID): New.
(EXPR_COND_UID): New.
libgcc/ChangeLog:
* libgcov-merge.c (__gcov_merge_ior): New.
gcc/testsuite/ChangeLog:
* lib/gcov.exp: Add condition coverage test function.
* g++.dg/gcov/gcov-18.C: New test.
* gcc.misc-tests/gcov-19.c: New test.
* gcc.misc-tests/gcov-20.c: New test.
* gcc.misc-tests/gcov-21.c: New test.
* gcc.misc-tests/gcov-22.c: New test.
* gcc.misc-tests/gcov-23.c: New test.
A few HWCAP entries are missing from aarch64/cpuinfo.c. This results in build
errors on older machines.
libgcc/
* config/aarch64/cpuinfo.c: Add HWCAP_EVTSTRM, HWCAP_CRC32, HWCAP_CPUID,
HWCAP_PACA and HWCAP_PACG.
Like in r12-7519-g027e30414492d50feb2854aff38227b14300dc4b, I've done
git grep -v 'long long\|optab optab\|template template\|double double' | grep ' \([a-zA-Z]\+\) \1 '
This is just part of the changes, mostly for non-gcc directories.
I'll try to get to the rest soon. Obviously, the above command also
finds cases which are correct as is and shouldn't be changed, so one
needs to manually inspect everything.
I'd hope most of it is pretty obvious, but the config/ and libstdc++-v3/
hunks include a tweak in a license wording, though other copies of the
similar license have the wording right.
2024-04-02 Jakub Jelinek <jakub@redhat.com>
* Makefile.tpl: Fix duplicated words; returns returns ->
returns.
config/
* lcmessage.m4: Fix duplicated words; can can -> can,
package package -> package.
libdecnumber/
* decCommon.c (decFinalize): Fix duplicated words in
comment; the the -> the.
libgcc/
* unwind-dw2-fde.c (struct fde_accumulator): Fix duplicated
words in comment; is is -> is.
libgfortran/
* configure.host: Fix duplicated words; the the -> the.
libgm2/
* configure.host: Fix duplicated words; the the -> the.
libgomp/
* libgomp.texi (OpenMP 5.2): Fix duplicated words; with with ->
with.
(omp_target_associate_ptr): Fix duplicated words; either either ->
either.
(omp_init_allocator): Fix duplicated words; be be -> be.
(omp_realloc): Fix duplicated words; is is -> is.
(OMP_ALLOCATOR): Fix duplicated words; other other -> other.
* priority_queue.h (priority_queue_multi_p): Fix duplicated words;
to to -> to.
libiberty/
* regex.c (byte_re_match_2_internal): Fix duplicated words in comment;
next next -> next.
* dyn-string.c (dyn_string_init): Fix duplicated words in comment;
of of -> of.
libitm/
* beginend.cc (GTM::gtm_thread::begin_transaction): Fix duplicated
words in comment; not not -> not to.
libobjc/
* init.c (duplicate_classes): Fix duplicated words in comment; in in
-> in.
* sendmsg.c (__objc_prepare_dtable_for_class): Fix duplicated words
in comment; the the -> the.
* encoding.c (objc_layout_structure): Likewise.
libstdc++-v3/
* acinclude.m4: Fix duplicated words; file file -> file can.
* configure.host: Fix duplicated words; the the -> the.
libvtv/
* vtv_rts.cc (vtv_fail): Fix duplicated words; to to -> to.
* vtv_fail.cc (vtv_fail): Likewise.
Original bug report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111731
The unwinding mechanism registers both the code range and the unwind
table itself within a b-tree lookup structure. That data structure
assumes that is consists of non-overlappping intervals. This
becomes a problem if the unwinding table is embedded within the
code itself, as now the intervals do overlap.
To fix this problem we now keep the unwind tables in a separate
b-tree, which prevents the overlap.
libgcc/ChangeLog:
PR libgcc/111731
* unwind-dw2-fde.c: Split unwind ranges if they contain the
unwind table.
The Knuth's division algorithm relies on the number of dividend limbs
to be greater ore equal to number of divisor limbs, which is why
I've added a special case for un < vn at the start of __divmodbitint4.
Unfortunately, my assumption that it then implies abs(v) > abs(u) and
so quotient must be 0 and remainder same as dividend is incorrect.
This is because this check is done before negation of the operands.
While bitint_reduce_prec reduces precision from clearly useless limbs,
the problematic case is when the dividend is unsigned or non-negative
and divisor is negative. We can have limbs (from MS to LS):
dividend: 0 M ?...
divisor: -1 -N ?...
where M has most significant bit set and M >= N (if M == N then it
also the following limbs matter) and the most significant limbs can
be even partial. In this case, the quotient should be -1 rather than
0. bitint_reduce_prec will reduce the precision of the dividend so
that M is the most significant limb, but can't reduce precision of the
divisor to more than having the -1 as most significant limb, because
-N doesn't have the most significant bit set.
The following patch fixes it by detecting this problematic case in the
un < vn handling, and instead of assuming q is 0 and r is u will
decrease vn by 1 because it knows the later code will negate the divisor
and it can be then expressed after negation in one fewer limbs.
2024-03-21 Jakub Jelinek <jakub@redhat.com>
PR libgcc/114397
* libgcc2.c (__divmodbitint4): Don't assume un < vn always means
abs(v) > abs(u), check for a special case of un + 1 == vn where
u is non-negative and v negative and after v's negation vn could
be reduced by 1.
* gcc.dg/torture/bitint-65.c: New test.
Tested with some simple toy examples where an exception is thrown in the
signal handler.
libgcc/ChangeLog:
* config/i386/gnu-unwind.h: Support unwinding x86_64 signal frames.
Signed-off-by: Flavio Cruz <flaviocruz@gmail.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
While for __mulbitint3 we actually don't negate anything and perform the
multiplication in unsigned style always, for __divmodbitint4 if the operands
aren't unsigned and are negative, we negate them first and then try to
negate them as needed at the end.
quotient is negated if just one of the operands was negated and the other
wasn't or vice versa, and remainder is negated if the first operand was
negated.
The case which doesn't work correctly is if due to limited range of the
operands we perform the division/modulo in some smaller number of limbs
and then extend it to the desired precision of the quotient and/or
remainder results. If they aren't negated, the extension is done with
memset to 0, if they are negated, the extension was done with memset
to -1. The problem is that if the quotient or remainder is zero,
then bitint_negate negates it again to zero (that is ok), but we should
then extend with memset to 0, not memset to -1.
The following patch achieves that by letting bitint_negate also check if
the negated operand is zero and changes the memset argument based on that.
2024-03-15 Jakub Jelinek <jakub@redhat.com>
PR libgcc/114327
* libgcc2.c (bitint_negate): Return UWtype bitwise or of all the limbs
before negation rather than void.
(__divmodbitint4): Determine whether to fill in the upper limbs after
negation based on whether bitint_negate returned 0 or non-zero, rather
then always filling with -1.
* gcc.dg/torture/bitint-63.c: New test.
This arranges that the byte order of the instruction sequences is
independent of the byte order of memory.
libgcc/ChangeLog:
* config/aarch64/heap-trampoline.c
(aarch64_trampoline_insns): Arrange to encode instructions as a
byte array so that the order is independent of memory byte order.
(struct aarch64_trampoline): Likewise.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
This allows the same trampoline pattern to be used on all linux variants
rather than restricting it to linux gnu.
PR target/113971
libgcc/ChangeLog:
* config/aarch64/heap-trampoline.c: Allow all linux variants.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
Fix a typo in __gthr_win32_abs_to_rel_time that caused it to return a
relative time in seconds instead of milliseconds. As a consequence,
__gthr_win32_cond_timedwait called SleepConditionVariableCS with a
1000x shorter timeout; this caused ~1000x more spurious wakeups in
CV timed waits such as std::condition_variable::wait_for or wait_until,
resulting generally in much higher CPU usage.
This can be demonstrated by this sample program:
```
int main() {
std::condition_variable cv;
std::mutex mx;
bool pass = false;
auto thread_fn = [&](bool timed) {
int wakeups = 0;
using sc = std::chrono::system_clock;
auto before = sc::now();
std::unique_lock<std::mutex> ml(mx);
if (timed) {
cv.wait_for(ml, std::chrono::seconds(2), [&]{
++wakeups;
return pass;
});
} else {
cv.wait(ml, [&]{
++wakeups;
return pass;
});
}
printf("pass: %d; wakeups: %d; elapsed: %d ms\n", pass, wakeups,
int((sc::now() - before) / std::chrono::milliseconds(1)));
pass = false;
};
{
// timed wait, let expire
std::thread t(thread_fn, true);
t.join();
}
{
// timed wait, wake up explicitly after 1 second
std::thread t(thread_fn, true);
std::this_thread::sleep_for(std::chrono::seconds(1));
{
std::unique_lock<std::mutex> ml(mx);
pass = true;
}
cv.notify_all();
t.join();
}
{
// non-timed wait, wake up explicitly after 1 second
std::thread t(thread_fn, false);
std::this_thread::sleep_for(std::chrono::seconds(1));
{
std::unique_lock<std::mutex> ml(mx);
pass = true;
}
cv.notify_all();
t.join();
}
return 0;
}
```
On builds based on non-affected threading models (e.g. POSIX on Linux,
or winpthreads or MCF on Win32) the output is something like
```
pass: 0; wakeups: 2; elapsed: 2000 ms
pass: 1; wakeups: 2; elapsed: 991 ms
pass: 1; wakeups: 2; elapsed: 996 ms
```
while with the Win32 threading model we get
```
pass: 0; wakeups: 1418; elapsed: 2000 ms
pass: 1; wakeups: 479; elapsed: 988 ms
pass: 1; wakeups: 2; elapsed: 992 ms
```
(notice the huge number of wakeups in the timed wait cases only).
This commit fixes the conversion, adjusting the final division by
NSEC100_PER_SEC to use NSEC100_PER_MSEC instead (already defined in the
file and not used in any other place, so probably just a typo).
libgcc/ChangeLog:
PR libgcc/113850
* config/i386/gthr-win32-cond.c (__gthr_win32_abs_to_rel_time):
fix absolute timespec to relative milliseconds count
conversion (it incorrectly returned seconds instead of
milliseconds); this avoids spurious wakeups in
__gthr_win32_cond_timedwait
Add x32 and IBT support to x86 heap trampoline implementation with a
testcase.
2024-02-13 Jakub Jelinek <jakub@redhat.com>
H.J. Lu <hjl.tools@gmail.com>
libgcc/
PR target/113855
* config/i386/heap-trampoline.c (trampoline_insns): Add IBT
support and pad to the multiple of 4 bytes. Use movabsq
instead of movabs in comments. Add -mx32 variant.
gcc/testsuite/
PR target/113855
* gcc.dg/heap-trampoline-1.c: New test.
* lib/target-supports.exp (check_effective_target_heap_trampoline):
New.
As I wrote earlier, I was seeing
FAIL: gcc.dg/torture/bitint-24.c -O0 execution test
FAIL: gcc.dg/torture/bitint-24.c -O2 execution test
with the ia32 _BitInt enablement patch on i686-linux. I thought
floatbitintxf.c was miscompiled with -O2 -march=i686 -mtune=generic, but it
turned out to be UB in it.
If a signed _BitInt to be converted to binary floating point has
(after sign extension from possible partial limb to full limb) one or
more most significant limbs equal to all ones and then in the limb below
(the most significant non-~(UBILtype)0 limb) has the most significant limb
cleared, like for 32-bit limbs
0x81582c05U, 0x0a8b01e4U, 0xc1b8b18fU, 0x2aac2a08U, -1U, -1U
then bitint_reduce_prec can't reduce it to that 0x2aac2a08U limb, so
msb is all ones and precision is negative (so it reduced precision from
161 to 192 bits down to 160 bits, in theory could go as low as 129 bits
but that wouldn't change anything on the following behavior).
But still iprec is negative, -160 here.
For that case (i.e. where we are dealing with an negative input), the
code was using 65 - __builtin_clzll (~msb) to compute how many relevant
bits we have from the msb. Unfortunately that invokes UB for msb all ones.
The right number of relevant bits in that case is 1 though (like for
-2 it is 2 and -4 or -3 3 as already computed) - all we care about from that
is that the most significant bit is set (i.e. the number is negative) and
the bits below that should be supplied from the limbs below.
So, the following patch fixes it by special casing it not to invoke UB.
For msb 0 we already have a special case from before (but that is also
different because msb 0 implies the whole number is 0 given the way
bitint_reduce_prec works - even if we have limbs like ..., 0x80000000U, 0U
the reduction can skip the most significant limb and msb then would be
the one below it), so if iprec > 0, we already don't call __builtin_clzll
on 0.
2024-02-13 Jakub Jelinek <jakub@redhat.com>
* soft-fp/bitint.h (FP_FROM_BITINT): If iprec < 0 and msb is all ones,
just set n to 1 instead of using __builtin_clzll (~msb).
The initial heap trampoline implementation was targeting 64b
platforms. As the PR demonstrates this creates an issue where it
is expected that the same symbols are exported for 32 and 64b.
Rather than conditionalize the exports and code-gen on x86_64,
this patch provides a basic implementation of the IA32 trampoline.
This also avoids potential user confusion, when a 32b target has
64b multilibs, and vice versa; which is the case for Darwin.
PR target/113855
gcc/ChangeLog:
* config/i386/darwin.h (DARWIN_HEAP_T_LIB): Moved to be
available to all sub-targets.
* config/i386/darwin32-biarch.h (DARWIN_HEAP_T_LIB): Delete.
* config/i386/darwin64-biarch.h (DARWIN_HEAP_T_LIB): Delete.
libgcc/ChangeLog:
* config.host: Add trampoline support to x?86-linux.
* config/i386/heap-trampoline.c (trampoline_insns): Provide
a variant for IA32.
(union ix86_trampoline): Likewise.
(__gcc_nested_func_ptr_created): Implement a basic trampoline
for IA32.
The ia32 _BitInt support revealed a bug in floatbitint?d.c.
As can be even guessed from how the code is written in the loop,
the intention was to set inexact to non-zero whenever the remainder
after division wasn't zero, but I've ended up just checking whether
the 2 least significant limbs of the remainder were non-zero.
Now, in the dfp/bitint-4.c test in one case the remainder happens
to have least significant 64 bits zero and then the higher limbs are
non-zero; with 32-bit limbs that means 2 least significant limbs are zero
and so the code acted as if it was exactly divisible.
Fixed thusly.
2024-02-10 Jakub Jelinek <jakub@redhat.com>
* soft-fp/floatbitintdd.c (__bid_floatbitintdd): Or in all remainder
limbs into inexact rather than just first two.
* soft-fp/floatbitintsd.c (__bid_floatbitintsd): Likewise.
* soft-fp/floatbitinttd.c (__bid_floatbitinttd): Likewise.
I've tried last night to enable _BitInt support for i?86-linux, and
a few spots in libgcc emitted -Wshift-count-overflow warnings and clearly
didn't do what it was supposed to do.
Fixed thusly.
2024-02-10 Jakub Jelinek <jakub@redhat.com>
* soft-fp/fixddbitint.c (__bid_fixddbitint): Fix up
BIL_TYPE_SIZE == 32 shifts.
* soft-fp/fixsdbitint.c (__bid_fixsdbitint): Likewise.
* soft-fp/fixtdbitint.c (__bid_fixtdbitint): Likewise.
* soft-fp/floatbitintdd.c (__bid_floatbitintdd): Likewise.
* soft-fp/floatbitinttd.c (__bid_floatbitinttd): Likewise.
Some exports were missed from the GCC-13 cycle, these are added here
along with the bitint-related ones added in GCC-14.
libgcc/ChangeLog:
* config/i386/libgcc-darwin.ver: Export bf and bitint-related
synbols.
As reported in the PR, all libgcc x86 symbol versions added after
GCC_7.0.0 were only added to i386/libgcc-glibc.ver, missing all of
libgcc-sol2.ver, libgcc-bsd.ver, and libgcc-darwin.ver.
This patch fixes this for Solaris/x86, adding all of them
(GCC_1[234].0.0) as GCC_14.0.0 to not retroactively change history.
Since this isn't the first time this happens, I've added a note to the
end of libgcc-glibc.ver to request notifying other maintainers in case
of additions.
Tested on i386-pc-solaris2.11.
2024-02-01 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
libgcc:
PR target/113700
* config/i386/libgcc-sol2.ver (GCC_14.0.0): Added all symbols from
i386/libgcc-glibc.ver (GCC_12.0.0, GCC_13.0.0, GCC_14.0.0).
* config/i386/libgcc-glibc.ver: Request notifications on updates.
SEH _Unwind_Resume_or_Rethrow invokes abort directly if
_Unwind_RaiseException doesn't manage to find a handler for the rethrown
exception; this is incorrect, as in this case std::terminate should be
invoked, allowing an application-provided terminate handler to handle
the situation instead of straight crashing the application through
abort.
The bug can be demonstrated with this simple test case:
===
static void custom_terminate_handler() {
fprintf(stderr, "custom_terminate_handler invoked\n");
std::exit(1);
}
int main(int argc, char *argv[]) {
std::set_terminate(&custom_terminate_handler);
if (argc < 2) return 1;
const char *mode = argv[1];
fprintf(stderr, "%s\n", mode);
if (strcmp(mode, "throw") == 0) {
throw std::exception();
} else if (strcmp(mode, "rethrow") == 0) {
try {
throw std::exception();
} catch (...) {
throw;
}
} else {
return 1;
}
return 0;
}
===
On all gcc builds with non-SEH exceptions, this will print
"custom_terminate_handler invoked" both if launched as ./a.out throw or
as ./a.out rethrow, on SEH builds instead if will work as expected only
with ./a.exe throw, but will crash with the "built-in" abort message
with ./a.exe rethrow.
This patch fixes the problem, forwarding back the error code to the
caller (__cxa_rethrow), that calls std::terminate if
_Unwind_Resume_or_Rethrow returns.
The change makes the code path coherent with SEH _Unwind_RaiseException,
and with the generic _Unwind_Resume_or_Rethrow from libgcc/unwind.inc
(used for SjLj and Dw2 exception backend).
libgcc/ChangeLog:
PR libgcc/113337
* unwind-seh.c (_Unwind_Resume_or_Rethrow): forward
_Unwind_RaiseException return code back to caller instead of
calling abort, allowing __cxa_rethrow to invoke std::terminate
in case of uncaught rethrown exception
The following testcase ends up with SIGFPE in __divmodbitint4.
The problem is a thinko in my attempt to implement Knuth's algorithm.
The algorithm does (where b is 65536, i.e. one larger than what
fits in their unsigned short word):
// Compute estimate qhat of q[j].
qhat = (un[j+n]*b + un[j+n-1])/vn[n-1];
rhat = (un[j+n]*b + un[j+n-1]) - qhat*vn[n-1];
again:
if (qhat >= b || qhat*vn[n-2] > b*rhat + un[j+n-2])
{ qhat = qhat - 1;
rhat = rhat + vn[n-1];
if (rhat < b) goto again;
}
The problem is that it uses a double-word / word -> double-word
division (and modulo), while all we have is udiv_qrnnd unless
we'd want to do further library calls, and udiv_qrnnd is a
double-word / word -> word division and modulo.
Now, as the algorithm description says, it can produce at most
word bits + 1 bit quotient. And I believe that actually the
highest qhat the original algorithm can produce is
(1 << word_bits) + 1. The algorithm performs earlier canonicalization
where both the divisor and dividend are shifted left such that divisor
has msb set. If it has msb set already before, no shifting occurs but
we start with added 0 limb, so in the first uv1:uv0 double-word uv1
is 0 and so we can't get too high qhat, if shifting occurs, the first
limb of dividend is shifted right by UWtype bits - shift count into
a new limb, so again in the first iteration in the uv1:uv0 double-word
uv1 doesn't have msb set while vv1 does and qhat has to fit into word.
In the following iterations, previous iteration should guarantee that
the previous quotient digit is correct. Even if the divisor was the
maximal possible vv1:all_ones_in_all_lower_limbs, if the old uv0:lower_limbs
would be larger or equal to the divisor, the previous quotient digit
would increase and another divisor would be subtracted, which I think
implies that in the next iteration in uv1:uv0 double-word uv1 <= vv1,
but uv0 could be up to all ones, e.g. in case of all lower limbs
of divisor being all ones and at least one dividend limb below uv0
being not all ones. So, we can e.g. for 64-bit UWtype see
uv1:uv0 / vv1 0x8000000000000000UL:0xffffffffffffffffUL / 0x8000000000000000UL
or 0xffffffffffffffffUL:0xffffffffffffffffUL / 0xffffffffffffffffUL
In all these cases (when uv1 == vv1 && uv0 >= uv1), qhat is
0x10000000000000001UL, i.e. 2 more than fits into UWtype result,
if uv1 == vv1 && uv0 < uv1 it would be 0x10000000000000000UL, i.e.
1 more than fits into UWtype result.
Because we only have udiv_qrnnd which can't deal with those too large
cases (SIGFPEs or otherwise invokes undefined behavior on those), I've
tried to handle the uv1 >= vv1 case separately, but for one thing
I thought it would be at most 1 larger than what fits, and for two
have actually subtracted vv1:vv1 from uv1:uv0 instead of subtracting
0:vv1 from uv1:uv0.
For the uv1 < vv1 case, the implementation already performs roughly
what the algorithm does.
Now, let's see what happens with the two possible extra cases in
the original algorithm.
If uv1 == vv1 && uv0 < uv1, qhat above would be b, so we take
if (qhat >= b, decrement qhat by 1 (it becomes b - 1), add
vn[n-1] aka vv1 to rhat and goto again if rhat < b (but because
qhat already fits we can goto to the again label in the uv1 < vv1
code). rhat in this case is uv0 and rhat + vv1 can but doesn't
have to overflow, say for uv0 42UL and vv1 0x8000000000000000UL
it will not (and so we should goto again), while for uv0
0x8000000000000000UL and vv1 0x8000000000000001UL it will (and
we shouldn't goto again).
If uv1 == vv1 && uv0 >= uv1, qhat above would be b + 1, so we
take if (qhat >= b, decrement qhat by 1 (it becomes b), add
vn[n-1] aka vv1 to rhat. But because vv1 has msb set and
rhat in this case is uv0 - vv1, the rhat + vv1 addition
certainly doesn't overflow, because (uv0 - vv1) + vv1 is uv0,
so in the algorithm we goto again, again take if (qhat >= b and
decrement qhat so it finally becomes b - 1, and add vn[n-1]
aka vv1 to rhat again. But this time I believe it must always
overflow, simply because we added (uv0 - vv1) + vv1 + vv1 and
vv1 has msb set, so already vv1 + vv1 must overflow. And
because it overflowed, it will not goto again.
So, I believe the following patch implements this correctly, by
subtracting vv1 from uv1:uv0 double-word once, then comparing
again if uv1 >= vv1. If that is true, subtract vv1 from uv1:uv0
again and add 2 * vv1 to rhat, no __builtin_add_overflow is needed
as we know it always overflowed and so won't goto again.
If after the first subtraction uv1 < vv1, use __builtin_add_overflow
when adding vv1 to rhat, because it can but doesn't have to overflow.
I've added an extra testcase which tests the behavior of all the changed
cases, so it has a case where uv1:uv0 / vv1 is 1:1, where it is
1:0 and rhat + vv1 overflows and where it is 1:0 and rhat + vv1 does not
overflow, and includes tests also from Zdenek's other failing tests.
2024-02-02 Jakub Jelinek <jakub@redhat.com>
PR libgcc/113604
* libgcc2.c (__divmodbitint4): If uv1 >= vv1, subtract
vv1 from uv1:uv0 once or twice as needed, rather than
subtracting vv1:vv1.
* gcc.dg/torture/bitint-53.c: New test.
* gcc.dg/torture/bitint-55.c: New test.
Rainer pointed out that __PFX__ and __FIXPTPFX__ prefix replacement is done
solely for libgcc-std.ver.in and not for the *.ver files in config.
I've used the __PFX__ prefix even in config/i386/libgcc-glibc.ver because it
was used for similar symbols in libgcc-std.ver.in, and that results in those
symbols being STB_LOCAL in libgcc_s.so.1. Tests still work because gcc by
default uses -static-libgcc when linking (unlike g++ etc.), but would
have failed when using -shared-libgcc (but I see nothing in the testsuite
actually testing with -shared-libgcc, so am not adding tests).
With the patch, libgcc_s.so.1 now exports
__fixtfbitint@@GCC_14.0.0 FUNC GLOBAL DEFAULT
__fixxfbitint@@GCC_14.0.0 FUNC GLOBAL DEFAULT
__floatbitintbf@@GCC_14.0.0 FUNC GLOBAL DEFAULT
__floatbitinthf@@GCC_14.0.0 FUNC GLOBAL DEFAULT
__floatbitinttf@@GCC_14.0.0 FUNC GLOBAL DEFAULT
__floatbitintxf@@GCC_14.0.0 FUNC GLOBAL DEFAULT
on x86_64-linux which it wasn't before.
2024-02-02 Jakub Jelinek <jakub@redhat.com>
PR target/113700
* config/i386/libgcc-glibc.ver (GCC_14.0.0): Remove __PFX prefixes
from symbol names.
I'm seeing hundreds of
In file included from ../../../libgcc/libgcc2.c:56:
../../../libgcc/libgcc2.h:32:13: warning: conflicting types for built-in function ‘__gcc_nested_func_ptr_created’; expected ‘void(void *, void *, void *)’
+[-Wbuiltin-declaration-mismatch]
32 | extern void __gcc_nested_func_ptr_created (void *, void *, void **);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
warnings.
Either we need to add like in r14-6218
#pragma GCC diagnostic ignored "-Wbuiltin-declaration-mismatch"
(but in that case because of the libgcc2.h prototype (why is it there?)
it would need to be also with #pragma GCC diagnostic push/pop around),
or we could go with just following how the builtins are prototyped on the
compiler side and only cast to void ** when dereferencing (which is in
a single spot in each TU).
2024-02-01 Jakub Jelinek <jakub@redhat.com>
PR libgcc/113402
* libgcc2.h (__gcc_nested_func_ptr_created): Change type of last
argument from void ** to void *.
* config/i386/heap-trampoline.c (__gcc_nested_func_ptr_created):
Change type of dst from void ** to void * and cast dst to void **
before dereferencing it.
* config/aarch64/heap-trampoline.c (__gcc_nested_func_ptr_created):
Likewise.
I'm seeing
../../../libgcc/shared-object.mk:14: warning: overriding recipe for target 'heap-trampoline.o'
../../../libgcc/shared-object.mk:14: warning: ignoring old recipe for target 'heap-trampoline.o'
../../../libgcc/shared-object.mk:17: warning: overriding recipe for target 'heap-trampoline_s.o'
../../../libgcc/shared-object.mk:17: warning: ignoring old recipe for target 'heap-trampoline_s.o'
This patch fixes that.
2024-02-01 Jakub Jelinek <jakub@redhat.com>
PR libgcc/113403
* config/i386/t-heap-trampoline: Add to LIB2ADDEHSHARED
i386/heap-trampoline.c rather than aarch64/heap-trampoline.c.
Use aarch64-asm.h in asm code consistently, this was started in
commit c608ada288
Author: Zac Walker <zacwalker@microsoft.com>
CommitDate: 2024-01-23 15:32:30 +0000
Ifdef `.hidden`, `.type`, and `.size` pseudo-ops for `aarch64-w64-mingw32` target
But that commit failed to remove some existing markings from asm files,
which means some objects got double marked with gnu property notes.
libgcc/ChangeLog:
* config/aarch64/crti.S: Remove stack marking.
* config/aarch64/crtn.S: Remove stack marking, include aarch64-asm.h
* config/aarch64/lse.S: Remove stack and GNU property markings.
In order to handle system security constraints during GCC build
and test and that most platform versions cannot link to libgcc_eh
since the unwinder there is incompatible with the system one.
1. We make the support functions weak definitions.
2. We include them as a CRT for platform conditions that do not
allow libgcc_eh.
3. We ensure that the weak symbols are exported from DSOs (which
includes exes on Darwin) so that the dynamic linker will
pick one instance (which avoids duplication of trampoline
caches).
PR libgcc/113403
gcc/ChangeLog:
* config/darwin.h (DARWIN_SHARED_WEAK_ADDS, DARWIN_WEAK_CRTS): New.
(REAL_LIBGCC_SPEC): Move weak CRT handling to separate spec.
* config/i386/darwin.h (DARWIN_HEAP_T_LIB): New.
* config/i386/darwin32-biarch.h (DARWIN_HEAP_T_LIB): New.
* config/i386/darwin64-biarch.h (DARWIN_HEAP_T_LIB): New.
* config/rs6000/darwin.h (DARWIN_HEAP_T_LIB): New.
libgcc/ChangeLog:
* config.host: Build libheap_t.a for i686/x86_64 Darwin.
* config/aarch64/heap-trampoline.c (HEAP_T_ATTR): New.
(allocate_tramp_ctrl): Allow a target to build this as a weak def.
(__gcc_nested_func_ptr_created): Likewise.
* config/i386/heap-trampoline.c (HEAP_T_ATTR): New.
(allocate_tramp_ctrl): Allow a target to build this as a weak def.
(__gcc_nested_func_ptr_created): Likewise.
* config/t-darwin: Build libheap_t.a (a CRT with heap trampoline
support).
This removes the heap trampoline support functions from libgcc.a and
adds them to libgcc_eh.a. They are also present in libgcc_s.
PR libgcc/113403
libgcc/ChangeLog:
* config/aarch64/t-heap-trampoline: Move the heap trampoline
support functions from libgcc.a to libgcc_eh.a.
* config/i386/t-heap-trampoline: Likewise.
The symbols for the functions supporting heap-based trampolines were
exported at an incorrect symbol version, the following patch fixes that.
As requested in the PR, this also renames __builtin_nested_func_ptr* to
__gcc_nested_func_ptr*. In carrying our the rename, we move the builtins
to use DEF_EXT_LIB_BUILTIN.
PR libgcc/113402
gcc/ChangeLog:
* builtins.cc (expand_builtin): Handle BUILT_IN_GCC_NESTED_PTR_CREATED
and BUILT_IN_GCC_NESTED_PTR_DELETED.
* builtins.def (BUILT_IN_GCC_NESTED_PTR_CREATED,
BUILT_IN_GCC_NESTED_PTR_DELETED): Make these builtins LIB-EXT and
rename the library fallbacks to __gcc_nested_func_ptr_created and
__gcc_nested_func_ptr_deleted.
* doc/invoke.texi: Rename these to __gcc_nested_func_ptr_created
and __gcc_nested_func_ptr_deleted.
* tree-nested.cc (finalize_nesting_tree_1): Use builtin_explicit for
BUILT_IN_GCC_NESTED_PTR_CREATED and BUILT_IN_GCC_NESTED_PTR_DELETED.
* tree.cc (build_common_builtin_nodes): Build the
BUILT_IN_GCC_NESTED_PTR_CREATED and BUILT_IN_GCC_NESTED_PTR_DELETED local
builtins only for non-explicit.
libgcc/ChangeLog:
* config/aarch64/heap-trampoline.c: Rename
__builtin_nested_func_ptr_created to __gcc_nested_func_ptr_created and
__builtin_nested_func_ptr_deleted to __gcc_nested_func_ptr_deleted.
* config/i386/heap-trampoline.c: Likewise.
* libgcc2.h: Likewise.
* libgcc-std.ver.in (GCC_7.0.0): Likewise and then move
__gcc_nested_func_ptr_created and
__gcc_nested_func_ptr_deleted from this symbol version to ...
(GCC_14.0.0): ... this one.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
Co-authored-by: Jakub Jelinek <jakub@redhat.com>
This is enough to get gfx1030 and gfx1100 working; there are still some test
failures to investigate, and probably some tuning to do.
gcc/ChangeLog:
* config/gcn/gcn-opts.h (TARGET_PACKED_WORK_ITEMS): Add TARGET_RDNA3.
* config/gcn/gcn-valu.md (all_convert): New iterator.
(<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): New
define_expand, and rename the old one to ...
(*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): ... this.
(extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): Likewise, to ...
(extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): .. this.
(*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>): New.
* config/gcn/gcn.cc (gcn_global_address_p): Use "offsetbits" correctly.
(gcn_hsa_declare_function_name): Update the vgpr counting for gfx1100.
* config/gcn/gcn.md (<u>mulhisi3): Disable on RDNA3.
(<u>mulqihi3_scalar): Likewise.
libgcc/ChangeLog:
* config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Handle RDNA3.
libgomp/ChangeLog:
* config/gcn/time.c (RTC_TICKS): Configure RDNA3.
(omp_get_wtime): Add RDNA3-compatible variant.
* plugin/plugin-gcn.c (max_isa_vgprs): Tune for gfx1030 and gfx1100.
Signed-off-by: Andrew Stubbs <ams@baylibre.com>
Recent
change (https://gcc.gnu.org/pipermail/gcc-cvs/2023-December/394915.html)
added a generic SME support using `.hidden`, `.type`, and ``.size`
pseudo-ops in the assembly sources, `aarch64-w64-mingw32` does not
support the pseudo-ops though. This patch wraps usage of those
pseudo-ops using macros and ifdefs them for `__ELF__` define.
libgcc/
* config/aarch64/aarch64-asm.h (HIDDEN, SYMBOL_SIZE, SYMBOL_TYPE)
(ENTRY_ALIGN, GNU_PROPERTY): New macros.
* config/aarch64/__arm_sme_state.S: Use them.
* config/aarch64/__arm_tpidr2_save.S: Likewise.
* config/aarch64/__arm_za_disable.S: Likewise.
* config/aarch64/crti.S: Likewise.
* config/aarch64/lse.S: Likewise.
As discussed on IRC, the following patch uses may_alias attribute, so that
on targets like aarch64 where abi_limb_mode != limb_mode the library
accesses the limbs (half limbs of the ABI) in the arrays with conservative
alias set.
2024-01-12 Jakub Jelinek <jakub@redhat.com>
* libgcc2.h (UBILtype): New typedef with may_alias attribute.
(__mulbitint3, __divmodbitint4): Use UBILtype * instead of
UWtype * and const UBILtype * instead of const UWtype *.
* libgcc2.c (bitint_reduce_prec, bitint_mul_1, bitint_addmul_1,
__mulbitint3, bitint_negate, bitint_submul_1, __divmodbitint4):
Likewise.
* soft-fp/bitint.h (UBILtype): Change define into a typedef with
may_alias attribute.
Exception handling on nios2-linux-gnu with -fpic has been broken since
revision 790854ea76, "Use _dl_find_object
in _Unwind_Find_FDE". For whatever reason, this doesn't work on nios2.
Nios2 uses the GOT address as the base for DW_EH_PE_datarel
relocations in PIC; see my previous fix to make this work, revision
2d33dcfe9f, "Support for GOT-relative
DW_EH_PE_datarel encoding". So this may be a horrible bug in the ABI
or in my interpretation of it or just glibc's implementation of
_dl_find_object for this target, but there's existing code out there
that does things this way; and realistically, nobody is going to
re-engineer this now that the vendor has EOL'ed the nios2
architecture. So, just skip over the code trying to use
_dl_find_object on this target and fall back to the way that works.
I plan to backport this patch to the GCC 12 and GCC 13 branches as well.
libgcc/ChangeLog
* unwind-dw2-fde-dip.c (_Unwind_Find_FDE): Do not try to use
_dl_find_object on nios2; it doesn't work.
For now, for single-threaded GCN, nvptx target use only; extension for
multi-threaded offloading use is to follow later. Eventually switch to
libstdc++-v3/libsupc++ proper.
libgcc/
* c++-minimal/README: New.
* c++-minimal/guard.c: New.
* config/gcn/t-amdgcn (LIB2ADD): Add it.
* config/nvptx/t-nvptx (LIB2ADD): Likewise.
If we allow __strub_leave to allocate a frame on sparc, it will
overlap with a lot of the stack range we're supposed to scrub, because
of the large fixed-size outgoing args and register save area.
Unfortunately, setting up the PIC register seems to prevent the frame
pointer from being omitted.
Since the strub runtime doesn't issue calls or use global variables,
at least on sparc, disabling PIC to compile strub.c seems to do the
right thing.
for libgcc/ChangeLog
PR middle-end/112917
* config.host (sparc, sparc64): Enable...
* config/sparc/t-sparc: ... this new fragment.
The strub builtins are not suited for cross-unit inlining, they should
only be inlined by the builtin expanders, if at all. While testing on
sparc64, it occurred to me that, if libgcc was built with LTO enabled,
lto1 might inline them, and that would likely break things. So, make
sure they're clearly marked as not inlinable.
for libgcc/ChangeLog
* strub.c (ATTRIBUTE_NOINLINE): New.
(ATTRIBUTE_STRUB_CALLABLE): Add it.
(__strub_dummy_force_no_leaf): Drop it.
This adds initial support for function multiversioning on aarch64 using
the target_version and target_clones attributes. This loosely follows
the Beta specification in the ACLE [1], although with some differences
that still need to be resolved (possibly as follow-up patches).
Existing function multiversioning implementations are broken in various
ways when used across translation units. This includes placing
resolvers in the wrong translation units, and using symbol mangling that
callers to unintentionally bypass the resolver in some circumstances.
Fixing these issues for aarch64 will require modifications to our ACLE
specification. It will also require further adjustments to existing
middle end code, to facilitate different mangling and resolver
placement while preserving existing target behaviours.
The list of function multiversioning features specified in the ACLE is
also inconsistent with the list of features supported in target option
extensions. I intend to resolve some or all of these inconsistencies at
a later stage.
The target_version attribute is currently only supported in C++, since
this is the only frontend with existing support for multiversioning
using the target attribute. On the other hand, this patch happens to
enable multiversioning with the target_clones attribute in Ada and D, as
well as the entire C family, using their existing frontend support.
This patch also does not support the following aspects of the Beta
specification:
- The target_clones attribute should allow an implicit unlisted
"default" version.
- There should be an option to disable function multiversioning at
compile time.
- Unrecognised target names in a target_clones attribute should be
ignored (with an optional warning). This current patch raises an
error instead.
[1] https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning
gcc/ChangeLog:
* config/aarch64/aarch64-feature-deps.h (fmv_deps_<FEAT_NAME>):
Define aarch64_feature_flags mask foreach FMV feature.
* config/aarch64/aarch64-option-extensions.def: Use new macros
to define FMV feature extensions.
* config/aarch64/aarch64.cc (aarch64_option_valid_attribute_p):
Check for target_version attribute after processing target
attribute.
(aarch64_fmv_feature_data): New.
(aarch64_parse_fmv_features): New.
(aarch64_process_target_version_attr): New.
(aarch64_option_valid_version_attribute_p): New.
(get_feature_mask_for_version): New.
(compare_feature_masks): New.
(aarch64_compare_version_priority): New.
(build_ifunc_arg_type): New.
(make_resolver_func): New.
(add_condition_to_bb): New.
(dispatch_function_versions): New.
(aarch64_generate_version_dispatcher_body): New.
(aarch64_get_function_versions_dispatcher): New.
(aarch64_common_function_versions): New.
(aarch64_mangle_decl_assembler_name): New.
(TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P): New implementation.
(TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE): New implementation.
(TARGET_OPTION_FUNCTION_VERSIONS): New implementation.
(TARGET_COMPARE_VERSION_PRIORITY): New implementation.
(TARGET_GENERATE_VERSION_DISPATCHER_BODY): New implementation.
(TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New implementation.
(TARGET_MANGLE_DECL_ASSEMBLER_NAME): New implementation.
* config/aarch64/aarch64.h (TARGET_HAS_FMV_TARGET_ATTRIBUTE):
Set target macro.
* config/arm/aarch-common.h (enum aarch_parse_opt_result): Add
new value to report duplicate FMV feature.
* common/config/aarch64/cpuinfo.h: New file.
libgcc/ChangeLog:
* config/aarch64/cpuinfo.c (enum CPUFeatures): Move to shared
copy in gcc/common
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/options_set_17.c: Reorder expected flags.
* gcc.target/aarch64/cpunative/native_cpu_0.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_13.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_16.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_17.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_18.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_19.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_20.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_21.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_6.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_7.c: Ditto.
This is added to enable function multiversioning, but can also be used
directly. The interface is chosen to match that used in LLVM's
compiler-rt, to facilitate cross-compiler compatibility.
The content of the patch is derived almost entirely from Pavel's prior
contributions to compiler-rt/lib/builtins/cpu_model.c. I have made minor
changes to align more closely with GCC coding style, and to exclude any code
from other LLVM contributors, and am adding this to GCC with Pavel's approval.
libgcc/ChangeLog:
* config/aarch64/t-aarch64: Include cpuinfo.c
* config/aarch64/cpuinfo.c: New file
(__init_cpu_features_constructor) New.
(__init_cpu_features_resolver) New.
(__init_cpu_features) New.
Co-authored-by: Pavel Iliin <Pavel.Iliin@arm.com>
This patch try to introduce the rwlock and split the read/write to
unit_root tree and unit_cache with rwlock instead of the mutex to
increase CPU efficiency. In the get_gfc_unit function, the percentage
to step into the insert_unit function is around 30%, in most instances,
we can get the unit in the phase of reading the unit_cache or unit_root
tree. So split the read/write phase by rwlock would be an approach to
make it more parallel.
BTW, the IPC metrics can gain around 9x in our test
server with 220 cores. The benchmark we used is
https://github.com/rwesson/NEAT
libgcc/ChangeLog:
* gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
(__gthrw): New function.
(__gthread_rwlock_rdlock): New function.
(__gthread_rwlock_tryrdlock): New function.
(__gthread_rwlock_wrlock): New function.
(__gthread_rwlock_trywrlock): New function.
(__gthread_rwlock_unlock): New function.
libgfortran/ChangeLog:
* io/async.c (DEBUG_LINE): New macro.
* io/async.h (RWLOCK_DEBUG_ADD): New macro.
(CHECK_RDLOCK): New macro.
(CHECK_WRLOCK): New macro.
(TAIL_RWLOCK_DEBUG_QUEUE): New macro.
(IN_RWLOCK_DEBUG_QUEUE): New macro.
(RDLOCK): New macro.
(WRLOCK): New macro.
(RWUNLOCK): New macro.
(RD_TO_WRLOCK): New macro.
(INTERN_RDLOCK): New macro.
(INTERN_WRLOCK): New macro.
(INTERN_RWUNLOCK): New macro.
* io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
a comment.
(unit_lock): Remove including associated internal_proto.
(unit_rwlock): New declarations including associated internal_proto.
(dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
instead of __gthread_mutex_lock and __gthread_mutex_unlock on
unit_lock.
* io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on
unit_rwlock instead of LOCK and UNLOCK on unit_lock.
(st_write_done_worker): Likewise.
* io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
comment. Use unit_rwlock variable instead of unit_lock variable.
(get_gfc_unit_from_unit_root): New function.
(get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
instead of LOCK and UNLOCK on unit_lock.
(close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of
LOCK and UNLOCK on unit_lock.
(close_units): Likewise.
(newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
unit_lock.
* io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
instead of LOCK and UNLOCK on unit_lock.
(flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
of LOCK and UNLOCK on unit_lock.
Some targets do not provide a prototype for fork, and compilation now
fails with an implicit-function-declaration error.
libgcc/
* libgcov-interface.c (__gcov_fork): Use __builtin_fork instead
of fork.
It was updated incorrectly in
commit dbbfb52b0e
Author: Szabolcs Nagy <szabolcs.nagy@arm.com>
CommitDate: 2023-12-08 11:29:06 +0000
libgcc: aarch64: Configure check for __getauxval
so regenerate it.
libgcc/ChangeLog:
* config.in: Regenerate.
To support the ZA lazy save scheme, the PCS requires the unwinder to
reset the SME state to PSTATE.SM=0, PSTATE.ZA=0, TPIDR2_EL0=0 on entry
to an exception handler. We use the __arm_za_disable SME runtime call
unconditionally to achieve this.
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#exceptions
The hidden alias is used to avoid a PLT and avoid inconsistent VPCS
marking (we don't rely on special PCS at the call site). In case of
static linking the SME runtime init code is linked in code that raises
exceptions.
libgcc/ChangeLog:
* config/aarch64/__arm_za_disable.S: Add hidden alias.
* config/aarch64/aarch64-unwind.h: Reset the SME state before
EH return via the _Unwind_Frames_Extra hook.
The call ABI for SME (Scalable Matrix Extension) requires a number of
helper routines which are added to libgcc so they are tied to the
compiler version instead of the libc version. See
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#sme-support-routines
The routines are in shared libgcc and static libgcc eh, even though
they are not related to exception handling. This is to avoid linking
a copy of the routines into dynamic linked binaries, because TPIDR2_EL0
block can be extended in the future which is better to handle in a
single place per process.
The support routines have to decide if SME is accessible or not. Linux
tells userspace if SME is accessible via AT_HWCAP2, otherwise a new
__aarch64_sme_accessible symbol was introduced that a libc can define.
Due to libgcc and libc build order, the symbol availability cannot be
checked so for __aarch64_sme_accessible an unistd.h feature test macro
is used while such detection mechanism is not available for __getauxval
so we rely on configure checks based on the target triplet.
Asm helper code is added to make writing the routines easier.
libgcc/ChangeLog:
* config/aarch64/t-aarch64: Add sources to the build.
* config/aarch64/__aarch64_have_sme.c: New file.
* config/aarch64/__arm_sme_state.S: New file.
* config/aarch64/__arm_tpidr2_restore.S: New file.
* config/aarch64/__arm_tpidr2_save.S: New file.
* config/aarch64/__arm_za_disable.S: New file.
* config/aarch64/aarch64-asm.h: New file.
* config/aarch64/libgcc-sme.ver: New file.
Add configure check for the __getauxval ABI symbol, which is always
available on aarch64 glibc, and may be available on other linux C
runtimes. For now only enabled on glibc, others have to override it
target_configargs=libgcc_cv_have___getauxval=yes
This is deliberately obscure as it should be auto detected, ideally
via a feature test macro in unistd.h (link time detection is not
possible since the libc may not be installed at libgcc build time),
but currently there is no such feature test mechanism.
Without __getauxval, libgcc cannot do runtime CPU feature detection
and has to assume only the build time known features are available.
libgcc/ChangeLog:
* config.in: Undef HAVE___GETAUXVAL.
* configure: Regenerate.
* configure.ac: Check for __getauxval.
Ideally SME support routines in libgcc are marked as variant PCS symbols
so check if as supports the directive.
libgcc/ChangeLog:
* config.in: Undef HAVE_AS_VARIANT_PCS.
* configure: Regenerate.
* configure.ac: Check for .variant_pcs.
When libgcc is being built in --disable-tls configuration or on
a target without native TLS support, one gets annoying warnings:
../../../../libgcc/emutls.c:61:7: warning: conflicting types for built-in function ‘__emutls_get_address’; expected ‘void *(void *)’ [-Wbuiltin-declaration-mismatch]
61 | void *__emutls_get_address (struct __emutls_object *);
| ^~~~~~~~~~~~~~~~~~~~
../../../../libgcc/emutls.c:63:6: warning: conflicting types for built-in function ‘__emutls_register_common’; expected ‘void(void *, unsigned int, unsigned int, void *)’
+[-Wbuiltin-declaration-mismatch]
63 | void __emutls_register_common (struct __emutls_object *, word, word, void *);
| ^~~~~~~~~~~~~~~~~~~~~~~~
../../../../libgcc/emutls.c:140:1: warning: conflicting types for built-in function ‘__emutls_get_address’; expected ‘void *(void *)’ [-Wbuiltin-declaration-mismatch]
140 | __emutls_get_address (struct __emutls_object *obj)
| ^~~~~~~~~~~~~~~~~~~~
../../../../libgcc/emutls.c:204:1: warning: conflicting types for built-in function ‘__emutls_register_common’; expected ‘void(void *, unsigned int, unsigned int, void *)’
+[-Wbuiltin-declaration-mismatch]
204 | __emutls_register_common (struct __emutls_object *obj,
| ^~~~~~~~~~~~~~~~~~~~~~~~
The thing is that in that case __emutls_get_address and
__emutls_register_common are builtins, and are declared with void *
arguments rather than struct __emutls_object *.
Now, struct __emutls_object is a type private to libgcc/emutls.c and the
middle-end creates on demand when calling the builtins a similar structure
(with small differences, like not having the union in there).
We have a precedent for this e.g. for fprintf or strftime builtins where
the builtins are created with magic fileptr_type_node or const_tm_ptr_type_node
types and then match it with user definition of pointers to some structure,
but I think for this case users should never define these functions
themselves nor call them and having special types for them in the compiler
would mean extra compile time spent during compiler initialization and more
GC data, so I think it is better to keep the compiler as is.
On the library side, there is an option to just follow what the
compiler is doing and do
EMUTLS_ATTR void
-__emutls_register_common (struct __emutls_object *obj,
+__emutls_register_common (void *xobj,
word size, word align, void *templ)
{
+ struct __emutls_object *obj = (struct __emutls_object *) xobj;
but that will make e.g. libabigail complain about ABI change in libgcc.
So, the patch just turns the warning off.
2023-12-06 Thomas Schwinge <thomas@codesourcery.com>
Jakub Jelinek <jakub@redhat.com>
PR libgcc/109289
* emutls.c: Add GCC diagnostic ignored "-Wbuiltin-declaration-mismatch"
pragma.
This patch adds the strub attribute for function and variable types,
command-line options, passes and adjustments to implement it,
documentation, and tests.
Stack scrubbing is implemented in a machine-independent way: functions
with strub enabled are modified so that they take an extra stack
watermark argument, that they update with their stack use, and the
caller can then zero it out once it regains control, whether by return
or exception. There are two ways to go about it: at-calls, that
modifies the visible interface (signature) of the function, and
internal, in which the body is moved to a clone, the clone undergoes
the interface change, and the function becomes a wrapper, preserving
its original interface, that calls the clone and then clears the stack
used by it.
Variables can also be annotated with the strub attribute, so that
functions that read from them get stack scrubbing enabled implicitly,
whether at-calls, for functions only usable within a translation unit,
or internal, for functions whose interfaces must not be modified.
There is a strict mode, in which functions that have their stack
scrubbed can only call other functions with stack-scrubbing
interfaces, or those explicitly marked as callable from strub
contexts, so that an entire call chain gets scrubbing, at once or
piecemeal depending on optimization levels. In the default mode,
relaxed, this requirement is not enforced by the compiler.
The implementation adds two IPA passes, one that assigns strub modes
early on, another that modifies interfaces and adds calls to the
builtins that jointly implement stack scrubbing. Another builtin,
that obtains the stack pointer, is added for use in the implementation
of the builtins, whether expanded inline or called in libgcc.
There are new command-line options to change operation modes and to
force the feature disabled; it is enabled by default, but it has no
effect and is implicitly disabled if the strub attribute is never
used. There are also options meant to use for testing the feature,
enabling different strubbing modes for all (viable) functions.
for gcc/ChangeLog
* Makefile.in (OBJS): Add ipa-strub.o.
(GTFILES): Add ipa-strub.cc.
* builtins.def (BUILT_IN_STACK_ADDRESS): New.
(BUILT_IN___STRUB_ENTER): New.
(BUILT_IN___STRUB_UPDATE): New.
(BUILT_IN___STRUB_LEAVE): New.
* builtins.cc: Include ipa-strub.h.
(STACK_STOPS, STACK_UNSIGNED): Define.
(expand_builtin_stack_address): New.
(expand_builtin_strub_enter): New.
(expand_builtin_strub_update): New.
(expand_builtin_strub_leave): New.
(expand_builtin): Call them.
* common.opt (fstrub=*): New options.
* doc/extend.texi (strub): New type attribute.
(__builtin_stack_address): New function.
(Stack Scrubbing): New section.
* doc/invoke.texi (-fstrub=*): New options.
(-fdump-ipa-*): New passes.
* gengtype-lex.l: Ignore multi-line pp-directives.
* ipa-inline.cc: Include ipa-strub.h.
(can_inline_edge_p): Test strub_inlinable_to_p.
* ipa-split.cc: Include ipa-strub.h.
(execute_split_functions): Test strub_splittable_p.
* ipa-strub.cc, ipa-strub.h: New.
* passes.def: Add strub_mode and strub passes.
* tree-cfg.cc (gimple_verify_flow_info): Note on debug stmts.
* tree-pass.h (make_pass_ipa_strub_mode): Declare.
(make_pass_ipa_strub): Declare.
(make_pass_ipa_function_and_variable_visibility): Fix
formatting.
* tree-ssa-ccp.cc (optimize_stack_restore): Keep restores
before strub leave.
* attribs.cc: Include ipa-strub.h.
(decl_attributes): Support applying attributes to function
type, rather than pointer type, at handler's request.
(comp_type_attributes): Combine strub_comptypes and target
comp_type results.
* doc/tm.texi.in (TARGET_STRUB_USE_DYNAMIC_ARRAY): New.
(TARGET_STRUB_MAY_USE_MEMSET): New.
* doc/tm.texi: Rebuilt.
* cgraph.h (symtab_node::reset): Add preserve_comdat_group
param, with a default.
* cgraphunit.cc (symtab_node::reset): Use it.
for gcc/c-family/ChangeLog
* c-attribs.cc: Include ipa-strub.h.
(handle_strub_attribute): New.
(c_common_attribute_table): Add strub.
for gcc/ada/ChangeLog
* gcc-interface/trans.cc: Include ipa-strub.h.
(gigi): Make internal decls for targets of compiler-generated
calls strub-callable too.
(build_raise_check): Likewise.
* gcc-interface/utils.cc: Include ipa-strub.h.
(handle_strub_attribute): New.
(gnat_internal_attribute_table): Add strub.
for gcc/testsuite/ChangeLog
* c-c++-common/strub-O0.c: New.
* c-c++-common/strub-O1.c: New.
* c-c++-common/strub-O2.c: New.
* c-c++-common/strub-O2fni.c: New.
* c-c++-common/strub-O3.c: New.
* c-c++-common/strub-O3fni.c: New.
* c-c++-common/strub-Og.c: New.
* c-c++-common/strub-Os.c: New.
* c-c++-common/strub-all1.c: New.
* c-c++-common/strub-all2.c: New.
* c-c++-common/strub-apply1.c: New.
* c-c++-common/strub-apply2.c: New.
* c-c++-common/strub-apply3.c: New.
* c-c++-common/strub-apply4.c: New.
* c-c++-common/strub-at-calls1.c: New.
* c-c++-common/strub-at-calls2.c: New.
* c-c++-common/strub-defer-O1.c: New.
* c-c++-common/strub-defer-O2.c: New.
* c-c++-common/strub-defer-O3.c: New.
* c-c++-common/strub-defer-Os.c: New.
* c-c++-common/strub-internal1.c: New.
* c-c++-common/strub-internal2.c: New.
* c-c++-common/strub-parms1.c: New.
* c-c++-common/strub-parms2.c: New.
* c-c++-common/strub-parms3.c: New.
* c-c++-common/strub-relaxed1.c: New.
* c-c++-common/strub-relaxed2.c: New.
* c-c++-common/strub-short-O0-exc.c: New.
* c-c++-common/strub-short-O0.c: New.
* c-c++-common/strub-short-O1.c: New.
* c-c++-common/strub-short-O2.c: New.
* c-c++-common/strub-short-O3.c: New.
* c-c++-common/strub-short-Os.c: New.
* c-c++-common/strub-strict1.c: New.
* c-c++-common/strub-strict2.c: New.
* c-c++-common/strub-tail-O1.c: New.
* c-c++-common/strub-tail-O2.c: New.
* c-c++-common/torture/strub-callable1.c: New.
* c-c++-common/torture/strub-callable2.c: New.
* c-c++-common/torture/strub-const1.c: New.
* c-c++-common/torture/strub-const2.c: New.
* c-c++-common/torture/strub-const3.c: New.
* c-c++-common/torture/strub-const4.c: New.
* c-c++-common/torture/strub-data1.c: New.
* c-c++-common/torture/strub-data2.c: New.
* c-c++-common/torture/strub-data3.c: New.
* c-c++-common/torture/strub-data4.c: New.
* c-c++-common/torture/strub-data5.c: New.
* c-c++-common/torture/strub-indcall1.c: New.
* c-c++-common/torture/strub-indcall2.c: New.
* c-c++-common/torture/strub-indcall3.c: New.
* c-c++-common/torture/strub-inlinable1.c: New.
* c-c++-common/torture/strub-inlinable2.c: New.
* c-c++-common/torture/strub-ptrfn1.c: New.
* c-c++-common/torture/strub-ptrfn2.c: New.
* c-c++-common/torture/strub-ptrfn3.c: New.
* c-c++-common/torture/strub-ptrfn4.c: New.
* c-c++-common/torture/strub-pure1.c: New.
* c-c++-common/torture/strub-pure2.c: New.
* c-c++-common/torture/strub-pure3.c: New.
* c-c++-common/torture/strub-pure4.c: New.
* c-c++-common/torture/strub-run1.c: New.
* c-c++-common/torture/strub-run2.c: New.
* c-c++-common/torture/strub-run3.c: New.
* c-c++-common/torture/strub-run4.c: New.
* c-c++-common/torture/strub-run4c.c: New.
* c-c++-common/torture/strub-run4d.c: New.
* c-c++-common/torture/strub-run4i.c: New.
* g++.dg/strub-run1.C: New.
* g++.dg/torture/strub-init1.C: New.
* g++.dg/torture/strub-init2.C: New.
* g++.dg/torture/strub-init3.C: New.
* gnat.dg/strub_attr.adb, gnat.dg/strub_attr.ads: New.
* gnat.dg/strub_ind.adb, gnat.dg/strub_ind.ads: New.
for libgcc/ChangeLog
* Makefile.in (LIB2ADD): Add strub.c.
* libgcc2.h (__strub_enter, __strub_update, __strub_leave):
Declare.
* strub.c: New.
* libgcc-std.ver.in (__strub_enter): Add to GCC_14.0.0.
(__strub_update, __strub_leave): Likewise.
read_encoded_value_with_base has an ifdef'd code path conditional on __FDPIC__
which was calling _Unwind_gnu_Find_got without a prototype. This naturally
caused various build failures.
This adds a suitable prototype.
Pushed to the trunk.
libgcc
* unwind-pe.h (_Unwind_gnu_Find_got): Add prototype.
The rx port has a bunch of what I presume are ABI compatibility functions in
libgcc. Those compatibility functions routines such as __eqdf2 from libgcc,
but without a prototype. This patch adds the missing prototypes.
libgcc/
* config/rx/rx-abi-functions.c (__ltdf2, __gtdf2): Add prototype.
(__ledf2, __gedf2, __eqdf2, __nedf2): Likewise.
(__ltsf2, __gtsf2, __lesf2, __gesf2, __eqsf2, __nesf2): Likewise.
Two issues prevent the frv-elf port from building after the C99 changes. First
the trampoline code emitted into libgcc has calls to exit, but no prototype.
Adding a trivial prototype for exit() into the macro fixes that little goof.
Second, frvbegin.c has a call to atexit, so a quick prototype is added into
frvbegin.c to fix that problem.
That's enough to get the compiler building again.
gcc/
* config/frv/frv.h (TRANSFER_FROM_TRAMPOLINE): Add prototype for exit.
libgcc/
* config/frv/frvbegin.c (atexit): Add prototype.
The libgcc-exported runtime component of control flow redundancy
hardening was missing symbol versioning information. Add it.
for libgcc/ChangeLog
* libgcc-std.ver.in (__hardcfr_check): Add to GCC_14.0.0.
__sync_val_compare_and_swap may be used on 128-bit types and either calls the
outline atomic code or uses an inline loop. On AArch64 LDXP is only atomic if
the value is stored successfully using STXP, but the current implementations
do not perform the store if the comparison fails. In this case the value
returned is not read atomically.
gcc/ChangeLog:
PR target/111404
* config/aarch64/aarch64.cc (aarch64_split_compare_and_swap):
For 128-bit store the loaded value and loop if needed.
libgcc/ChangeLog:
PR target/111404
* config/aarch64/lse.S (__aarch64_cas16_acq_rel): Execute STLXP using
either new value or loaded value.
My previous patch to add an implementation of __sync_syncrhonize with
a warning trips a testsuite failure in fortran (and possibly other
languages as well) as the framework expects no blank lines in the
output, but this warning was generating one. So remove the newline
from the end of the message and rely on the one added by the linker
instead.
Since we're there, remove the trailing period from the message as
well, since the convention seems to be not to have one.
libgcc/
* config/arm/lib1funcs.S (__sync_synchronize): Adjust warning message.
Prior to Armv6 there was no architected method to synchronize data
across processors. Armv6 saw the first introduction of
multi-processor support, using a CP15 operation; but this was
deprecated in Armv7 and is not supported on m-profile devices of any
form. Armv7 (and armv6-m) and later support data synchronization via
the DMB instruction.
This all leads to difficulties when linking programs as the user
generally needs to know which synchronization method is needed, but
there seems no easy way around this, when there are no OS-related
primitives available.
I've addressed this by adding multiple variants of __sync_synchronize
to libgcc, one for each of the above use cases. I've named these
__sync_synchronize_none, __sync_synchronize_cp15dmb and
__sync_synchronize_dmb. I've also added three specs files that can be
used to direct the linker to pick the appropriate implementation.
Using specs fragments for this is preferable to directing the user to
directly use --defsym as the latter has to be placed at the correct
position on the command line to be effective and the spec rule ensures
this automatically.
I've also added a default implementation of __sync_synchronize. The
default implementation will use DMB if that is available in the target
ISA, or fall back to a nul-implementation if it isn't. In the latter
case it will cause the linker (GNU LD) to emit a warning that
specifies how to pick a specific implementation. I've chosen not to
permit this default to use the CP15 solution as that has been
deprecated.
libgcc:
* config.host (arm*-*-eabi* | arm*-*-rtems*):
Add arm/t-sync to the makefile rules.
* config/arm/lib1funcs.S (__sync_synchronize_none)
(__sync_synchronize_cp15dmb, __sync_synchronize_dmb)
(__sync_synchronize): New functions.
* config/arm/t-sync: New file.
* config/arm/sync-none.specs: Likewise.
* config/arm/sync-dmb.specs: Likewise.
* config/arm/sync-cp15dmb.specs: Likewise.
The function __hardcfr_check_fail in hardcfr.c is internal and static
inline. It receives many arguments, which require more than five
registers to be passed in bpf-none-unknown targets. BPF is limited to
that number of registers to pass arguments, and therefore libgcc fails
to build in that target. This patch marks the function with the
always_inline attribute, fixing the bpf build.
Tested in bpf-unknown-none target and x86_64-linux-gnu host.
libgcc/ChangeLog:
* hardcfr.c (__hardcfr_check_fail): Mark as always_inline.
The code coverage support uses counters to determine which edges in the control
flow graph were executed. If a counter overflows, then the code coverage
information is invalid. Therefore the counter type should be a 64-bit integer.
In multi-threaded applications, it is important that the counter increments are
atomic. This is not the case by default. The user can enable atomic counter
increments through the -fprofile-update=atomic and
-fprofile-update=prefer-atomic options.
If the target supports 64-bit atomic operations, then everything is fine. If
not and -fprofile-update=prefer-atomic was chosen by the user, then non-atomic
counter increments will be used. However, if the target does not support the
required atomic operations and -fprofile-atomic=update was chosen by the user,
then a warning was issued and as a forced fallback to non-atomic operations was
done. This is probably not what a user wants. There is still hardware on the
market which does not have atomic operations and is used for multi-threaded
applications. A user which selects -fprofile-update=atomic wants consistent
code coverage data and not random data.
This patch removes the fallback to non-atomic operations for
-fprofile-update=atomic the target platform supports libatomic. To
mitigate potential performance issues an optimization for systems which
only support 32-bit atomic operations is provided. Here, the edge
counter increments are done like this:
low = __atomic_add_fetch_4 (&counter.low, 1, MEMMODEL_RELAXED);
high_inc = low == 0 ? 1 : 0;
__atomic_add_fetch_4 (&counter.high, high_inc, MEMMODEL_RELAXED);
In gimple_gen_time_profiler() this split operation cannot be used, since the
updated counter value is also required. Here, a library call is emitted. This
is not a performance issue since the update is only done if counters[0] == 0.
gcc/c-family/ChangeLog:
* c-cppbuiltin.cc (c_cpp_builtins): Define
__LIBGCC_HAVE_LIBATOMIC for libgcov.
gcc/ChangeLog:
* doc/invoke.texi (-fprofile-update): Clarify default method. Document
the atomic method behaviour.
* tree-profile.cc (enum counter_update_method): New.
(counter_update): Likewise.
(gen_counter_update): Use counter_update_method. Split the
atomic counter update in two 32-bit atomic operations if
necessary.
(tree_profiling): Select counter_update_method.
libgcc/ChangeLog:
* libgcov.h (GCOV_SUPPORTS_ATOMIC): Always define it.
Set it also to 1, if __LIBGCC_HAVE_LIBATOMIC is defined.
libgcc/config/avr/libf7/
* libf7-const.def [F7MOD_sinh_]: Add MiniMax polynomial.
* libf7.c (f7_sinh): Use it instead of (exp(x) - exp(-x)) / 2
when |x| < 0.5 to avoid loss of precision due to cancellation.
Check for non-zero denorm in __adddf3. Need to check both the upper and
lower 32-bit chunks of a 64-bit float for a non-zero value when
checking to see if the value is -0.
Fix __addsf3 when the sum exponent is exactly 0xff to ensure that
produces infinity and not nan.
Handle converting NaN/inf values between formats.
Handle underflow and overflow when truncating.
Write a replacement for __fixxfsi so that it does not raise extra
exceptions during an extra conversion from long double to double.
libgcc/
* config/m68k/lb1sf68.S (__adddf3): Properly check for non-zero denorm.
(__divdf3): Restore sign bit properly.
(__addsf3): Correct exponent check.
* config/m68k/fpgnulib.c (EXPMASK): Define.
(__extendsfdf2): Handle Inf and NaN properly.
(__truncdfsf2): Handle underflow and overflow correctly.
(__extenddfxf2): Handle underflow, denorms, Inf and NaN correctly.
(__truncxfdf2): Handle underflow and denorms correctly.
(__fixxfsi): Reimplement.
The following patch adds the missing
{unsigned ,}__int128 <-> _Decimal{32,64,128}
conversion support into libgcc.a on top of the _BitInt support
(doing it without that would be larger amount of code and I hope all
the targets which support __int128 will eventually support _BitInt,
after all it is a required part of C23) and because it is in libgcc.a
only, it doesn't hurt that much if it is added for some architectures
only in GCC 15.
Initially I thought about doing this on the compiler side, but doing
it on the library side seems to be easier and more -Os friendly.
The tests currently require bitint effective target, that can be
removed when all the int128 targets support bitint.
2023-11-09 Jakub Jelinek <jakub@redhat.com>
PR libgcc/65833
libgcc/
* config/t-softfp (softfp_bid_list): Add
{U,}TItype <-> _Decimal{32,64,128} conversions.
* soft-fp/floattisd.c: New file.
* soft-fp/floattidd.c: New file.
* soft-fp/floattitd.c: New file.
* soft-fp/floatuntisd.c: New file.
* soft-fp/floatuntidd.c: New file.
* soft-fp/floatuntitd.c: New file.
* soft-fp/fixsdti.c: New file.
* soft-fp/fixddti.c: New file.
* soft-fp/fixtdti.c: New file.
* soft-fp/fixunssdti.c: New file.
* soft-fp/fixunsddti.c: New file.
* soft-fp/fixunstdti.c: New file.
gcc/testsuite/
* gcc.dg/dfp/int128-1.c: New test.
* gcc.dg/dfp/int128-2.c: New test.
* gcc.dg/dfp/int128-3.c: New test.
* gcc.dg/dfp/int128-4.c: New test.
This adds support for the 'indirect' clause in the 'declare target'
directive. Functions declared as indirect may be called via function
pointers passed from the host in offloaded code.
Virtual calls to member functions via the object pointer in C++ are
currently not supported in target regions.
2023-11-07 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/c-family/
* c-attribs.cc (c_common_attribute_table): Add attribute for
indirect functions.
* c-pragma.h (enum parma_omp_clause): Add entry for indirect clause.
gcc/c/
* c-decl.cc (c_decl_attributes): Add attribute for indirect
functions.
* c-lang.h (c_omp_declare_target_attr): Add indirect field.
* c-parser.cc (c_parser_omp_clause_name): Handle indirect clause.
(c_parser_omp_clause_indirect): New.
(c_parser_omp_all_clauses): Handle indirect clause.
(OMP_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
(c_parser_omp_declare_target): Handle indirect clause. Emit error
message if device_type or indirect clauses used alone. Emit error
if indirect clause used with device_type that is not 'any'.
(OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
(c_parser_omp_begin): Handle indirect clause.
* c-typeck.cc (c_finish_omp_clauses): Handle indirect clause.
gcc/cp/
* cp-tree.h (cp_omp_declare_target_attr): Add indirect field.
* decl2.cc (cplus_decl_attributes): Add attribute for indirect
functions.
* parser.cc (cp_parser_omp_clause_name): Handle indirect clause.
(cp_parser_omp_clause_indirect): New.
(cp_parser_omp_all_clauses): Handle indirect clause.
(handle_omp_declare_target_clause): Add extra parameter. Add
indirect attribute for indirect functions.
(OMP_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
(cp_parser_omp_declare_target): Handle indirect clause. Emit error
message if device_type or indirect clauses used alone. Emit error
if indirect clause used with device_type that is not 'any'.
(OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
(cp_parser_omp_begin): Handle indirect clause.
* semantics.cc (finish_omp_clauses): Handle indirect clause.
gcc/
* lto-cgraph.cc (enum LTO_symtab_tags): Add tag for indirect
functions.
(output_offload_tables): Write indirect functions.
(input_offload_tables): read indirect functions.
* lto-section-names.h (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME): New.
* omp-builtins.def (BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR): New.
* omp-offload.cc (offload_ind_funcs): New.
(omp_discover_implicit_declare_target): Add functions marked with
'omp declare target indirect' to indirect functions list.
(omp_finish_file): Add indirect functions to section for offload
indirect functions.
(execute_omp_device_lower): Redirect indirect calls on target by
passing function pointer to BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR.
(pass_omp_device_lower::gate): Run pass_omp_device_lower if
indirect functions are present on an accelerator device.
* omp-offload.h (offload_ind_funcs): New.
* tree-core.h (omp_clause_code): Add OMP_CLAUSE_INDIRECT.
* tree.cc (omp_clause_num_ops): Add entry for OMP_CLAUSE_INDIRECT.
(omp_clause_code_name): Likewise.
* tree.h (OMP_CLAUSE_INDIRECT_EXPR): New.
* config/gcn/mkoffload.cc (process_asm): Process offload_ind_funcs
section. Count number of indirect functions.
(process_obj): Emit number of indirect functions.
* config/nvptx/mkoffload.cc (ind_func_ids, ind_funcs_tail): New.
(process): Emit offload_ind_func_table in PTX code. Emit indirect
function names and count in image.
* config/nvptx/nvptx.cc (nvptx_record_offload_symbol): Mark
indirect functions in PTX code with IND_FUNC_MAP.
gcc/testsuite/
* c-c++-common/gomp/declare-target-7.c: Update expected error message.
* c-c++-common/gomp/declare-target-indirect-1.c: New.
* c-c++-common/gomp/declare-target-indirect-2.c: New.
* g++.dg/gomp/attrs-21.C (v12): Update expected error message.
* g++.dg/gomp/declare-target-indirect-1.C: New.
* gcc.dg/gomp/attrs-21.c (v12): Update expected error message.
include/
* gomp-constants.h (GOMP_VERSION): Increment to 3.
(GOMP_VERSION_SUPPORTS_INDIRECT_FUNCS): New.
libgcc/
* offloadstuff.c (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME): New.
(__offload_ind_func_table): New.
(__offload_ind_funcs_end): New.
(__OFFLOAD_TABLE__): Add entries for indirect functions.
libgomp/
* Makefile.am (libgomp_la_SOURCES): Add target-indirect.c.
* Makefile.in: Regenerate.
* libgomp-plugin.h (GOMP_INDIRECT_ADDR_MAP): New define.
(GOMP_OFFLOAD_load_image): Add extra argument.
* libgomp.h (struct indirect_splay_tree_key_s): New.
(indirect_splay_tree_node, indirect_splay_tree,
indirect_splay_tree_key): New.
(indirect_splay_compare): New.
* libgomp.map (GOMP_5.1.1): Add GOMP_target_map_indirect_ptr.
* libgomp.texi (OpenMP 5.1): Update documentation on indirect
calls in target region and on indirect clause.
(Other new OpenMP 5.2 features): Add entry for virtual function calls.
* libgomp_g.h (GOMP_target_map_indirect_ptr): Add prototype.
* oacc-host.c (host_load_image): Add extra argument.
* target.c (gomp_load_image_to_device): If the GOMP_VERSION is high
enough, read host indirect functions table and pass to
load_image_func.
* config/accel/target-indirect.c: New.
* config/linux/target-indirect.c: New.
* config/gcn/team.c (build_indirect_map): Add prototype.
(gomp_gcn_enter_kernel): Initialize support for indirect
function calls on GCN target.
* config/nvptx/team.c (build_indirect_map): Add prototype.
(gomp_nvptx_main): Initialize support for indirect function
calls on NVPTX target.
* plugin/plugin-gcn.c (struct gcn_image_desc): Add field for
indirect functions count.
(GOMP_OFFLOAD_load_image): Add extra argument. If the GOMP_VERSION
is high enough, build address translation table and copy it to target
memory.
* plugin/plugin-nvptx.c (nvptx_tdata): Add field for indirect
functions count.
(GOMP_OFFLOAD_load_image): Add extra argument. If the GOMP_VERSION
is high enough, Build address translation table and copy it to target
memory.
* testsuite/libgomp.c-c++-common/declare-target-indirect-1.c: New.
* testsuite/libgomp.c-c++-common/declare-target-indirect-2.c: New.
* testsuite/libgomp.c++/declare-target-indirect-1.C: New.