Restore the behaviour in GCC 8 and earlier where _GLIBCXX_USE_FLOAT128
is not defined when configure detects support is missing. This avoids
having three states where the macro is either 1, 0, or undefined.
PR libstdc++/85672
* include/Makefile.am [!ENABLE_FLOAT128]: Change c++config.h entry
to #undef _GLIBCXX_USE_FLOAT128 instead of defining it to zero.
* include/Makefile.in: Regenerate.
* include/bits/c++config (_GLIBCXX_USE_FLOAT128): Move definition
within conditional block.
From-SVN: r260043
PR target/85572
* config/i386/i386.c (ix86_expand_sse2_abs): Handle E_V2DImode and
E_V4DImode.
* config/i386/sse.md (abs<mode>2): Use VI_AVX2 iterator instead of
VI1248_AVX512VL_AVX512BW. Handle V2DImode and V4DImode if not
TARGET_AVX512VL using ix86_expand_sse2_abs. Formatting fixes.
* g++.dg/other/sse2-pr85572-1.C: New test.
* g++.dg/other/sse2-pr85572-2.C: New test.
* g++.dg/other/sse4-pr85572-1.C: New test.
* g++.dg/other/avx2-pr85572-1.C: New test.
From-SVN: r260041
There are a number of places in parsecpu.awk where I've managed to get
the operator precedence between ! and 'in' incorrect (! binds more
tightly). In most cases this just makes a consistency test
ineffective, but in a few cases it means we fail to correctly diagnose
errors by the user (for example, when passing an invalid cpu or
architecture name to configure. This patch fixes all the cases I
could find, based on searching for all uses of the two operators in
the same expression. The tweak to the API of check_fpu is to bring it
into line with the other check functions - it now returns the result
rather than printing it directly. The caller now does the printing,
in the same way that the chkarch and chkcpu commands do.
PR target/85658
* config/arm/parsecpu.awk (check_cpu): Fix operator precedence.
(check_arch): Likewise.
(check_fpu): Return the result rather than printing it.
(end arch): Fix operator precedence.
(end cpu): Likewise.
(END): Print the result from check_fpu.
From-SVN: r260032
This patch adds SVE patterns that combine a PTRUE-predicated
comparison with a separate AND. The main benefit is for
optimising ANDs with the loop predicate, as in the testcase.
However, one of the potential drawbacks is that it triggers
even for cases in which two naturally-parallel comparisons
are ANDed together. Whether that's a win or a less will
depend on the schedule, but it has the potential to be a win
more often than a loss.
The combine patterns are undeniably ugly. One way of getting
around them would be to allow 1->1 "splits" when combining
2 instructions, as well as 1->2 splits when combining more
than 2 instructions (although that wouldn't really be a split).
Another would be to have a way of defining target-specific
rtx simplifications. branches/ARM/sve-branch has a prototype
implementation of that, but it would need some clean-up before being
ready to submit. It would also be good to make it closer to the
match.pd style.
Until then, I think what the combine patterns are doing is the
"correct" implementation given the current infrastructure.
2018-05-08 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* config/aarch64/aarch64-sve.md (*pred_cmp<cmp_op><mode>_combine)
(*pred_cmp<cmp_op><mode>, *fcm<cmp_op><mode>_and_combine)
(*fcmuo<mode>_and_combine, *fcm<cmp_op><mode>_and)
(*fcmuo<mode>_and): New patterns.
gcc/testsuite/
* gcc.target/aarch64/sve/vcond_6.c: Do not expect any ANDs.
XFAIL the BIC test.
* gcc.target/aarch64/sve/vcond_7.c: New test.
* gcc.target/aarch64/sve/vcond_7_run.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r260031
This patch rewrites the SVE comparison handling so that it uses
UNSPEC_MERGE_PTRUE for comparisons that are known to be predicated
on a PTRUE, for consistency with other patterns. Specific unspecs
are then only needed for truly predicated floating-point comparisons,
such as those used in the expansion of UNEQ for flag_trapping_math.
The patch also makes sure that the comparison expanders attach
a REG_EQUAL note to instructions that use UNSPEC_MERGE_PTRUE,
so passes can use that as an alternative to the unspec pattern.
(This happens automatically for optabs. The problem was that
this code emits instruction patterns directly.)
No specific benefit on its own, but it lays the groundwork for
the next patch.
2018-05-08 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* config/aarch64/iterators.md (UNSPEC_COND_LO, UNSPEC_COND_LS)
(UNSPEC_COND_HI, UNSPEC_COND_HS, UNSPEC_COND_UO): Delete.
(SVE_INT_CMP, SVE_FP_CMP): New code iterators.
(cmp_op, sve_imm_con): New code attributes.
(SVE_COND_INT_CMP, imm_con): Delete.
(cmp_op): Remove above unspecs from int attribute.
* config/aarch64/aarch64-sve.md (*vec_cmp<cmp_op>_<mode>): Rename
to...
(*cmp<cmp_op><mode>): ...this. Use UNSPEC_MERGE_PTRUE instead of
comparison-specific unspecs.
(*vec_cmp<cmp_op>_<mode>_ptest): Rename to...
(*cmp<cmp_op><mode>_ptest): ...this and adjust likewise.
(*vec_cmp<cmp_op>_<mode>_cc): Rename to...
(*cmp<cmp_op><mode>_cc): ...this and adjust likewise.
(*vec_fcm<cmp_op><mode>): Rename to...
(*fcm<cmp_op><mode>): ...this and adjust likewise.
(*vec_fcmuo<mode>): Rename to...
(*fcmuo<mode>): ...this and adjust likewise.
(*pred_fcm<cmp_op><mode>): New pattern.
* config/aarch64/aarch64.c (aarch64_emit_unop, aarch64_emit_binop)
(aarch64_emit_sve_ptrue_op, aarch64_emit_sve_ptrue_op_cc): New
functions.
(aarch64_unspec_cond_code): Remove handling of LTU, GTU, LEU, GEU
and UNORDERED.
(aarch64_gen_unspec_cond, aarch64_emit_unspec_cond): Delete.
(aarch64_emit_sve_predicated_cond): New function.
(aarch64_expand_sve_vec_cmp_int): Use aarch64_emit_sve_ptrue_op_cc.
(aarch64_emit_unspec_cond_or): Replace with...
(aarch64_emit_sve_or_conds): ...this new function. Use
aarch64_emit_sve_ptrue_op for the individual comparisons and
aarch64_emit_binop to OR them together.
(aarch64_emit_inverted_unspec_cond): Replace with...
(aarch64_emit_sve_inverted_cond): ...this new function. Use
aarch64_emit_sve_ptrue_op for the comparison and
aarch64_emit_unop to invert the result.
(aarch64_expand_sve_vec_cmp_float): Update after the above
changes. Use aarch64_emit_sve_ptrue_op for native comparisons.
From-SVN: r260029
sve/vcond_6.c was effectively testing a three-input logical operation,
since the result of BINOP needed to be ANDed with the loop predicate
before loading src[i]. This patch makes it really test a binary
operation instead. A later patch will add (and optimise) the
three-operand case.
2018-05-08 Richard Sandiford <richard.sandiford@linaro.org>
gcc/testsuite/
* gcc.target/aarch64/sve/vcond_6.c (LOOP): Unconditionally
load from src[i].
From-SVN: r260028
* decl2.c (determine_visibility): Don't mess with template arguments
from the containing scope.
(vague_linkage_p): Check DECL_ABSTRACT_P before looking at a 'tor
thunk.
From-SVN: r260017
* scanner.c (preprocessor_line): Call linemap_add after a line
directive that changes the current filename.
* gfortran.dg/linefile.f90: New test.
From-SVN: r260010
By performing the /= operation on a named local variable instead of a
temporary the copy made for the return value can be elided.
PR libstdc++/85671
* include/bits/fs_path.h (operator/): Permit copy elision.
* include/experimental/bits/fs_path.h (operator/): Likewise.
From-SVN: r260009
2018-05-07 Edward Smith-Rowland <3dw4rd@verizon.net>
Moar PR libstdc++/80506
* include/bits/random.tcc (gamma_distribution::__generate_impl()):
Fix magic number used in loop condition.
Actually put the file in.
Don't know what my problem is today...
From-SVN: r260008
2018-05-07 Edward Smith-Rowland <3dw4rd@verizon.net>
Moar PR libstdc++/80506
* include/bits/random.tcc (gamma_distribution::__generate_impl()):
Fix magic number used in loop condition.
From-SVN: r260004
2018-05-07 Edward Smith-Rowland <3dw4rd@verizon.net>
Moar PR libstdc++/80506
* include/bits/random.tcc (gamma_distribution::__generate_impl()):
Fix magic number used in loop condition.
From-SVN: r260001
The following patch adds an option to control software prefetching of memory
references with non-constant/unknown strides.
Currently we prefetch these references if the pass thinks there is benefit to
doing so. But, since this is all based on heuristics, it's not always the case
that we end up with better performance.
For Falkor there is also the problem of conflicts with the hardware prefetcher,
so we need to be more conservative in terms of what we issue software prefetch
hints for.
This also aligns GCC with what LLVM does for Falkor.
Similarly to the previous patch, the defaults guarantee no change in behavior
for other targets and architectures.
2018-05-07 Luis Machado <luis.machado@linaro.org>
gcc/
* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
<prefetch_dynamic_strides>: New const bool field.
* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
prefetch_dynamic_strides.
(exynosm1_prefetch_tune): Likewise.
(thunderxt88_prefetch_tune): Likewise.
(thunderx_prefetch_tune): Likewise.
(thunderx2t99_prefetch_tune): Likewise.
(qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to false.
(aarch64_override_options_internal): Update to set
PARAM_PREFETCH_DYNAMIC_STRIDES.
* doc/invoke.texi (prefetch-dynamic-strides): Document new option.
* params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New.
* params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define.
* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for
prefetch-dynamic-strides setting.
From-SVN: r259996
This patch adds a new option to control the minimum stride, for a memory
reference, after which the loop prefetch pass may issue software prefetch
hints for. There are two motivations:
* Make the pass less aggressive, only issuing prefetch hints for bigger strides
that are more likely to benefit from prefetching. I've noticed a case in cpu2017
where we were issuing thousands of hints, for example.
* For processors that have a hardware prefetcher, like Falkor, it allows the
loop prefetch pass to defer prefetching of smaller (less than the threshold)
strides to the hardware prefetcher instead. This prevents conflicts between
the software prefetcher and the hardware prefetcher.
I've noticed considerable reduction in the number of prefetch hints and
slightly positive performance numbers. This aligns GCC and LLVM in terms of
prefetch behavior for Falkor.
The default settings should guarantee no changes for existing targets. Those
are free to tweak the settings as necessary.
2018-05-07 Luis Machado <luis.machado@linaro.org>
Introduce option to limit software prefetching to known constant
strides above a specific threshold with the goal of preventing
conflicts with a hardware prefetcher.
gcc/
* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
<minimum_stride>: New const int field.
* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
minimum_stride field.
(exynosm1_prefetch_tune): Likewise.
(thunderxt88_prefetch_tune): Likewise.
(thunderx_prefetch_tune): Likewise.
(thunderx2t99_prefetch_tune): Likewise.
(qdf24xx_prefetch_tune): Likewise. Set minimum_stride to 2048.
(aarch64_override_options_internal): Update to set
PARAM_PREFETCH_MINIMUM_STRIDE.
* doc/invoke.texi (prefetch-minimum-stride): Document new option.
* params.def (PARAM_PREFETCH_MINIMUM_STRIDE): New.
* params.h (PARAM_PREFETCH_MINIMUM_STRIDE): Define.
* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Return false if
stride is constant and is below the minimum stride threshold.
From-SVN: r259995
2018-05-07 Tom de Vries <tom@codesourcery.com>
PR testsuite/85677
* testsuite/lib/libgomp.exp (libgomp_init): Move inclusion of top-level
include directory in ALWAYS_CFLAGS out of $blddir != "" condition.
From-SVN: r259992
PR c++/85659
* cfgexpand.c (expand_asm_stmt): Don't create a temporary if
the type is addressable. Don't force op into register if it has
BLKmode.
* g++.dg/ext/asm14.C: New test.
* g++.dg/ext/asm15.C: New test.
* g++.dg/ext/asm16.C: New test.
From-SVN: r259981
2018-05-06 Andrew Sadek <andrew.sadek.se@gmail.com>
* gcc.target/microblaze/others/picdtr.c: Add test for
-fPIE -mpic-data-is-text-relative.
From-SVN: r259975
gcc/
PR other/77609
* varasm.c (default_section_type_flags): Set SECTION_NOTYPE for
any section for which we don't know a specific type it should have,
regardless of name. Previously this was done only for the exact
names ".init_array", ".fini_array", and ".preinit_array".
(default_elf_asm_named_section): Add comment about
relationship with default_section_type_flags and SECTION_NOTYPE.
(get_section): Don't consider it a type conflict if one side has
SECTION_NOTYPE and the other doesn't, as long as neither has the
SECTION_BSS et al used in the default_section_type_flags logic.
From-SVN: r259969
2018-05-05 Paolo Carlini <paolo.carlini@oracle.com>
* cvt.c (ocp_convert): Early handle the special case of a
null_ptr_cst_p expr converted to a NULLPTR_TYPE_P type.
From-SVN: r259966
Add flag -fassume-phsa that is on by default. If -fno-assume-phsa
is given, these optimizations are disabled.
With this flag, gccbrig can generate GENERIC that assumes we are
targeting a phsa-runtime based implementation, which allows us
to expose the work-item context accesses to retrieve WI IDs etc.
which helps optimizers.
First optimization that takes advantage of this is to get rid of
the setworkitemid calls whenever we have non-inlined calls that
use IDs internally.
Other optimizations added in this commit:
- expand absoluteid to similar level of simplicity as workitemid.
At the moment absoluteid is the best indexing ID to end up with
WG vectorization.
- propagate ID variables closer to their uses. This is mainly
to avoid known useless casts, which confuse at least scalar
evolution analysis.
- use signed long long for storing IDs. Unsigned integers have
defined wraparound semantics, which confuse at least scalar
evolution analysis, leading to unvectorizable WI loops.
- also refactor some BRIG function generation helpers to brig_function.
- no point in having the wi-loop as a for-loop. It's really
a do...while and SCEV can analyze it just fine still.
- add consts to ptrs etc. in BRIG builtin defs.
Improves optimization opportunities.
- add qualifiers to generated function parameters.
Const and restrict on the hidden local/private pointers,
the arg buffer and the context pointer help some optimizations.
From-SVN: r259957