The vectorizable_* routines had many instances of:
slp_node || PURE_SLP_STMT (stmt_info)
which gives the misleading impression that we can have
!slp_node && PURE_SLP_STMT (stmt_info). In this context
it's really enough to test slp_node on its own.
There are three cases:
loop vectorisation only:
vectorizable_foo called only with !slp_node
pure SLP:
vectorizable_foo called only with slp_node
hybrid SLP:
(e.g. a vector that's used in SLP statements and also in a reduction)
- vectorizable_foo called once with slp_node for the SLP uses.
- vectorizable_foo called once with !slp_node for the non-SLP uses.
Hybrid SLP isn't possible for stores, so I added an explicit assert
for that.
I also made vectorizable_comparison static, to make it obvious that
no other callers outside tree-vect-stmts.c could use it with the
!slp && PURE_SLP_STMT combination.
Tested on aarch64-linux-gnu and x86_64-linux-gnu.
gcc/
* tree-vectorizer.h (vectorizable_comparison): Delete.
* tree-vect-loop.c (vectorizable_reduction): Remove redundant
PURE_SLP_STMT check.
* tree-vect-stmts.c (vectorizable_call): Likewise.
(vectorizable_simd_clone_call): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_assignment): Likewise.
(vectorizable_shift): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_load): Likewise.
(vectorizable_condition): Likewise.
(vectorizable_store): Likewise. Assert that we don't have
hybrid SLP.
(vectorizable_comparison): Make static. Remove redundant
PURE_SLP_STMT check.
(vect_transform_stmt): Assert that we always have an slp_node
if PURE_SLP_STMT.
From-SVN: r236642
* config/arm/neon.md (ashldi3_neon): Replace comparison of INTVAL of
operands[2] against 1 with comparison against CONST1_RTX.
(<shift>di3_neon): Likewise.
* config/arm/predicates.md (const0_operand): Replace with comparison
against CONST0_RTX.
From-SVN: r236641
* config/arm/arm.md (andsi3): Replace cast of 1 to HOST_WIDE_INT
with HOST_WIDE_INT_1.
(insv): Likewise.
* config/arm/arm.c (optimal_immediate_sequence): Replace cast of
1 to unsigned HOST_WIDE_INT with HOST_WIDE_INT_1U.
(arm_canonicalize_comparison): Likewise.
(thumb1_rtx_costs): Replace cast of 1 to HOST_WIDE_INT with
HOST_WIDE_INT_1.
(thumb1_size_rtx_costs): Likewise.
(vfp_const_double_index): Replace cast of 1 to unsigned
HOST_WIDE_INT with HOST_WIDE_INT_1U.
(get_jump_table_size): Replace cast of 1 to HOST_WIDE_INT with
HOST_WIDE_INT_1.
(arm_asan_shadow_offset): Replace cast of 1 to unsigned
HOST_WIDE_INT with HOST_WIDE_INT_1U.
* config/arm/neon.md (vec_set<mode>): Replace cast of 1 to
HOST_WIDE_INT with HOST_WIDE_INT_1.
From-SVN: r236638
PR target/69857
* config/arm/arm.c (gen_operands_ldrd_strd): Remove bogus early
return. Reindent transformation comment and mention the ARM state
behavior.
From-SVN: r236635
vectorizable_load forces peeling for gaps if the vectorisation factor
is not a multiple of the group size, since in that case we'd normally load
beyond the original scalar accesses but drop the excess elements as part
of a following permute:
if (loop_vinfo
&& ! STMT_VINFO_STRIDED_P (stmt_info)
&& (GROUP_GAP (vinfo_for_stmt (first_stmt)) != 0
|| (!slp && vf % GROUP_SIZE (vinfo_for_stmt (first_stmt)) != 0)))
This isn't necessary for LOAD_LANES though, since it loads only the
data needed and does the permute itself.
Tested on aarch64-linux-gnu and x86_64-linux-gnu.
gcc/
* tree-vect-stmts.c (vectorizable_load): Reorder checks so that
load_lanes/grouped_load classification comes first. Don't check
whether the vectorization factor is a multiple of the group size
for load_lanes.
gcc/testsuite/
* gcc.dg/vect/vect-load-lanes-peeling-1.c: New test.
From-SVN: r236632
vectorizable_load had a curious "force_peeling" variable, with no
comment explaining why we need it for single-element interleaving
but not for other cases. I think it's simply because we weren't
initialising the GROUP_GAP correctly for single loads.
Tested on aarch64-linux-gnu and x86_64-linux-gnu.
gcc/
* tree-vect-data-refs.c (vect_analyze_group_access_1): Set
GROUP_GAP for single-element interleaving.
* tree-vect-stmts.c (vectorizable_load): Remove force_peeling
variable.
From-SVN: r236631
2016-05-24 Richard Biener <rguenther@suse.de>
PR middle-end/70434
PR c/69504
c-family/
* c-common.h (convert_vector_to_pointer_for_subscript): Rename to ...
(convert_vector_to_array_for_subscript): ... this.
* c-common.c (convert_vector_to_pointer_for_subscript): Use a
VIEW_CONVERT_EXPR to an array type. Rename to ...
(convert_vector_to_array_for_subscript): ... this.
cp/
* expr.c (mark_exp_read): Handle VIEW_CONVERT_EXPR.
* constexpr.c (cxx_eval_array_reference): Handle indexed
vectors.
* typeck.c (cp_build_array_ref): Adjust.
c/
* c-typeck.c (build_array_ref): Do not complain about indexing
non-lvalue vectors. Adjust for function name change.
* tree-ssa.c (non_rewritable_mem_ref_base): Make sure to mark
bases which are accessed with non-invariant indices.
* gimple-fold.c (maybe_canonicalize_mem_ref_addr): Re-write
constant index ARRAY_REFs of vectors into BIT_FIELD_REFs.
* c-c++-common/vector-subscript-4.c: New testcase.
* c-c++-common/vector-subscript-5.c: Likewise.
From-SVN: r236630
2016-05-23 Jerry DeLisle <jvdelisle@gcc.gnu.org>
PR libgfortran/71123
* io/list_read (eat_spaces): Eat '\r' as part of spaces.
fix change log
From-SVN: r236629
2016-05-23 Jerry DeLisle <jvdelisle@gcc.gnu.org>
PR libgfortran/70684
* io/list_read (eat_spaces): Eat '\r' as part of spaces.
* gfortran.dg/namelist_90.f: New test
From-SVN: r236628
2016-05-23 Jerry DeLisle <jvdelisle@gcc.gnu.org>
PR fortran/66461
* scanner.c (gfc_next_char_literal): Clear end_flag when adjusting
current locus back to old_locus.
* gfortran.dg/unexpected_eof.f: New test
From-SVN: r236627
[gcc]
2016-05-23 Michael Meissner <meissner@linux.vnet.ibm.com>
PR target/71201
* config/rs6000/altivec.md (altivec_vperm_<mode>_internal): Drop
ISA 3.0 xxperm fusion alternative.
(altivec_vperm_v8hiv16qi): Likewise.
(altivec_vperm_<mode>_uns_internal): Likewise.
(vperm_v8hiv4si): Likewise.
(vperm_v16qiv8hi): Likewise.
[gcc/testsuite]
2016-05-23 Michael Meissner <meissner@linux.vnet.ibm.com>
Kelvin Nilsen <kelvin@gcc.gnu.org>
* gcc.target/powerpc/p9-permute.c: Run test on big endian as well
as little endian.
[gcc]
2016-05-23 Michael Meissner <meissner@linux.vnet.ibm.com>
Kelvin Nilsen <kelvin@gcc.gnu.org>
* config/rs6000/rs6000.c (rs6000_expand_vector_set): Generate
vpermr/xxpermr on ISA 3.0.
(altivec_expand_vec_perm_le): Likewise.
* config/rs6000/altivec.md (UNSPEC_VPERMR): New unspec.
(altivec_vpermr_<mode>_internal): Add VPERMR/XXPERMR support for
ISA 3.0.
Co-Authored-By: Kelvin Nilsen <kelvin@gcc.gnu.org>
From-SVN: r236617
* pt.c (tsubst_copy): Just return a local variable from
non-template context. Don't call rest_of_decl_compilation for
duplicated static locals.
(tsubst_decl): Set DECL_CONTEXT of local static from another
function.
From-SVN: r236615
* tree-ssa-threadbackward.c (profitable_jump_thread_path): New function
extracted from ...
(fsm_find_control_statement_thread_paths): Call it.
From-SVN: r236599
2016-05-23 Martin Jambor <mjambor@suse.cz>
PR ipa/71234
* ipa-cp.c (ipa_get_indirect_edge_target_1): Only check value of
from_global_constant if t is not NULL.
From-SVN: r236598
2016-05-23 Martin Jambor <mjambor@suse.cz>
* hsa-gen.c (gen_hsa_insns_for_switch_stmt): Create an empty
default block if a PHI node in the original one would be resized.
libgomp/
* testsuite/libgomp.hsa.c/switch-sbr-2.c: New test.
From-SVN: r236585
Fix PR58135.
2016-05-23 Venkataramanan Kumar <venkataramanan.kumar@amd.com>
PR tree-optimization/58135
* tree-vect-slp.c: When group size is not multiple
of vector size, allow splitting of store group at
vector boundary.
2016-05-23 Venkataramanan Kumar <venkataramanan.kumar@amd.com>
* gcc.dg/vect/bb-slp-19.c: Remove XFAIL.
* gcc.dg/vect/pr58135.c: Add new.
* gfortran.dg/pr46519-1.f: Adjust test case.
From-SVN: r236582
* config/i386/sse.md (vec_set_lo_v16hi, vec_set_hi_v16hi,
vec_set_lo_v32qi, vec_set_hi_v32qi): Add alternative with
v constraint instead of x and vinserti32x4 insn.
* gcc.target/i386/avx512vl-vinserti32x4-3.c: New test.
From-SVN: r236568
* config/i386/sse.md (avx2_vec_dupv4df): Use v instead of x
constraint, use maybe_evex prefix instead of vex.
(vec_dupv4sf): Use v constraint instead of x for output
operand except for noavx alternative, use Yv constraint
instead of x for input. Use maybe_evex prefix instead of vex.
(*vec_dupv4si): Likewise.
(*vec_dupv2di): Likewise.
* gcc.target/i386/avx512vl-vbroadcast-1.c: New test.
From-SVN: r236566