git/gcc - gcc - Lanzhou University OSS Mirrors Git Backend

Commit Graph

Author	SHA1	Message	Date
Iain Sandoe	1e84849cb2	libstdc++: Implement P1494 and P3641 Partial program correctness [PR119060] This implements the library parts of P1494 as amended by P3641. For GCC the compiler itself treats stdio operations as equivalent to the observable checkpoint and thus it does not appear to be necessary to add calls to those functions (it will not alter the outcome). This adds the facility for C++26, although there is no reason, in principle, that it would not work back to C++11 at least. PR c++/119060 libstdc++-v3/ChangeLog: * include/bits/version.def: Add observable_checkpoint at present allowed from C++26. * include/bits/version.h: Regenerate. * include/std/utility: Add std::observable_checkpoint(). * src/c++23/std.cc.in: Add obervable_checkpoint () to utility. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>	2025-10-18 23:18:02 +01:00
Iain Sandoe	9056b5faa8	c++: Implement P1494 and P3641 Partial program correctness [PR119060]. P1494 provides a mechanism that serves to demarc epochs within the code preventing UB-based optimisations from 'time traveling' across such boundaries. The additional paper, P3641, alters the name of the function to 'observable_checkpoint' which is the name used here. This implementation maintains the observable function call through to expand, where it produces no code. PR c++/119060 gcc/ChangeLog: * builtins.cc (expand_builtin): Handle BUILT_IN_OBSERVABLE_CHKPT. * builtins.def (BUILT_IN_OBSERVABLE_CHKPT): New. * tree.cc (build_common_builtin_nodes): Build observable checkpoint builtin. gcc/cp/ChangeLog: * cxxapi-data.csv: Add observable_checkpoint to <utility>. * std-name-hint.gperf: Add observable_checkpoint to <utility>. * std-name-hint.h: Regenerate. gcc/testsuite/ChangeLog: * g++.dg/cpp26/observable-checkpoint.C: New test. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>	2025-10-18 23:17:42 +01:00
Nathaniel Shead	515045254f	c++/modules: Import purview using-directives in the same module [PR122279] [namespace.qual] p1 says that a namespace nominated by a using-directive is searched if the using-directive precedes that point. [basic.lookup.general] p2 says that a declaration in a different TU within a module purview is visible if either the declaration is exported, or the other TU is part of the same module as the point of lookup. This patch implements the second half of that. PR c++/122279 gcc/cp/ChangeLog: * module.cc (depset:#️⃣:add_namespace_entities): Seed any purview using-decls. (module_state::write_using_directives): Stream if the udir was exported or not. (module_state::read_using_directives): Add the using-directive if it's either exported or part of this module. gcc/testsuite/ChangeLog: * g++.dg/modules/namespace-13_b.C: Adjust expected results. * g++.dg/modules/namespace-13_c.C: Test non-exported using-directive is not used. * g++.dg/modules/namespace-14_a.C: New test. * g++.dg/modules/namespace-14_b.C: New test. * g++.dg/modules/namespace-14_c.C: New test. * g++.dg/modules/namespace-14_d.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Jason Merrill <jason@redhat.com>	2025-10-19 00:53:31 +11:00
Tamar Christina	75fb400d29	AArch64: Implement widen_[us]sum using 2-way [US]UDOT for SVE2p1 [PR122069] SVE2p1 adds 2-way dotproduct which we can use when we have to do a single step widening addition. This is useful for instance when the value to be widened does not come from a load. For example for int foo2_int(unsigned short x, unsigned short restrict y) { int sum = 0; for (int i = 0; i < 8000; i++) { x[i] = x[i] + y[i]; sum += x[i]; } return sum; } we used to generate .L12: ld1h z30.h, p7/z, [x0, x2, lsl 1] ld1h z29.h, p7/z, [x1, x2, lsl 1] add z30.h, z30.h, z29.h uaddwb z31.s, z31.s, z30.h uaddwt z31.s, z31.s, z30.h st1h z30.h, p7, [x0, x2, lsl 1] mov x3, x2 inch x2 cmp w2, w4 bls .L12 inch x3 uaddv d31, p7, z31.s but with +sve2p1 .L12: ld1h z31.h, p7/z, [x0, x2, lsl 1] ld1h z29.h, p7/z, [x1, x2, lsl 1] add z31.h, z31.h, z29.h udot z30.s, z31.h, z28.h st1h z31.h, p7, [x0, x2, lsl 1] mov x3, x2 inch x2 cmp w2, w4 bls .L12 inch x3 uaddv d30, p7, z30.s gcc/ChangeLog: PR middle-end/122069 * config/aarch64/aarch64-sve2.md (widen_ssum<mode><Vnarrow>3): Update. (widen_usum<mode><Vnarrow>3): Update. gcc/testsuite/ChangeLog: PR middle-end/122069 * gcc.target/aarch64/sve2/pr122069_3.c: New test. * gcc.target/aarch64/sve2/pr122069_4.c: New test.	2025-10-18 08:24:18 +01:00
Tamar Christina	25c8a8d431	AArch64: Implement widen_[us]sum using [US]ADDW[TB] for SVE2 [PR122069] SVE2 adds [US]ADDW[TB] which we can use when we have to do a single step widening addition. This is useful for instance when the value to be widened does not come from a load. For example for int foo2_int(unsigned short x, unsigned short restrict y) { int sum = 0; for (int i = 0; i < 8000; i++) { x[i] = x[i] + y[i]; sum += x[i]; } return sum; } we used to generate .L6: ld1h z1.h, p7/z, [x0, x2, lsl 1] ld1h z29.h, p7/z, [x1, x2, lsl 1] add z29.h, z29.h, z1.h punpklo p6.h, p7.b uunpklo z0.s, z29.h add z31.s, p6/m, z31.s, z0.s punpkhi p6.h, p7.b uunpkhi z30.s, z29.h add z31.s, p6/m, z31.s, z30.s st1h z29.h, p7, [x0, x2, lsl 1] add x2, x2, x4 whilelo p7.h, w2, w3 b.any .L6 ptrue p7.b, all uaddv d31, p7, z31.s but with +sve2 .L12: ld1h z30.h, p7/z, [x0, x2, lsl 1] ld1h z29.h, p7/z, [x1, x2, lsl 1] add z30.h, z30.h, z29.h uaddwb z31.s, z31.s, z30.h uaddwt z31.s, z31.s, z30.h st1h z30.h, p7, [x0, x2, lsl 1] mov x3, x2 inch x2 cmp w2, w4 bls .L12 inch x3 uaddv d31, p7, z31.s gcc/ChangeLog: PR middle-end/122069 * config/aarch64/aarch64-sve2.md: (widen_ssum<mode><Vnarrow>3): New. (widen_usum<mode><Vnarrow>3): New. * config/aarch64/iterators.md (Vnarrow): New, to match VNARROW. gcc/testsuite/ChangeLog: PR middle-end/122069 * gcc.target/aarch64/sve2/pr122069_1.c: New test. * gcc.target/aarch64/sve2/pr122069_2.c: New test.	2025-10-18 08:24:18 +01:00
Tamar Christina	2f719014bf	AArch64: Implement widen_[us]sum using dotproduct for SVE [PR122069] This patch implements support for using dotproduct to do sum reductions by changing += a into += (a * 1). i.e. we seed the multiplication with 1. Given the example int foo_int(unsigned char x, unsigned char restrict y) { int sum = 0; for (int i = 0; i < 8000; i++) sum += char_abs(x[i] - y[i]); return sum; } we used to generate .L2: ld1b z1.b, p7/z, [x0, x2] ld1b z29.b, p7/z, [x1, x2] sub z29.b, z1.b, z29.b uunpklo z0.h, z29.b uunpkhi z29.h, z29.b uunpklo z30.s, z0.h add z31.s, p6/m, z31.s, z30.s uunpkhi z0.s, z0.h add z31.s, p5/m, z31.s, z0.s uunpklo z28.s, z29.h add z31.s, p4/m, z31.s, z28.s uunpkhi z29.s, z29.h add z31.s, p3/m, z31.s, z29.s add x2, x2, x7 whilelo p7.b, w2, w3 whilelo p3.s, w2, w6 whilelo p4.s, w2, w5 whilelo p5.s, w2, w4 whilelo p6.s, w2, w3 b.any .L2 ptrue p7.b, all uaddv d31, p7, z31.s but now generates with +dotprod .L3: ld1b z30.b, p7/z, [x5, x2] ld1b z29.b, p7/z, [x1, x2] sub z30.b, z30.b, z29.b udot z31.s, z30.b, z28.b mov x3, x2 add x2, x2, x6 cmp w2, w0 bls .L3 incb x3 uaddv d31, p7, z31.s gcc/ChangeLog: PR middle-end/122069 * config/aarch64/aarch64-sve.md (widen_<sur>sum<mode><vsi2qi>3): New. gcc/testsuite/ChangeLog: PR middle-end/122069 * gcc.target/aarch64/sve/pr122069_1.c: New test. * gcc.target/aarch64/sve/pr122069_2.c: New test.	2025-10-18 08:24:17 +01:00
Tamar Christina	bb80fb6e9b	rs6000: convert widen_[us]sum into convert optab [PR122069] This patch is a mechanical rewrite of the widen_[us]sum optabs from a direct to a conversion optab. The result of which requires the output mode to be added to the existing patterns. No change in functionality is expected. gcc/ChangeLog: PR middle-end/122069 * config/rs6000/altivec.md (widen_usum<mode>3): Rename ... (widen_usumv4si<mode>3): ... to this. (widen_ssumv16qi3): Rename ... (widen_ssumv4siv16qi3): ... to this. (widen_ssumv8hi3): Rename ... (widen_ssumv4siv8hi3): ... to this.	2025-10-18 08:24:17 +01:00
Tamar Christina	8f60eb8097	ia64: convert widen_[us]sum into convert optab [PR122069] The target does not seem to have a maintainer listed, I've CC'ed a group of global maintainers instead hoping one of you could approve it. This patch is a mechanical rewrite of the widen_[us]sum optabs from a direct to a conversion optab. The result of which requires the output mode to be added to the existing patterns. No change in functionality is expected. gcc/ChangeLog: PR middle-end/122069 * config/ia64/vect.md (widen_usumv8qi3): Renamed ... (widen_usumv4hiv8qi3): ... into this. (widen_usumv4hi3): Renamed ... (widen_usumv2siv4hi3): ... into this. (widen_ssumv8qi3): Renamed ... (widen_ssumv4hiv8qi3): ... into this. (widen_ssumv4hi3): Renamed ... (widen_ssumv2siv4hi3): ... into this.	2025-10-18 08:24:17 +01:00
Tamar Christina	7793947247	arm: convert widen_[us]sum into convert optab [PR122069] This patch is a mechanical rewrite of the widen_[us]sum optabs from a direct to a conversion optab. The result of which requires the output mode to be added to the existing patterns. No change in functionality is expected. gcc/ChangeLog: PR middle-end/122069 * config/arm/iterators.md (v_double_width): New, matching V_double_width. * config/arm/neon.md (widen_ssum<mode>3): Renamed ... (widen_ssum<v_double_width><mode>3, widen_ssum<V_widen_l><mode>3): ... into these. (widen_usum<mode>3): Renamed ... (widen_usum<v_double_width><mode>3, widen_usum<V_widen_l><mode>3): ... into these.	2025-10-18 08:24:17 +01:00
Tamar Christina	c8dc5d5070	AArch64: add double widen_sum optab using dotprod for Adv.SIMD [PR122069] This patch implements support for using dotproduct to do sum reductions by changing += a into += (a * 1). i.e. we seed the multiplication with 1. Given the example int foo_int(unsigned char x, unsigned char restrict y) { int sum = 0; for (int i = 0; i < 8000; i++) sum += char_abs(x[i] - y[i]); return sum; } we used to generate .L2: ldr q0, [x0, x2] ldr q28, [x1, x2] sub v28.16b, v0.16b, v28.16b zip1 v29.16b, v28.16b, v31.16b zip2 v28.16b, v28.16b, v31.16b uaddw v30.4s, v30.4s, v29.4h uaddw2 v30.4s, v30.4s, v29.8h uaddw v30.4s, v30.4s, v28.4h uaddw2 v30.4s, v30.4s, v28.8h add x2, x2, 16 cmp x2, x3 bne .L2 addv s31, v30.4s but now generates with +dotprod .L2: ldr q29, [x0, x2] ldr q28, [x1, x2] sub v28.16b, v29.16b, v28.16b udot v31.4s, v28.16b, v30.16b add x2, x2, 16 cmp x2, x3 bne .L2 addv s31, v31.4s gcc/ChangeLog: PR middle-end/122069 * config/aarch64/aarch64-simd.md (widen_ssum<mode><vsi2qi>3): New. (widen_usum<mode><vsi2qi>3): New. gcc/testsuite/ChangeLog: PR middle-end/122069 * gcc.target/aarch64/pr122069_3.c: New test. * gcc.target/aarch64/pr122069_4.c: New test.	2025-10-18 08:24:17 +01:00
Tamar Christina	b394181afd	AArch64: convert widen_sum optabs to convert [PR122069] This patch is a mechanical rewrite of the widen_[us]sum optabs from a direct to a conversion optab. The result of which requires the output mode to be added to the existing patterns. No change in functionality is expected. gcc/ChangeLog: PR middle-end/122069 * config/aarch64/aarch64-simd.md (widen_ssum<mode>3): Change into.. (widen_ssum<Vdblw><mode>3, widen_ssum<Vwide><mode>3): ... these. (widen_usum<mode>3): Change into ... (widen_usum<Vdblw><mode>3, widen_usum<Vwide><mode>3): ... these. * config/aarch64/iterators.md (Vdblw): New. (Vwide): Extend to match VWIDE. gcc/testsuite/ChangeLog: PR middle-end/122069 * gcc.target/aarch64/pr122069_1.c: New test. * gcc.target/aarch64/pr122069_2.c: New test.	2025-10-18 08:24:16 +01:00
Tamar Christina	2bb6a8c4f9	middle-end: refactor WIDEN_SUM_EXPR into convert optab [PR122069] This patch changes the widen_[us]sum optabs into a convert optabs such that targets and specify more than one conversion. Following this patch are patches rewriting all targets using this change. While working on this I noticed that the pattern does miss some cases it could handle if it tried multiple attempts. e.g. if the promotion is from qi to si, and the target doesn't have this, it should try hi -> si. But I'm leaving that for now. gcc/ChangeLog: PR middle-end/122069 * doc/md.texi (widen_ssum@var{n}@var{m}3, widen_usum@var{n}@var{m}3): Update docs. * optabs.cc (expand_widen_pattern_expr): Add WIDEN_SUM_EXPR as widening. * optabs.def (ssum_widen_optab, usum_widen_optab): Convert from direct to a conversion optab. * tree-vect-patterns.cc (vect_recog_widen_sum_pattern): Change vect_supportable_direct_optab_p into vect_supportable_conv_optab_p.	2025-10-18 08:24:16 +01:00
Yuao Ma	2c1949bf15	fortran: allow character in conditional expression This patch allows the use of character types in conditional expressions. gcc/fortran/ChangeLog: * resolve.cc (resolve_conditional): Allow character in cond-expr. * trans-const.cc (gfc_conv_constant): Handle want_pointer. * trans-expr.cc (gfc_conv_conditional_expr): Fill se->string_length. (gfc_conv_string_parameter): Handle COND_EXPR tree code. gcc/testsuite/ChangeLog: * gfortran.dg/conditional_1.f90: Test character type. * gfortran.dg/conditional_2.f90: Test print constants. * gfortran.dg/conditional_4.f90: Test diagnostic message. * gfortran.dg/conditional_6.f90: Test character cond-arg.	2025-10-18 15:22:09 +08:00
Linsen Zhou	82cefc4898	tree-object-size.cc: Fix assert constant offset in check_for_plus_in_loops [PR122012] After commit `51b85dfeb1`, when the pointer offset is a variable in the loop, the object size of the pointer may also need to be reexamined. Which make gcc_assert in the check_for_plus_in_loops failed. gcc/ChangeLog: PR tree-optimization/122012 * tree-object-size.cc (check_for_plus_in_loops): Skip check for the variable offset gcc/testsuite/ChangeLog: PR tree-optimization/122012 * gcc.dg/torture/pr122012.c: New test. Signed-off-by: Linsen Zhou <i@lin.moe>	2025-10-17 20:58:33 -05:00
GCC Administrator	fa8ca9554d	Daily bump.	2025-10-18 00:18:06 +00:00
David Faust	239535e9b0	bpf: fix memset miscompilation with larger stores [PR122139] The BPF backend expansion of setmem was broken, because it could elect to use stores of HI, SI or DI modes based on the destination alignment when the value was QI, but fail to duplicate the byte value across to those larger sizes. This resulted in not all bytes of the destination actually being set to the desired value. Fix bpf_expand_setmem to ensure the desired byte value is really duplicated as necessary, whether it is constant or a (sub)reg:QI. PR target/122139 gcc/ * config/bpf/bpf.cc (bpf_expand_setmem): Duplicate byte value across to new mode when using larger modes for store. gcc/testsuite/ * gcc.target/bpf/memset-3.c: New. * gcc.target/bpf/memset-4.c: New.	2025-10-17 08:40:46 -07:00
Tamar Christina	d1965b1fd8	AArch64: Extend intrinsics framework to account for merging predications without gp [PR121604] In PR121604 the problem was noted that currently the SVE intrinsics infrastructure assumes that for any predicated operation that the GP is at the first argument position which has a svbool_t or for a unary merging operation that it's in the second position. However you have intrinsics like fmov_lane which have an svbool_t but it's not a GP but instead it's the inactive lanes. You also have instructions like BRKB which work only on predicates so it incorrectly determines the first operand to be the GP, while that's also the inactive lanes. However during apply_predication we do have the information about where the GP is. This patch re-organizes the code to record this information into the function_instance such that folders have access to this information. For functions that are outliers like pmov_lane we can now override the availability of the intrinsics having a GP. gcc/ChangeLog: PR target/121604 * config/aarch64/aarch64-sve-builtins-shapes.cc (apply_predication): Store gp_index. (struct pmov_to_vector_lane_def): Mark instruction as has no GP. * config/aarch64/aarch64-sve-builtins.h (function_instance::gp_value, function_instance::inactive_values, function_instance::gp_index, function_shape::has_gp_argument_p): New. * config/aarch64/aarch64-sve-builtins.cc (gimple_folder::fold_pfalse): Simplify code and use GP helpers. gcc/testsuite/ChangeLog: PR target/121604 * gcc.target/aarch64/sve/pr121604_brk.c: New test. * gcc.target/aarch64/sve2/pr121604_pmov.c: New test. Co-authored-by: Jennifer Schmitz <jschmitz@nvidia.com>	2025-10-17 15:43:10 +01:00
Richard Biener	d6986e06db	tree-optimization/122308 - apply LIM after unroll-and-jam Just like with loop interchange, unroll-and-jam can leave invariant stmts in the inner loop from outer loop stmts inbetween the two inner loop copies. Do a per-function invariant motion when we applied unroll-and-jam. This avoids failed dataref analysis and fallback to gather/scatter during vectorization. PR tree-optimization/122308 * gimple-loop-jam.cc (tree_loop_unroll_and_jam): Do LIM after applying unroll-and-jam. * gcc.dg/vect/vect-pr122308.c: New testcase.	2025-10-17 16:03:28 +02:00
Josef Melcr	7cd91c7c42	ipa, cgraph: Enable constant propagation to OpenMP kernels. This patch enables constant propagation to outlined OpenMP kernels. It does so using a new function attribute called ' callback' (note the space). The attribute ' callback' captures the notion of a function calling one of its arguments with some of its parameters as arguments. An OpenMP example of such function is GOMP_parallel. We implement the attribute with new callgraph edges called callback edges. They are imaginary edges pointing from the caller of the function with the attribute (e.g. caller of GOMP_parallel) to the body function itself (e.g. the outlined OpenMP body). They share their call statement with the edge from which they are derived (direct edge caller -> GOMP_parallel in this case). These edges allow passes such as ipa-cp to see the hidden call site to the body function and optimize the function accordingly. To illustrate on an example, the body GOMP_parallel looks something like this: void GOMP_parallel (void (fn) (void ), void data, / ... /) { / ... / fn (data); / ... / } If we extend it with the attribute ' callback(1, 2)', we express that the function calls its first argument and passes it its second argument. This is represented in the call graph in this manner: direct indirect caller -----------------> GOMP_parallel ---------------> fn \| ----------------------> fn callback The direct edge is then the callback-carrying edge, all new edges are the derived callback edges. While constant propagation is the main focus of this patch, callback edges can be useful for different passes (for example, they improve icf for OpenMP kernels), as they allow for address redirection. If the outlined body function gets optimized and cloned, from body_fn to body_fn.optimized, the callback edge allows us to replace the address in the arguments list: GOMP_parallel (body_fn, &data_struct, / ... /); becomes GOMP_parallel (body_fn.optimized, &data_struct, / ... /); This redirection is possible for any function with the attribute. This callback attribute implementation is partially compatible with clang's implementation. Its semantics, arguments and argument indexing style are the same, but we represent an unknown argument position with 0 (precedent set by attributes such as 'format'), while clang uses -1 or '?'. We use the index 1 for the 'this' pointer in member functions, clang uses 0. We also allow for multiple callback attributes on the same function, while clang only allows one. The attribute is currently for GCC internal use only, thanks to the space in its name. Originally, it was supposed to be called 'callback' like its clang counterpart, but we cannot use this name, as clang uses non-standard indexing style, leading to inconsistencies. The attribute will be introduced into the public API as 'gnu::callback_only' in a future patch. The attribute allows us to propagate constants into body functions of OpenMP constructs. Currently, GCC won't propagate the value 'c' into the OpenMP body in the following example: int a[100]; void test(int c) { #pragma omp parallel for for (int i = 0; i < c; i++) { if (!__builtin_constant_p(c)) { __builtin_abort(); } a[i] = i; } } int main() { test(100); return a[5] - 5; } With this patch, the body function will get cloned and the constant 'c' will get propagated. Some functions may utilize the attribute's infrastructure without being declared with it, for example GOMP_task. These functions are special cases and use the special case functions found in attr-callback.h. Special cases use the attribute under certain circumstances, for example GOMP_task uses it when the copy function is not being used required. gcc/ChangeLog: Makefile.in: Add attr-callback.o to OBJS. * builtin-attrs.def (ATTR_CALLBACK): Callback attr identifier. (DEF_CALLBACK_ATTRIBUTE): Macro for callback attr creation. (GOMP): Attr for libgomp functions. (ATTR_CALLBACK_GOMP_LIST): ATTR_NOTHROW_LIST with GOMP callback attr added. * cgraph.cc (cgraph_add_edge_to_call_site_hash): Always hash the callback-carrying edge. (cgraph_node::get_edge): Always return the callback-carrying edge. (cgraph_edge::set_call_stmt): Add cascade for callback edges. (symbol_table::create_edge): Allow callback edges to share call stmts, initialize new flags. (cgraph_edge::make_callback): New method, derives a new callback edge. (cgraph_edge::get_callback_carrying_edge): New method. (cgraph_edge::first_callback_edge): Likewise. (cgraph_edge::next_callback_edge): Likewise. (cgraph_edge::purge_callback_edges): Likewise. (cgraph_edge::redirect_callee): When redirecting a callback edge, redirect its ref as well. (cgraph_edge::redirect_call_stmt_to_callee): Add callback edge redirection logic, set update_derived_edges to true hwne redirecting the carrying edge. (cgraph_node::remove_callers): Add cascade for callback edges. (cgraph_edge::dump_edge_flags): Print callback flags. (cgraph_node::verify_node): Add sanity checks for callback edges. * cgraph.h: Add new 1 bit flags and 16 bit callback_id to cgraph_edge class. * cgraphclones.cc (cgraph_edge::clone): Copy over callback data. * cif-code.def (CALLBACK_EDGE): Add CIF_CALLBACK_EDGE code. * ipa-cp.cc (purge_useless_callback_edges): New function, deletes callback edges when necessary. (ipcp_decision_stage): Call purge_useless_callback_edges. * ipa-fnsummary.cc (ipa_call_summary_t::duplicate): Add an exception for callback edges. (analyze_function_body): Copy over summary from carrying to callback edge. * ipa-inline-analysis.cc (do_estimate_growth_1): Skip callback edges when estimating growth. * ipa-inline-transform.cc (inline_transform): Add redirection cascade for callback edges. * ipa-param-manipulation.cc (drop_decl_attribute_if_params_changed_p): New function. (ipa_param_adjustments::build_new_function_type): Add args_modified out param. (ipa_param_adjustments::adjust_decl): Drop callback attrs when modifying args. * ipa-param-manipulation.h: Adjust decl of build_new_function_type. * ipa-prop.cc (ipa_duplicate_jump_function): Add decl. (init_callback_edge_summary): New function. (ipa_compute_jump_functions_for_edge): Add callback edge creation logic. * lto-cgraph.cc (lto_output_edge): Stream out callback data. (input_edge): Input callback data. * omp-builtins.def (BUILT_IN_GOMP_PARALLEL_LOOP_STATIC): Use new attr list. (BUILT_IN_GOMP_PARALLEL_LOOP_GUIDED): Likewise. (BUILT_IN_GOMP_PARALLEL_LOOP_NONMONOTONIC_DYNAMIC): Likewise. (BUILT_IN_GOMP_PARALLEL_LOOP_NONMONOTONIC_RUNTIME): Likewise. (BUILT_IN_GOMP_PARALLEL): Likewise. (BUILT_IN_GOMP_PARALLEL_SECTIONS): Likewise. (BUILT_IN_GOMP_TEAMS_REG): Likewise. * tree-core.h (ECF_CB_1_2): New constant for callback(1,2). * tree-inline.cc (copy_bb): Copy callback edges when copying the carrying edge. (redirect_all_calls): Redirect callback edges. * tree.cc (set_call_expr_flags): Create callback attr according to the ECF_CB flag. * attr-callback.cc: New file. * attr-callback.h: New file. gcc/c-family/ChangeLog: * c-attribs.cc: Define callback attr. gcc/fortran/ChangeLog: * f95-lang.cc (ATTR_CALLBACK_GOMP_LIST): New attr list corresponding to the list in builtin-attrs.def. gcc/testsuite/ChangeLog: * gcc.dg/ipa/ipcp-cb-spec1.c: New test. * gcc.dg/ipa/ipcp-cb-spec2.c: New test. * gcc.dg/ipa/ipcp-cb1.c: New test. Signed-off-by: Josef Melcr <jmelcr02@gmail.com>	2025-10-17 11:31:38 +02:00
Eric Botcazou	cdb08b4bd2	Fix missing style violation report for package instantiation Unlike for subprogram instantiation, -gnatyr does not report style violation for package instantiation, more precisely for the generic package's name. Fixing it uncovered style violations in the sources of the compiler itself! gcc/ada/ PR ada/122295 * sem_ch12.adb (Analyze_Package_Instantiation): Force Style_Check to False only after possibly installing the parent. * aspects.adb (UAD_Pragma_Map): Fix style violation. * inline.adb (To_Pending_Instantiations): Likewise. * lib.ads (Unit_Names): Likewise. * repinfo.adb (Relevant_Entities): Likewise. * sem_ch7.adb (Subprogram_Table): Likewise. (Traversed_Table): Likewise. * sem_util.adb (Interval_Sorting): Likewise. gcc/testsuite/ * gnat.dg/specs/style1.ads: New test.	2025-10-17 11:05:08 +02:00
Tomasz Kamiński	c591c2aff5	libstdc++: Fix typo in in __atomic_ref_base::_S_required_alignment. libstdc++-v3/ChangeLog: * include/bits/atomic_base.h (__atomic_ref_base::_S_required_alignment): Renamed from... (__atomic_ref_base::_S_required_aligment): Renamed.	2025-10-17 10:26:26 +02:00
Richard Biener	2cb9925f40	tree-optimization/122301 - fix ICE and improve vectorization of min/max reduction The following fixes another issue with updating of reduc_idx in pattern sequences. But the testcase also shows the pattern in question is harmful for vectorization since a reduction path may not contain promotions/demotions. So the already existing but ineffective check to guard the pattern is fixed. PR tree-optimization/122301 * tree-vect-patterns.cc (vect_recog_over_widening_pattern): Fix reduction guard. (vect_mark_pattern_stmts): Fix reduction def check. * gcc.dg/vect/vect-pr122301.c: New testcase.	2025-10-17 10:24:49 +02:00
Avinash Jayakar	6883d51304	vect: Add pattern recognition for vectorizing {FLOOR,CEIL,ROUND}_{MOD, DIV}_EXPR Added a new helper function "add_code_for_floorceilround_divmod" in tree-vect-patterns.cc for adding compensating code for each of the op {FLOOR,ROUND,CEIL}_{DIV,MOD}_EXPR. This function checks if target supports all required operations required to implement these operation and generates vectorized code for the respective operations. Based on the following logic FLOOR_{DIV,MOD} r = x %[fl] y; r = x % y; if (r && (x ^ y) < 0) r += y; r = x/[fl] y; r = x % y; d = x/y; if (r && (x ^ y) < 0) d--; CEIL_{DIV,MOD} (unsigned) r = x %[cl] y; r = x % y; if (r) r -= y; r = x/[cl] y; r = x % y; d = x/y; if (r) d++; CEIL_{DIV,MOD} (signed) r = x %[cl] y; r = x % y; if (r && (x ^ y) >= 0) r -= y; r = x/[cl] y; r = x % y; d = x/y; if (r && (x ^ y) >= 0) d++; ROUND_{DIV,MOD} (unsigned) r = x %[rd] y; r = x % y; if (r > ((y-1)/2)) r -= y; r = x/[rd] y; r = x % y; d = x/y; if (r > ((y-1)/2)) d++; ROUND_{DIV,MOD} (signed) r = x %[rd] y; r = x % y; if (r > ((y-1)/2)) {if ((x ^ y) >= 0) r -= y; else r += y;} r = x/[rd] y; r = x % y; d = x/y; if ((r > ((y-1)/2)) && (x ^ y) >= 0) {if ((x ^ y) >= 0) d++; else d--;} each of the case is implemented in a vectorized form. This function is then called in each of the path in vect_recog_divmod_pattern, which there are 3, based on value of constant operand1, 1. == 2 2. == power of 2 3. otherwise 2025-10-17 Avinash Jayakar <avinashd@linux.ibm.com> gcc/ChangeLog: PR tree-optimization/104116 * tree-vect-patterns.cc (add_code_for_floorceilround_divmod): patt recog for {FLOOR,ROUND,CEIL}_{DIV,MOD}_EXPR. (vect_recog_divmod_pattern): Call add_code_for_floorceilround_divmod after computing div/mod for each control path. gcc/testsuite/ChangeLog: PR tree-optimization/104116 * gcc.dg/vect/pr104116-ceil-div-2.c: New test. * gcc.dg/vect/pr104116-ceil-div-pow2.c: New test. * gcc.dg/vect/pr104116-ceil-div.c: New test. * gcc.dg/vect/pr104116-ceil-mod-2.c: New test. * gcc.dg/vect/pr104116-ceil-mod-pow2.c: New test. * gcc.dg/vect/pr104116-ceil-mod.c: New test. * gcc.dg/vect/pr104116-ceil-udiv-2.c: New test. * gcc.dg/vect/pr104116-ceil-udiv-pow2.c: New test. * gcc.dg/vect/pr104116-ceil-udiv.c: New test. * gcc.dg/vect/pr104116-ceil-umod-2.c: New test. * gcc.dg/vect/pr104116-ceil-umod-pow2.c: New test. * gcc.dg/vect/pr104116-ceil-umod.c: New test. * gcc.dg/vect/pr104116-floor-div-2.c: New test. * gcc.dg/vect/pr104116-floor-div-pow2.c: New test. * gcc.dg/vect/pr104116-floor-div.c: New test. * gcc.dg/vect/pr104116-floor-mod-2.c: New test. * gcc.dg/vect/pr104116-floor-mod-pow2.c: New test. * gcc.dg/vect/pr104116-floor-mod.c: New test. * gcc.dg/vect/pr104116-round-div-2.c: New test. * gcc.dg/vect/pr104116-round-div-pow2.c: New test. * gcc.dg/vect/pr104116-round-div.c: New test. * gcc.dg/vect/pr104116-round-mod-2.c: New test. * gcc.dg/vect/pr104116-round-mod-pow2.c: New test. * gcc.dg/vect/pr104116-round-mod.c: New test. * gcc.dg/vect/pr104116-round-udiv-2.c: New test. * gcc.dg/vect/pr104116-round-udiv-pow2.c: New test. * gcc.dg/vect/pr104116-round-udiv.c: New test. * gcc.dg/vect/pr104116-round-umod-2.c: New test. * gcc.dg/vect/pr104116-round-umod-pow2.c: New test. * gcc.dg/vect/pr104116-round-umod.c: New test. * gcc.dg/vect/pr104116.h: New test.	2025-10-17 13:11:44 +05:30
Andrew Pinski	eb717a8f4e	match: Fix (a != b) \| ((a\|b) != 0) and (a == b) & ((a\|b) == 0) match pattern [PR122296] There are 2 fixes for these 2 patterns. 1) Reuse the (a\|b) expression instead of recreating it Fixed by capturing the bit_ior expression and using that instead of a new expression. 2) Use the correct 0. Fixed by capturing the integer_zerop and using that instead of integer_zero_node. 2) could be fuxed by using `build_cst_zero (TREE_TYPE (@0))` But since we already have the correct 0, capturing it would be faster. Pushed as obvious after a bootstrap/test on x86_64-linux-gnu. PR tree-optimization/122296 gcc/ChangeLog: * match.pd (`(a != b) \| ((a\|b) != 0)`): Reuse both the ior and zero instead of recreating them. (`(a == b) & ((a\|b) == 0)`): Likewise gcc/testsuite/ChangeLog: * gcc.dg/torture/int-bwise-opt-1.c: New test. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>	2025-10-16 22:00:45 -07:00
Andrew Pinski	128933c9cf	match: Fix `(a == b) \| ((a\|b) != 0)` pattern for vectors [PR122296] The pattern `(a == b) \| ((a\|b) != 0)` uses build_one_cst to build boolean true but boolean can be a signed multi-bit type. So this changes the result to use constant_boolean_node isntead. `(a != b) & ((a\|b) == 0)` has a similar issue but in that case it is less likely to be an issue as false is almost always just 0 but this changes it to be consistent. Pushed as obvious after a bootstrap/test on x86_64-linux-gnu. PR tree-optimization/122296 gcc/ChangeLog: * match.pd (`(a == b) \| ((a\|b) != 0)`): Fix true value. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/int-bwise-opt-vect01.c: New test. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>	2025-10-16 22:00:45 -07:00
Hu, Lin1	175bacbb25	x86: Cast stride to __PTRDIFF_TYPE__ for AMX-MOVRS intrinsics. [PR122119] On 64-bit windows, long can't be used, because it is 32 bits. Use __PTRDIFF_TYPE__ instead of long. gcc/ChangeLog: PR target/122119 * config/i386/amxmovrsintrin.h (_tile_loaddrs_internal): Use __PTRDIFF_TYPE__ instead of long. (_tile_loaddrst1_internal): Ditto.	2025-10-17 11:20:54 +08:00
GCC Administrator	03fed2a80b	Daily bump.	2025-10-17 00:18:48 +00:00
David Malcolm	c89bd48e7e	diagnostics: generalize state graph code to use json::property instances (v2) In r16-1631-g2334d30cd8feac I added support for capturing state information from -fanalyzer in the form of embedded XML strings in SARIF output. In r16-2211-ga5d9debedd2f46 I rewrote this so the state was captured in the form of a SARIF directed graph, using various custom types. I want to add the ability to capture other kinds of graph in our SARIF output (e.g. inheritance hierarchies, CFGs, etc), so the following patch reworks the state graph handling code to minimize the use of custom types. Instead, the patch introduces various json::property types, and describes the state graph serialization in terms of instances of these properties, rather than hardcoding string attribute names in readers and writers. The custom SARIF properties live in a new "gcc/custom-sarif-properties/" directory. The "experimental-html" scheme keys "show-state-diagrams-dot-src" and "show-state-diagrams-sarif" become "show-graph-dot-src" and "show-graph-dot-src" in preparation for new kinds of graph in the output. This is an updated version of the patch, tested to build with GCC 5 (which the previous version didn't leading to PR bootstrap/122151) contrib/ChangeLog: * gcc.doxy (INPUT): Add gcc/custom-sarif-properties gcc/ChangeLog: * Makefile.in (OBJS-libcommon): Add custom-sarif-properties/digraphs.o and custom-sarif-properties/state-graphs.o. Remove diagnostics/state-graphs.o. * configure: Regenerate. * configure.ac: Add custom-sarif-properties to subdir iteration. * custom-sarif-properties/digraphs.cc: New file. * custom-sarif-properties/digraphs.h: New file. * custom-sarif-properties/state-graphs.cc: New file. * custom-sarif-properties/state-graphs.h: New file. * diagnostics/diagnostics-selftests.cc (run_diagnostics_selftests): Drop call of state_graphs_cc_tests. * diagnostics/diagnostics-selftests.h (state_graphs_cc_tests): Delete decl. * diagnostics/digraphs.cc: Include "custom-sarif-properties/digraphs.h". Move include of "selftest.h" to within CHECKING_P section. (using digraph_object): New. (namespace properties): New. (diagnostics::digraphs::object::get_attr): Delete. (diagnostics::digraphs::object::set_attr): Delete. (diagnostics::digraphs::object::set_json_attr): Delete. (digraph_object::get_property): New definitions, for various property types. (digraph_object::set_property): Likewise. (digraph_object::maybe_get_property): New. (digraph_object::get_property_as_tristate): New. (digraph_object::ensure_property_bag): New. (digraph::get_graph_kind): New. (digraph::set_graph_kind): New. Add include of "custom-sarif-properties/state-graphs.h". (selftest::test_simple_graph): Rewrite to use json::property instances rather than string attribute names. (selftest::test_property_objects): New test. (selftest::digraphs_cc_tests): Call it. * diagnostics/digraphs.h: Include "tristate.h". (object::get_attr): Delete. (object::set_attr): Delete. (object::get_property): New decls. (object::set_property): New decls. (object::maybe_get_property): New. (object::get_property_as_tristate): New. (object::set_json_attr): Delete. (object::ensure_property_bag): New. (graph::get_graph_kind): New. (graph::set_graph_kind): New. * diagnostics/html-sink.cc (html_generation_options::html_generation_options): Update for field renamings. (html_generation_options::dump): Likewise. (html_builder::maybe_make_state_diagram): Likewise. (html_builder::add_graph): Show SARIF and .dot src inline, if requested. * diagnostics/html-sink.h (html_generation_options::m_show_state_diagrams_sarif): Rename to... (html_generation_options::m_show_graph_sarif): ...this. (html_generation_options::m_show_state_diagrams_dot_src): Rename to... (html_generation_options::m_show_graph_dot_src0): ...this. * diagnostics/output-spec.cc (html_scheme_handler::maybe_handle_kv): Rename keys. (html_scheme_handler::get_keys): Likewise. * diagnostics/state-graphs-to-dot.cc: : Reimplement throughout to use json::property instances found within custom_sarif_properties throughout, rather than types in diagnostics::state_graphs. * diagnostics/state-graphs.cc: Deleted file. * diagnostics/state-graphs.h: Delete almost all, except decl of diagnostics::state_graphs::make_dot_graph. * doc/invoke.texi: Update for changes to "experimental-html" sink keys. * json.cc (json::object::set_string): New. (json::object::set_integer): New. (json::object::set_bool): New. (json::object::set_array_of_string): New. * json.h: Include "label-text.h". (struct json::property): New template. (json::string_property): New. (json::integer_property): New. (json::bool_property): New. (json::json_property): New. (using json::array_of_string_property): New. (struct json::enum_traits): New. (enum_json::property): New. (json::value::dyn_cast_array): New vfunc. (json::value::dyn_cast_integer_number): New vfunc. (json::value::set_string): New. (json::value::set_integer): New. (json::value::set_bool): New. (json::value::set_array_of_string): New. (json::value::maybe_get_enum): New. (json::value::set_enum): New. (json::array::dyn_cast_array): New. (json::integer_number::dyn_cast_integer_number): New. (object::maybe_get_enum): New. (object::set_enum): New. gcc/analyzer/ChangeLog: * ana-state-to-diagnostic-state.cc: Reimplement throughout to use json::property instances found within custom_sarif_properties throughout, rather than types in diagnostics::state_graphs. * ana-state-to-diagnostic-state.h: Likewise. * checker-event.cc: Likewise. * sm-malloc.cc: Likewise. gcc/testsuite/ChangeLog: * gcc.dg/plugin/diagnostic_plugin_test_graphs.cc (report_diag_with_graphs): Port from set_attr to set_property. Signed-off-by: David Malcolm <dmalcolm@redhat.com>	2025-10-16 17:39:03 -04:00
David Faust	822a139e7d	dwarf: add wiki link for DWARF GNU_annotation extensions include/ * dwarf2.def (DW_TAG_GNU_annotation): Add link to wiki page documenting the extension. (DW_AT_GNU_annotation): Likewise.	2025-10-16 11:08:30 -07:00
Jonathan Wakely	08b2c542e4	libstdc++: Improve ostream output for std::stacktrace With this change stacktrace entries always output the frame address, and source file information no longer results in " at :0", e.g. 16# myfunc(int) at /tmp/bt.cc:48 [0x4008b7] 17# main at /tmp/bt.cc:61 [0x40091a] 18# __libc_start_call_main [0x7efc3d6d3574] 19# __libc_start_main@GLIBC_2.2.5 [0x7efc3d6d3627] 20# _start [0x400684] This replaces the previous output: 16# myfunc(int) at /tmp/bt.cc:48 17# main at /tmp/bt.cc:61 18# __libc_start_call_main at :0 19# __libc_start_main@GLIBC_2.2.5 at :0 20# _start at :0 A change that is not visible in the examples above is that for a non-empty stacktrace_entry, we now print "<unknown>" for the function name if description() returns an empty string. For an empty (e.g. default constructed) stacktrace_entry the entire string representation is now "<unknown>" instead of an empty string. Instead of printing "<unknown>" for the function name, we could set that string in the stacktrace_entry::_Info object, so that description() returns "<unknown>" and then operator<< wouldn't need to handle an empty description() string. However, returning an empty string from that function seems simpler for users to detect, rather than having to parse "<unknown>". We could also choose a different string for an empty stacktrace_entry, maybe "<none>" or "<invalid>", but "<unknown>" seems good. libstdc++-v3/ChangeLog: * include/std/stacktrace (operator<<(ostream&, const stacktrace_entry&)): Improve output when description() or source_file() returns an empty string, or the stacktrace_entry is invalid. Append frame address to output. (operator<<(ostream&, const basic_stacktrace<A>&)): Use the size_type of the correct specialization. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Reviewed-by: Nathan Myers <nmyers@redhat.com>	2025-10-16 14:59:42 +01:00
Ayappan Perumal	dfb7e97dd2	Error out stack-protector unavailability on AIX stack-protector is not supported in GCC on AIX. This patch is to fail the compilation if -fstack-protector option is passed. gcc/ChangeLog: * config/rs6000/aix.h (SUBTARGET_DRIVER_SELF_SPECS): Error out when stack-protector option is used in AIX as it is not supported on AIX Approved By: Segher Boessenkool <segher@kernel.crashing.org>	2025-10-16 04:22:51 -05:00
Tobias Burnus	e1e5444ff2	libgomp.c/declare-variant-4-gfx: Add missing archs + dg-excess-errors Add missing tests for gfx context selectors; mark all but the default-arch declare-variant-4.c with 'dg-excess-errors' to silence libgomp not-found errors (still passing the scan-offload-tree-dump check) - or at least causing just UNRESOLVED errors if the error is "built without library support ... consider compiling for the associated generic architecture". In case the multilib is configured, the result will be an XPASS. libgomp/ChangeLog: * testsuite/libgomp.c/declare-variant-4-gfx10-3-generic.c: Add dg-excess-errors to handle possible missing libgomp multi lib. * testsuite/libgomp.c/declare-variant-4-gfx1030.c: Likewise. * testsuite/libgomp.c/declare-variant-4-gfx1036.c: Likewise. * testsuite/libgomp.c/declare-variant-4-gfx11-generic.c: Likewise. * testsuite/libgomp.c/declare-variant-4-gfx1100.c: Likewise. * testsuite/libgomp.c/declare-variant-4-gfx1103.c: Likewise. * testsuite/libgomp.c/declare-variant-4-gfx9-4-generic.c: Likewise. * testsuite/libgomp.c/declare-variant-4-gfx9-generic.c: Likewise. * testsuite/libgomp.c/declare-variant-4-gfx900.c: Likewise. * testsuite/libgomp.c/declare-variant-4-gfx906.c: Likewise. * testsuite/libgomp.c/declare-variant-4-gfx908.c: Likewise. * testsuite/libgomp.c/declare-variant-4-gfx90a.c: Likewise. * testsuite/libgomp.c/declare-variant-4-gfx90c.c: Likewise. * testsuite/libgomp.c/declare-variant-4-gfx942.c: Likewise. * testsuite/libgomp.c/declare-variant-4-gfx1031.c: New test. * testsuite/libgomp.c/declare-variant-4-gfx1032.c: New test. * testsuite/libgomp.c/declare-variant-4-gfx1033.c: New test. * testsuite/libgomp.c/declare-variant-4-gfx1034.c: New test. * testsuite/libgomp.c/declare-variant-4-gfx1035.c: New test. * testsuite/libgomp.c/declare-variant-4-gfx1101.c: New test. * testsuite/libgomp.c/declare-variant-4-gfx1102.c: New test. * testsuite/libgomp.c/declare-variant-4-gfx1150.c: New test. * testsuite/libgomp.c/declare-variant-4-gfx1151.c: New test. * testsuite/libgomp.c/declare-variant-4-gfx1152.c: New test. * testsuite/libgomp.c/declare-variant-4-gfx1153.c: New test. * testsuite/libgomp.c/declare-variant-4-gfx902.c: New test. * testsuite/libgomp.c/declare-variant-4-gfx904.c: New test. * testsuite/libgomp.c/declare-variant-4-gfx909.c: New test. * testsuite/libgomp.c/declare-variant-4-gfx950.c: New test.	2025-10-16 11:11:39 +02:00
Richard Biener	6d9d969ab4	tree-optimization/122292 - fix reduction code gen issue The following fixes a mixup of vector types checked when looking at a conditional reduction operation. We want the actual data vector input type, so look at the SLP trees type instead and special-case lane-reducing ops like the original code did. PR tree-optimization/122292 * tree-vect-loop.cc (vect_transform_reduction): Compute the input vector type the same way the analysis phase does.	2025-10-16 10:54:30 +02:00
GCC Administrator	b9c253795e	Daily bump.	2025-10-16 00:21:56 +00:00
Andrew MacLeod	9e04a43012	Range snap bitmasks as they are set. Range bounds adjustments based on a bitmask were lazily set. This lead to some inconsitencies which were causing problems. Improve the bounds, and do it every time the bitmask is adjusted. PR tree-optimization/121468 PR tree-optimization/121206 PR tree-optimization/122200 gcc/ * value-range.cc (irange_bitmask::range_from_mask): New. (irange::snap): Add explicit overflow flag. (irange::snap_subranges): Use overflow flag. (irange::set_range_from_bitmask): Use range_from_mask. (test_irange_snap_bounds): Adjust for improved ranges. * value-range.h (irange::range_from_mask): Add prototype. (irange::snap): Adjust prototype. gcc/testsuite/ * gcc.dg/pr121468.c: New. * gcc.dg/pr122200.c: New.	2025-10-15 18:22:34 -04:00
Jonathan Wakely	fa9008b8a7	libstdc++: Add pretty printers for std::stacktrace libstdc++-v3/ChangeLog: * python/libstdcxx/v6/printers.py (StdStacktraceEntryPrinter): New printer for std::stacktrace_entry. (StdStacktracePrinter): New printer for std::basic_stacktrace.	2025-10-15 21:57:59 +01:00
Jonathan Wakely	6c272ca18b	libstdc++: Remove invalid entry from the end of std::stacktrace The backtrace_simple function seems to consistently invoke the callback with an invalid -1UL value as the last entry, which seems to come from _Unwind_Backtrace. The glibc backtrace(3) function has a special case to not include that final invalid address, but libbacktrace doesn't seem to handle it. Do so in std::stacktrace::current() instead. libstdc++-v3/ChangeLog: * include/std/stacktrace (basic_stacktrace::current): Call _M_trim before returning. (basic_stacktrace::_M_trim): New member function.	2025-10-15 21:54:56 +01:00
Jonathan Wakely	524bca2e33	libstdc++: Fix missing __to_timeout_timespec for targets using POSIX sleep [PR122293] The preprocessor condition for defining the new __to_timeout_timespec function templates did not match all the conditions under which it's needed. std::this_thread::sleep_for is defined #if ! defined _GLIBCXX_NO_SLEEP but it relies on __to_timeout_timespec which was only being defined for targets that use nanosleep, or clock_gettime, or use gthreads. For a non-gthreads target that uses POSIX sleep to implement std::this_thread::sleep_for, the build fails with: include/bits/this_thread_sleep.h:71:40: error: '__to_timeout_timespec' is not a member of 'std::chrono' [-Wtemplate-body] 71 \| struct timespec __ts = chrono::__to_timeout_timespec(__rtime); \| ^~~~~~~~~~~~~~~~~~~~~ Presumably the same would happen for mingw-w64 if configured with --disable-threads (as that would be a non-gthreads target that doesn't use nanosleep or clock_gettime). libstdc++-v3/ChangeLog: PR libstdc++/122293 * include/bits/chrono.h (__to_timeout_timespec): Fix preprocessor condition to match the conditions under which callers of this function are defined. * include/bits/this_thread_sleep.h: Remove unused include.	2025-10-15 21:54:56 +01:00
Basil Milanich	f81e712120	[PATCH] Makefile.tpl: remove an extra \; from find command The extra \; parameter in the find command causes it to fail immediately and not clean any config.cache: $ find . -name config.cache -exec rm -f {} \; \; find: paths must precede expression: `;' This is benign in most cases but the binutils is also using this Makefile.tpl and as the result its 'make distclean' can leave config.cache files around, which fails subsequent attempts to configure and build it. I have modified the Makefile.tpl and regenerated Makefile.in from it. For testing I ran a config/make/make distclean loop. * Makefile.tpl (distclean): Remove extraenous semicolon. * Makefile.in: Rebuilt.	2025-10-15 11:32:21 -06:00
Tobias Burnus	b3c0e9aadb	gcn: Add missing GFX9_4_GENERIC, OpenMP context-selector update The definition for gfx942 and gfx950 missed the GFX9_4_GENERIC family flag. For OpenMP context selectors: The t-omp-device file missed the generic selectors. Additionally, there is now a note in the OpenMP documentation that there is a one-to-one match for ISA names, ignoring any compatibility. For instance, for Nvidia GPUs 'isa("sm_70")' is only true when compiling for 'sm_70', even though sm < 7.0 code also runs on sm_70 hardware. And, for AMD GPUs, gfx9-4-generic neither matches 'gfx942' (even though such generic code runs on gfx942) - nor the reverse (although all gfx9-4-generic code runs on gfx942). gcc/ChangeLog: * config/gcn/gcn-devices.def (gfx942, gfx950): Set generic name to GFX9_4_GENERIC. * config/gcn/t-omp-device: Include generic names for OpenMP's ISA trait. libgomp/ChangeLog: * libgomp.texi (OpenMP Context Selectors): Add note that there is currently an exact match between ISA and compilation, ignoring compatibilities in both ways. * testsuite/libgomp.c/declare-variant-4.h: Add missing variant functions for specific and generic AMD GPUs. * testsuite/libgomp.c/declare-variant-4-gfx10-3-generic.c: New test. * testsuite/libgomp.c/declare-variant-4-gfx11-generic.c: New test. * testsuite/libgomp.c/declare-variant-4-gfx9-4-generic.c: New test. * testsuite/libgomp.c/declare-variant-4-gfx9-generic.c: New test.	2025-10-15 19:15:15 +02:00
Andrew Pinski	0a999da1c8	debug_tree: print out clique/base for MEM_REF/TARGET_MEM_REF While debugging PR 122273, I noticed that print_node was not printing out the clique/base for MEM_REF/TARGET_MEM_REF. This made harder to understand why operand_equal_p (without looking into the code) would be rejecting two looking the same MEM_REFs. Changes since v1: * v2: Don't print out clique/base if clique is 0. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * print-tree.cc (print_node): Print out clique/base for MEM_REF and TARGET_MEM_REF. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>	2025-10-15 09:18:03 -07:00
Richard Earnshaw	99af0f9078	arm: avoid unmatched insn in movhfcc [PR118460] When compiling for m-profile with the floating-point extension we have a vsel instruction that takes a limited set of comparisons. In most cases we can use this with careful selection of the operand order, but we need to expand things in the right way. This patch is in two parts: 1) We validate that the expansion will produce correct RTL; 2) We canonicalize the comparison to increase the chances that the above check will pass. gcc: PR target/118460 * config/arm/arm.cc (arm_canonicalize_comparison): For floating- point comparisons, swap the operand order if that will be more likely to produce a comparison that can be used with VSEL. (arm_validize_comparison): Make sure that HFmode comparisons are compatible with VSEL. gcc/testsuite: PR target/118460 * gcc.target/arm/armv8_2-fp16-move-1.c: Adjust expected output. * gcc.target/arm/armv8_2-fp16-move-2.c: Likewise.	2025-10-15 16:55:36 +01:00
Andrew Pinski	94ce59ad33	dce: Remove __builtin_stack_save during dce [PR122037] __builtin_stack_save can be removed when the lhs becomes unused as it is just recording the current StackPointer into another register. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/122037 gcc/ChangeLog: * tree-ssa-dce.cc (eliminate_unnecessary_stmts): Remove __builtin_stack_save when the lhs is unused. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/vla-1.c: New test. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>	2025-10-15 08:51:01 -07:00
Chris Johns	651bf5126d	libstdc++: Enable features for RTEMS (based on GCC 15) libstdc++-v3/ChangeLog: * configure: Regenerate. * configure.ac (newlib, -rtems): Add HAVE_SYS_IOCTL_H, HAVE_SYS_STAT_H, HAVE_SYS_TYPES_H, HAVE_S_ISREG, HAVE_UNISTD_H, HAVE_UNLINKAT, _GLIBCXX_USE_CHMOD, _GLIBCXX_USE_MKDIR, _GLIBCXX_USE_CHDIR, _GLIBCXX_USE_GETCWD, _GLIBCXX_USE_UTIME, _GLIBCXX_USE_LINK, _GLIBCXX_USE_READLINK, _GLIBCXX_USE_SYMLINK, _GLIBCXX_USE_TRUNCATE and _GLIBCXX_USE_FDOPENDIR.	2025-10-15 09:55:02 -05:00
Alice Carlotti	c62f3e81a0	aarch64: Sync aarch64-sys-regs.def with Binutils This patch incorporates changes to this file in Binutils since March 2024 (excluding one patch that was already cherry-picked by Ezra in July 2025). It includes: - New system registers in the 2024 and 2025 architecture extensions. - Updated feature requirements for most system register accessors. - Removal of registers that were dropped from the architecture. - Removal of the unnecessary F_ARCHEXT flag. - Fixed encoding for pmsdsfr_el1. The updated architecture feature requirements are only relevant when the new `-menable-sysreg-checking' option is enabled. gcc/ChangeLog: * config/aarch64/aarch64-sys-regs.def: Copy from Binutils. * config/aarch64/aarch64.cc (F_ARCHEXT): Delete flag. * config/aarch64/aarch64.h (AARCH64_FL_AMU): Delete unused macro. (AARCH64_FL_SCXTNUM): Ditto. (AARCH64_FL_ID_PFR2): Ditto. (AARCH64_FL_AIE): Ditto. (AARCH64_FL_DEBUGv8p9): Ditto. (AARCH64_FL_FGT2): Ditto. (AARCH64_FL_PFAR): Ditto. (AARCH64_FL_PMUv3_ICNTR): Ditto. (AARCH64_FL_PMUv3_SS): Ditto. (AARCH64_FL_PMUv3p9): Ditto. (AARCH64_FL_S1PIE): Ditto. (AARCH64_FL_S1POE): Ditto. (AARCH64_FL_S2PIE): Ditto. (AARCH64_FL_S2POE): Ditto. (AARCH64_FL_SCTLR2): Ditto. (AARCH64_FL_SEBEP): Ditto. (AARCH64_FL_SPE_FDS): Ditto. (AARCH64_FL_TCR2): Ditto. gcc/testsuite/ChangeLog: * gcc.target/aarch64/acle/rwsr-armv8p9.c: Fix incorrect encoding.	2025-10-15 14:04:28 +01:00
Sebastian Pop	f708b83d19	tree-parloops: Enable runtime thread detection with -ftree-parallelize-loops This patch adds runtime thread count detection to auto-parallelization. -ftree-parallelize-loops option generates parallelized loops without specifying a fixed thread count, deferring this decision to program execution time where it is controlled by the OMP_NUM_THREADS environment variable. Bootstrap and regression tested on aarch64-linux. Compiled SPEC HPC pot3d https://www.spec.org/hpc2021/docs/benchmarks/628.pot3d_s.html with -ftree-parallelize-loops and tested without having OMP_NUM_THREADS set in the environment and with OMP_NUM_THREADS set to different values. gcc/ChangeLog: * doc/invoke.texi (ftree-parallelize-loops): Update. * common.opt (ftree-parallelize-loops): Add alias that maps to special value INT_MAX for runtime thread detection. * tree-parloops.cc (create_parallel_loop): Use INT_MAX for runtime detection. Call gimple_build_omp_parallel without building a OMP_CLAUSE_NUM_THREADS clause. (gen_parallel_loop): For auto-detection, use a conservative estimate of 2 threads. (parallelize_loops): Same. gcc/testsuite/ChangeLog: * gcc.dg/autopar/runtime-auto.c: New test. Signed-off-by: Sebastian Pop <spop@nvidia.com>	2025-10-15 14:57:45 +02:00
Christophe Lyon	0272058797	arm: [MVE] Fix carry-in support for vadcq / vsbcq [PR122189] The vadcq and vsbcq patterns had two problems: - the adc / sbc part of the pattern did not mention the use of vfpcc - the carry calcultation part should use a different unspec code In addtion, the get_fpscr_nzcvqc and set_fpscr_nzcvqc were over-cautious by using unspec_volatile when unspec is really what they need. Making them unspec enables to remove redundant accesses to FPSCR_nzcvqc. With unspec_volatile, we used to generate: test_2: @ args = 0, pretend = 0, frame = 8 @ frame_needed = 0, uses_anonymous_args = 0 vmov.i32 q0, #0x1 @ v4si push {lr} sub sp, sp, #12 vmrs r3, FPSCR_nzcvqc ;; [1] bic r3, r3, #536870912 vmsr FPSCR_nzcvqc, r3 vadc.i32 q3, q0, q0 vmrs r3, FPSCR_nzcvqc ;; [2] vmrs r3, FPSCR_nzcvqc orr r3, r3, #536870912 vmsr FPSCR_nzcvqc, r3 vadc.i32 q0, q0, q0 vmrs r3, FPSCR_nzcvqc ldr r0, .L8 ubfx r3, r3, #29, #1 str r3, [sp, #4] bl print_uint32x4_t add sp, sp, #12 @ sp needed pop {pc} .L9: .align 2 .L8: .word .LC1 with unspec, we generate: test_2: @ args = 0, pretend = 0, frame = 8 @ frame_needed = 0, uses_anonymous_args = 0 vmrs r3, FPSCR_nzcvqc ;; [1] bic r3, r3, #536870912 ;; [3] vmov.i32 q0, #0x1 @ v4si vmsr FPSCR_nzcvqc, r3 vadc.i32 q3, q0, q0 vmrs r3, FPSCR_nzcvqc orr r3, r3, #536870912 vmsr FPSCR_nzcvqc, r3 vadc.i32 q0, q0, q0 vmrs r3, FPSCR_nzcvqc push {lr} ubfx r3, r3, #29, #1 sub sp, sp, #12 ldr r0, .L8 str r3, [sp, #4] bl print_uint32x4_t add sp, sp, #12 @ sp needed pop {pc} .L9: .align 2 .L8: .word .LC1 That is, unspec in get_fpscr_nzcvqc enables to: - move [1] earlier - delete redundant [2] and unspec in set_fpscr_nzcvqc enables to move push {lr} and stack manipulation later. gcc/ChangeLog: PR target/122189 * config/arm/iterators.md (VxCIQ_carry, VxCIQ_M_carry, VxCQ_carry) (VxCQ_M_carry): New iterators. * config/arm/mve.md (get_fpscr_nzcvqc, set_fpscr_nzcvqc): Use unspec instead of unspec_volatile. (vadciq, vadciq_m, vadcq, vadcq_m): Use vfpcc in operation. Use a different unspec code for carry calcultation. * config/arm/unspecs.md (VADCQ_U_carry, VADCQ_M_U_carry) (VADCQ_S_carry, VADCQ_M_S_carry, VSBCIQ_U_carry ,VSBCIQ_S_carry ,VSBCIQ_M_U_carry ,VSBCIQ_M_S_carry ,VSBCQ_U_carry ,VSBCQ_S_carry ,VSBCQ_M_U_carry ,VSBCQ_M_S_carry ,VADCIQ_U_carry ,VADCIQ_M_U_carry ,VADCIQ_S_carry ,VADCIQ_M_S_carry): New unspec codes. gcc/testsuite/ChangeLog: PR target/122189 * gcc.target/arm/mve/intrinsics/vadcq-check-carry.c: New test. * gcc.target/arm/mve/intrinsics/vadcq_m_s32.c: Adjust instructions order. * gcc.target/arm/mve/intrinsics/vadcq_m_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c: Likewise.	2025-10-15 12:34:33 +00:00
Roger Sayle	da293ec6b6	PR rtl-optimization/122266: Handle TImode in reg_num_sign_bit_copies_for_combine This patch resolves PR rtl-optimization/122266 by changing the types of the last_set_sign_bit_copies and sign_bit_copies fields in combine.cc's reg_stat_type struct to be "unsigned short". This makes both types consistent, and fixes the issue that on platforms where char is by default signed, combine.cc can overflow when handling TImode values, where sign_bit_copies can be 128 bits. Conveniently, there are holes (caused by field alignment/padding) in the reg_stat_type struct that allows us to upgrade to "unsigned short" without increasing the total size of the struct. This should help reduce problems in future handling OImode or XImode values, or possible issues with 256-bit and 512-bit vector modes. Note that it's important to take care when reordering the fields of this struct, as the (partial) ordering of fields is significant: See the use of offsetof in combine.cc's init_reg_last. Before: (gdb) ptype /o reg_stat_type /* offset \| size / type = struct reg_stat_type { / 0 \| 8 / rtx_insn last_death; /* 8 \| 8 / rtx_insn last_set; /* 16 \| 8 / rtx last_set_value; / 24 \| 4 / int last_set_table_tick; / 28 \| 4 / int last_set_label; / 32 \| 8 / unsigned long last_set_nonzero_bits; / 40 \| 1 / char last_set_sign_bit_copies; / 41: 0 \| 4 / machine_mode last_set_mode : 16; / 43 \| 1 / bool last_set_invalid; / 44 \| 1 / unsigned char sign_bit_copies; / XXX 3-byte hole / / 48 \| 8 / unsigned long nonzero_bits; / 56 \| 4 / int truncation_label; / 60: 0 \| 4 / machine_mode truncated_to_mode : 16; / XXX 2-byte padding / / total size (bytes): 64 / } After: / offset \| size / type = struct reg_stat_type { / 0 \| 8 / rtx_insn last_death; /* 8 \| 8 / rtx_insn last_set; /* 16 \| 8 / rtx last_set_value; / 24 \| 4 / int last_set_table_tick; / 28 \| 4 / int last_set_label; / 32 \| 8 / unsigned long last_set_nonzero_bits; / 40 \| 2 / unsigned short last_set_sign_bit_copies; / 42: 0 \| 4 / machine_mode last_set_mode : 16; / 44 \| 1 / bool last_set_invalid; / XXX 1-byte hole / / 46 \| 2 / unsigned short sign_bit_copies; / 48 \| 8 / unsigned long nonzero_bits; / 56 \| 4 / int truncation_label; / 60: 0 \| 4 / machine_mode truncated_to_mode : 16; / XXX 2-byte padding / / total size (bytes): 64 / } 2025-10-15 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR rtl-optimization/122266 combine.cc (struct reg_stat_type): Change types of sign_bit_copies and last_set_sign_bit_copies to unsigned short, to avoid overflows on TImode (and wider) values. gcc/testsuite/ChangeLog PR rtl-optimization/122266 * gcc.target/i386/pr122266.c: New test case.	2025-10-15 11:21:18 +01:00
Jan Hubicka	a93f80feee	Cleanup max of profile_count profile_count::max is not implemented same way as other arithmetics on profile counts which generally require counts to be compatible and returns minimum of qualities of input counts. Reason is that originally it was used to compute statistics of whole callgraph profile so inliner weights can be scaled to reasonable integers interprocedurally. It also combines qulities weird way so the same counter could be used to determine what quality of profile is available. That code had roundoff error issues and was replaced by sreals. Now max is mostly used to determine cfg->max_count which is used to scale counts to reasonable integers intraprocedurally and is still being used i.e. by IRA. There are also few places where max is used for normal arithmetics when updating profile. For computing max_count we need max to still be a bit special so max (uninitialized, initialized) returns initialized rather then uninitialized. Partial profiles are later handled specially. This patch renames max to max_prefer_initialized to make it clear and updates implementation to require compatible profiles. I checked this behaviour is good for other places using it as well. I also turned function to static, since a = a->max (b) looks odd. gcc/ChangeLog: * auto-profile.cc (scale_bb_profile): Use profile_count::max_prefer_initialized. (afdo_adjust_guessed_profile): Likewise. * bb-reorder.cc (edge_order): Do not use max. * cfghooks.cc (merge_blocks): Likewise. * ipa-fnsummary.cc (param_change_prob): Likewise. * ipa-inline-transform.cc (inline_transform): Likewise. * predict.cc (update_max_bb_count): Likewise. (estimate_bb_frequencies): Likewise. (rebuild_frequencies): Likewise. * tree-ssa-loop-unswitch.cc (struct unswitch_predicate): Likewise. * profile-count.h (profile_count::max): Rename to (profile_count::max_prefer_initialized): this; update handling of qualities.	2025-10-15 09:55:17 +02:00
Haochen Jiang	24cc91f5ca	Initial Wildcat Lake Support Add Wildcat Lake support according to ISE. gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_intel_cpu): Handle Wildcat Lake. * common/config/i386/i386-common.cc (processor_name): Add Wildcat Lake. * doc/invoke.texi: Ditto.	2025-10-15 14:01:50 +08:00

1 2 3 4 5 ...

224107 Commits All Branches Search

224107 Commits

All Branches