git/gcc - gcc - Lanzhou University OSS Mirrors Git Backend

Commit Graph

Author	SHA1	Message	Date
Jakub Jelinek	254a858ae7	Update copyright years.	2026-01-02 09:56:11 +01:00
Jakub Jelinek	f4b80b0338	libgomp: Avoid -Waddress warning The function has assert (htab_find) with a comment that that is to avoid -Wunused-function warning. The problem is that it triggers a different warning, ../../../libgomp/plugin/build-target-indirect-htab.h:68:3: warning: the address of ‘htab_find’ will always evaluate as ‘true’ (or error depending on exact flags). This uses (void) htab_find instead to avoid any diagnostics. 2025-12-15 Jakub Jelinek <jakub@redhat.com> * plugin/build-target-indirect-htab.h (create_target_indirect_map): Use (void) htab_find instead of assert (htab_find) to silence -Werror=unused-function because the latter triggers -Werror=address.	2025-12-15 19:08:06 +01:00
Andrew Stubbs	1d1d12da6d	amdgcn, libgomp: improve generic device errors Switching to use "generic" ISA variants has changed the error modes a bit. This patch changes the runtime so that it doesn't say to use the device-specific -march option when the real problem is not the ISA (it'll be a mismatched xnack setting, probably). Additionally, the testsuite effective target check needs to see if the xnack mode is accepted by the runtime, as well as the compiler. libgomp/ChangeLog: * plugin/plugin-gcn.c (generic_isa_code): New function. (isa_matches_agent): Use generic ISA details to help select an error message on ISA mismatch. * testsuite/lib/libgomp.exp (check_effective_target_offload_target_amdgcn_with_xnack): Use a runtime check.	2025-12-04 15:35:05 +00:00
Andrew Stubbs	723b18ce3d	libgomp, amdgcn: Implement Managed Memory This patch implements "managed" memory for AMD GCN GPUs in OpenMP. It builds on the support added to the NVPTX libgomp for CUDA Managed Memory, a week or two ago. These features were first posted here a few years ago, as part of a larger Unified Shared Memory patch series, and then in a slightly changed version just over a year ago. Hopefully this time the controversial bits have been removed. Since we do not use HIP we cannot use hipMallocManaged, so this patch attempts to replicate the same effect by setting the appropriate attributes. This works on more devices than support proper USM, but still I cannot be sure that the settings are correct for every device out there (I have tested on gfx900, gfx906, gfx908, gfx90a, and gfx1100). The HSA header file update uses the most recent files relicensed for us by AMD, at the time of the first patch posting. Those files have certainly moved on in the upstream sources, but I did not ask to get those relicensed. include/ChangeLog: * hsa.h: Import newer version. * hsa_ext_amd.h: Likewise. * hsa_ext_image.h: Likewise. libgomp/ChangeLog: * Makefile.in: Regenerate. * libgomp-plugin.h (gomp_simple_alloc_init_context): New prototype. (gomp_simple_alloc_register_memory): New prototype. (gomp_simple_alloc): New prototype. (gomp_simple_free): New prototype. (gomp_simple_realloc): New prototype. * libgomp.h (gomp_simple_alloc_init_context): Move to libgomp-plugin.h. (gomp_simple_alloc_register_memory): Likewise. (gomp_simple_alloc): Likewise. (gomp_simple_free): Likewise. (gomp_simple_realloc): Likewise. * libgomp.texi: Update AMD managed memory description. * plugin/Makefrag.am (libgomp_plugin_gcn_la_SOURCES): Add simple-allocator.c and plugin/mutex.c. * plugin/plugin-gcn.c: Include sys/mman.h and unistd.h. (struct hsa_runtime_fn_info): Add hsa_amd_svm_attributes_set_fn. (dump_hsa_system_info): Add HSA_AMD_SYSTEM_INFO_SVM_SUPPORTED and HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT to the GCN_DEBUG output. (init_hsa_runtime_functions): Add hsa_amd_svm_attributes_set. (isa_matches_agent): Add a new error message for the case where the ISA doesn't match but the name does. (managed_ctx): New variable. (managed_heap_create): New function. (GOMP_OFFLOAD_managed_alloc): Likewise. (GOMP_OFFLOAD_managed_free): Likewise. * simple-allocator.c (gomp_fatal): New macro. * testsuite/lib/libgomp.exp (check_effective_target_omp_managedmem): Add amdgcn support checker. (check_effective_target_offload_target_amdgcn_with_xnack): New. * testsuite/libgomp.c-c++-common/requires-4.c: Ignore xnack warning. * testsuite/libgomp.c-c++-common/requires-4a.c: Ignore xnack warning. * testsuite/libgomp.c-c++-common/requires-5.c: Ignore xnack warning. * testsuite/libgomp.c++/alloc-managed-1.C: Add -mxnack=on, if needed. * testsuite/libgomp.c/alloc-managed-1.c: Likewise. * testsuite/libgomp.c/alloc-managed-2.c: Likewise. * testsuite/libgomp.c/alloc-managed-3.c: Likewise. * testsuite/libgomp.c/alloc-managed-4.c: Likewise. * testsuite/libgomp.fortran/alloc-managed-1.f90: Likewise. * plugin/mutex.c: New file.	2025-12-01 12:03:35 +00:00
Andrew Stubbs	62174ec27b	openmp, nvptx: ompx_gnu_managed_mem_alloc This adds support for using Cuda Managed Memory with omp_alloc. AMD support will be added in a future patch. There is one new predefined allocator, "ompx_gnu_managed_mem_alloc", plus a corresponding memory space, which can be used to allocate memory in the "managed" space. The nvptx plugin is modified to make the necessary Cuda calls, via two new (optional) plugin interfaces. gcc/fortran/ChangeLog: * openmp.cc (is_predefined_allocator): Use GOMP_OMP_PREDEF_ALLOC_MAX and GOMP_OMPX_PREDEF_ALLOC_MIN/MAX instead of hardcoded values in the comment. include/ChangeLog: * cuda/cuda.h (cuMemAllocManaged): Add declaration and related CU_MEM_ATTACH_GLOBAL flag. * gomp-constants.h (GOMP_OMPX_PREDEF_ALLOC_MAX): Update to 201. (GOMP_OMP_PREDEF_MEMSPACE_MAX): New constant. (GOMP_OMPX_PREDEF_MEMSPACE_MIN): New constant. (GOMP_OMPX_PREDEF_MEMSPACE_MAX): New constant. libgomp/ChangeLog: * allocator.c (ompx_gnu_max_predefined_alloc): Update to ompx_gnu_managed_mem_alloc. (_Static_assert): Fix assertion messages for allocators and add new assertions for memspace constants. (omp_max_predefined_mem_space): New define. (ompx_gnu_min_predefined_mem_space): New define. (ompx_gnu_max_predefined_mem_space): New define. (MEMSPACE_ALLOC): Add check for non-standard memspaces. (MEMSPACE_CALLOC): Likewise. (MEMSPACE_REALLOC): Likewise. (MEMSPACE_VALIDATE): Likewise. (predefined_ompx_gnu_alloc_mapping): Add ompx_gnu_managed_mem_space. (omp_init_allocator): Add ompx_gnu_managed_mem_space validation. * config/gcn/allocator.c (gcn_memspace_alloc): Add check for non-standard memspaces. (gcn_memspace_calloc): Likewise. (gcn_memspace_realloc): Likewise. (gcn_memspace_validate): Update to validate standard vs non-standard memspaces. * config/linux/allocator.c (linux_memspace_alloc): Add managed memory space handling. (linux_memspace_calloc): Likewise. (linux_memspace_free): Likewise. (linux_memspace_realloc): Likewise (returns NULL for fallback). * config/nvptx/allocator.c (nvptx_memspace_alloc): Add check for non-standard memspaces. (nvptx_memspace_calloc): Likewise. (nvptx_memspace_realloc): Likewise. (nvptx_memspace_validate): Update to validate standard vs non-standard memspaces. * env.c (parse_allocator): Add ompx_gnu_managed_mem_alloc, ompx_gnu_managed_mem_space, and some static asserts so I don't forget them again. * libgomp-plugin.h (GOMP_OFFLOAD_managed_alloc): New declaration. (GOMP_OFFLOAD_managed_free): New declaration. * libgomp.h (gomp_managed_alloc): New declaration. (gomp_managed_free): New declaration. (struct gomp_device_descr): Add managed_alloc_func and managed_free_func fields. * libgomp.texi: Document ompx_gnu_managed_mem_alloc and ompx_gnu_managed_mem_space, add C++ template documentation, and describe NVPTX and AMD support. * omp.h.in: Add ompx_gnu_managed_mem_space and ompx_gnu_managed_mem_alloc enumerators, and gnu_managed_mem C++ allocator template. * omp_lib.f90.in: Add Fortran bindings for new allocator and memory space. * omp_lib.h.in: Likewise. * plugin/cuda-lib.def: Add cuMemAllocManaged. * plugin/plugin-nvptx.c (nvptx_alloc): Add managed parameter to support cuMemAllocManaged. (GOMP_OFFLOAD_alloc): Move contents to ... (cleanup_and_alloc): ... this new function, and add managed support. (GOMP_OFFLOAD_managed_alloc): New function. (GOMP_OFFLOAD_managed_free): New function. * target.c (gomp_managed_alloc): New function. (gomp_managed_free): New function. (gomp_load_plugin_for_device): Load optional managed_alloc and managed_free plugin APIs. * testsuite/lib/libgomp.exp: Add check_effective_target_omp_managedmem. * testsuite/libgomp.c++/alloc-managed-1.C: New test. * testsuite/libgomp.c/alloc-managed-1.c: New test. * testsuite/libgomp.c/alloc-managed-2.c: New test. * testsuite/libgomp.c/alloc-managed-3.c: New test. * testsuite/libgomp.c/alloc-managed-4.c: New test. * testsuite/libgomp.fortran/alloc-managed-1.f90: New test. Co-authored-by: Kwok Cheung Yeung <kcyeung@baylibre.com> Co-authored-by: Thomas Schwinge <tschwinge@baylibre.com>	2025-11-13 14:16:09 +00:00
Andrew Stubbs	3b8d9d579c	libgomp, nvptx: Cuda pinned memory Use Cuda to pin memory, instead of Linux mlock, when available. There are two advantages: firstly, this gives a significant speed boost for NVPTX offloading, and secondly, it side-steps the usual OS ulimit/rlimit setting. The design adds a device independent plugin API for allocating pinned memory, and then implements it for NVPTX. At present, the other supported devices do not have equivalent capabilities (or requirements). libgomp/ChangeLog: * config/linux/allocator.c: Include assert.h. (using_device_for_page_locked): New variable. (linux_memspace_alloc): Add init0 parameter. Support device pinning. (linux_memspace_calloc): Set init0 to true. (linux_memspace_free): Support device pinning. (linux_memspace_realloc): Support device pinning. (MEMSPACE_ALLOC): Set init0 to false. * libgomp-plugin.h (GOMP_OFFLOAD_page_locked_host_alloc): New prototype. (GOMP_OFFLOAD_page_locked_host_free): Likewise. * libgomp.h (gomp_page_locked_host_alloc): Likewise. (gomp_page_locked_host_free): Likewise. (struct gomp_device_descr): Add page_locked_host_alloc_func and page_locked_host_free_func. * libgomp.texi: Adjust the docs for the pinned trait. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_page_locked_host_alloc): New function. (GOMP_OFFLOAD_page_locked_host_free): Likewise. * target.c (device_for_page_locked): New variable. (get_device_for_page_locked): New function. (gomp_page_locked_host_alloc): Likewise. (gomp_page_locked_host_free): Likewise. (gomp_load_plugin_for_device): Add page_locked_host_alloc and page_locked_host_free. * testsuite/libgomp.c/alloc-pinned-1.c: Change expectations for NVPTX devices. * testsuite/libgomp.c/alloc-pinned-2.c: Likewise. * testsuite/libgomp.c/alloc-pinned-3.c: Likewise. * testsuite/libgomp.c/alloc-pinned-4.c: Likewise. * testsuite/libgomp.c/alloc-pinned-5.c: Likewise. * testsuite/libgomp.c/alloc-pinned-6.c: Likewise. Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>	2025-10-23 11:08:06 +00:00
Tobias Burnus	d2ad7e9083	libgomp: Add is_integrated_apu function to plugin/plugin-{gcn,nvptx}.c The added function is currently '#if 0' but is planned to be used to enable self mapping automatically. Prerequisite for auto self maps is still mapping 'declare target' variables (if any, in libgomp) or converting all 'declare target' variables to 'declare target link' in the compiler (as required for 'omp requires self_maps'). include/ChangeLog: * hsa_ext_amd.h (enum hsa_amd_agent_info_s): Add HSA_AMD_AGENT_INFO_MEMORY_PROPERTIES. (enum): Add HSA_AMD_MEMORY_PROPERTY_AGENT_IS_APU. libgomp/ChangeLog: * plugin/plugin-gcn.c (is_integrated_apu): New; currently '#if 0'. * plugin/plugin-nvptx.c (is_integrated_apu): Likewise.	2025-10-10 09:48:37 +02:00
Tobias Burnus	da5803c794	libgomp: Init hash table for 'indirect'-clause of 'declare target' on the host [PR114445, PR119857] Especially with unified-shared memory and especially with C++'s virtual functions, it is not uncommon to have on the device a function pointer that points to the host function - but has an associated device. If the pointed-to function is (explicitly or implicitly) 'declare target' with the 'indirect' clause, it is added to the lookup table. Before this commit, the conversion of the lookup table into a lookup hash table happened every time a device kernel was launched on the first team - albeit if already converted, the function immediately returned. Ignoring the overhead, there was also a race: If multiple teams were launched, it could happen that another team of the same target region already tried to use the lookup table which it was still being created. Likewise when lauching a kernel with 'nowait' and directly afterward another kernel, there could be a race of creating the table. With this commit, the creating of the kernel has been moved to the host-plugin's GOMP_OFFLOAD_load_image. The previous code stored a pointer to the host/device pointer array, which makes it hard when creating the hash table on the host (data is needed for finding the slot) - but accessing it on the device (where the lookup has to work as well). As the hash-table implementation (only) supports integral value as payload (0 and 1 having special meaning), the solution was to move to an uint128_t variable to store both the host and device address. As the host-side library is typically dynamically linked and the device-side one statically, there is the problem of backward compatibility. The current implementation permits both older binaries and newer libgomp and newer binaries with older libgomp. I could imagine us breaking the latter eventually, but for now there is up and downward compatibility. (Obviously, the race is only fixed if new + new is combined.) Code wise, on the device exist GOMP_INDIRECT_ADDR_MAP which was updated to point to the host/device-address array. Now additionally GOMP_INDIRECT_ADDR_HMAP exists, which contains the hash-table map. If the latter exists, libgomp only updates it and the former remains a NULL pointer; it is also untouched if there are no indirect functions. Being NULL therefore avoids the call to the device-side build_indirect_map. The code also currently supports to have no hash and a linear walk. I think that remained from testing; due to the backward-compat feature, it can actually be turned of on either side. libgomp/ChangeLog: PR libgomp/119857 PR libgomp/114445 * config/accel/target-indirect.c: Change to use uint128_t instead of a struct as data structure and add GOMP_INDIRECT_ADDR_HMAP as host-accessible variable. (struct indirect_map_t): Remove. (USE_HASHTAB_LOOKUP, INDIRECT_DEV_ADDR, INDIRECT_HOST_ADDR, SET_INDIRECT_HOST_ADDR, SET_INDIRECT_ADDRS): Define. (htab_free): Use __builtin_unreachable. (htab_hash, htab_eq, GOMP_target_map_indirect_ptr, build_indirect_map): Update for new representation and new pointer-to-hash variable. * config/gcn/team.c (gomp_gcn_enter_kernel): Only call build_indirect_map when GOMP_INDIRECT_ADDR_MAP. * config/nvptx/team.c (gomp_nvptx_main): Likewise. * libgomp-plugin.h (GOMP_INDIRECT_ADDR_HMAP): Define. * plugin/plugin-gcn.c: Conditionally include build-target-indirect-htab.h. (USE_HASHTAB_LOOKUP_FOR_INDIRECT): Define. (create_target_indirect_map): New prototype. (GOMP_OFFLOAD_load_image): Update to create the device's indirect-function hash table on the host. * plugin/plugin-nvptx.c: Conditionally include build-target-indirect-htab.h. (USE_HASHTAB_LOOKUP_FOR_INDIRECT): Define. (create_target_indirect_map): New prototype. (GOMP_OFFLOAD_load_image): Update to create the device's indirect-function hash table on the host. * plugin/build-target-indirect-htab.h: New file.	2025-09-17 08:47:36 +02:00
Tobias Burnus	4e47e2f833	libgomp: Add OpenMP's omp_target_memset/omp_target_memset_async PR libgomp/120444 include/ChangeLog: * cuda/cuda.h (cuMemsetD8, cuMemsetD8Async): Declare. libgomp/ChangeLog: * libgomp-plugin.h (GOMP_OFFLOAD_memset): Declare. * libgomp.h (struct gomp_device_descr): Add memset_func. * libgomp.map (GOMP_6.0.1): Add omp_target_memset{,_async}. * libgomp.texi (Device Memory Routines): Document them. * omp.h.in (omp_target_memset, omp_target_memset_async): Declare. * omp_lib.f90.in (omp_target_memset, omp_target_memset_async): Add interfaces. * omp_lib.h.in (omp_target_memset, omp_target_memset_async): Likewise. * plugin/cuda-lib.def: Add cuMemsetD8. * plugin/plugin-gcn.c (struct hsa_runtime_fn_info): Add hsa_amd_memory_fill_fn. (init_hsa_runtime_functions): DLSYM_OPT_FN load it. (GOMP_OFFLOAD_memset): New. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memset): New. * target.c (omp_target_memset_int, omp_target_memset, omp_target_memset_async_helper, omp_target_memset_async): New. (gomp_load_plugin_for_device): Add DLSYM (memset). * testsuite/libgomp.c-c++-common/omp_target_memset.c: New test. * testsuite/libgomp.c-c++-common/omp_target_memset-2.c: New test. * testsuite/libgomp.c-c++-common/omp_target_memset-3.c: New test. * testsuite/libgomp.fortran/omp_target_memset.f90: New test. * testsuite/libgomp.fortran/omp_target_memset-2.f90: New test.	2025-06-02 17:43:57 +02:00
Tobias Burnus	f4aa6b5a8d	libgomp: Add OpenACC's acc_memcpy_device{,_async} routines [PR93226] libgomp/ChangeLog: PR libgomp/93226 * libgomp-plugin.h (GOMP_OFFLOAD_openacc_async_dev2dev): New prototype. * libgomp.h (struct acc_dispatch_t): Add dev2dev_func. (gomp_copy_dev2dev): New prototype. * libgomp.map (OACC_2.6.1): New; add acc_memcpy_device{,_async}. * libgomp.texi (acc_memcpy_device): New. * oacc-mem.c (memcpy_tofrom_device): Change to take from/to device boolean; use memcpy not memmove; add early return if size == 0 or same device + same ptr. (acc_memcpy_to_device, acc_memcpy_to_device_async, acc_memcpy_from_device, acc_memcpy_from_device_async): Update. (acc_memcpy_device, acc_memcpy_device_async): New. * openacc.f90 (acc_memcpy_device, acc_memcpy_device_async): Add interface. * openacc_lib.h (acc_memcpy_device, acc_memcpy_device_async): Likewise. * openacc.h (acc_memcpy_device, acc_memcpy_device_async): Add prototype. * plugin/plugin-gcn.c (GOMP_OFFLOAD_openacc_async_host2dev): Update comment. (GOMP_OFFLOAD_openacc_async_dev2host): Update call. (GOMP_OFFLOAD_openacc_async_dev2dev): New. * plugin/plugin-nvptx.c (cuda_memcpy_dev_sanity_check): New. (GOMP_OFFLOAD_dev2dev): Call it. (GOMP_OFFLOAD_openacc_async_dev2dev): New. * target.c (gomp_copy_dev2dev): New. (gomp_load_plugin_for_device): Load dev2dev and async_dev2dev. * testsuite/libgomp.oacc-c-c++-common/acc_memcpy_device-1.c: New test. * testsuite/libgomp.oacc-fortran/acc_memcpy_device-1.f90: New test.	2025-05-29 22:47:06 +02:00
Tobias Burnus	1c5a375c21	libgomp/plugin/plugin-nvptx.c: Fix device used for stream creation libgomp/ChangeLog: * plugin/plugin-nvptx.c (GOMP_OFFLOAD_interop): Set context for stream creation to use the specified device.	2025-03-24 16:08:20 +01:00
Tobias Burnus	41b9c3b848	libgomp/plugin: Add initial interop support to nvptx + gcn The interop directive operates on an opaque object that represents a foreign runtime. This commit adds support for this to the two offloading plugins. For nvptx, it supports cuda, cuda_driver and hip; the latter is AMD's version of CUDA which for Nvidia devices boils down to normal CUDA. Thus, at the end for this limited use, cuda/cuda_driver/hip are all the same - and for plugin-nvptx.c, the they differ only in terms of what gets fr_id, fr_name and get_interop_type_desc return. For gcn, it supports hip and hsa. Regarding get-mapped-ptr-1.c: That's actually a fix for the GOMP_interop commit r15-8654-g99e2906ae255fc that added GOMP_DEVICE_DEFAULT_OMP_61 alias omp_default_device, which is a conforming device number. But that test used -5 as check for a non-conforming device number. libgomp/ChangeLog: * plugin/plugin-gcn.c (_LIBGOMP_PLUGIN_INCLUDE): Define. (struct hsa_runtime_fn_info): Add two queue functions. (hipError_t, hipCtx_t, hipStream_s, hipStream_t): New types. (struct hip_runtime_fn_info): New. (hip_runtime_lib, hip_fns): New global vars. (init_environment_variables): Handle hip_runtime_lib. (init_hsa_runtime_functions): Load the two queue functions. (init_hip_runtime_functions, GOMP_OFFLOAD_interop, GOMP_OFFLOAD_get_interop_int, GOMP_OFFLOAD_get_interop_ptr, GOMP_OFFLOAD_get_interop_str, GOMP_OFFLOAD_get_interop_type_desc): New. * plugin/plugin-nvptx.c (_LIBGOMP_PLUGIN_INCLUDE): Define. (GOMP_OFFLOAD_interop, GOMP_OFFLOAD_get_interop_int, GOMP_OFFLOAD_get_interop_ptr, GOMP_OFFLOAD_get_interop_str, GOMP_OFFLOAD_get_interop_type_desc): New. * testsuite/libgomp.c/interop-fr-1.c: New test. * testsuite/libgomp.c-c++-common/get-mapped-ptr-1.c: Use -6 not -5 as non-conforming device number.	2025-03-21 21:39:42 +01:00
Tobias Burnus	8561e4e290	[GCN] Handle generic ISA names in libgomp's plugin-gcn.c libgomp/ChangeLog: * plugin/plugin-gcn.c (ELFABIVERSION_AMDGPU_HSA_V6, EF_AMDGPU_GENERIC_VERSION_V, EF_AMDGPU_GENERIC_VERSION_OFFSET, GET_GENERIC_VERSION): New #define. (elf_gcn_isa_is_generic): New. (isa_matches_agent): Accept all generic code objects on the first go; extend the diagnostic and handle runtime-failed case. (create_and_finalize_hsa_program): Call it also after loading the code failed, pass the status.	2025-02-07 13:20:25 +01:00
Jakub Jelinek	6441eb6dc0	Update copyright years.	2025-01-02 11:59:57 +01:00
Tobias Burnus	7a12dc695b	plugin/plugin-gcn.c: Fix error handling of GOMP_OFFLOAD_openacc_async_construct Follow up to r15-5392-g884637b6362391. As the name implies, GOMP_OFFLOAD_openacc_async_construct is also externally called. Hence, partially revert previous commit to permit unlocking handling in oacc-async.c's lookup_goacc_asyncqueue by not failing fatally. Hence, also the other (indirect) callers had to be updated: GOMP_OFFLOAD_dev2dev fails now with 'false' and GOMP_OFFLOAD_async_run fatally. libgomp/ChangeLog: * plugin/plugin-gcn.c (GOMP_OFFLOAD_dev2dev, GOMP_OFFLOAD_async_run): Handle omp_async_queue == NULL after call to maybe_init_omp_async. (GOMP_OFFLOAD_openacc_async_construct): Use error not fatal error, partially reverting r15-5392.	2024-12-10 16:16:04 +01:00
Tobias Burnus	884637b636	libgomp/plugin/plugin-gcn.c: async-queue init - fix function-return type and fail fatally libgomp/ChangeLog: * plugin/plugin-gcn.c (GOMP_OFFLOAD_openacc_async_construct): In case of an error, call GOMP_PLUGIN_fatal not ..._error; use NULL not false in return.	2024-11-18 14:58:21 +01:00
Tobias Burnus	e7e3d1838f	libgomp/plugin/plugin-nvptx.c: Change false to NULL to fix C23 wrong-return-type error [PR117626] libgomp/ChangeLog: PR libgomp/117626 * plugin/plugin-nvptx.c (nvptx_open_device): Use 'CUDA_CALL_ERET' with 'NULL' as error return instead of 'CUDA_CALL' that returns false.	2024-11-18 11:06:58 +01:00
Tobias Burnus	8473010807	libgomp/plugin/plugin-gcn.c: Show device number in ISA error message libgomp/ChangeLog: * plugin/plugin-gcn.c (isa_matches_agent): Mention the device number and ROCR_VISIBLE_DEVICES when reporting an ISA mismatch error.	2024-11-11 12:17:42 +01:00
Andrew Stubbs	a6b26e5ea0	amdgcn: Refactor device settings into a def file Almost all device-specific settings are now centralised into gcn-devices.def for the compiler, mkoffload, and libgomp. No longer will we have to touch 10 files in multiple places just to add another device without any exotic features. (New ISAs and devices with incompatible metadata will continue to need a bit more.) In order to remove the device-specific conditionals in the code a new value HSACO_ATTR_UNSUPPORTED has been added, indicating that the assembler will reject any setting of that option. This incorporates some of Tobias's patch from March 2024. Co-Authored-By: Tobias Burnus <tburnus@baylibre.com> gcc/ChangeLog: * config.gcc (amdgcn): Add gcn-device-macros.h to tm_file. Add gcn-tables.opt to extra_options. * config/gcn/gcn-hsa.h (NO_XNACK): Delete. (NO_SRAM_ECC): Delete. (SRAMOPT): Move definition to generated file gcn-device-macros.h. (XNACKOPT): Likewise. (ASM_SPEC): Redefine using generated values from gcn-device-macros.h. * config/gcn/gcn-opts.h (enum processor_type): Generate from gcn-devices.def. (TARGET_VEGA10): Delete. (TARGET_VEGA20): Delete. (TARGET_GFX908): Delete. (TARGET_GFX90a): Delete. (TARGET_GFX90c): Delete. (TARGET_GFX1030): Delete. (TARGET_GFX1036): Delete. (TARGET_GFX1100): Delete. (TARGET_GFX1103): Delete. (TARGET_XNACK): Redefine to allow for HSACO_ATTR_UNSUPPORTED. (enum hsaco_attr_type): Add HSACO_ATTR_UNSUPPORTED. (TARGET_TGSPLIT): New define. * config/gcn/gcn.cc (gcn_devices): New constant table. (gcn_option_override): Rework to use gcn_devices table. (gcn_omp_device_kind_arch_isa): Likewise. (output_file_start): Likewise. (gcn_hsa_declare_function_name): Rework using TARGET_* macros. * config/gcn/gcn.h (gcn_devices): Declare struct and table. (TARGET_CPU_CPP_BUILTINS): Rework using gcn_devices. * config/gcn/gcn.opt: Move enum data to generated file gcn-tables.opt. Use new names for the default values. * config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX900): Delete. (EF_AMDGPU_MACH_AMDGCN_GFX906): Delete. (EF_AMDGPU_MACH_AMDGCN_GFX908): Delete. (EF_AMDGPU_MACH_AMDGCN_GFX90a): Delete. (EF_AMDGPU_MACH_AMDGCN_GFX90c): Delete. (EF_AMDGPU_MACH_AMDGCN_GFX1030): Delete. (EF_AMDGPU_MACH_AMDGCN_GFX1036): Delete. (EF_AMDGPU_MACH_AMDGCN_GFX1100): Delete. (EF_AMDGPU_MACH_AMDGCN_GFX1103): Delete. (enum elf_arch_code): Define using gcn-devices.def. (get_arch): Rework using gcn-devices.def. (main): Rework using gcn-devices.def * config/gcn/t-gcn-hsa (gcn-tables.opt): Generate file. (gcn-device-macros.h): Generate file. * config/gcn/t-omp-device: Generate isa list from gcn-devices.def. * config/gcn/gcn-devices.def: New file. * config/gcn/gcn-tables.opt: New file. * config/gcn/gcn-tables.opt.urls: New file. * config/gcn/gen-gcn-device-macros.awk: New file. * config/gcn/gen-opt-tables.awk: New file. libgomp/ChangeLog: * plugin/plugin-gcn.c (EF_AMDGPU_MACH): Generate from gcn-devices.def. (gcn_gfx803_s): Delete. (gcn_gfx900_s): Delete. (gcn_gfx906_s): Delete. (gcn_gfx908_s): Delete. (gcn_gfx90a_s): Delete. (gcn_gfx90c_s): Delete. (gcn_gfx1030_s): Delete. (gcn_gfx1036_s): Delete. (gcn_gfx1100_s): Delete. (gcn_gfx1103_s): Delete. (gcn_isa_name_len): Delete. (isa_hsa_name): Rename ... (isa_name): ... to this, and rework using gcn-devices.def. (isa_gcc_name): Delete. (isa_code): Rework using gcn-devices.def. (max_isa_vgprs): Rework using gcn-devices.def. (isa_matches_agent): Update isa_name usage. (GOMP_OFFLOAD_init_device): Improve diagnostic using the name.	2024-10-22 11:07:05 +00:00
Tobias Burnus	b752eed3e3	OpenMP: Add support for 'self_maps' to the 'require' directive 'self_maps' implies 'unified_shared_memory', except that the latter also permits that explicit maps copy data to device memory while self_maps does not. In GCC, currently, both are handled identical. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_requires): Handle self_maps clause. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_requires): Handle self_maps clause. gcc/fortran/ChangeLog: * gfortran.h (enum gfc_omp_requires_kind): Add OMP_REQ_SELF_MAPS. (gfc_namespace): Enlarge omp_requires bitfield. * module.cc (enum ab_attribute, attr_bits): Add AB_OMP_REQ_SELF_MAPS. (mio_symbol_attribute): Handle it. * openmp.cc (gfc_check_omp_requires, gfc_match_omp_requires): Handle self_maps clause. * parse.cc (gfc_parse_file): Handle self_maps clause. gcc/ChangeLog: * lto-cgraph.cc (output_offload_tables, omp_requires_to_name): Handle self_maps clause. * omp-general.cc (struct omp_ts_info, omp_context_selector_matches): Likewise for the associated trait. * omp-general.h (enum omp_requires): Add OMP_REQUIRES_SELF_MAPS. * omp-selectors.h (enum omp_ts_code): Add OMP_TRAIT_IMPLEMENTATION_SELF_MAPS. include/ChangeLog: * gomp-constants.h (GOMP_REQUIRES_SELF_MAPS): #define. libgomp/ChangeLog: * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Accept self_maps clause. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Likewise. * libgomp.texi (TR13 Impl. Status): Set to 'Y'. * target.c (gomp_requires_to_name, GOMP_offload_register_ver, gomp_target_init): Handle self_maps clause. * testsuite/libgomp.fortran/self_maps.f90: New test. gcc/testsuite/ChangeLog: * c-c++-common/gomp/declare-variant-1.c: Add self_maps test. * c-c++-common/gomp/requires-4.c: Likewise. * gfortran.dg/gomp/declare-variant-3.f90: Likewise. * c-c++-common/gomp/requires-2.c: Update dg-error msg. * gfortran.dg/gomp/requires-2.f90: Likewise. * gfortran.dg/gomp/requires-self-maps-aux.f90: New. * gfortran.dg/gomp/requires-self-maps.f90: New.	2024-09-24 10:53:59 +02:00
Tobias Burnus	cdb9aa0f62	OpenMP: Fix omp_get_device_from_uid, minor cleanup In Fortran, omp_get_device_from_uid can also accept substrings, which are then not NUL terminated. Fixed by introducing a fortran.c wrapper function. Additionally, in case of a fail the plugin functions now return NULL instead of failing fatally such that a fall-back UID is generated. gcc/ChangeLog: * omp-general.cc (omp_runtime_api_procname): Strip "omp_" from string; move get_device_from_uid as now a '_' suffix exists. libgomp/ChangeLog: * fortran.c (omp_get_device_from_uid_): New function. * libgomp.map (GOMP_6.0): Add it. * oacc-host.c (host_dispatch): Init '.uid' and '.get_uid_func'. * omp_lib.f90.in: Make it used by removing bind(C). * omp_lib.h.in: Likewise. * target.c (omp_get_device_from_uid): Ensure the device is initialized. * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_uid): Add function comment; return NULL in case of an error. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_uid): Likewise. * testsuite/libgomp.fortran/device_uid.f90: Update to test substrings.	2024-09-23 15:58:39 +02:00
Tobias Burnus	bf4a5efa80	OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines Those TR13/OpenMP 6.0 routines permit a reproducible offloading to a specific device by mapping an OpenMP device number to a unique ID (UID). The GPU device UIDs should be universally unique, the one for the host is not. gcc/ChangeLog: * omp-general.cc (omp_runtime_api_procname): Add get_device_from_uid and omp_get_uid_from_device routines. include/ChangeLog: * cuda/cuda.h (cuDeviceGetUuid): Declare. (cuDeviceGetUuid_v2): Add prototype. libgomp/ChangeLog: * config/gcn/target.c (omp_get_uid_from_device, omp_get_device_from_uid): Add stub implementation. * config/nvptx/target.c (omp_get_uid_from_device, omp_get_device_from_uid): Likewise. * fortran.c (omp_get_uid_from_device_, omp_get_uid_from_device_8_): New functions. * libgomp-plugin.h (GOMP_OFFLOAD_get_uid): Add prototype. * libgomp.h (struct gomp_device_descr): Add 'uid' and 'get_uid_func'. * libgomp.map (GOMP_6.0): New, includind the new UID routines. * libgomp.texi (OpenMP Technical Report 13): Mark UID routines as 'Y'. (Device Information Routines): Document new UID routines. (Offload-Target Specifics): Document UID format. * omp.h.in (omp_get_device_from_uid, omp_get_uid_from_device): New prototype. * omp_lib.f90.in (omp_get_device_from_uid, omp_get_uid_from_device): New interface. * omp_lib.h.in: Likewise. * plugin/cuda-lib.def: Add cuDeviceGetUuid and cuDeviceGetUuid_v2 via CUDA_ONE_CALL_MAYBE_NULL. * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_uid): New. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_uid): New. * target.c (str_omp_initial_device): New static var. (STR_OMP_DEV_PREFIX): Define. (gomp_get_uid_for_device, omp_get_uid_from_device, omp_get_device_from_uid): New. (gomp_load_plugin_for_device): DLSYM_OPT the function 'get_uid'. (gomp_target_init): Set the device's 'uid' field to NULL. * testsuite/libgomp.c/device_uid.c: New test. * testsuite/libgomp.fortran/device_uid.f90: New test.	2024-09-20 09:25:33 +02:00
Thomas Schwinge	0d25989d60	nvptx offloading: 'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE' environment variable [PR97384, PR105274] ... as a means to manually set the "native" GPU thread stack size. PR libgomp/97384 PR libgomp/105274 libgomp/ * plugin/cuda-lib.def (cuCtxSetLimit): Add. * plugin/plugin-nvptx.c (nvptx_open_device): Handle 'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE' environment variable.	2024-06-06 13:41:47 +02:00
Thomas Schwinge	5bbe5350a0	nvptx offloading: Global constructor, destructor support, via nvptx-tools 'ld' This extends commit `d9c90c82d9` "nvptx target: Global constructor, destructor support, via nvptx-tools 'ld'" for offloading. libgcc/ * config/nvptx/gbl-ctors.c ["mgomp"] (__do_global_ctors__entry__mgomp) (__do_global_dtors__entry__mgomp): New. [!"mgomp"] (__do_global_ctors__entry, __do_global_dtors__entry): New. libgomp/ * plugin/plugin-nvptx.c (nvptx_do_global_cdtors): New. (nvptx_close_device, GOMP_OFFLOAD_load_image) (GOMP_OFFLOAD_unload_image): Call it.	2024-06-06 13:41:47 +02:00
Tobias Burnus	18f477980c	libgomp: Enable USM for AMD APUs and MI200 devices If HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT is true, all GPUs on the system support unified shared memory. That's the case for APUs and MI200 devices when XNACK is enabled. XNACK can be enabled by setting HSA_XNACK=1 as env var for supported devices; otherwise, if disable, USM code will use host fallback. gcc/ChangeLog: * config/gcn/gcn-hsa.h (gcn_local_sym_hash): Fix typo. include/ChangeLog: * hsa.h (HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT): Add enum value. libgomp/ChangeLog: * libgomp.texi (gcn): Update USM handling * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Handle USM if HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT is true.	2024-05-29 15:29:06 +02:00
Tobias Burnus	4ccb3366ad	libgomp: Enable USM for some nvptx devices A few high-end nvptx devices support the attribute CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS; for those, unified shared memory is supported in hardware. This patch enables support for those - if all installed nvptx devices have this feature (as the capabilities are per device type). This exposes a bug in gomp_copy_back_icvs as it did before use omp_get_mapped_ptr to find mapped variables, but that returns the unchanged pointer in cased of shared memory. But in this case, we have a few actually mapped pointers - like the ICV variables. Additionally, there was a mismatch with regards to '-1' for the device number as gomp_copy_back_icvs and omp_get_mapped_ptr count differently. Hence, do the lookup manually. include/ChangeLog: * cuda/cuda.h (CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS): Add. libgomp/ChangeLog: * libgomp.texi (nvptx): Update USM description. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Claim support when requesting USM and all devices support CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS. * target.c (gomp_copy_back_icvs): Fix device ptr lookup. (gomp_target_init): Set GOMP_OFFLOAD_CAP_SHARED_MEM is the devices supports USM.	2024-05-29 15:14:38 +02:00
Frederik Harwath	b8e9fd535d	amdgcn: Add gfx90c target Add support for gfx90c GCN5 APU integrated graphics devices. The LLVM AMDGPU documentation does not list those devices as supported by rocm-amdhsa, but it passes most libgomp offloading tests. Although they are constrainted compared to dGPUs, they might be interesting for learning, experimentation, and testing. gcc/ChangeLog: * config.gcc: Add gfx90c. * config/gcn/gcn-hsa.h (NO_SRAM_ECC): Likewise. * config/gcn/gcn-opts.h (enum processor_type): Likewise. (TARGET_GFX90c): New macro. * config/gcn/gcn.cc (gcn_option_override): Handle gfx90c. (gcn_omp_device_kind_arch_isa): Likewise. (output_file_start): Likewise. * config/gcn/gcn.h: Add gfx90c. * config/gcn/gcn.opt: Likewise. * config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX90c): New macro. (get_arch): Handle gfx90c. (main): Handle EF_AMDGPU_MACH_AMDGCN_GFX90c * config/gcn/t-omp-device: Add gfx90c. * doc/install.texi: Likewise. * doc/invoke.texi: Likewise. libgomp/ChangeLog: * plugin/plugin-gcn.c (isa_hsa_name): Handle EF_AMDGPU_MACH_AMDGCN_GFX90c. (isa_code): Handle gfx90c. (max_isa_vgprs): Handle EF_AMDGPU_MACH_AMDGCN_GFX90c. Signed-off-by: Frederik Harwath <frederik@harwath.name>	2024-04-26 11:23:43 +02:00
Thomas Schwinge	a02d7f0edc	GCN, nvptx: Errors during device probing are fatal Currently, we silently disable libgomp GCN and nvptx plugins/devices in presence of certain error conditions during device probing, thus typically silently resorting to host-fallback execution. Make such errors fatal, similar as for any other device access later on, so that we early and reliably notice when things go wrong. (Keep just two cases non-fatal: (a) libgomp GCN or nvptx plugins are available but 'libhsa-runtime64.so.1' or 'libcuda.so.1' are not, and (b) those are available, but the corresponding devices are not.) This resolves the issue that we've got execution test cases unexpectedly PASSing, despite: libgomp: GCN fatal error: Run-time could not be initialized Runtime message: HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events. ..., and therefore they were not offloaded to the GCN device, but ran in host-fallback execution mode. What happend in that scenario is that in 'init_hsa_context' during the initial 'GOMP_OFFLOAD_get_num_devices' we ran into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES', but it wasn't fatal, but just silently disabled the libgomp plugin/device. Especially "entertaining" were cases where such unintended host-fallback execution happened during effective-target checks like 'offload_device_available' (host-fallback execution there meaning: no offload device available), but actual test cases then were running with an offload device available, and therefore mis-configured. include/ * cuda/cuda.h (CUresult): Add 'CUDA_ERROR_NO_DEVICE'. libgomp/ * plugin/plugin-gcn.c (init_hsa_context): Add and handle 'bool probe' parameter. Adjust all users; errors during device probing are fatal. * plugin/plugin-nvptx.c (nvptx_get_num_devices): Aside from 'CUDA_ERROR_NO_DEVICE', errors during device probing are fatal.	2024-04-08 22:08:00 +02:00
Richard Biener	78b56a12dd	amdgcn: Add gfx1036 target Add support for the gfx1036 RDNA2 APU integrated graphics devices. The ROCm documentation warns that these may not be supported, but it seems to work at least partially. gcc/ChangeLog: * config.gcc (amdgcn): Add gfx1036 entries. * config/gcn/gcn-hsa.h (NO_XNACK): Likewise. (gcn_local_sym_hash): Likewise. * config/gcn/gcn-opts.h (enum processor_type): Likewise. (TARGET_GFX1036): New macro. * config/gcn/gcn.cc (gcn_option_override): Handle gfx1036. (gcn_omp_device_kind_arch_isa): Likewise. (output_file_start): Likewise. * config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Add __gfx1036__. (TARGET_CPU_CPP_BUILTINS): Rename __gfx1030 to __gfx1030__. * config/gcn/gcn.opt: Add gfx1036. * config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX1036): New. (main): Handle gfx1036. * config/gcn/t-omp-device: Add gfx1036 isa. * doc/install.texi (amdgcn): Add gfx1036. * doc/invoke.texi (-march): Likewise. libgomp/ChangeLog: * plugin/plugin-gcn.c (EF_AMDGPU_MACH): GFX1036. (gcn_gfx1103_s): New. (isa_hsa_name): Handle gfx1036. (isa_code): Likewise. (max_isa_vgprs): Likewise.	2024-03-25 15:54:37 +01:00
Andrew Stubbs	1bf18629c5	amdgcn: Add gfx1103 target Add support for the gfx1103 RDNA3 APU integrated graphics devices. The ROCm documentation warns that these may not be supported, but it seems to work at least partially. gcc/ChangeLog: * config.gcc (amdgcn): Add gfx1103 entries. * config/gcn/gcn-hsa.h (NO_XNACK): Likewise. (gcn_local_sym_hash): Likewise. * config/gcn/gcn-opts.h (enum processor_type): Likewise. (TARGET_GFX1103): New macro. * config/gcn/gcn.cc (gcn_option_override): Handle gfx1103. (gcn_omp_device_kind_arch_isa): Likewise. (output_file_start): Likewise. (gcn_hsa_declare_function_name): Use TARGET_RDNA3, not just gfx1100. * config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Add __gfx1103__. * config/gcn/gcn.opt: Add gfx1103. * config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX1103): New. (main): Handle gfx1103. * config/gcn/t-omp-device: Add gfx1103 isa. * doc/install.texi (amdgcn): Add gfx1103. * doc/invoke.texi (-march): Likewise. libgomp/ChangeLog: * plugin/plugin-gcn.c (EF_AMDGPU_MACH): GFX1103. (gcn_gfx1103_s): New. (isa_hsa_name): Handle gfx1103. (isa_code): Likewise. (max_isa_vgprs): Likewise.	2024-03-22 14:45:15 +00:00
Thomas Schwinge	84fc8f4f32	GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK' isn't applicable (non-shared memory system) 'GCN_SUPPRESS_HOST_FALLBACK' originated as 'HSA_SUPPRESS_HOST_FALLBACK' in the libgomp HSA plugin, where the idea was -- in my understanding -- that you wouldn't have device code available for all functions that may be called, and in that case transparently (shared memory system!) do host-fallback execution. Or, with 'HSA_SUPPRESS_HOST_FALLBACK' set, you'd get those diagnosed. This has then been copied into the libgomp GCN plugin as 'GCN_SUPPRESS_HOST_FALLBACK'. However, the original meaning isn't applicable for the libgomp GCN plugin anymore: we assume that we're generating device code for all relevant functions, and we're implementing a non-shared memory system, where we cannot transparently do host-fallback execution for individual functions. However, 'GCN_SUPPRESS_HOST_FALLBACK' has gained an additional meaning, to enforce a fatal error in case that 'libhsa-runtime64.so.1' can't be dynamically loaded; keep that meaning. libgomp/ * plugin/plugin-gcn.c (GOMP_OFFLOAD_can_run): Don't consider 'GCN_SUPPRESS_HOST_FALLBACK' anymore (assume always-'true'). (init_hsa_context): Adjust 'GCN_SUPPRESS_HOST_FALLBACK' error message.	2024-03-08 16:35:28 +01:00
Thomas Schwinge	37078f241a	nvptx: 'cuDeviceGetCount' failure is fatal Per commit `683f118439` "OpenMP: Move omp requires checks to libgomp", we're now using 'return -1' from 'GOMP_OFFLOAD_get_num_devices' for 'omp_requires_mask' purposes. This missed that via 'nvptx_get_num_devices', we could also 'return -1' for 'cuDeviceGetCount' failure. Before, this meant (in 'gomp_target_init') to silently ignore the plugin/device -- which also has been doubtful behavior. Let's instead turn 'cuDeviceGetCount' failure into a fatal error, similar to other errors during device initialization. libgomp/ * plugin/plugin-nvptx.c (nvptx_get_num_devices): 'cuDeviceGetCount' failure is fatal.	2024-03-08 16:35:28 +01:00
Thomas Schwinge	ab70addf56	GCN, nvptx: Fatal error for missing symbols in 'libhsa-runtime64.so.1', 'libcuda.so.1' If 'libhsa-runtime64.so.1', 'libcuda.so.1' are not available, the corresponding libgomp plugin/device gets disabled, as before. But if they are available, report any inconsistencies such as missing symbols, similar to how we fail in presence of other issues during device initialization. libgomp/ * plugin/plugin-gcn.c (init_hsa_runtime_functions): Fatal error for missing symbols. * plugin/plugin-nvptx.c (init_cuda_lib): Likewise.	2024-03-08 16:35:28 +01:00
Richard Biener	c34ab549d8	Avoid registering unsupported OMP offload devices The following avoids registering unsupported GCN offload devices when iterating over available ones. With a Zen4 desktop CPU you will have an IGPU (unspported) which will otherwise be made available. This causes testcases like libgomp.c-c++-common/non-rect-loop-1.c which iterate over all decives to FAIL. libgomp/ * plugin/plugin-gcn.c (suitable_hsa_agent_p): Filter out agents with unsupported ISA.	2024-01-26 15:36:35 +01:00
Richard Biener	209ed06c3a	Fix architecture support in OMP_OFFLOAD_init_device for gcn The following makes the existing architecture support check work instead of being optimized away (enum vs. -1). This avoids later asserts when we assume such devices are never actually used. libgomp/ * plugin/plugin-gcn.c (EF_AMDGPU_MACH::EF_AMDGPU_MACH_UNSUPPORTED): Add. (isa_code): Return that instead of -1. (GOMP_OFFLOAD_init_device): Adjust.	2024-01-26 15:36:35 +01:00
Andrew Stubbs	99890e1552	amdgcn: additional gfx1030/gfx1100 support This is enough to get gfx1030 and gfx1100 working; there are still some test failures to investigate, and probably some tuning to do. gcc/ChangeLog: * config/gcn/gcn-opts.h (TARGET_PACKED_WORK_ITEMS): Add TARGET_RDNA3. * config/gcn/gcn-valu.md (all_convert): New iterator. (<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): New define_expand, and rename the old one to ... (<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): ... this. (extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): Likewise, to ... (extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): .. this. (<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>): New. * config/gcn/gcn.cc (gcn_global_address_p): Use "offsetbits" correctly. (gcn_hsa_declare_function_name): Update the vgpr counting for gfx1100. * config/gcn/gcn.md (<u>mulhisi3): Disable on RDNA3. (<u>mulqihi3_scalar): Likewise. libgcc/ChangeLog: * config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Handle RDNA3. libgomp/ChangeLog: * config/gcn/time.c (RTC_TICKS): Configure RDNA3. (omp_get_wtime): Add RDNA3-compatible variant. * plugin/plugin-gcn.c (max_isa_vgprs): Tune for gfx1030 and gfx1100. Signed-off-by: Andrew Stubbs <ams@baylibre.com>	2024-01-26 11:38:47 +00:00
Thomas Schwinge	f9290cdf46	GCN: Add pre-initial support for gfx1100: 'EF_AMDGPU_MACH_AMDGCN_GFX1100' ../../../source-gcc/libgomp/plugin/plugin-gcn.c: In function ‘isa_hsa_name’: ../../../source-gcc/libgomp/plugin/plugin-gcn.c:1666:10: error: ‘EF_AMDGPU_MACH_AMDGCN_GFX1100’ undeclared (first use in this function); did you mean ‘EF_AMDGPU_MACH_AMDGCN_GFX1030’? 1666 \| case EF_AMDGPU_MACH_AMDGCN_GFX1100: \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \| EF_AMDGPU_MACH_AMDGCN_GFX1030 ../../../source-gcc/libgomp/plugin/plugin-gcn.c:1666:10: note: each undeclared identifier is reported only once for each function it appears in ../../../source-gcc/libgomp/plugin/plugin-gcn.c: In function ‘isa_code’: ../../../source-gcc/libgomp/plugin/plugin-gcn.c:1711:12: error: ‘EF_AMDGPU_MACH_AMDGCN_GFX1100’ undeclared (first use in this function); did you mean ‘EF_AMDGPU_MACH_AMDGCN_GFX1030’? 1711 \| return EF_AMDGPU_MACH_AMDGCN_GFX1100; \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \| EF_AMDGPU_MACH_AMDGCN_GFX1030 ../../../source-gcc/libgomp/plugin/plugin-gcn.c: In function ‘max_isa_vgprs’: ../../../source-gcc/libgomp/plugin/plugin-gcn.c:1728:10: error: ‘EF_AMDGPU_MACH_AMDGCN_GFX1100’ undeclared (first use in this function); did you mean ‘EF_AMDGPU_MACH_AMDGCN_GFX1030’? 1728 \| case EF_AMDGPU_MACH_AMDGCN_GFX1100: \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \| EF_AMDGPU_MACH_AMDGCN_GFX1030 make[4]: *** [Makefile:813: libgomp_plugin_gcn_la-plugin-gcn.lo] Error 1 Fix-up for commit `52a2c659ae` "GCN: Add pre-initial support for gfx1100". libgomp/ * plugin/plugin-gcn.c (EF_AMDGPU_MACH): Add 'EF_AMDGPU_MACH_AMDGCN_GFX1100'.	2024-01-08 20:46:37 +01:00
Julian Brown	a17299c17a	OpenMP: Support accelerated 2D/3D memory copies for AMD GCN This patch adds support for 2D/3D memory copies for omp_target_memcpy_rect using AMD extensions to the HSA API. This is just the AMD GCN-specific part of the following patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631001.html 2024-01-04 Julian Brown <julian@codesourcery.com> libgomp/ * plugin/plugin-gcn.c (hsa_runtime_fn_info): Add hsa_amd_memory_lock_fn, hsa_amd_memory_unlock_fn, hsa_amd_memory_async_copy_rect_fn function pointers. (init_hsa_runtime_functions): Add above functions, with DLSYM_OPT_FN. (GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): New functions.	2024-01-08 17:56:17 +00:00
Tobias Burnus	52a2c659ae	GCN: Add pre-initial support for gfx1100 ROCm since 5.7.1 supports gfx1100 (RDNA3) cards. This commit adds support for it, mostly by assuming gfx1100 behaves identical to gfx1030. Like gfx1030, gfx1100 support is neither documented nor the build of the multilib enabled by default. But contrary to gfx1030, gfx1100 has a known issue causing some libraries not to build, including newlib: The sdwa variant of v_mov_b32_sdwa is not supported by the hardware but GCC current does generates this instruction. This will be addressed in a later commit. gcc/ChangeLog: * config.gcc (amdgcn--amdhsa): Accept --with-arch=gfx1100. config/gcn/gcn-hsa.h (NO_XNACK): Add gfx1100: (ASM_SPEC): Handle gfx1100. * config/gcn/gcn-opts.h (enum processor_type): Add PROCESSOR_GFX1100. (enum gcn_isa): Add ISA_RDNA3. (TARGET_GFX1100, TARGET_RDNA2_PLUS, TARGET_RDNA3): Define. * config/gcn/gcn-valu.md: Change TARGET_RDNA2 to TARGET_RDNA2_PLUS. * config/gcn/gcn.cc (gcn_option_override, gcn_omp_device_kind_arch_isa, output_file_start): Handle gfx1100. (gcn_global_address_p, gcn_addr_space_legitimate_address_p): Change TARGET_RDNA2 to TARGET_RDNA2_PLUS. (gcn_hsa_declare_function_name): Don't use '.amdhsa_reserve_flat_scratch' with gfx1100. * config/gcn/gcn.h (ASSEMBLER_DIALECT): Likewise. (TARGET_CPU_CPP_BUILTINS): Define __RDNA3__, __gfx1030__ and __gfx1100__. * config/gcn/gcn.md: Change TARGET_RDNA2 to TARGET_RDNA2_PLUS. * config/gcn/gcn.opt (Enum gpu_type): Add gfx1100. * config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX1100): Define. (isa_has_combined_avgprs, main): Handle gfx1100. * config/gcn/t-omp-device (isa): Add gfx1100. libgomp/ChangeLog: * plugin/plugin-gcn.c (gcn_gfx1100_s): New const string. (gcn_isa_name_len): Fix length. (isa_hsa_name, isa_code, max_isa_vgprs): Handle gfx1100.	2024-01-08 15:12:44 +01:00
Jakub Jelinek	a945c346f5	Update copyright years.	2024-01-03 12:19:35 +01:00
Julian Brown	d7e9ae4fa9	OpenMP, NVPTX: memcpy[23]D bias correction This patch works around behaviour of the 2D and 3D memcpy operations in the CUDA driver runtime. Particularly in Fortran, the "base pointer" of an array (used for either source or destination of a host/device copy) may lie outside of data that is actually stored on the device. The fix is to make sure that we use the first element of data to be transferred instead, and adjust parameters accordingly. 2023-10-02 Julian Brown <julian@codesourcery.com> libgomp/ * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d): Adjust parameters to avoid out-of-bounds array checks in CUDA runtime. (GOMP_OFFLOAD_memcpy3d): Likewise. * testsuite/libgomp.c-c++-common/memcpyxd-bias-1.c: New test.	2023-12-20 21:35:36 +00:00
Andrew Stubbs	e7d6c277fa	amdgcn, libgomp: low-latency allocator This implements the OpenMP low-latency memory allocator for AMD GCN using the small per-team LDS memory (Local Data Store). Since addresses can now refer to LDS space, the "Global" address space is no-longer compatible. This patch therefore switches the backend to use entirely "Flat" addressing (which supports both memories). A future patch will re-enable "global" instructions for cases where it is known to be safe to do so. gcc/ChangeLog: * config/gcn/gcn-builtins.def (DISPATCH_PTR): New built-in. * config/gcn/gcn.cc (gcn_init_machine_status): Disable global addressing. (gcn_expand_builtin_1): Implement GCN_BUILTIN_DISPATCH_PTR. libgomp/ChangeLog: * config/gcn/libgomp-gcn.h (TEAM_ARENA_START): Move to here. (TEAM_ARENA_FREE): Likewise. (TEAM_ARENA_END): Likewise. (GCN_LOWLAT_HEAP): New. * config/gcn/team.c (LITTLEENDIAN_CPU): New, and import hsa.h. (__gcn_lowlat_init): New prototype. (gomp_gcn_enter_kernel): Initialize the low-latency heap. * libgomp.h (TEAM_ARENA_START): Move to libgomp.h. (TEAM_ARENA_FREE): Likewise. (TEAM_ARENA_END): Likewise. * plugin/plugin-gcn.c (lowlat_size): New variable. (print_kernel_dispatch): Label the group_segment_size purpose. (init_environment_variables): Read GOMP_GCN_LOWLAT_POOL. (create_kernel_dispatch): Pass low-latency head allocation to kernel. (run_kernel): Use shadow; don't assume values. * testsuite/libgomp.c/omp_alloc-traits.c: Enable for amdgcn. * config/gcn/allocator.c: New file. * libgomp.texi: Document low-latency implementation details.	2023-12-06 16:48:57 +00:00
Andrew Stubbs	30486fab71	libgomp, nvptx: low-latency memory allocator This patch adds support for allocating low-latency ".shared" memory on NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc. The memory can be allocated, reallocated, and freed using a basic but fast algorithm, is thread safe and the size of the low-latency heap can be configured using the GOMP_NVPTX_LOWLAT_POOL environment variable. The use of the PTX dynamic_smem_size feature means that low-latency allocator will not work with the PTX 3.1 multilib. For now, the omp_low_lat_mem_alloc allocator also works, but that will change when I implement the access traits. libgomp/ChangeLog: * allocator.c (MEMSPACE_ALLOC): New macro. (MEMSPACE_CALLOC): New macro. (MEMSPACE_REALLOC): New macro. (MEMSPACE_FREE): New macro. (predefined_alloc_mapping): New array. Add _Static_assert to match. (ARRAY_SIZE): New macro. (omp_aligned_alloc): Use MEMSPACE_ALLOC. Implement fall-backs for predefined allocators. Simplify existing fall-backs. (omp_free): Use MEMSPACE_FREE. (omp_calloc): Use MEMSPACE_CALLOC. Implement fall-backs for predefined allocators. Simplify existing fall-backs. (omp_realloc): Use MEMSPACE_REALLOC, MEMSPACE_ALLOC, and MEMSPACE_FREE. Implement fall-backs for predefined allocators. Simplify existing fall-backs. * config/nvptx/team.c (__nvptx_lowlat_pool): New asm variable. (__nvptx_lowlat_init): New prototype. (gomp_nvptx_main): Call __nvptx_lowlat_init. * libgomp.texi: Update memory space table. * plugin/plugin-nvptx.c (lowlat_pool_size): New variable. (GOMP_OFFLOAD_init_device): Read the GOMP_NVPTX_LOWLAT_POOL envvar. (GOMP_OFFLOAD_run): Apply lowlat_pool_size. * basic-allocator.c: New file. * config/nvptx/allocator.c: New file. * testsuite/libgomp.c/omp_alloc-1.c: New test. * testsuite/libgomp.c/omp_alloc-2.c: New test. * testsuite/libgomp.c/omp_alloc-3.c: New test. * testsuite/libgomp.c/omp_alloc-4.c: New test. * testsuite/libgomp.c/omp_alloc-5.c: New test. * testsuite/libgomp.c/omp_alloc-6.c: New test. Co-authored-by: Kwok Cheung Yeung <kcy@codesourcery.com> Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>	2023-12-06 16:48:57 +00:00
Andrew Stubbs	ae0d2c2402	amdgcn: Add Accelerator VGPR registers Add the new CDNA register file. We don't support any of the specialized instructions that use these registers, but they're useful to relieve register pressure without spilling to stack. Co-authored-by: Andrew Jenner <andrew@codesourcery.com> gcc/ChangeLog: * config/gcn/constraints.md: Add "a" AVGPR constraint. * config/gcn/gcn-valu.md (mov<mode>): Add AVGPR alternatives. (mov<mode>_4reg): Likewise. (@mov<mode>_sgprbase): Likewise. (gather<mode>_insn_1offset<exec>): Likewise. (gather<mode>_insn_1offset_ds<exec>): Likewise. (gather<mode>_insn_2offsets<exec>): Likewise. (scatter<mode>_expr<exec_scatter>): Likewise. (scatter<mode>_insn_1offset_ds<exec_scatter>): Likewise. (scatter<mode>_insn_2offsets<exec_scatter>): Likewise. * config/gcn/gcn.cc (MAX_NORMAL_AVGPR_COUNT): Define. (gcn_class_max_nregs): Handle AVGPR_REGS and ALL_VGPR_REGS. (gcn_hard_regno_mode_ok): Likewise. (gcn_regno_reg_class): Likewise. (gcn_spill_class): Allow spilling to AVGPRs on TARGET_CDNA1_PLUS. (gcn_sgpr_move_p): Handle AVGPRs. (gcn_secondary_reload): Reload AVGPRs via VGPRs. (gcn_conditional_register_usage): Handle AVGPRs. (gcn_vgpr_equivalent_register_operand): New function. (gcn_valid_move_p): Check for validity of AVGPR moves. (gcn_compute_frame_offsets): Handle AVGPRs. (gcn_memory_move_cost): Likewise. (gcn_register_move_cost): Likewise. (gcn_vmem_insn_p): Handle TYPE_VOP3P_MAI. (gcn_md_reorg): Handle AVGPRs. (gcn_hsa_declare_function_name): Likewise. (print_reg): Likewise. (gcn_dwarf_register_number): Likewise. * config/gcn/gcn.h (FIRST_AVGPR_REG): Define. (AVGPR_REGNO): Define. (LAST_AVGPR_REG): Define. (SOFT_ARG_REG): Update. (FRAME_POINTER_REGNUM): Update. (DWARF_LINK_REGISTER): Update. (FIRST_PSEUDO_REGISTER): Update. (AVGPR_REGNO_P): Define. (enum reg_class): Add AVGPR_REGS and ALL_VGPR_REGS. (REG_CLASS_CONTENTS): Add new register classes and add entries for AVGPRs to all classes. (REGISTER_NAMES): Add AVGPRs. * config/gcn/gcn.md (FIRST_AVGPR_REG, LAST_AVGPR_REG): Define. (AP_REGNUM, FP_REGNUM): Update. (define_attr "type"): Add vop3p_mai. (define_attr "unit"): Handle vop3p_mai. (define_attr "gcn_version"): Add "cdna2". (define_attr "enabled"): Handle cdna2. (mov<mode>_insn): Add AVGPR alternatives. (movti_insn): Likewise. * config/gcn/mkoffload.cc (isa_has_combined_avgprs): New. (process_asm): Process avgpr_count. * config/gcn/predicates.md (gcn_avgpr_register_operand): New. (gcn_avgpr_hard_register_operand): New. * doc/md.texi: Document the "a" constraint. gcc/testsuite/ChangeLog: * gcc.target/gcn/avgpr-mem-double.c: New test. * gcc.target/gcn/avgpr-mem-int.c: New test. * gcc.target/gcn/avgpr-mem-long.c: New test. * gcc.target/gcn/avgpr-mem-short.c: New test. * gcc.target/gcn/avgpr-spill-double.c: New test. * gcc.target/gcn/avgpr-spill-int.c: New test. * gcc.target/gcn/avgpr-spill-long.c: New test. * gcc.target/gcn/avgpr-spill-short.c: New test. libgomp/ChangeLog: * plugin/plugin-gcn.c (max_isa_vgprs): New. (run_kernel): CDNA2 devices have more VGPRs.	2023-11-15 14:02:00 +00:00
Kwok Cheung Yeung	a49c7d3193	openmp: Add support for the 'indirect' clause in C/C++ This adds support for the 'indirect' clause in the 'declare target' directive. Functions declared as indirect may be called via function pointers passed from the host in offloaded code. Virtual calls to member functions via the object pointer in C++ are currently not supported in target regions. 2023-11-07 Kwok Cheung Yeung <kcy@codesourcery.com> gcc/c-family/ * c-attribs.cc (c_common_attribute_table): Add attribute for indirect functions. * c-pragma.h (enum parma_omp_clause): Add entry for indirect clause. gcc/c/ * c-decl.cc (c_decl_attributes): Add attribute for indirect functions. * c-lang.h (c_omp_declare_target_attr): Add indirect field. * c-parser.cc (c_parser_omp_clause_name): Handle indirect clause. (c_parser_omp_clause_indirect): New. (c_parser_omp_all_clauses): Handle indirect clause. (OMP_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask. (c_parser_omp_declare_target): Handle indirect clause. Emit error message if device_type or indirect clauses used alone. Emit error if indirect clause used with device_type that is not 'any'. (OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask. (c_parser_omp_begin): Handle indirect clause. * c-typeck.cc (c_finish_omp_clauses): Handle indirect clause. gcc/cp/ * cp-tree.h (cp_omp_declare_target_attr): Add indirect field. * decl2.cc (cplus_decl_attributes): Add attribute for indirect functions. * parser.cc (cp_parser_omp_clause_name): Handle indirect clause. (cp_parser_omp_clause_indirect): New. (cp_parser_omp_all_clauses): Handle indirect clause. (handle_omp_declare_target_clause): Add extra parameter. Add indirect attribute for indirect functions. (OMP_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask. (cp_parser_omp_declare_target): Handle indirect clause. Emit error message if device_type or indirect clauses used alone. Emit error if indirect clause used with device_type that is not 'any'. (OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask. (cp_parser_omp_begin): Handle indirect clause. * semantics.cc (finish_omp_clauses): Handle indirect clause. gcc/ * lto-cgraph.cc (enum LTO_symtab_tags): Add tag for indirect functions. (output_offload_tables): Write indirect functions. (input_offload_tables): read indirect functions. * lto-section-names.h (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME): New. * omp-builtins.def (BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR): New. * omp-offload.cc (offload_ind_funcs): New. (omp_discover_implicit_declare_target): Add functions marked with 'omp declare target indirect' to indirect functions list. (omp_finish_file): Add indirect functions to section for offload indirect functions. (execute_omp_device_lower): Redirect indirect calls on target by passing function pointer to BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR. (pass_omp_device_lower::gate): Run pass_omp_device_lower if indirect functions are present on an accelerator device. * omp-offload.h (offload_ind_funcs): New. * tree-core.h (omp_clause_code): Add OMP_CLAUSE_INDIRECT. * tree.cc (omp_clause_num_ops): Add entry for OMP_CLAUSE_INDIRECT. (omp_clause_code_name): Likewise. * tree.h (OMP_CLAUSE_INDIRECT_EXPR): New. * config/gcn/mkoffload.cc (process_asm): Process offload_ind_funcs section. Count number of indirect functions. (process_obj): Emit number of indirect functions. * config/nvptx/mkoffload.cc (ind_func_ids, ind_funcs_tail): New. (process): Emit offload_ind_func_table in PTX code. Emit indirect function names and count in image. * config/nvptx/nvptx.cc (nvptx_record_offload_symbol): Mark indirect functions in PTX code with IND_FUNC_MAP. gcc/testsuite/ * c-c++-common/gomp/declare-target-7.c: Update expected error message. * c-c++-common/gomp/declare-target-indirect-1.c: New. * c-c++-common/gomp/declare-target-indirect-2.c: New. * g++.dg/gomp/attrs-21.C (v12): Update expected error message. * g++.dg/gomp/declare-target-indirect-1.C: New. * gcc.dg/gomp/attrs-21.c (v12): Update expected error message. include/ * gomp-constants.h (GOMP_VERSION): Increment to 3. (GOMP_VERSION_SUPPORTS_INDIRECT_FUNCS): New. libgcc/ * offloadstuff.c (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME): New. (__offload_ind_func_table): New. (__offload_ind_funcs_end): New. (__OFFLOAD_TABLE__): Add entries for indirect functions. libgomp/ * Makefile.am (libgomp_la_SOURCES): Add target-indirect.c. * Makefile.in: Regenerate. * libgomp-plugin.h (GOMP_INDIRECT_ADDR_MAP): New define. (GOMP_OFFLOAD_load_image): Add extra argument. * libgomp.h (struct indirect_splay_tree_key_s): New. (indirect_splay_tree_node, indirect_splay_tree, indirect_splay_tree_key): New. (indirect_splay_compare): New. * libgomp.map (GOMP_5.1.1): Add GOMP_target_map_indirect_ptr. * libgomp.texi (OpenMP 5.1): Update documentation on indirect calls in target region and on indirect clause. (Other new OpenMP 5.2 features): Add entry for virtual function calls. * libgomp_g.h (GOMP_target_map_indirect_ptr): Add prototype. * oacc-host.c (host_load_image): Add extra argument. * target.c (gomp_load_image_to_device): If the GOMP_VERSION is high enough, read host indirect functions table and pass to load_image_func. * config/accel/target-indirect.c: New. * config/linux/target-indirect.c: New. * config/gcn/team.c (build_indirect_map): Add prototype. (gomp_gcn_enter_kernel): Initialize support for indirect function calls on GCN target. * config/nvptx/team.c (build_indirect_map): Add prototype. (gomp_nvptx_main): Initialize support for indirect function calls on NVPTX target. * plugin/plugin-gcn.c (struct gcn_image_desc): Add field for indirect functions count. (GOMP_OFFLOAD_load_image): Add extra argument. If the GOMP_VERSION is high enough, build address translation table and copy it to target memory. * plugin/plugin-nvptx.c (nvptx_tdata): Add field for indirect functions count. (GOMP_OFFLOAD_load_image): Add extra argument. If the GOMP_VERSION is high enough, Build address translation table and copy it to target memory. * testsuite/libgomp.c-c++-common/declare-target-indirect-1.c: New. * testsuite/libgomp.c-c++-common/declare-target-indirect-2.c: New. * testsuite/libgomp.c++/declare-target-indirect-1.C: New.	2023-11-07 15:44:50 +00:00
Andrew Stubbs	c7ec7bd1c6	amdgcn: add -march=gfx1030 EXPERIMENTAL Accept the architecture configure option and resolve build failures. This is enough to build binaries, but I've not got a device to test it on, so there are probably runtime issues to fix. The cache control instructions might be unsafe (or too conservative), and the kernel metadata might be off. Vector reductions will need to be reworked for RDNA2. In principle, it would be better to use wavefrontsize32 for this architecture, but that would mean switching everything to allow SImode masks, so wavefrontsize64 it is. The multilib is not included in the default configuration so either configure --with-arch=gfx1030 or include it in --with-multilib-list=gfx1030,.... The majority of this patch has no effect on other devices, but changing from using scalar writes for the exit value to vector writes means we don't need the scalar cache write-back instruction anywhere (which doesn't exist in RDNA2). gcc/ChangeLog: * config.gcc: Allow --with-arch=gfx1030. * config/gcn/gcn-hsa.h (NO_XNACK): gfx1030 does not support xnack. (ASM_SPEC): gfx1030 needs -mattr=+wavefrontsize64 set. * config/gcn/gcn-opts.h (enum processor_type): Add PROCESSOR_GFX1030. (TARGET_GFX1030): New. (TARGET_RDNA2): New. * config/gcn/gcn-valu.md (@dpp_move<mode>): Disable for RDNA2. (addc<mode>3<exec_vcc>): Add RDNA2 syntax variant. (subc<mode>3<exec_vcc>): Likewise. (<convop><mode><vndi>2_exec): Add RDNA2 alternatives. (vec_cmp<mode>di): Likewise. (vec_cmp<u><mode>di): Likewise. (vec_cmp<mode>di_exec): Likewise. (vec_cmp<u><mode>di_exec): Likewise. (vec_cmp<mode>di_dup): Likewise. (vec_cmp<mode>di_dup_exec): Likewise. (reduc_<reduc_op>_scal_<mode>): Disable for RDNA2. (<reduc_op>_dpp_shr_<mode>): Likewise. (plus_carry_dpp_shr_<mode>): Likewise. (plus_carry_in_dpp_shr_<mode>): Likewise. config/gcn/gcn.cc (gcn_option_override): Recognise gfx1030. (gcn_global_address_p): RDNA2 only allows smaller offsets. (gcn_addr_space_legitimate_address_p): Likewise. (gcn_omp_device_kind_arch_isa): Recognise gfx1030. (gcn_expand_epilogue): Use VGPRs instead of SGPRs. (output_file_start): Configure gfx1030. * config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Add __RDNA2__; (ASSEMBLER_DIALECT): New. * config/gcn/gcn.md (rdna): New define_attr. (enabled): Use "rdna" attribute. (gcn_return): Remove s_dcache_wb. (addcsi3_scalar): Add RDNA2 syntax variant. (addcsi3_scalar_zero): Likewise. (addptrdi3): Likewise. (mulsi3): v_mul_lo_i32 should be v_mul_lo_u32 on all ISA. (memory_barrier): Add RDNA2 syntax variant. (atomic_load<mode>): Add RDNA2 cache control variants, and disable scalar atomics for RDNA2. (atomic_store<mode>): Likewise. (atomic_exchange<mode>): Likewise. config/gcn/gcn.opt (gpu_type): Add gfx1030. * config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX1030): New. (main): Recognise -march=gfx1030. * config/gcn/t-omp-device: Add gfx1030 isa. libgcc/ChangeLog: * config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Set false for __RDNA2__. libgomp/ChangeLog: * plugin/plugin-gcn.c (EF_AMDGPU_MACH_AMDGCN_GFX1030): New. (isa_hsa_name): Recognise gfx1030. (isa_code): Likewise. * team.c (defined): Remove s_endpgm.	2023-10-20 12:40:25 +01:00
Tobias Burnus	8b9e559fe7	libgomp: cuda.h and omp_target_memcpy_rect cleanup Fixes for commit r14-2792-g25072a477a56a727b369bf9b20f4d18198ff5894 "OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect", namely: In that commit, the code was changed to handle shared-memory devices; however, as pointed out, omp_target_memcpy_check already set the pointer to NULL in that case. Hence, this commit reverts to the prior version. In cuda.h, it adds cuMemcpyPeer{,Async} for symmetry for cuMemcpy3DPeer (all currently unused) and in three structs, fixes reserved-member names and remove a bogus 'const' in three structs. And it changes a DLSYM to DLSYM_OPT as not all plugins support the new functions, yet. include/ChangeLog: * cuda/cuda.h (CUDA_MEMCPY2D, CUDA_MEMCPY3D, CUDA_MEMCPY3D_PEER): Remove bogus 'const' from 'const void dst' and fix reserved-name name in those structs. (cuMemcpyPeer, cuMemcpyPeerAsync): Add. libgomp/ChangeLog: target.c (omp_target_memcpy_rect_worker): Undo dim=1 change for GOMP_OFFLOAD_CAP_SHARED_MEM. (omp_target_memcpy_rect_copy): Likewise for lock condition. (gomp_load_plugin_for_device): Use DLSYM_OPT not DLSYM for memcpy3d/memcpy2d. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): Use memset 0 to nullify reserved and unused src/dst fields for that mem type; remove '{src,dst}LOD = 0'.	2023-07-29 13:25:03 +02:00
Tobias Burnus	25072a477a	OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect When copying a 2D or 3D rectangular memmory block, the performance is better when using CUDA's cuMemcpy2D/cuMemcpy3D instead of copying the data one by one. That's what this commit does. Additionally, it permits device-to-device copies, if neccessary using a temporary variable on the host. include/ChangeLog: * cuda/cuda.h (CUlimit): Add CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_INVALID_HANDLE. (CUarray, CUmemorytype, CUDA_MEMCPY2D, CUDA_MEMCPY3D, CUDA_MEMCPY3D_PEER): New typdefs. (cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpy3DPeer, cuMemcpy3DPeerAsync): New prototypes. libgomp/ChangeLog: * libgomp-plugin.h (GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): New prototypes. * libgomp.h (struct gomp_device_descr): Add memcpy2d_func and memcpy3d_func. * libgomp.texi (nvtpx): Document when cuMemcpy2D/cuMemcpy3D is used. * oacc-host.c (memcpy2d_func, .memcpy3d_func): Init with NULL. * plugin/cuda-lib.def (cuMemcpy2D, cuMemcpy2DUnaligned, cuMemcpy3D): Invoke via CUDA_ONE_CALL. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): New. * target.c (omp_target_memcpy_rect_worker): (omp_target_memcpy_rect_check, omp_target_memcpy_rect_copy): Permit all device-to-device copyies; invoke new plugins for 2D and 3D copying when available. (gomp_load_plugin_for_device): DLSYM the new plugin functions. * testsuite/libgomp.c/target-12.c: Fix dimension bug. * testsuite/libgomp.fortran/target-12.f90: Likewise. * testsuite/libgomp.fortran/target-memcpy-rect-1.f90: New test.	2023-07-26 16:22:35 +02:00
Tobias Burnus	f1af7d65ff	libgomp: plugin-gcn - support 'unified_address' Effectively, for GCN (as for nvptx) there is a common address space between host and device, whether being accessible or not. Thus, this commit permits to use 'omp requires unified_address' with GCN devices. (nvptx accepts this requirement since r13-3460-g131d18e928a3ea.) libgomp/ * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Regard unified_address requirement as supported. * libgomp.texi (OpenMP 5.0, AMD Radeon, nvptx): Remove 'unified_address' from the not-supported requirements.	2023-06-06 18:06:14 +02:00
Thomas Schwinge	130c2f3c3a	libgomp: Simplify OpenMP reverse offload host <-> device memory copy implementation ... by using the existing 'goacc_asyncqueue' instead of re-coding parts of it. Follow-up to commit `131d18e928` "libgomp/nvptx: Prepare for reverse-offload callback handling", and commit `ea4b23d9c8` "libgomp: Handle OpenMP's reverse offloads". libgomp/ * target.c (gomp_target_rev): Instead of 'dev_to_host_cpy', 'host_to_dev_cpy', 'token', take a single 'goacc_asyncqueue'. * libgomp.h (gomp_target_rev): Adjust. * libgomp-plugin.c (GOMP_PLUGIN_target_rev): Adjust. * libgomp-plugin.h (GOMP_PLUGIN_target_rev): Adjust. * plugin/plugin-gcn.c (process_reverse_offload): Adjust. * plugin/plugin-nvptx.c (rev_off_dev_to_host_cpy) (rev_off_host_to_dev_cpy): Remove. (GOMP_OFFLOAD_run): Adjust.	2023-05-08 15:58:05 +02:00

1 2 3 4 5

205 Commits