mirror of git://gcc.gnu.org/git/gcc.git
This is a version of Tobias's mainline patch of the same name, merged to og13 and with the followup patch "libgomp: cuda.h and omp_target_memcpy_rect cleanup" folded in. A couple of merge conflicts have also been resolved, mostly regarding "gomp_update". Tobias's original log message follows. When copying a 2D or 3D rectangular memmory block, the performance is better when using CUDA's cuMemcpy2D/cuMemcpy3D instead of copying the data one by one. That's what this commit does. Additionally, it permits device-to-device copies, if necessary using a temporary variable on the host. 2023-09-19 Tobias Burnus <tobias@codesourcery.com> Julian Brown <julian@codesourcery.com> include/ * cuda/cuda.h (CUlimit): Add CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_INVALID_HANDLE. (CUarray, CUmemorytype, CUDA_MEMCPY2D, CUDA_MEMCPY3D, CUDA_MEMCPY3D_PEER): New typdefs. (cuMemcpyPeer, cuMemcpyPeerAsync, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpy3DPeer, cuMemcpy3DPeerAsync): New prototypes. libgomp/ * libgomp-plugin.h (GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): New prototypes. * libgomp.h (struct gomp_device_descr): Add memcpy2d_func and memcpy3d_func. * libgomp.texi (nvptx): Document when cuMemcpy2D/cuMemcpy3D is used. * oacc-host.c (memcpy2d_func, .memcpy3d_func): Init with NULL. * plugin/cuda-lib.def (cuMemcpy2D, cuMemcpy2DUnaligned, cuMemcpy3D): Invoke via CUDA_ONE_CALL. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): New. * target.c (omp_target_memcpy_rect_worker): Update prototype. (omp_target_memcpy_rect_check, omp_target_memcpy_rect_copy): Permit all device-to-device copies; invoke new plugins for 2D and 3D copying when available. (gomp_update): Update calls to omp_target_memcpy_rect_worker. Ensure that tmp space is not allocated here. (gomp_load_plugin_for_device): DLSYM the new plugin functions. * testsuite/libgomp.c/target-12.c: Fix dimension bug. * testsuite/libgomp.fortran/target-12.f90: Likewise. * testsuite/libgomp.fortran/target-memcpy-rect-1.f90: New test. |
||
|---|---|---|
| .. | ||
| cuda | ||
| gdb | ||
| COPYING | ||
| COPYING3 | ||
| ChangeLog | ||
| ChangeLog-9103 | ||
| ChangeLog.jit | ||
| ChangeLog.omp | ||
| ansidecl.h | ||
| btf.h | ||
| ctf.h | ||
| demangle.h | ||
| dwarf2.def | ||
| dwarf2.h | ||
| dyn-string.h | ||
| environ.h | ||
| fibheap.h | ||
| filenames.h | ||
| floatformat.h | ||
| fnmatch.h | ||
| gcc-c-fe.def | ||
| gcc-c-interface.h | ||
| gcc-cp-fe.def | ||
| gcc-cp-interface.h | ||
| gcc-interface.h | ||
| getopt.h | ||
| gomp-constants.h | ||
| hashtab.h | ||
| hsa.h | ||
| hsa_ext_amd.h | ||
| hsa_ext_image.h | ||
| leb128.h | ||
| libiberty.h | ||
| longlong.h | ||
| lto-symtab.h | ||
| md5.h | ||
| objalloc.h | ||
| obstack.h | ||
| partition.h | ||
| plugin-api.h | ||
| safe-ctype.h | ||
| sha1.h | ||
| simple-object.h | ||
| sort.h | ||
| splay-tree.h | ||
| symcat.h | ||
| timeval-utils.h | ||
| vtv-change-permission.h | ||
| xregex.h | ||
| xregex2.h | ||
| xtensa-config.h | ||
| xtensa-dynconfig.h | ||