Commit Graph

52 Commits

Author SHA1 Message Date
Tom de Vries 2c372e81a9 [nvptx, libgomp] Don't launch with num_workers == 0
When using a compiler build with:
...
+#define PTX_DEFAULT_VECTOR_LENGTH PTX_CTA_SIZE
+#define PTX_MAX_VECTOR_LENGTH PTX_CTA_SIZE
...
and running the libgomp testsuite, we run into an execution failure in
parallel-loop-1.c, due to a cuda launch failure:
...
  nvptx_exec: kernel f6_none_none$_omp_fn$0: launch gangs=480, workers=0, \
    vectors=1024

libgomp: cuLaunchKernel error: invalid argument
...
because workers == 0.

The workers variable is set to 0 here in nvptx_exec:
...
                workers = blocks / actual_vectors;
...
because actual_vectors is 1024, and blocks is 768:
...
cuOccupancyMaxPotentialBlockSize: grid = 10, block = 768
...

Fix this by ensuring that workers is at least one.

2019-01-09  Tom de Vries  <tdevries@suse.de>

	* plugin/plugin-nvptx.c (nvptx_exec): Make sure to launch with at least
	one worker.

From-SVN: r267746
2019-01-09 00:07:45 +00:00
Jakub Jelinek a554497024 Update copyright years.
From-SVN: r267494
2019-01-01 13:31:55 +01:00
Thomas Schwinge f847198ec3 [PR88495] An OpenACC async queue is always synchronized with itself
An OpenACC async queue is always synchronized with itself, so invocations like
"#pragma acc wait(0) async(0)", or "acc_wait_async (0, 0)" don't make a lot of
sense, but are still valid.

	libgomp/
	PR libgomp/88495
	* plugin/plugin-nvptx.c (nvptx_wait_async): Don't refuse
	"identical parameters".
	* testsuite/libgomp.oacc-c-c++-common/asyncwait-nop-1.c: Update.
	* testsuite/libgomp.oacc-c-c++-common/lib-80.c: Remove.

From-SVN: r267152
2018-12-14 21:43:02 +01:00
Thomas Schwinge 1404af62dc [PR88407] [OpenACC] Correctly handle unseen async-arguments
... which turn the operation into a no-op.

	libgomp/
	PR libgomp/88407
	* plugin/plugin-nvptx.c (nvptx_async_test, nvptx_wait)
	(nvptx_wait_async): Unseen async-argument is a no-op.
	* testsuite/libgomp.oacc-c-c++-common/async_queue-1.c: Update.
	* testsuite/libgomp.oacc-c-c++-common/data-2-lib.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/data-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-79.c: Likewise.
	* testsuite/libgomp.oacc-fortran/lib-12.f90: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-71.c: Merge into...
	* testsuite/libgomp.oacc-c-c++-common/lib-69.c: ... this.  Update.
	* testsuite/libgomp.oacc-c-c++-common/lib-77.c: Merge into...
	* testsuite/libgomp.oacc-c-c++-common/lib-74.c: ... this.  Update

From-SVN: r267150
2018-12-14 21:42:40 +01:00
Thomas Schwinge 18c247cc0b [PR88370] acc_get_cuda_stream/acc_set_cuda_stream: acc_async_sync, acc_async_noval
Per my reading of the OpenACC specification (and as supported by secondary
documentation, such as code examples, or presentations), it's valid to call
"acc_get_cuda_stream"/"acc_set_cuda_stream" also with "acc_async_sync",
"acc_async_noval" arguments, not just with the nonnegative values as currently
implemented.

	libgomp/
	PR libgomp/88370
	* libgomp.texi (acc_get_current_cuda_context, acc_get_cuda_stream)
	(acc_set_cuda_stream): Clarify.
	* oacc-cuda.c (acc_get_cuda_stream, acc_set_cuda_stream): Use
	"async_valid_p".
	* plugin/plugin-nvptx.c (nvptx_set_cuda_stream): Refuse "async ==
	acc_async_sync".
	* testsuite/libgomp.oacc-c-c++-common/acc_set_cuda_stream-1.c: New file.
	* testsuite/libgomp.oacc-c-c++-common/async_queue-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-84.c: Update.
	* testsuite/libgomp.oacc-c-c++-common/lib-85.c: Likewise.

From-SVN: r267147
2018-12-14 21:42:08 +01:00
Cesar Philippidis 2049befdd0 [nvptx] Remove use of CUDA unified memory in libgomp
libgomp/
	* plugin/plugin-nvptx.c (struct cuda_map): New.
	(struct ptx_stream): Replace d, h, h_begin, h_end, h_next, h_prev,
	h_tail with (cuda_map *) map.
	(cuda_map_create): New function.
	(cuda_map_destroy): New function.
	(map_init): Update to use a linked list of cuda_map objects.
	(map_fini): Likewise.
	(map_pop): Likewise.
	(map_push): Likewise.  Return CUdeviceptr instead of void.
	(init_streams_for_device): Remove stales references to ptx_stream
	members.
	(select_stream_for_async): Likewise.
	(nvptx_exec): Update call to map_init.

From-SVN: r264397
2018-09-18 08:41:54 -07:00
Cesar Philippidis bd9b3d3d1a [nvptx] Use CUDA driver API to select default runtime launch geometry
The CUDA driver API starting version 6.5 offers a set of runtime functions to
calculate several occupancy-related measures, as a replacement for the occupancy
calculator spreadsheet.

This patch adds a heuristic for default runtime launch geometry, based on the
new runtime function cuOccupancyMaxPotentialBlockSize.

Build on x86_64 with nvptx accelerator and ran libgomp testsuite.

2018-08-13  Cesar Philippidis  <cesar@codesourcery.com>
	    Tom de Vries  <tdevries@suse.de>

	PR target/85590
	* plugin/cuda/cuda.h (CUoccupancyB2DSize): New typedef.
	(cuOccupancyMaxPotentialBlockSize): Declare.
	* plugin/cuda-lib.def (cuOccupancyMaxPotentialBlockSize): New
	CUDA_ONE_CALL_MAYBE_NULL.
	* plugin/plugin-nvptx.c (CUDA_VERSION < 6050): Define
	CUoccupancyB2DSize and declare
	cuOccupancyMaxPotentialBlockSize.
	(nvptx_exec): Use cuOccupancyMaxPotentialBlockSize to set the
	default num_gangs and num_workers when the driver supports it.

Co-Authored-By: Tom de Vries <tdevries@suse.de>

From-SVN: r263505
2018-08-13 12:04:24 +00:00
Tom de Vries 8e09a12f01 [libgomp, nvptx] Fall back to cuLinkAddData/cuLinkCreate if _v2 not found
Cuda driver api functions cuLinkAddData and cuLinkCreate are available starting
version 5.5.  In version 6.5, they are remapped onto _v2 versions.

The dlopen interface of the libgomp nvptx plugin uses the _v2 versions, so it
won't work with a cuda driver with driver api version lower than 6.5.

This patch fixes the problem by testing for the presence of the _v2 versions,
and falling back to the original versions in case of absence of the _v2
versions.

Build on x86_64 with nvptx accelerator and reg-tested libgomp, both with and
without --without-cuda-driver.

2018-08-08  Tom de Vries  <tdevries@suse.de>

	* plugin/cuda-lib.def (cuLinkAddData_v2, cuLinkCreate_v2): Declare using
	CUDA_ONE_CALL_MAYBE_NULL.
	* plugin/plugin-nvptx.c (cuLinkAddData, cuLinkCreate): Undef and declare.
	(cuLinkAddData_v2, cuLinkCreate_v2): Declare.
	(link_ptx): Fall back to cuLinkAddData/cuLinkCreate if the _v2 versions
	are not found.

From-SVN: r263408
2018-08-08 14:26:37 +00:00
Tom de Vries cedd9bd016 [libgomp, nvptx] Allow cuGetErrorString to be NULL
Cuda driver api function cuGetErrorString is available in version 6.0 and
higher.

Currently, when the driver that is used does not contain this function, the
libgomp nvptx plugin will not build (PLUGIN_NVPTX_DYNAMIC == 0) or run
(PLUGIN_NVPTX_DYNAMIC == 1).

This patch fixes this problem by testing for the presence of the function, and
handling absence.

Build on x86_64 with nvptx accelerator and reg-tested libgomp, both with and
without --without-cuda-driver.

2018-08-08  Tom de Vries  <tdevries@suse.de>

	* plugin/cuda-lib.def (cuGetErrorString): Use CUDA_ONE_CALL_MAYBE_NULL.
	* plugin/plugin-nvptx.c (cuda_error): Handle if cuGetErrorString is not
	present.

From-SVN: r263407
2018-08-08 14:26:28 +00:00
Tom de Vries b113af959c [libgomp, nvptx] Remove hard-coded const in nvptx_open_device
CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR is defined in cuda driver
api version 6.0 and higher.

Currently nvptx_open_device uses a hard-coded constant instead.

This patch fixes that by:
- defining CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR to the hardcoded
  constant at toplevel, if not present in cuda.h, and
- using CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR in nvptx_open_device

Build on x86_64 with nvptx accelerator and reg-tested libgomp.

2018-08-08  Tom de Vries  <tdevries@suse.de>

	* plugin/plugin-nvptx.c
	(CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR): Define.
	(nvptx_open_device): Use
	CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR.

From-SVN: r263406
2018-08-08 14:26:19 +00:00
Tom de Vries 94767dacea [libgomp, nvptx] Note that cuGetErrorString is in CUDA_VERSION >= 6000
Cuda driver api function cuGetErrorString is available in version 6.0 and
higher.

This patch:
- removes a comment saying the declaration is not available in cuda.h 6.0
- fixes the presence test to use CUDA_VERSION < 6000
- moves the declaration to toplevel

Build on x86_64 with nvptx accelerator and reg-tested libgomp.

2018-08-08  Tom de Vries  <tdevries@suse.de>

	* plugin/plugin-nvptx.c (cuda_error): Move declaration of cuGetErrorString ...
	(cuGetErrorString): ... here.  Guard with CUDA_VERSION < 6000.

From-SVN: r263405
2018-08-08 14:26:10 +00:00
Tom de Vries 02150de863 [libgomp, nvptx] Handle CUDA_ONE_CALL_MAYBE_NULL
This patch adds handling of functions that may not be present in the cuda
driver.

Such a function can be declared using CUDA_ONE_CALL_MAYBE_NULL in cuda-lib.def,
it can be called with the usual convenience macros, but before calling its
presence needs to be tested using new macro CUDA_CALL_EXISTS.

When using the dlopen interface (PLUGIN_NVPTX_DYNAMIC == 1), we allow
non-present functions by allowing dlsym to return NULL.  Otherwise
(PLUGIN_NVPTX_DYNAMIC == 0) we declare the non-present function to be weak.

Build and reg-tested libgomp on x86_64 with nvidia accelerator, with and without
--disable-cuda-driver, in combination with a trigger patch that adds a
non-existing function foo to cuda-lib.def:
...
CUDA_ONE_CALL_MAYBE_NULL (foo)
...
and declares it in plugin-nvptx.c:
...
CUresult foo (void);
...
and then uses it in nvptx_init after the init_cuda_lib call:
...
  if (CUDA_CALL_EXISTS (foo))
    CUDA_CALL (foo);
...

Also build and reg-tested on x86_64 with nvidia accelerator, with and without
--disable-cuda-driver, in combination with a trigger patch that replaces all
CUDA_ONE_CALLs in cuda-lib.def with CUDA_ONE_CALL_MAYBE_NULL, and guards two
CUDA_CALLs with CUDA_CALL_EXISTS, one for a regular fn, and one for a fn that is
a define in cuda/cuda.h.

2018-08-07  Tom de Vries  <tdevries@suse.de>

	* plugin/plugin-nvptx.c (DO_PRAGMA): Define.
	(struct cuda_lib_s): Add def/undef of CUDA_ONE_CALL_MAYBE_NULL.
	(init_cuda_lib): Add new param to CUDA_ONE_CALL_1.  Add arg to
	corresponding call in CUDA_ONE_CALL.  Add def/undef of
	CUDA_ONE_CALL_MAYBE_NULL.
	(CUDA_CALL_EXISTS): Define.

From-SVN: r263346
2018-08-06 22:13:56 +00:00
Tom de Vries 9e28b10779 [libgomp, nvptx] Minimize lifetime of CUDA_ONE_CALL defines
This patch makes sure that the lifetimes of the CUDA_ONE_CALL macro (which is
defined twice in plugin-nvptx.c) are minimized, to make it obvious that the
definitions are used only in the lib-cuda.def include.

Build on x86_64 with nvptx accelerator and reg-tested libgomp.

2018-08-07  Tom de Vries  <tdevries@suse.de>

	* plugin/plugin-nvptx.c (struct cuda_lib_s, init_cuda_lib): Put
	CUDA_ONE_CALL defines right before the cuda-lib.def include, and the
	corresponding undefs right after.

From-SVN: r263345
2018-08-06 22:13:46 +00:00
Cesar Philippidis 094db6beb9 [PATCH] Remove use of 'struct map' from plugin (nvptx)
libgomp/
	* plugin/plugin-nvptx.c (struct map): Removed.
	(map_init, map_pop): Remove use of struct map. (map_push):
	Likewise and change argument list.
	* testsuite/libgomp.oacc-c-c++-common/mapping-1.c: New

Co-Authored-By: James Norris <jnorris@codesourcery.com>

From-SVN: r263212
2018-08-01 07:09:56 -07:00
Tom de Vries 8c6310a2c2 [libgomp, nvptx] Add cuda-lib.def
2018-08-01  Tom de Vries  <tdevries@suse.de>

	* plugin/cuda-lib.def: New file.  Factor out of ...
	* plugin/plugin-nvptx.c (CUDA_CALLS): ... here.
	(struct cuda_lib_s, init_cuda_lib): Include cuda-lib.def instead of
	using CUDA_CALLS.

From-SVN: r263208
2018-08-01 13:20:22 +00:00
Tom de Vries 4cdfee3f20 [libgomp, nvptx] Handle per-function max-threads-per-block in default dims
Currently parallel-loop-1.c fails at -O0 on a Quadro M1200, because one of the
kernel launch configurations exceeds the resources available in the device, due
to the default dimensions chosen by the runtime.

This patch fixes that by taking the per-function max_threads_per_block into
account when using the default dimensions.

2018-07-30  Tom de Vries  <tdevries@suse.de>

	* plugin/plugin-nvptx.c (MIN, MAX): Redefine.
	(nvptx_exec): Ensure worker and vector default dims don't exceed
	targ_fn->max_threads_per_block.

From-SVN: r263062
2018-07-30 08:17:26 +00:00
Tom de Vries 0b210c43bb [libgomp, nvptx] Calculate default dims per device
The default dimensions are calculated using per-device properties, but
initialized once and used on all devices.

This patch fixes this problem by introducing per-device default dimensions.

2018-07-30  Tom de Vries  <tdevries@suse.de>

	* plugin/plugin-nvptx.c (struct ptx_device): Add default_dims field.
	(nvptx_open_device): Init default_dims for device.
	(nvptx_exec): Use default_dims from device.

From-SVN: r263061
2018-07-30 08:17:16 +00:00
Cesar Philippidis 88a4654d03 [libgomp, nvptx] Add error with recompilation hint for launch failure
Currently, when a kernel is lauched with too many workers, it results in a cuda
launch failure.  This is triggered f.i. for parallel-loop-1.c at -O0 on a Quadro
M1200.

This patch detects this situation, and errors out with a hint on how to fix it.

Build and reg-tested on x86_64 with nvptx accelerator.

2018-07-26  Cesar Philippidis  <cesar@codesourcery.com>
	    Tom de Vries  <tdevries@suse.de>

	* plugin/plugin-nvptx.c (nvptx_exec): Error if the hardware doesn't have
	sufficient resources to launch a kernel, and give a hint on how to fix
	it.

Co-Authored-By: Tom de Vries <tdevries@suse.de>

From-SVN: r262997
2018-07-26 11:42:29 +00:00
Cesar Philippidis 0c6c2f5fc2 [libgomp, nvptx] Move device property sampling from nvptx_exec to nvptx_open
Move sampling of device properties from nvptx_exec to nvptx_open, and assume
the sampling always succeeds.  This simplifies the default dimension
initialization code in nvptx_open.

2018-07-26  Cesar Philippidis  <cesar@codesourcery.com>
	    Tom de Vries  <tdevries@suse.de>

	* plugin/plugin-nvptx.c (struct ptx_device): Add warp_size,
	max_threads_per_block and max_threads_per_multiprocessor fields.
	(nvptx_open_device): Initialize new fields.
	(nvptx_exec): Use num_sms, and new fields.

Co-Authored-By: Tom de Vries <tdevries@suse.de>

From-SVN: r262996
2018-07-26 11:42:19 +00:00
Tom de Vries ec00d3faf4 [openacc] Move GOMP_OPENACC_DIM parsing out of nvptx plugin
2018-05-02  Tom de Vries  <tom@codesourcery.com>

	PR libgomp/85411
	* plugin/plugin-nvptx.c (nvptx_exec): Move parsing of
	GOMP_OPENACC_DIM ...
	* env.c (parse_gomp_openacc_dim): ... here.  New function.
	(initialize_env): Call parse_gomp_openacc_dim.
	(goacc_default_dims): Define.
	* libgomp.h (goacc_default_dims): Declare.
	* oacc-plugin.c (GOMP_PLUGIN_acc_default_dim): New function.
	* oacc-plugin.h (GOMP_PLUGIN_acc_default_dim): Declare.
	* libgomp.map: New version "GOMP_PLUGIN_1.2". Add
	GOMP_PLUGIN_acc_default_dim.
	* testsuite/libgomp.oacc-c-c++-common/loop-default-runtime.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-default.h: New test.

From-SVN: r259852
2018-05-02 17:53:56 +00:00
Tom de Vries df36a3d3be [nvptx, libgomp] Add GOMP_NVPTX_JIT=-O[0-4] in nvptx libgomp plugin
2018-04-26  Tom de Vries  <tom@codesourcery.com>

	PR libgomp/84020
	* plugin/cuda/cuda.h (CUjit_option): Add CU_JIT_OPTIMIZATION_LEVEL.
	* plugin/plugin-nvptx.c (_GNU_SOURCE): Define.
	(process_GOMP_NVPTX_JIT): New function.
	(link_ptx): Use process_GOMP_NVPTX_JIT.

From-SVN: r259678
2018-04-26 13:27:04 +00:00
Jakub Jelinek 85ec4feb11 Update copyright years.
From-SVN: r256169
2018-01-03 11:03:58 +01:00
Tom de Vries dfb15f6bbb Show value of GOMP_OPENACC_DIM in libgomp nvptx plugin
2017-06-27  Tom de Vries  <tom@codesourcery.com>

	* plugin/plugin-nvptx.c (notify_var): New function.
	(nvptx_exec): Use notify_var for GOMP_OPENACC_DIM.

From-SVN: r249695
2017-06-27 15:51:48 +00:00
Thomas Schwinge 78672bd8fd libgomp nvptx plugin: Debugging output when disabling nvptx offloading
libgomp/
	* plugin/plugin-nvptx.c (nvptx_get_num_devices): Debugging output
	when disabling nvptx offloading.

From-SVN: r248400
2017-05-24 08:59:05 +02:00
Jakub Jelinek 19929ba9c9 plugin-nvptx.c (cuda_lib_inited): Use signed char type instead of char.
* plugin/plugin-nvptx.c (cuda_lib_inited): Use signed char type
	instead of char.

From-SVN: r246918
2017-04-13 21:59:04 +02:00
Thomas Schwinge e70ab10d5c libgomp, nvptx plugin: Make "nvptx_exec" static
libgomp/
	* plugin/plugin-nvptx.c (nvptx_exec): Make it static.

From-SVN: r245127
2017-02-02 15:35:30 +01:00
Thomas Schwinge 345a8c1712 libgomp: Normalize the names of a few functions of the libgomp plugin API
libgomp/
	* libgomp-plugin.h (GOMP_OFFLOAD_openacc_parallel): Rename to
	GOMP_OFFLOAD_openacc_exec.  Adjust all users.
	(GOMP_OFFLOAD_openacc_get_current_cuda_device): Rename to
	GOMP_OFFLOAD_openacc_cuda_get_current_device.  Adjust all users.
	(GOMP_OFFLOAD_openacc_get_current_cuda_context): Rename to
	GOMP_OFFLOAD_openacc_cuda_get_current_context.  Adjust all users.
	(GOMP_OFFLOAD_openacc_get_cuda_stream): Rename to
	GOMP_OFFLOAD_openacc_cuda_get_stream.  Adjust all users.
	(GOMP_OFFLOAD_openacc_set_cuda_stream): Rename to
	GOMP_OFFLOAD_openacc_cuda_set_stream.  Adjust all users.

From-SVN: r245125
2017-02-02 15:13:57 +01:00
Jakub Jelinek 2393d337e7 configfrag.ac: For --without-cuda-driver don't initialize CUDA_DRIVER_INCLUDE nor CUDA_DRIVER_LIB.
* plugin/configfrag.ac: For --without-cuda-driver don't initialize
	CUDA_DRIVER_INCLUDE nor CUDA_DRIVER_LIB.  If both
	CUDA_DRIVER_INCLUDE and CUDA_DRIVER_LIB are empty and linking small
	cuda program fails, define PLUGIN_NVPTX_DYNAMIC to 1 and use
	plugin/include/cuda as include dir and -ldl instead of -lcuda as
	library to link ptx plugin against.
	* plugin/plugin-nvptx.c: Include dlfcn.h if PLUGIN_NVPTX_DYNAMIC.
	(CUDA_CALLS): Define.
	(cuda_lib, cuda_lib_inited): New variables.
	(init_cuda_lib): New function.
	(CUDA_CALL_PREFIX): Define.
	(CUDA_CALL_ERET, CUDA_CALL_ASSERT): Use CUDA_CALL_PREFIX.
	(CUDA_CALL): Use FN instead of (FN).
	(CUDA_CALL_NOCHECK): Define.
	(cuda_error, fini_streams_for_device, select_stream_for_async,
	nvptx_attach_host_thread_to_device, nvptx_open_device, link_ptx,
	event_gc, nvptx_exec, nvptx_async_test, nvptx_async_test_all,
	nvptx_wait_all, nvptx_set_clocktick, GOMP_OFFLOAD_unload_image,
	nvptx_stacks_alloc, nvptx_stacks_free, GOMP_OFFLOAD_run): Use
	CUDA_CALL_NOCHECK.
	(nvptx_init): Call init_cuda_lib, if it fails, return false.  Use
	CUDA_CALL_NOCHECK.
	(nvptx_get_num_devices): Call init_cuda_lib, if it fails, return 0.
	Use CUDA_CALL_NOCHECK.
	* plugin/cuda/cuda.h: New file.
	* config.h.in: Regenerated.
	* configure: Regenerated.

From-SVN: r244522
2017-01-17 10:44:17 +01:00
Jakub Jelinek cbe34bb5ed Update copyright years.
From-SVN: r243994
2017-01-01 13:07:43 +01:00
Alexander Monakov 6103184e81 OpenMP offloading to NVPTX: libgomp changes
* Makefile.am (libgomp_la_SOURCES): Add atomic.c, icv.c, icv-device.c.
	* Makefile.in. Regenerate.
	* configure.ac [nvptx*-*-*] (libgomp_use_pthreads): Set and use it...
	(LIBGOMP_USE_PTHREADS): ...here; new define.
	* configure: Regenerate.
	* config.h.in: Likewise.
	* config/posix/affinity.c: Move to...
	* affinity.c: ...here (new file).  Guard use of Pthreads-specific
	interface by LIBGOMP_USE_PTHREADS. 
	* critical.c: Split out GOMP_atomic_{start,end} into...
	* atomic.c: ...here (new file).
	* env.c: Split out ICV definitions into...
	* icv.c: ...here (new file) and...
	* icv-device.c: ...here. New file.
	* config/linux/lock.c (gomp_init_lock_30): Move to generic lock.c.
	(gomp_destroy_lock_30): Ditto.
	(gomp_set_lock_30): Ditto.
	(gomp_unset_lock_30): Ditto.
	(gomp_test_lock_30): Ditto.
	(gomp_init_nest_lock_30): Ditto.
	(gomp_destroy_nest_lock_30): Ditto.
	(gomp_set_nest_lock_30): Ditto.
	(gomp_unset_nest_lock_30): Ditto.
	(gomp_test_nest_lock_30): Ditto.
	* lock.c: New.
	* config/nvptx/lock.c: New.
	* config/nvptx/bar.c: New.
	* config/nvptx/bar.h: New.
	* config/nvptx/doacross.h: New.
	* config/nvptx/error.c: New.
	* config/nvptx/icv-device.c: New.
	* config/nvptx/mutex.h: New.
	* config/nvptx/pool.h: New.
	* config/nvptx/proc.c: New.
	* config/nvptx/ptrlock.h: New.
	* config/nvptx/sem.h: New.
	* config/nvptx/simple-bar.h: New.
	* config/nvptx/target.c: New.
	* config/nvptx/task.c: New.
	* config/nvptx/team.c: New.
	* config/nvptx/time.c: New.
	* config/posix/simple-bar.h: New.
	* libgomp.h: Guard pthread.h inclusion.  Include simple-bar.h.
	(gomp_num_teams_var): Declare.
	(struct gomp_thread_pool): Change threads_dock member to
	gomp_simple_barrier_t.
	[__nvptx__] (gomp_thread): New implementation.
	(gomp_thread_attr): Guard by LIBGOMP_USE_PTHREADS.
	(gomp_thread_destructor): Ditto.
	(gomp_init_thread_affinity): Ditto.
	* team.c: Guard uses of Pthreads-specific interfaces by
	LIBGOMP_USE_PTHREADS.  Adjust all uses of threads_dock.
	(gomp_free_thread) [__nvptx__]: Do not call 'free'.

	* config/nvptx/alloc.c: Delete.
	* config/nvptx/barrier.c: Ditto.
	* config/nvptx/fortran.c: Ditto.
	* config/nvptx/iter.c: Ditto.
	* config/nvptx/iter_ull.c: Ditto.
	* config/nvptx/loop.c: Ditto.
	* config/nvptx/loop_ull.c: Ditto.
	* config/nvptx/ordered.c: Ditto.
	* config/nvptx/parallel.c: Ditto.
	* config/nvptx/priority_queue.c: Ditto.
	* config/nvptx/sections.c: Ditto.
	* config/nvptx/single.c: Ditto.
	* config/nvptx/splay-tree.c: Ditto.
	* config/nvptx/work.c: Ditto.

	* testsuite/libgomp.fortran/fortran.exp (lang_link_flags): Pass
	-foffload=-lgfortran in addition to -lgfortran.
	* testsuite/libgomp.oacc-fortran/fortran.exp (lang_link_flags): Ditto.

	* plugin/plugin-nvptx.c: Include <limits.h>.
	(struct targ_fn_descriptor): Add new fields.
	(struct ptx_device): Ditto.  Set them...
	(nvptx_open_device): ...here.
	(nvptx_adjust_launch_bounds): New.
	(nvptx_host2dev): Allow NULL 'nvthd'.
	(nvptx_dev2host): Ditto.
	(GOMP_OFFLOAD_get_caps): Add GOMP_OFFLOAD_CAP_OPENMP_400.
	(link_ptx): Adjust log sizes.
	(nvptx_host2dev): Allow NULL 'nvthd'.
	(nvptx_dev2host): Ditto.
	(nvptx_set_clocktick): New.  Use it...
	(GOMP_OFFLOAD_load_image): ...here.  Set new targ_fn_descriptor
	fields.
	(GOMP_OFFLOAD_dev2dev): New.
	(nvptx_adjust_launch_bounds): New.
	(nvptx_stacks_size): New.
	(nvptx_stacks_alloc): New.
	(nvptx_stacks_free): New.
	(GOMP_OFFLOAD_run): New.
	(GOMP_OFFLOAD_async_run): New (stub).

Co-Authored-By: Dmitry Melnik <dm@ispras.ru>
Co-Authored-By: Jakub Jelinek <jakub@redhat.com>

From-SVN: r242789
2016-11-23 21:36:41 +03:00
Cesar Philippidis 6668eb4593 nvptx.c (PTX_GANG_DEFAULT): Set to zero.
gcc/
	* config/nvptx/nvptx.c (PTX_GANG_DEFAULT): Set to zero.

	libgomp/
	* plugin/plugin-nvptx.c (nvptx_exec): Interrogate board attributes
	to determine default geometry.
	* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Set gang
	dimension.


Co-Authored-By: Nathan Sidwell <nathan@acm.org>

From-SVN: r241803
2016-11-02 15:10:02 -07:00
Chung-Lin Tang b4557008c4 oacc-plugin.h (GOMP_PLUGIN_async_unmap_vars): Add int parameter.
2016-05-26  Chung-Lin Tang  <cltang@codesourcery.com>

	libgomp/
	* oacc-plugin.h (GOMP_PLUGIN_async_unmap_vars): Add int parameter.
	* oacc-plugin.c (GOMP_PLUGIN_async_unmap_vars): Add 'int async'
	parameter, use to set async stream around call to gomp_unmap_vars,
	call gomp_unmap_vars() with 'do_copyfrom' set to true.
	* plugin/plugin-nvptx.c (struct ptx_event): Add 'int val' field.
	(event_gc): Adjust event handling loop, collect PTX_EVT_ASYNC_CLEANUP
	events and call GOMP_PLUGIN_async_unmap_vars() for each of them.
	(event_add): Add int parameter, initialize 'val' field when
	adding new ptx_event struct.
	(nvptx_evec): Adjust event_add() call arguments.
	(nvptx_host2dev): Likewise.
	(nvptx_dev2host): Likewise.
	(nvptx_wait_async): Likewise.
	(nvptx_wait_all_async): Likewise.
	(GOMP_OFFLOAD_openacc_register_async_cleanup): Add async parameter,
	pass to event_add() call.
	* oacc-host.c (host_openacc_register_async_cleanup): Add 'int async'
	parameter.
	* oacc-mem.c (gomp_acc_remove_pointer): Adjust async case to
	call openacc.register_async_cleanup_func() hook.
	* oacc-parallel.c (GOACC_parallel_keyed): Likewise.
	* target.c (gomp_copy_from_async): Delete function.
	(gomp_map_vars): Remove async_refcount.
	(gomp_unmap_vars): Likewise.
	(gomp_load_image_to_device): Likewise.
	(omp_target_associate_ptr): Likewise.
	* libgomp.h (struct splay_tree_key_s): Remove async_refcount.
	(acc_dispatch_t.register_async_cleanup_func): Add int parameter.
	(gomp_copy_from_async): Remove.

From-SVN: r236772
2016-05-26 13:28:25 +00:00
Chung-Lin Tang 6ce1307231 target.c (gomp_device_copy): New function.
libgomp/
2016-05-26  Chung-Lin Tang  <cltang@codesourcery.com>

	* target.c (gomp_device_copy): New function.
	(gomp_copy_host2dev): Likewise.
	(gomp_copy_dev2host): Likewise.
	(gomp_free_device_memory): Likewise.
	(gomp_map_vars_existing): Adjust to call gomp_copy_host2dev.
	(gomp_map_pointer): Likewise.
	(gomp_map_vars): Adjust to call gomp_copy_host2dev, handle
	NULL value from alloc_func plugin hook.
	(gomp_unmap_tgt): Adjust to call gomp_free_device_memory.
	(gomp_copy_from_async): Adjust to call gomp_copy_dev2host.
	(gomp_unmap_vars): Likewise.
	(gomp_update): Adjust to call gomp_copy_dev2host and
	gomp_copy_host2dev functions.
	(gomp_unload_image_from_device): Handle false value from
	unload_image_func plugin hook.
	(gomp_init_device): Handle false value from init_device_func
	plugin hook.
	(gomp_exit_data): Adjust to call gomp_copy_dev2host.
	(omp_target_free): Adjust to call gomp_free_device_memory.
	(omp_target_memcpy): Handle return values from host2dev_func,
	dev2host_func, and dev2dev_func plugin hooks.
	(omp_target_memcpy_rect_worker): Likewise.
	(gomp_target_fini): Handle false value from fini_device_func
	plugin hook.
	* libgomp.h (struct gomp_device_descr): Adjust return type of
	init_device_func, fini_device_func, unload_image_func, free_func,
	dev2host_func,host2dev_func, and dev2dev_func plugin hooks to 'bool'.
	* oacc-init.c (acc_shutdown_1): Handle false value from
	fini_device_func plugin hook.
	* oacc-host.c (host_init_device): Change return type to bool.
	(host_fini_device): Likewise.
	(host_unload_image): Likewise.
	(host_free): Likewise.
	(host_dev2host): Likewise.
	(host_host2dev): Likewise.
	* oacc-mem.c (acc_free): Handle plugin hook fatal error case.
	(acc_memcpy_to_device): Likewise.
	(acc_memcpy_from_device): Likewise.
	(delete_copyout): Add libfnname parameter, handle free_func
	hook fatal error case.
	(acc_delete): Adjust delete_copyout call.
	(acc_copyout): Likewise.
	(update_dev_host): Move gomp_mutex_unlock to after
	host2dev/dev2host hook calls.

	* plugin/plugin-hsa.c (hsa_warn): Adjust 'hsa_error' local variable
	to 'hsa_error_msg', for clarity.
	(hsa_fatal): Likewise.
	(hsa_error): New function.
	(init_hsa_context): Change return type to bool, adjust to return
	false on error.
	(GOMP_OFFLOAD_get_num_devices): Adjust to handle init_hsa_context
	return value.
	(GOMP_OFFLOAD_init_device): Change return type to bool, adjust to
	return false on error.
	(get_agent_info): Adjust to return NULL on error.
	(destroy_hsa_program): Change return type to bool, adjust to
	return false on error.
	(GOMP_OFFLOAD_load_image): Adjust to return -1 on error.
	(destroy_module): Change return type to bool, adjust to
	return false on error.
	(GOMP_OFFLOAD_unload_image): Likewise.
	(GOMP_OFFLOAD_fini_device): Likewise.
	(GOMP_OFFLOAD_alloc): Change to return NULL when called.
	(GOMP_OFFLOAD_free): Change to return false when called.
	(GOMP_OFFLOAD_dev2host): Likewise.
	(GOMP_OFFLOAD_host2dev): Likewise.
	(GOMP_OFFLOAD_dev2dev): Likewise.

	* plugin/plugin-nvptx.c (CUDA_CALL_ERET): New convenience macro.
	(CUDA_CALL): Likewise.
	(CUDA_CALL_ASSERT): Likewise.
	(map_init): Change return type to bool, use CUDA_CALL* macros.
	(map_fini): Likewise.
	(init_streams_for_device): Change return type to bool, adjust
	call to map_init.
	(fini_streams_for_device): Change return type to bool, adjust
	call to map_fini.
	(select_stream_for_async): Release stream_lock before calls to
	GOMP_PLUGIN_fatal, adjust call to map_init.
	(nvptx_init): Use CUDA_CALL* macros.
	(nvptx_attach_host_thread_to_device): Change return type to bool,
	use CUDA_CALL* macros.
	(nvptx_open_device): Use CUDA_CALL* macros.
	(nvptx_close_device): Change return type to bool, use CUDA_CALL*
	macros.
	(nvptx_get_num_devices): Use CUDA_CALL* macros.
	(link_ptx): Change return type to bool, use CUDA_CALL* macros.
	(nvptx_exec): Use CUDA_CALL* macros.
	(nvptx_alloc): Use CUDA_CALL* macros.
	(nvptx_free): Change return type to bool, use CUDA_CALL* macros.
	(nvptx_host2dev): Likewise.
	(nvptx_dev2host): Likewise.
	(nvptx_wait): Use CUDA_CALL* macros.
	(nvptx_wait_async): Likewise.
	(nvptx_wait_all): Likewise.
	(nvptx_wait_all_async): Likewise.
	(nvptx_set_cuda_stream): Adjust order of stream_lock acquire,
	use CUDA_CALL* macros, adjust call to map_fini.
	(GOMP_OFFLOAD_init_device): Change return type to bool,
	adjust code accordingly.
	(GOMP_OFFLOAD_fini_device): Likewise.
	(GOMP_OFFLOAD_load_image): Adjust calls to
	nvptx_attach_host_thread_to_device/link_ptx to handle errors,
	use CUDA_CALL* macros.
	(GOMP_OFFLOAD_unload_image): Change return type to bool, adjust
	return code.
	(GOMP_OFFLOAD_alloc): Adjust calls to code to handle error return.
	(GOMP_OFFLOAD_free): Change return type to bool, adjust calls to
	handle error return.
	(GOMP_OFFLOAD_dev2host): Likewise.
	(GOMP_OFFLOAD_host2dev): Likewise.
	(GOMP_OFFLOAD_openacc_register_async_cleanup): Use CUDA_CALL* macros.
	(GOMP_OFFLOAD_openacc_create_thread_data): Likewise.

liboffloadmic/
2016-05-26  Chung-Lin Tang  <cltang@codesourcery.com>

	* plugin/libgomp-plugin-intelmic.cpp (offload): Change return type
	to bool, adjust return code.
	(GOMP_OFFLOAD_init_device): Likewise.
	(GOMP_OFFLOAD_fini_device): Likewise.
	(get_target_table): Likewise.
	(offload_image): Likwise.
	(GOMP_OFFLOAD_load_image): Adjust call to offload_image(), change
	to return -1 on error.
	(GOMP_OFFLOAD_unload_image): Change return type to bool, adjust return
	code.
	(GOMP_OFFLOAD_alloc): Likewise.
	(GOMP_OFFLOAD_free): Likewise.
	(GOMP_OFFLOAD_host2dev): Likewise.
	(GOMP_OFFLOAD_dev2host): Likewise.
	(GOMP_OFFLOAD_dev2dev): Likewise.

From-SVN: r236768
2016-05-26 09:58:56 +00:00
Alexander Monakov c2bd3b6911 libgomp nvptx plugin: make cuMemFreeHost error non-fatal
From-SVN: r235339
2016-04-21 16:11:47 +03:00
Thomas Schwinge f99c355797 Use plain -fopenacc to enable OpenACC kernels processing
gcc/
	* tree-parloops.c (create_parallel_loop, gen_parallel_loop)
	(parallelize_loops): In OpenACC kernels mode, set n_threads to
	zero.
	(pass_parallelize_loops::gate): In OpenACC kernels mode, gate on
	flag_openacc.
	* tree-ssa-loop.c (gate_oacc_kernels): Likewise.
	gcc/testsuite/
	* c-c++-common/goacc/kernels-counter-vars-function-scope.c: Adjust
	to -ftree-parallelize-loops/-fopenacc changes.
	* c-c++-common/goacc/kernels-double-reduction-n.c: Likewise.
	* c-c++-common/goacc/kernels-double-reduction.c: Likewise.
	* c-c++-common/goacc/kernels-loop-2.c: Likewise.
	* c-c++-common/goacc/kernels-loop-3.c: Likewise.
	* c-c++-common/goacc/kernels-loop-g.c: Likewise.
	* c-c++-common/goacc/kernels-loop-mod-not-zero.c: Likewise.
	* c-c++-common/goacc/kernels-loop-n.c: Likewise.
	* c-c++-common/goacc/kernels-loop-nest.c: Likewise.
	* c-c++-common/goacc/kernels-loop.c: Likewise.
	* c-c++-common/goacc/kernels-one-counter-var.c: Likewise.
	* c-c++-common/goacc/kernels-reduction.c: Likewise.
	* gfortran.dg/goacc/kernels-loop-inner.f95: Likewise.
	* gfortran.dg/goacc/kernels-loops-adjacent.f95: Likewise.
	libgomp/
	* oacc-parallel.c (GOACC_parallel_keyed): Initialize dims.
	* plugin/plugin-nvptx.c (nvptx_exec): Provide default values for
	dims.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Adjust to
	-ftree-parallelize-loops/-fopenacc changes.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c:
	Likewise.

From-SVN: r233634
2016-02-23 16:07:54 +01:00
Alexander Monakov 0d58938ed7 nvptx plugin: do not force JIT target SM version
When link_ptx runs, a CUDA device is already bound to current thread, so the
driver library knows the target architecture.  There isn't any benefit from
forcing a specific target here; on the contrary, hardcoding sm_30 breaks
offloading on later (Maxwell, sm_5x) devices.
    
	* plugin/plugin-nvptx.c (link_ptx): Do not set CU_JIT_TARGET.

From-SVN: r232227
2016-01-11 15:55:31 +03:00
Jakub Jelinek 818ab71a41 Update copyright years.
From-SVN: r232055
2016-01-04 15:30:50 +01:00
Nathan Sidwell 5c06742f6f libgomp.h (struct acc_dispatch_t): Remove args from exec_func.
* libgomp.h (struct acc_dispatch_t): Remove args from exec_func.
	* plugin/plugin-nvptx.c (nvptx_exec): Remove sizes & kinds arg.
	(GOMP_OFFLOAD_openacc_parallel): Likewise.
	* oacc-host.c (host_openacc_exec): Likewise.
	* oacc-parallel.c (GOACC_parallel_keyed): Adjust exec_func call.

From-SVN: r229721
2015-11-03 20:18:33 +00:00
Nathan Sidwell a1c1908bbd plugin-nvptx.c (nvptx_exec): Remove check on compute dimensions.
* plugin/plugin-nvptx.c (nvptx_exec): Remove check on compute
	dimensions.

From-SVN: r229471
2015-10-28 03:00:10 +00:00
Thomas Schwinge 113020dc59 nvptx offloading linking
gcc/
	* config/nvptx/mkoffload.c (Kind, Vis): Remove enums.
	(Token, Stmt): Remove structs.
	(decls, vars, fns): Remove variables.
	(alloc_comment, append_stmt, is_keyword): Remove macros.
	(tokenize, write_token, write_tokens, alloc_stmt, rev_stmts)
	(write_stmt, write_stmts, parse_insn, parse_list_nosemi)
	(parse_init, parse_file): Remove functions.
	(read_file): Accept a pointer to a length and store into it.
	(process): Don't try to parse the input file, just write it out as
	a string, but looking for maps.  Also write out the length.
	(main): Don't use "-S" to compile PTX code.

	libgomp/
	* oacc-ptx.h: Remove file, moving its content into...
	* config/nvptx/fortran.c: ... here...
	* config/nvptx/oacc-init.c: ..., here...
	* config/nvptx/oacc-parallel.c: ..., and here.
	* config/nvptx/openacc.f90: New file.
	* plugin/plugin-nvptx.c: Don't include "oacc-ptx.h".
	(link_ptx): Don't link in predefined bits of PTX code.

Co-Authored-By: Bernd Schmidt <bernds@codesourcery.com>

From-SVN: r228418
2015-10-02 21:43:41 +02:00
Nathan Sidwell cc3cd79bc2 mkoffload.c (process): Change offload data format.
gcc/
	* config/nvptx/mkoffload.c (process): Change offload data format.

	libgomp/
	* plugin/plugin-nvptx.c (targ_fn_launch): Use GOMP_DIM_MAX.
	(struct targ_ptx_obj): New.
	(nvptx_tdata): Move earlier, change data format.
	(link_ptx): Take targ_ptx_obj ptr and count.  Allow multiple
	objects.
	(GOMP_OFFLOAD_load_image): Adjust.

Co-Authored-By: Bernd Schmidt <bernds@codesourcery.com>

From-SVN: r228308
2015-09-30 21:35:47 +00:00
Nathan Sidwell a12a043782 plugin-nvptx.c (ARRAYSIZE): Delete.
* plugin/plugin-nvptx.c (ARRAYSIZE): Delete.
	(cuda_errlist): Delete.
	(cuda_error): Reimplement.

From-SVN: r228265
2015-09-29 19:45:47 +00:00
Nathan Sidwell 3e32ee19a5 gomp-constants.h (GOMP_VERSION_NVIDIA_PTX): Increment.
inlude/
	* gomp-constants.h (GOMP_VERSION_NVIDIA_PTX): Increment.
	(GOMP_DIM_GANG, GOMP_DIM_WORKER, GOMP_DIM_VECTOR, GOMP_DIM_MAX,
	GOMP_DIM_MASK): New.
	(GOMP_LAUNCH_DIM, GOMP_LAUNCH_ASYNC, GOMP_LAUNCH_WAIT): New.
	(GOMP_LAUNCH_CODE_SHIFT, GOMP_LAUNCH_DEVICE_SHIFT,
	GOMP_LAUNCH_OP_SHIFT): New.
	(GOMP_LAUNCH_PACK, GOMP_LAUNCH_CODE, GOMP_LAUNCH_DEVICE,
	GOMP_LAUNCH_OP): New.
	(GOMP_LAUNCH_OP_MAX): New.

	libgomp/
	* libgomp.h (acc_dispatch_t): Replace separate geometry args with
	array.
	* libgomp.map (GOACC_parallel_keyed): New.
	* oacc-parallel.c (goacc_wait): Take pointer to va_list.  Adjust
	all callers.
	(GOACC_parallel_keyed): New interface.  Lose geometry arguments
	and take keyed varargs list.  Adjust call to exec_func.
	(GOACC_parallel): Force host fallback.
	* libgomp_g.h (GOACC_parallel): Remove.
	(GOACC_parallel_keyed): Declare.
	* plugin/plugin-nvptx.c (struct targ_fn_launch): New struct.
	(stuct targ_gn_descriptor): Replace name field with launch field.
	(nvptx_exec): Lose separate geometry args, take array.  Process
	dynamic dimensions and adjust.
	(struct nvptx_tdata): Replace fn_names field with fn_descs.
	(GOMP_OFFLOAD_load_image): Adjust for change in function table
	data.
	(GOMP_OFFLOAD_openacc_parallel): Adjust for change in dimension
	passing.
	* oacc-host.c (host_openacc_exec): Adjust for change in dimension
	passing.

	gcc/
	* config/nvptx/nvptx.c: Include omp-low.h and gomp-constants.h.
	(nvptx_record_offload_symbol): Record function execution geometry.
	* config/nvptx/mkoffload.c (process): Include launch geometry in
	function data.
	* omp-low.c (oacc_launch_pack): New.
	(replace_oacc_fn_attrib): New.
	(set_oacc_fn_attrib): New.
	(get_oacc_fn_attrib): New.
	(expand_omp_target): Create keyed varargs for GOACC_parallel call
	generation.
	* omp-low.h (get_oacc_fn_attrib): Declare.
	* builtin-types.def (DEF_FUNCTION_TyPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.
	* tree.h (OMP_CLAUSE_EXPR): New.
	* omp-builtins.def (BUILT_IN_GOACC_PARALLEL): Change target fn name.

	gcc/lto/
	* lto-lang.c (DEF_FUNCTION_TYPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.

	gcc/c-family/
	* c-common.c (DEF_FUNCTION_TYPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.

	gcc/fortran/
	* f95-lang.c (DEF_FUNCTION_TYPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.
	* types.def (DEF_FUNCTION_TYPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.

	gcc/ada/
	* gcc-interface/utils.c (DEF_FUNCTION_TYPE_VAR_6): Define

From-SVN: r228220
2015-09-28 19:37:33 +00:00
Nathan Sidwell 2a21ff193a libgomp.map: Add 4.0.2 version.
libgomp/
	* libgomp.map: Add 4.0.2 version.
	* target.c (offload_image_descr): Add version field.
	(gomp_load_image_to_device): Add version argument.  Adjust plugin
	call.  Improve load mismatch diagnostic.
	(gomp_unload_image_from_device): Add version argument.  Adjust plugin
	call.
	(GOMP_offload_regster): Make stub function, move bulk to ...
	(GOMP_offload_register_ver): ... here.  Process version argument.
	(GOMP_offload_unregister): Make stub function, move bulk to ...
	(GOMP_offload_unregister_ver): ... here.  Process version argument.
	(gomp_init_device): Process version field.
	(gomp_unload_device): Process version field.
	(gomp_load_plugin_for_device): Reimplement DLSYM & DLSYM_OPT
	macros.  Check plugin version.
	* libgomp.h (gomp_device_descr): Add version function field.  Adjust
	loader and unloader types.
	* oacc-host.c: Include gomp-constants.h.
	(host_version): New.
	(host_load_image, host_unload_image): Adjust.
	(host_dispatch): Add host_version.
	* plugin/plugin-nvptx.c: Include gomp-constants.h.
	(GOMP_OFFLOAD_version): New.
	(GOMP_OFFLOAD_load_image): Add version arg and check it.
	(GOMP_OFFLOAD_unload_image): Likewise.
	* plugin/plugin-host.c: Include gomp-constants.h.
	(GOMP_OFFLOAD_version): New.
	(GOMP_OFFLOAD_load_image): Add version arg.
	(GOMP_OFFLOAD_unload_image): Likewise.

	liboffloadmic/
	* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_version): New.
	(GOMP_OFFLOAD_load_image): Add version arg and check it.
	(GOMP_OFFLOAD_unload_image): Likewise.

	include/
	* gomp-constants.h (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX,
	GOMP_VERSION_INTEL_MIC): New.
	(GOMP_VERSION_PACK, GOMP_VERSION_LIB, GOMP_VERSION_DEV): New.

	gcc/
	* config/nvptx/mkoffload.c (process): Replace
	GOMP_offload_{,un}register with GOMP_offload_{,un}register_ver.

From-SVN: r227137
2015-08-24 17:10:06 +00:00
Nathan Sidwell 9ebddeb045 plugin-nvptx.c: Don't include dlfcn.h.
* plugin/plugin-nvptx.c: Don't include dlfcn.h.
	(cuda_errlist): Constify.
	(errmsg):  Move into ...
	(cuda_error): ... here.  Make smaller.
	(_XSTR, _STR): Delete.
	(cuda_synames): Delete.
	(verify_device_library): Delete.
	(nvptx_init): Don't call it.

From-SVN: r226539
2015-08-04 00:40:18 +00:00
Nathan Sidwell f3e9a059a7 plugin-nvptx.c (struct targ_fn_descriptor): Move later.
* plugin/plugin-nvptx.c (struct targ_fn_descriptor): Move later.
	(struct ptx_image_data): Move earlier, add fns field.
	(struct ptx_device): Add images and image_lock fields.
	(ptx_images, ptx_image_lock): Delete.
	(nvptx_open_device): Initialize images and image_lock fields.
	(nvptx_close_device): Destroy image_lock.
	(GOMP_OFFLOAD_load_image): Register image to device-specific fields.
	(GOMP_OFFLOAD_unload_image): Unregister image from device-specific
	fields.

From-SVN: r226004
2015-07-20 16:17:57 +00:00
Nathan Sidwell afb2d80bc5 mkoffload.c (process): Constify target data.
gcc/
	* config/nvptx/mkoffload.c (process): Constify target data.
	* config/i386/intelmic-mkoffload.c (generate_target_descr_file):
	Constify target data.
	(generate_target_offloadend_file): Likewise.

	libgomp/
	* libgomp.h (gomp_device_descr): Constify target data arguments.
	* target.c (struct offload_image_descr): Constify target_data.
	(gomp_offload_image_to_device): Likewise.
	(GOMP_offload_register): Likewise.
	(GOMP_offload_unrefister): Likewise.
	* plugin/plugin-host.c (GOMP_OFFLOAD_load_image,
	GOMP_OFFLOAD_unload_image): Constify target data.
	* plugin/plugin-nvptx.c (struct ptx_image_data): Constify target data.
	(GOMP_OFFLOAD_load_image, GOMP_OFFLOAD_unload_image): Likewise.

	liboffloadmic/
	* plugin/libgomp-plugin-intelmic.cpp (ImgDevAddrMap): Constify.
	(offload_image, GOMP_OFFLOAD_load_image,
	OMP_OFFLOAD_unload_image): Constify target data.

From-SVN: r225936
2015-07-17 14:07:53 +00:00
Nathan Sidwell a4cb876dc9 plugin-nvptx.c (link_ptx): Constify string argument.
libgomp/
	* plugin/plugin-nvptx.c (link_ptx): Constify string argument.
	Workaround driver library const error.
	(struct nvptx_tdata, nvptx_tdata_t): New.
	(GOMP_OFFLOAD_load_image): Use struct for target_data's real
	type.

	gcc/
	* config/nvptx/mkoffload.c (process): Constify mapping variables.
	Define target data struct and initialize it.

From-SVN: r225897
2015-07-16 17:17:31 +00:00
Thomas Schwinge a92defdab7 [nvptx offloading] Only 64-bit configurations are currently supported
PR libgomp/65099
	gcc/
	* config/nvptx/mkoffload.c (main): Create an offload image only in
	64-bit configurations.
	libgomp/
	* plugin/plugin-nvptx.c (nvptx_get_num_devices): Return 0 if not
	in a 64-bit configuration.
	* testsuite/libgomp.oacc-c++/c++.exp: Don't attempt nvidia
	offloading testing if no such device is available.
	* testsuite/libgomp.oacc-c/c.exp: Likewise.
	* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.

From-SVN: r225560
2015-07-08 16:59:59 +02:00
Julian Brown c831982647 plugin-nvptx.c (nvptx_get_num_devices): Return zero on cuInit failure.
* plugin/plugin-nvptx.c (nvptx_get_num_devices): Return zero
	on cuInit failure.

From-SVN: r223352
2015-05-19 11:06:31 +00:00