mirror of git://gcc.gnu.org/git/gcc.git
592 lines
18 KiB
HTML
592 lines
18 KiB
HTML
<?xml version="1.0" encoding="ISO-8859-1"?>
|
|
<!DOCTYPE html
|
|
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
|
|
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
|
|
<head>
|
|
<meta name="AUTHOR" content="bkoz@gcc.gnu.org (Benjamin Kosnik)" />
|
|
<meta name="KEYWORDS" content="c++, libstdc++, gdb, g++, debug" />
|
|
<meta name="DESCRIPTION" content="The libstdc++ parallel mode." />
|
|
<meta name="GENERATOR" content="emacs and ten fingers" />
|
|
<title>The libstdc++ parallel mode</title>
|
|
<link rel="StyleSheet" href="lib3styles.css" type="text/css" />
|
|
<link rel="Copyright" href="17_intro/license.html" type="text/html" />
|
|
</head>
|
|
<body>
|
|
|
|
<h1 class="centered"><a name="top">The libstdc++ parallel mode</a></h1>
|
|
|
|
<p class="fineprint"><em>
|
|
The latest version of this document is always available at
|
|
<a href="http://gcc.gnu.org/onlinedocs/libstdc++/parallel_mode.html">
|
|
http://gcc.gnu.org/onlinedocs/libstdc++/parallel_mode.html</a>.
|
|
</em></p>
|
|
|
|
<p><em>
|
|
To the <a href="http://gcc.gnu.org/libstdc++/">libstdc++-v3 homepage</a>.
|
|
</em></p>
|
|
|
|
<!-- ####################################################### -->
|
|
<hr />
|
|
<p> The libstdc++ parallel mode is an experimental parallel
|
|
implementation of many algorithms the C++ Standard Library.
|
|
</p>
|
|
|
|
<p>
|
|
Several of the standard algorithms, for instance
|
|
<code>std::sort</code>, are made parallel using OpenMP
|
|
annotations. These parallel mode constructs and can be invoked by
|
|
explicit source declaration or by compiling existing sources with a
|
|
specific compiler flag.
|
|
</p>
|
|
|
|
<h3 class="left"><a name="parallel">The libstdc++ parallel mode</a></h3>
|
|
|
|
<p>The libstdc++ parallel mode performs parallelization of algorithms,
|
|
function objects, classes, and functions in the C++ Standard.</p>
|
|
|
|
<h4 class="left">Using the libstdc++ parallel mode</h4>
|
|
|
|
<p>To use the libstdc++ parallel mode, compile your application with
|
|
the compiler flag <code>-D_GLIBCXX_PARALLEL -fopenmp</code>. This
|
|
will link in <code>libgomp</code>, the GNU OpenMP <a
|
|
href="http://gcc.gnu.org/onlinedocs/libgomp">implementation</a>,
|
|
whose presence is mandatory. In addition, hardware capable of atomic
|
|
operations is mandatory. Actually activating these atomic
|
|
operations may require explicit compiler flags on some targets
|
|
(like sparc and x86), such as <code>-march=i686</code>,
|
|
<code>-march=native</code> or <code>-mcpu=v9</code>.
|
|
</p>
|
|
|
|
<p>Note that the <code>_GLIBCXX_PARALLEL</code> define may change the
|
|
sizes and behavior of standard class templates such as
|
|
<code>std::search</code>, and therefore one can only link code
|
|
compiled with parallel mode and code compiled without parallel mode
|
|
if no instantiation of a container is passed between the two
|
|
translation units. Parallel mode functionality has distinct linkage,
|
|
and cannot be confused with normal mode symbols.</p>
|
|
|
|
|
|
<p>The following library components in the include
|
|
<code><numeric></code> are included in the parallel mode:</p>
|
|
<ul>
|
|
<li><code>std::accumulate</code></li>
|
|
<li><code>std::adjacent_difference</code></li>
|
|
<li><code>std::inner_product</code></li>
|
|
<li><code>std::partial_sum</code></li>
|
|
</ul>
|
|
|
|
<p>The following library components in the include
|
|
<code><algorithm></code> are included in the parallel mode:</p>
|
|
<ul>
|
|
<li><code>std::adjacent_find</code></li>
|
|
<li><code>std::count</code></li>
|
|
<li><code>std::count_if</code></li>
|
|
<li><code>std::equal</code></li>
|
|
<li><code>std::find</code></li>
|
|
<li><code>std::find_if</code></li>
|
|
<li><code>std::find_first_of</code></li>
|
|
<li><code>std::for_each</code></li>
|
|
<li><code>std::generate</code></li>
|
|
<li><code>std::generate_n</code></li>
|
|
<li><code>std::lexicographical_compare</code></li>
|
|
<li><code>std::mismatch</code></li>
|
|
<li><code>std::search</code></li>
|
|
<li><code>std::search_n</code></li>
|
|
<li><code>std::transform</code></li>
|
|
<li><code>std::replace</code></li>
|
|
<li><code>std::replace_if</code></li>
|
|
<li><code>std::max_element</code></li>
|
|
<li><code>std::merge</code></li>
|
|
<li><code>std::min_element</code></li>
|
|
<li><code>std::nth_element</code></li>
|
|
<li><code>std::partial_sort</code></li>
|
|
<li><code>std::partition</code></li>
|
|
<li><code>std::random_shuffle</code></li>
|
|
<li><code>std::set_union</code></li>
|
|
<li><code>std::set_intersection</code></li>
|
|
<li><code>std::set_symmetric_difference</code></li>
|
|
<li><code>std::set_difference</code></li>
|
|
<li><code>std::sort</code></li>
|
|
<li><code>std::stable_sort</code></li>
|
|
<li><code>std::unique_copy</code></li>
|
|
</ul>
|
|
|
|
<p>The following library components in the includes
|
|
<code><set></code> and <code><map></code> are included in the parallel mode:</p>
|
|
<ul>
|
|
<li><code>std::(multi_)map/set<T>::(multi_)map/set(Iterator begin, Iterator end)</code> (bulk construction)</li>
|
|
<li><code>std::(multi_)map/set<T>::insert(Iterator begin, Iterator end)</code> (bulk insertion)</li>
|
|
</ul>
|
|
|
|
|
|
<h4 class="left">Using the parallel algorithms without parallel mode</h4>
|
|
|
|
<p>When it is not feasible to recompile your entire application, or
|
|
only specific algorithms need to be parallel-aware, individual
|
|
parallel algorithms can be made available explicitly. These
|
|
parallel algorithms are functionally equivalent to the standard
|
|
drop-in algorithms used in parallel mode, but they are available in
|
|
a separate namespace as GNU extensions and may be used in programs
|
|
compiled with either release mode or with parallel mode. The
|
|
following table provides the names and headers of the parallel
|
|
algorithms:
|
|
</p>
|
|
|
|
|
|
<table title="Parallel algorithms" border="1">
|
|
<tr>
|
|
<th>Algorithm</th>
|
|
<th>Header</th>
|
|
<th>Parallel algorithm</th>
|
|
<th>Parallel header</th>
|
|
</tr>
|
|
<tr>
|
|
<td>std::accumulate</td>
|
|
<td><numeric></td>
|
|
<td>__gnu_parallel::accumulate</td>
|
|
<td><parallel/numeric></td>
|
|
</tr>
|
|
<tr>
|
|
<td>std::adjacent_difference</td>
|
|
<td><numeric></td>
|
|
<td>__gnu_parallel::adjacent_difference</td>
|
|
<td><parallel/numeric></td>
|
|
</tr>
|
|
<tr>
|
|
<td>std::inner_product</td>
|
|
<td><numeric></td>
|
|
<td>__gnu_parallel::inner_product</td>
|
|
<td><parallel/numeric></td>
|
|
</tr>
|
|
<tr>
|
|
<td>std::partial_sum</td>
|
|
<td><numeric></td>
|
|
<td>__gnu_parallel::partial_sum</td>
|
|
<td><parallel/numeric></td>
|
|
</tr>
|
|
<tr>
|
|
<td>std::adjacent_find</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::adjacent_find</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::count</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::count</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::count_if</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::count_if</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::equal</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::equal</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::find</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::find</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::find_if</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::find_if</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::find_first_of</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::find_first_of</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::for_each</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::for_each</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::generate</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::generate</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::generate_n</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::generate_n</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::lexicographical_compare</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::lexicographical_compare</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::mismatch</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::mismatch</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::search</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::search</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::search_n</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::search_n</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::transform</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::transform</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::replace</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::replace</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::replace_if</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::replace_if</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::max_element</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::max_element</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::merge</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::merge</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::min_element</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::min_element</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::nth_element</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::nth_element</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::partial_sort</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::partial_sort</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::partition</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::partition</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::random_shuffle</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::random_shuffle</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::set_union</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::set_union</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::set_intersection</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::set_intersection</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::set_symmetric_difference</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::set_symmetric_difference</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::set_difference</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::set_difference</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::sort</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::sort</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::stable_sort</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::stable_sort</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>std::unique_copy</td>
|
|
<td><algorithm></td>
|
|
<td>__gnu_parallel::unique_copy</td>
|
|
<td><parallel/algorithm></td>
|
|
</tr>
|
|
|
|
</table>
|
|
|
|
|
|
<h4 class="left">Parallel mode semantics</h4>
|
|
|
|
<p> The parallel mode STL algorithms are currently not exception-safe,
|
|
i. e. user-defined functors must not throw exceptions.
|
|
</p>
|
|
|
|
<p> Since the current GCC OpenMP implementation does not support
|
|
OpenMP parallel regions in concurrent threads,
|
|
it is not possible to call parallel STL algorithm in
|
|
concurrent threads, either.
|
|
It might work with other compilers, though.</p>
|
|
|
|
|
|
<h4 class="left">Configuration and Tuning</h4>
|
|
|
|
<p> Some algorithm variants can be enabled/disabled/selected at compile-time.
|
|
See <a href="latest-doxygen/compiletime__settings_8h.html">
|
|
<code><compiletime_settings.h></code></a> and
|
|
See <a href="latest-doxygen/compiletime__settings_8h.html">
|
|
<code><features.h></code></a> for details.
|
|
</p>
|
|
|
|
<p>
|
|
To specify the number of threads to be used for an algorithm,
|
|
use <code>omp_set_num_threads</code>.
|
|
To force a function to execute sequentially,
|
|
even though parallelism is switched on in general,
|
|
add <code>__gnu_parallel::sequential_tag()</code>
|
|
to the end of the argument list.
|
|
</p>
|
|
|
|
<p>
|
|
Parallelism always incurs some overhead. Thus, it is not
|
|
helpful to parallelize operations on very small sets of data.
|
|
There are measures to avoid parallelizing stuff that is not worth it.
|
|
For each algorithm, a minimum problem size can be stated,
|
|
usually using the variable
|
|
<code>__gnu_parallel::Settings::[algorithm]_minimal_n</code>.
|
|
Please see <a href="latest-doxygen/settings_8h.html">
|
|
<code><settings.h></code><a> for details.</p>
|
|
|
|
|
|
|
|
<h4 class="left">Interface basics and general design</h4>
|
|
|
|
<p>All parallel algorithms are intended to have signatures that are
|
|
equivalent to the ISO C++ algorithms replaced. For instance, the
|
|
<code>std::adjacent_find</code> function is declared as:
|
|
|
|
<pre>
|
|
namespace std
|
|
{
|
|
template<typename _FIter>
|
|
_FIter
|
|
adjacent_find(_FIter, _FIter);
|
|
}
|
|
</pre>
|
|
|
|
Which means that there should be something equivalent for the parallel
|
|
version. Indeed, this is the case:
|
|
|
|
<pre>
|
|
namespace std
|
|
{
|
|
namespace __parallel
|
|
{
|
|
template<typename _FIter>
|
|
_FIter
|
|
adjacent_find(_FIter, _FIter);
|
|
|
|
...
|
|
}
|
|
}
|
|
</pre>
|
|
|
|
<p>But.... why the elipses?
|
|
</p>
|
|
|
|
<p> The elipses in the example above represent additional overloads
|
|
required for the parallel version of the function. These additional
|
|
overloads are used to dispatch calls from the ISO C++ function
|
|
signature to the appropriate parallel function (or sequential
|
|
function, if no parallel functions are deemed worthy), based on either
|
|
compile-time or run-time conditions.
|
|
</p>
|
|
|
|
<p> Compile-time conditions are referred to as "embarrassingly
|
|
parallel," and are denoted with the appropriate dispatch object, ie
|
|
one of <code>__gnu_parallel::sequential_tag</code>,
|
|
<code>__gnu_parallel::parallel_tag</code>,
|
|
<code>__gnu_parallel::balanced_tag</code>,
|
|
<code>__gnu_parallel::unbalanced_tag</code>,
|
|
<code>__gnu_parallel::omp_loop_tag</code>, or
|
|
<code>__gnu_parallel::omp_loop_static_tag</code>.
|
|
</p>
|
|
|
|
<p> Run-time conditions depend on the hardware being used, the number
|
|
of threads available, etc., and are denoted by the use of the enum
|
|
<code>__gnu_parallel::parallelism</code>. Values of this enum include
|
|
<code>__gnu_parallel::sequential</code>,
|
|
<code>__gnu_parallel::parallel_unbalanced</code>,
|
|
<code>__gnu_parallel::parallel_balanced</code>,
|
|
<code>__gnu_parallel::parallel_omp_loop</code>,
|
|
<code>__gnu_parallel::parallel_omp_loop_static</code>, or
|
|
<code>__gnu_parallel::parallel_taskqueue</code>.
|
|
</p>
|
|
|
|
<p> Putting all this together, the general view of overloads for the
|
|
parallel algorithms look like this:
|
|
<p>
|
|
<ul>
|
|
<li>ISO C++ signature</li>
|
|
<li>ISO C++ signature + sequential_tag argument</li>
|
|
<li>ISO C++ signature + parallelism argument</li>
|
|
</ul>
|
|
|
|
<p> Please note that the implementation may use additional functions
|
|
(designated with the <code>_switch</code> suffix) to dispatch from the
|
|
ISO C++ signature to the correct parallel version. Also, some of the
|
|
algorithms do not have support for run-time conditions, so the last
|
|
overload is therefore missing.
|
|
</p>
|
|
|
|
|
|
<h4 class="left">Relevant namespaces</h4>
|
|
|
|
<p> One namespace contain versions of code that are explicitly sequential:
|
|
<code>__gnu_serial</code>.
|
|
</p>
|
|
|
|
<p> Two namespaces contain the parallel mode:
|
|
<code>std::__parallel</code> and <code>__gnu_parallel</code>.
|
|
</p>
|
|
|
|
<p> Parallel implementations of standard components, including
|
|
template helpers to select parallelism, are defined in <code>namespace
|
|
std::__parallel</code>. For instance, <code>std::transform</code> from
|
|
<algorithm> has a parallel counterpart in
|
|
<code>std::__parallel::transform</code> from
|
|
<parallel/algorithm>. In addition, these parallel
|
|
implementations are injected into <code>namespace
|
|
__gnu_parallel</code> with using declarations.
|
|
</p>
|
|
|
|
<p> Support and general infrastructure is in <code>namespace
|
|
__gnu_parallel</code>.
|
|
</p>
|
|
|
|
<p> More information, and an organized index of types and functions
|
|
related to the parallel mode on a per-namespace basis, can be found in
|
|
the generated source documentation.
|
|
</p>
|
|
|
|
<h4 class="left">Testing</h4>
|
|
|
|
<p> Both the normal conformance and regression tests and the
|
|
supplemental performance tests work.</p>
|
|
|
|
<p> To run the conformance and regression tests with the parallel mode
|
|
active,</p>
|
|
<code>make check-parallel</code>
|
|
|
|
<p>The log and summary files for conformance testing are in the
|
|
<code>testsuite/parallel</code> directory.</p>
|
|
|
|
<p> To run the performance tests with the parallel mode active, </p>
|
|
<code>make check-performance-parallel</code>
|
|
|
|
<p>The result file for performance testing are in the
|
|
<code>testsuite</code> directory, in the file
|
|
<code>libstdc++_performance.sum</code>. In addition, the policy-based
|
|
containers have their own visualizations, which have additional
|
|
software dependencies than the usual bare-boned text file, and can be
|
|
generated by using the <code>make doc-performance</code> rule in the
|
|
testsuite's Makefile.</p>
|
|
|
|
<p>Return <a href="#top">to the top of the page</a> or
|
|
<a href="http://gcc.gnu.org/libstdc++/">to the libstdc++ homepage</a>.
|
|
</p>
|
|
|
|
|
|
<h4 class="left">References / Further Reading</h4>
|
|
|
|
<p>
|
|
Johannes Singler, Peter Sanders, Felix Putze. The Multi-Core Standard Template Library. Euro-Par 2007: Parallel Processing. (LNCS 4641)
|
|
</p>
|
|
|
|
<p>
|
|
Leonor Frias, Johannes Singler: Parallelization of Bulk Operations for STL Dictionaries. Workshop on Highly Parallel Processing on a Chip (HPPC) 2007. (LNCS)
|
|
</p>
|
|
|
|
<!-- ####################################################### -->
|
|
|
|
<hr />
|
|
<p class="fineprint"><em>
|
|
See <a href="17_intro/license.html">license.html</a> for copying conditions.
|
|
Comments and suggestions are welcome, and may be sent to
|
|
<a href="mailto:libstdc++@gcc.gnu.org">the libstdc++ mailing list</a>.
|
|
</em></p>
|
|
|
|
|
|
</body>
|
|
</html>
|