mirror of git://gcc.gnu.org/git/gcc.git
				
				
				
			
		
			
				
	
	
		
			879 lines
		
	
	
		
			31 KiB
		
	
	
	
		
			XML
		
	
	
	
			
		
		
	
	
			879 lines
		
	
	
		
			31 KiB
		
	
	
	
		
			XML
		
	
	
	
| <chapter xmlns="http://docbook.org/ns/docbook" version="5.0" 
 | |
| 	 xml:id="manual.ext.parallel_mode" xreflabel="Parallel Mode">
 | |
| <?dbhtml filename="parallel_mode.html"?>
 | |
| 
 | |
| <info><title>Parallel Mode</title>
 | |
|   <keywordset>
 | |
|     <keyword>C++</keyword>
 | |
|     <keyword>library</keyword>
 | |
|     <keyword>parallel</keyword>
 | |
|   </keywordset>
 | |
| </info>
 | |
| 
 | |
| 
 | |
| 
 | |
| <para> The libstdc++ parallel mode is an experimental parallel
 | |
| implementation of many algorithms the C++ Standard Library.
 | |
| </para>
 | |
| 
 | |
| <para>
 | |
| Several of the standard algorithms, for instance
 | |
| <function>std::sort</function>, are made parallel using OpenMP
 | |
| annotations. These parallel mode constructs and can be invoked by
 | |
| explicit source declaration or by compiling existing sources with a
 | |
| specific compiler flag.
 | |
| </para>
 | |
| 
 | |
| 
 | |
| <section xml:id="manual.ext.parallel_mode.intro" xreflabel="Intro"><info><title>Intro</title></info>
 | |
|   
 | |
| 
 | |
| <para>The following library components in the include
 | |
| <filename class="headerfile">numeric</filename> are included in the parallel mode:</para>
 | |
| <itemizedlist>
 | |
|   <listitem><para><function>std::accumulate</function></para></listitem>
 | |
|   <listitem><para><function>std::adjacent_difference</function></para></listitem>
 | |
|   <listitem><para><function>std::inner_product</function></para></listitem>
 | |
|   <listitem><para><function>std::partial_sum</function></para></listitem>
 | |
| </itemizedlist>
 | |
| 
 | |
| <para>The following library components in the include
 | |
| <filename class="headerfile">algorithm</filename> are included in the parallel mode:</para>
 | |
| <itemizedlist>
 | |
|   <listitem><para><function>std::adjacent_find</function></para></listitem>
 | |
|   <listitem><para><function>std::count</function></para></listitem>
 | |
|   <listitem><para><function>std::count_if</function></para></listitem>
 | |
|   <listitem><para><function>std::equal</function></para></listitem>
 | |
|   <listitem><para><function>std::find</function></para></listitem>
 | |
|   <listitem><para><function>std::find_if</function></para></listitem>
 | |
|   <listitem><para><function>std::find_first_of</function></para></listitem>
 | |
|   <listitem><para><function>std::for_each</function></para></listitem>
 | |
|   <listitem><para><function>std::generate</function></para></listitem>
 | |
|   <listitem><para><function>std::generate_n</function></para></listitem>
 | |
|   <listitem><para><function>std::lexicographical_compare</function></para></listitem>
 | |
|   <listitem><para><function>std::mismatch</function></para></listitem>
 | |
|   <listitem><para><function>std::search</function></para></listitem>
 | |
|   <listitem><para><function>std::search_n</function></para></listitem>
 | |
|   <listitem><para><function>std::transform</function></para></listitem>
 | |
|   <listitem><para><function>std::replace</function></para></listitem>
 | |
|   <listitem><para><function>std::replace_if</function></para></listitem>
 | |
|   <listitem><para><function>std::max_element</function></para></listitem>
 | |
|   <listitem><para><function>std::merge</function></para></listitem>
 | |
|   <listitem><para><function>std::min_element</function></para></listitem>
 | |
|   <listitem><para><function>std::nth_element</function></para></listitem>
 | |
|   <listitem><para><function>std::partial_sort</function></para></listitem>
 | |
|   <listitem><para><function>std::partition</function></para></listitem>
 | |
|   <listitem><para><function>std::random_shuffle</function></para></listitem>
 | |
|   <listitem><para><function>std::set_union</function></para></listitem>
 | |
|   <listitem><para><function>std::set_intersection</function></para></listitem>
 | |
|   <listitem><para><function>std::set_symmetric_difference</function></para></listitem>
 | |
|   <listitem><para><function>std::set_difference</function></para></listitem>
 | |
|   <listitem><para><function>std::sort</function></para></listitem>
 | |
|   <listitem><para><function>std::stable_sort</function></para></listitem>
 | |
|   <listitem><para><function>std::unique_copy</function></para></listitem>
 | |
| </itemizedlist>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| <section xml:id="manual.ext.parallel_mode.semantics" xreflabel="Semantics"><info><title>Semantics</title></info>
 | |
| <?dbhtml filename="parallel_mode_semantics.html"?>
 | |
|   
 | |
| 
 | |
| <para> The parallel mode STL algorithms are currently not exception-safe,
 | |
| i.e. user-defined functors must not throw exceptions.
 | |
| Also, the order of execution is not guaranteed for some functions, of course.
 | |
| Therefore, user-defined functors should not have any concurrent side effects.
 | |
| </para>
 | |
| 
 | |
| <para> Since the current GCC OpenMP implementation does not support
 | |
| OpenMP parallel regions in concurrent threads,
 | |
| it is not possible to call parallel STL algorithm in
 | |
| concurrent threads, either.
 | |
| It might work with other compilers, though.</para>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| <section xml:id="manual.ext.parallel_mode.using" xreflabel="Using"><info><title>Using</title></info>
 | |
| <?dbhtml filename="parallel_mode_using.html"?>
 | |
|   
 | |
| 
 | |
| <section xml:id="parallel_mode.using.prereq_flags"><info><title>Prerequisite Compiler Flags</title></info>
 | |
|   
 | |
| 
 | |
| <para>
 | |
|   Any use of parallel functionality requires additional compiler
 | |
|   and runtime support, in particular support for OpenMP. Adding this support is
 | |
|   not difficult: just compile your application with the compiler
 | |
|   flag <literal>-fopenmp</literal>. This will link
 | |
|   in <code>libgomp</code>, the
 | |
|   <link xmlns:xlink="http://www.w3.org/1999/xlink"
 | |
|     xlink:href="http://gcc.gnu.org/onlinedocs/libgomp/">GNU Offloading and
 | |
|     Multi Processing Runtime Library</link>,
 | |
|   whose presence is mandatory.
 | |
| </para>
 | |
| 
 | |
| <para>
 | |
| In addition, hardware that supports atomic operations and a compiler
 | |
|   capable of producing atomic operations is mandatory: GCC defaults to no
 | |
|   support for atomic operations on some common hardware
 | |
|   architectures. Activating atomic operations may require explicit
 | |
|   compiler flags on some targets (like sparc and x86), such
 | |
|   as <literal>-march=i686</literal>,
 | |
|   <literal>-march=native</literal> or <literal>-mcpu=v9</literal>. See
 | |
|   the GCC manual for more information.
 | |
| </para>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| <section xml:id="parallel_mode.using.parallel_mode"><info><title>Using Parallel Mode</title></info>
 | |
|   
 | |
| 
 | |
| <para>
 | |
|   To use the libstdc++ parallel mode, compile your application with
 | |
|   the prerequisite flags as detailed above, and in addition
 | |
|   add <constant>-D_GLIBCXX_PARALLEL</constant>. This will convert all
 | |
|   use of the standard (sequential) algorithms to the appropriate parallel
 | |
|   equivalents. Please note that this doesn't necessarily mean that
 | |
|   everything will end up being executed in a parallel manner, but
 | |
|   rather that the heuristics and settings coded into the parallel
 | |
|   versions will be used to determine if all, some, or no algorithms
 | |
|   will be executed using parallel variants.
 | |
| </para>
 | |
| 
 | |
| <para>Note that the <constant>_GLIBCXX_PARALLEL</constant> define may change the
 | |
|   sizes and behavior of standard class templates such as
 | |
|   <function>std::search</function>, and therefore one can only link code
 | |
|   compiled with parallel mode and code compiled without parallel mode
 | |
|   if no instantiation of a container is passed between the two
 | |
|   translation units. Parallel mode functionality has distinct linkage,
 | |
|   and cannot be confused with normal mode symbols.
 | |
| </para>
 | |
| </section>
 | |
| 
 | |
| <section xml:id="parallel_mode.using.specific"><info><title>Using Specific Parallel Components</title></info>
 | |
|   
 | |
| 
 | |
| <para>When it is not feasible to recompile your entire application, or
 | |
|   only specific algorithms need to be parallel-aware, individual
 | |
|   parallel algorithms can be made available explicitly. These
 | |
|   parallel algorithms are functionally equivalent to the standard
 | |
|   drop-in algorithms used in parallel mode, but they are available in
 | |
|   a separate namespace as GNU extensions and may be used in programs
 | |
|   compiled with either release mode or with parallel mode.
 | |
| </para>
 | |
| 
 | |
| 
 | |
| <para>An example of using a parallel version
 | |
| of <function>std::sort</function>, but no other parallel algorithms, is:
 | |
| </para>
 | |
| 
 | |
| <programlisting>
 | |
| #include <vector>
 | |
| #include <parallel/algorithm>
 | |
| 
 | |
| int main()
 | |
| {
 | |
|   std::vector<int> v(100);
 | |
| 
 | |
|   // ...
 | |
| 
 | |
|   // Explicitly force a call to parallel sort.
 | |
|   __gnu_parallel::sort(v.begin(), v.end());
 | |
|   return 0;
 | |
| }
 | |
| </programlisting>
 | |
| 
 | |
| <para>
 | |
| Then compile this code with the prerequisite compiler flags
 | |
| (<literal>-fopenmp</literal> and any necessary architecture-specific
 | |
| flags for atomic operations.)
 | |
| </para>
 | |
| 
 | |
| <para> The following table provides the names and headers of all the
 | |
|   parallel algorithms that can be used in a similar manner:
 | |
| </para>
 | |
| 
 | |
| <table frame="all" xml:id="table.parallel_algos">
 | |
| <title>Parallel Algorithms</title>
 | |
| 
 | |
| <tgroup cols="4" align="left" colsep="1" rowsep="1">
 | |
| <colspec colname="c1"/>
 | |
| <colspec colname="c2"/>
 | |
| <colspec colname="c3"/>
 | |
| <colspec colname="c4"/>
 | |
| 
 | |
| <thead>
 | |
|   <row>
 | |
|     <entry>Algorithm</entry>
 | |
|     <entry>Header</entry>
 | |
|     <entry>Parallel algorithm</entry>
 | |
|     <entry>Parallel header</entry>
 | |
|   </row>
 | |
| </thead>
 | |
| 
 | |
| <tbody>
 | |
|   <row>
 | |
|     <entry><function>std::accumulate</function></entry>
 | |
|     <entry><filename class="headerfile">numeric</filename></entry>
 | |
|     <entry><function>__gnu_parallel::accumulate</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/numeric</filename></entry>
 | |
|   </row>
 | |
|   <row>
 | |
|     <entry><function>std::adjacent_difference</function></entry>
 | |
|     <entry><filename class="headerfile">numeric</filename></entry>
 | |
|     <entry><function>__gnu_parallel::adjacent_difference</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/numeric</filename></entry>
 | |
|   </row>
 | |
|   <row>
 | |
|     <entry><function>std::inner_product</function></entry>
 | |
|     <entry><filename class="headerfile">numeric</filename></entry>
 | |
|     <entry><function>__gnu_parallel::inner_product</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/numeric</filename></entry>
 | |
|   </row>
 | |
|   <row>
 | |
|     <entry><function>std::partial_sum</function></entry>
 | |
|     <entry><filename class="headerfile">numeric</filename></entry>
 | |
|     <entry><function>__gnu_parallel::partial_sum</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/numeric</filename></entry>
 | |
|   </row>
 | |
|   <row>
 | |
|     <entry><function>std::adjacent_find</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::adjacent_find</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::count</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::count</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::count_if</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::count_if</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::equal</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::equal</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::find</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::find</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::find_if</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::find_if</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::find_first_of</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::find_first_of</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::for_each</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::for_each</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::generate</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::generate</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::generate_n</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::generate_n</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::lexicographical_compare</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::lexicographical_compare</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::mismatch</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::mismatch</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::search</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::search</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::search_n</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::search_n</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::transform</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::transform</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::replace</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::replace</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::replace_if</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::replace_if</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::max_element</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::max_element</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::merge</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::merge</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::min_element</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::min_element</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::nth_element</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::nth_element</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::partial_sort</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::partial_sort</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::partition</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::partition</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::random_shuffle</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::random_shuffle</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::set_union</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::set_union</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::set_intersection</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::set_intersection</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::set_symmetric_difference</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::set_symmetric_difference</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::set_difference</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::set_difference</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::sort</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::sort</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::stable_sort</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::stable_sort</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| 
 | |
|   <row>
 | |
|     <entry><function>std::unique_copy</function></entry>
 | |
|     <entry><filename class="headerfile">algorithm</filename></entry>
 | |
|     <entry><function>__gnu_parallel::unique_copy</function></entry>
 | |
|     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
 | |
|   </row>
 | |
| </tbody>
 | |
| </tgroup>
 | |
| </table>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| <section xml:id="manual.ext.parallel_mode.design" xreflabel="Design"><info><title>Design</title></info>
 | |
| <?dbhtml filename="parallel_mode_design.html"?>
 | |
|   
 | |
|   <para>
 | |
|   </para>
 | |
| <section xml:id="parallel_mode.design.intro" xreflabel="Intro"><info><title>Interface Basics</title></info>
 | |
|   
 | |
| 
 | |
| <para>
 | |
| All parallel algorithms are intended to have signatures that are
 | |
| equivalent to the ISO C++ algorithms replaced. For instance, the
 | |
| <function>std::adjacent_find</function> function is declared as:
 | |
| </para>
 | |
| <programlisting>
 | |
| namespace std
 | |
| {
 | |
|   template<typename _FIter>
 | |
|     _FIter
 | |
|     adjacent_find(_FIter, _FIter);
 | |
| }
 | |
| </programlisting>
 | |
| 
 | |
| <para>
 | |
| Which means that there should be something equivalent for the parallel
 | |
| version. Indeed, this is the case:
 | |
| </para>
 | |
| 
 | |
| <programlisting>
 | |
| namespace std
 | |
| {
 | |
|   namespace __parallel
 | |
|   {
 | |
|     template<typename _FIter>
 | |
|       _FIter
 | |
|       adjacent_find(_FIter, _FIter);
 | |
| 
 | |
|     ...
 | |
|   }
 | |
| }
 | |
| </programlisting>
 | |
| 
 | |
| <para>But.... why the ellipses?
 | |
| </para>
 | |
| 
 | |
| <para> The ellipses in the example above represent additional overloads
 | |
| required for the parallel version of the function. These additional
 | |
| overloads are used to dispatch calls from the ISO C++ function
 | |
| signature to the appropriate parallel function (or sequential
 | |
| function, if no parallel functions are deemed worthy), based on either
 | |
| compile-time or run-time conditions.
 | |
| </para>
 | |
| 
 | |
| <para> The available signature options are specific for the different
 | |
| algorithms/algorithm classes.</para>
 | |
| 
 | |
| <para> The general view of overloads for the parallel algorithms look like this:
 | |
| </para>
 | |
| <itemizedlist>
 | |
|    <listitem><para>ISO C++ signature</para></listitem>
 | |
|    <listitem><para>ISO C++ signature + sequential_tag argument</para></listitem>
 | |
|    <listitem><para>ISO C++ signature + algorithm-specific tag type
 | |
|     (several signatures)</para></listitem>
 | |
| </itemizedlist>
 | |
| 
 | |
| <para> Please note that the implementation may use additional functions
 | |
| (designated with the <code>_switch</code> suffix) to dispatch from the
 | |
| ISO C++ signature to the correct parallel version. Also, some of the
 | |
| algorithms do not have support for run-time conditions, so the last
 | |
| overload is therefore missing.
 | |
| </para>
 | |
| 
 | |
| 
 | |
| </section>
 | |
| 
 | |
| <section xml:id="parallel_mode.design.tuning" xreflabel="Tuning"><info><title>Configuration and Tuning</title></info>
 | |
|   
 | |
| 
 | |
| 
 | |
| <section xml:id="parallel_mode.design.tuning.omp" xreflabel="OpenMP Environment"><info><title>Setting up the OpenMP Environment</title></info>
 | |
|   
 | |
| 
 | |
| <para>
 | |
| Several aspects of the overall runtime environment can be manipulated
 | |
| by standard OpenMP function calls.
 | |
| </para>
 | |
| 
 | |
| <para>
 | |
| To specify the number of threads to be used for the algorithms globally,
 | |
| use the function <function>omp_set_num_threads</function>. An example:
 | |
| </para>
 | |
| 
 | |
| <programlisting>
 | |
| #include <stdlib.h>
 | |
| #include <omp.h>
 | |
| 
 | |
| int main()
 | |
| {
 | |
|   // Explicitly set number of threads.
 | |
|   const int threads_wanted = 20;
 | |
|   omp_set_dynamic(false);
 | |
|   omp_set_num_threads(threads_wanted);
 | |
| 
 | |
|   // Call parallel mode algorithms.
 | |
| 
 | |
|   return 0;
 | |
| }
 | |
| </programlisting>
 | |
| 
 | |
| <para>
 | |
|  Some algorithms allow the number of threads being set for a particular call,
 | |
|  by augmenting the algorithm variant.
 | |
|  See the next section for further information.
 | |
| </para>
 | |
| 
 | |
| <para>
 | |
| Other parts of the runtime environment able to be manipulated include
 | |
| nested parallelism (<function>omp_set_nested</function>), schedule kind
 | |
| (<function>omp_set_schedule</function>), and others. See the OpenMP
 | |
| documentation for more information.
 | |
| </para>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| <section xml:id="parallel_mode.design.tuning.compile" xreflabel="Compile Switches"><info><title>Compile Time Switches</title></info>
 | |
|   
 | |
| 
 | |
| <para>
 | |
| To force an algorithm to execute sequentially, even though parallelism
 | |
| is switched on in general via the macro <constant>_GLIBCXX_PARALLEL</constant>,
 | |
| add <classname>__gnu_parallel::sequential_tag()</classname> to the end
 | |
| of the algorithm's argument list.
 | |
| </para>
 | |
| 
 | |
| <para>
 | |
| Like so:
 | |
| </para>
 | |
| 
 | |
| <programlisting>
 | |
| std::sort(v.begin(), v.end(), __gnu_parallel::sequential_tag());
 | |
| </programlisting>
 | |
| 
 | |
| <para>
 | |
| Some parallel algorithm variants can be excluded from compilation by
 | |
| preprocessor defines. See the doxygen documentation on
 | |
| <code>compiletime_settings.h</code> and <code>features.h</code> for details.
 | |
| </para>
 | |
| 
 | |
| <para>
 | |
| For some algorithms, the desired variant can be chosen at compile-time by
 | |
| appending a tag object. The available options are specific to the particular
 | |
| algorithm (class).
 | |
| </para>
 | |
| 
 | |
| <para>
 | |
| For the "embarrassingly parallel" algorithms, there is only one "tag object
 | |
| type", the enum _Parallelism.
 | |
| It takes one of the following values,
 | |
| <code>__gnu_parallel::parallel_tag</code>,
 | |
| <code>__gnu_parallel::balanced_tag</code>,
 | |
| <code>__gnu_parallel::unbalanced_tag</code>,
 | |
| <code>__gnu_parallel::omp_loop_tag</code>,
 | |
| <code>__gnu_parallel::omp_loop_static_tag</code>.
 | |
| This means that the actual parallelization strategy is chosen at run-time.
 | |
| (Choosing the variants at compile-time will come soon.)
 | |
| </para>
 | |
| 
 | |
| <para>
 | |
| For the following algorithms in general, we have
 | |
| <code>__gnu_parallel::parallel_tag</code> and
 | |
| <code>__gnu_parallel::default_parallel_tag</code>, in addition to
 | |
| <code>__gnu_parallel::sequential_tag</code>.
 | |
| <code>__gnu_parallel::default_parallel_tag</code> chooses the default
 | |
| algorithm at compiletime, as does omitting the tag.
 | |
| <code>__gnu_parallel::parallel_tag</code> postpones the decision to runtime
 | |
| (see next section).
 | |
| For all tags, the number of threads desired for this call can optionally be
 | |
| passed to the respective tag's constructor.
 | |
| </para>
 | |
| 
 | |
| <para>
 | |
| The <code>multiway_merge</code> algorithm comes with the additional choices,
 | |
| <code>__gnu_parallel::exact_tag</code> and
 | |
| <code>__gnu_parallel::sampling_tag</code>.
 | |
| Exact and sampling are the two available splitting strategies.
 | |
| </para>
 | |
| 
 | |
| <para>
 | |
| For the <code>sort</code> and <code>stable_sort</code> algorithms, there are
 | |
| several additional choices, namely
 | |
| <code>__gnu_parallel::multiway_mergesort_tag</code>,
 | |
| <code>__gnu_parallel::multiway_mergesort_exact_tag</code>,
 | |
| <code>__gnu_parallel::multiway_mergesort_sampling_tag</code>,
 | |
| <code>__gnu_parallel::quicksort_tag</code>, and
 | |
| <code>__gnu_parallel::balanced_quicksort_tag</code>.
 | |
| Multiway mergesort comes with the two splitting strategies for multi-way
 | |
| merging. The quicksort options cannot be used for <code>stable_sort</code>.
 | |
| </para>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| <section xml:id="parallel_mode.design.tuning.settings" xreflabel="_Settings"><info><title>Run Time Settings and Defaults</title></info>
 | |
|   
 | |
| 
 | |
| <para>
 | |
| The default parallelization strategy, the choice of specific algorithm
 | |
| strategy, the minimum threshold limits for individual parallel
 | |
| algorithms, and aspects of the underlying hardware can be specified as
 | |
| desired via manipulation
 | |
| of <classname>__gnu_parallel::_Settings</classname> member data.
 | |
| </para>
 | |
| 
 | |
| <para>
 | |
| First off, the choice of parallelization strategy: serial, parallel,
 | |
| or heuristically deduced. This corresponds
 | |
| to <code>__gnu_parallel::_Settings::algorithm_strategy</code> and is a
 | |
| value of enum <type>__gnu_parallel::_AlgorithmStrategy</type>
 | |
| type. Choices
 | |
| include: <type>heuristic</type>, <type>force_sequential</type>,
 | |
| and <type>force_parallel</type>. The default is <type>heuristic</type>.
 | |
| </para>
 | |
| 
 | |
| 
 | |
| <para>
 | |
| Next, the sub-choices for algorithm variant, if not fixed at compile-time.
 | |
| Specific algorithms like <function>find</function> or <function>sort</function>
 | |
| can be implemented in multiple ways: when this is the case,
 | |
| a <classname>__gnu_parallel::_Settings</classname> member exists to
 | |
| pick the default strategy. For
 | |
| example, <code>__gnu_parallel::_Settings::sort_algorithm</code> can
 | |
| have any values of
 | |
| enum <type>__gnu_parallel::_SortAlgorithm</type>: <type>MWMS</type>, <type>QS</type>,
 | |
| or <type>QS_BALANCED</type>.
 | |
| </para>
 | |
| 
 | |
| <para>
 | |
| Likewise for setting the minimal threshold for algorithm
 | |
| parallelization.  Parallelism always incurs some overhead. Thus, it is
 | |
| not helpful to parallelize operations on very small sets of
 | |
| data. Because of this, measures are taken to avoid parallelizing below
 | |
| a certain, pre-determined threshold. For each algorithm, a minimum
 | |
| problem size is encoded as a variable in the
 | |
| active <classname>__gnu_parallel::_Settings</classname> object.  This
 | |
| threshold variable follows the following naming scheme:
 | |
| <code>__gnu_parallel::_Settings::[algorithm]_minimal_n</code>.  So,
 | |
| for <function>fill</function>, the threshold variable
 | |
| is <code>__gnu_parallel::_Settings::fill_minimal_n</code>,
 | |
| </para>
 | |
| 
 | |
| <para>
 | |
| Finally, hardware details like L1/L2 cache size can be hardwired
 | |
| via <code>__gnu_parallel::_Settings::L1_cache_size</code> and friends.
 | |
| </para>
 | |
| 
 | |
| <para>
 | |
| </para>
 | |
| 
 | |
| <para>
 | |
| All these configuration variables can be changed by the user, if
 | |
| desired.
 | |
| There exists one global instance of the class <classname>_Settings</classname>,
 | |
| i. e. it is a singleton. It can be read and written by calling
 | |
| <code>__gnu_parallel::_Settings::get</code> and
 | |
| <code>__gnu_parallel::_Settings::set</code>, respectively.
 | |
| Please note that the first call return a const object, so direct manipulation
 | |
| is forbidden.
 | |
| See <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/a01005.html">
 | |
|   <filename class="headerfile">settings.h</filename></link>
 | |
| for complete details.
 | |
| </para>
 | |
| 
 | |
| <para>
 | |
| A small example of tuning the default:
 | |
| </para>
 | |
| 
 | |
| <programlisting>
 | |
| #include <parallel/algorithm>
 | |
| #include <parallel/settings.h>
 | |
| 
 | |
| int main()
 | |
| {
 | |
|   __gnu_parallel::_Settings s;
 | |
|   s.algorithm_strategy = __gnu_parallel::force_parallel;
 | |
|   __gnu_parallel::_Settings::set(s);
 | |
| 
 | |
|   // Do work... all algorithms will be parallelized, always.
 | |
| 
 | |
|   return 0;
 | |
| }
 | |
| </programlisting>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| <section xml:id="parallel_mode.design.impl" xreflabel="Impl"><info><title>Implementation Namespaces</title></info>
 | |
|   
 | |
| 
 | |
| <para> One namespace contain versions of code that are always
 | |
| explicitly sequential:
 | |
| <code>__gnu_serial</code>.
 | |
| </para>
 | |
| 
 | |
| <para> Two namespaces contain the parallel mode:
 | |
| <code>std::__parallel</code> and <code>__gnu_parallel</code>.
 | |
| </para>
 | |
| 
 | |
| <para> Parallel implementations of standard components, including
 | |
| template helpers to select parallelism, are defined in <code>namespace
 | |
| std::__parallel</code>. For instance, <function>std::transform</function> from <filename class="headerfile">algorithm</filename> has a parallel counterpart in
 | |
| <function>std::__parallel::transform</function> from <filename class="headerfile">parallel/algorithm</filename>. In addition, these parallel
 | |
| implementations are injected into <code>namespace
 | |
| __gnu_parallel</code> with using declarations.
 | |
| </para>
 | |
| 
 | |
| <para> Support and general infrastructure is in <code>namespace
 | |
| __gnu_parallel</code>.
 | |
| </para>
 | |
| 
 | |
| <para> More information, and an organized index of types and functions
 | |
| related to the parallel mode on a per-namespace basis, can be found in
 | |
| the generated source documentation.
 | |
| </para>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| <section xml:id="manual.ext.parallel_mode.test" xreflabel="Testing"><info><title>Testing</title></info>
 | |
| <?dbhtml filename="parallel_mode_test.html"?>
 | |
|   
 | |
| 
 | |
|   <para>
 | |
|     Both the normal conformance and regression tests and the
 | |
|     supplemental performance tests work.
 | |
|   </para>
 | |
| 
 | |
|   <para>
 | |
|     To run the conformance and regression tests with the parallel mode
 | |
|     active,
 | |
|   </para>
 | |
| 
 | |
|   <screen>
 | |
|   <userinput>make check-parallel</userinput>
 | |
|   </screen>
 | |
| 
 | |
|   <para>
 | |
|     The log and summary files for conformance testing are in the
 | |
|     <filename class="directory">testsuite/parallel</filename> directory.
 | |
|   </para>
 | |
| 
 | |
|   <para>
 | |
|     To run the performance tests with the parallel mode active,
 | |
|   </para>
 | |
| 
 | |
|   <screen>
 | |
|   <userinput>make check-performance-parallel</userinput>
 | |
|   </screen>
 | |
| 
 | |
|   <para>
 | |
|     The result file for performance testing are in the
 | |
|     <filename class="directory">testsuite</filename> directory, in the file
 | |
|     <filename>libstdc++_performance.sum</filename>. In addition, the
 | |
|     policy-based containers have their own visualizations, which have
 | |
|     additional software dependencies than the usual bare-boned text
 | |
|     file, and can be generated by using the <code>make
 | |
|     doc-performance</code> rule in the testsuite's Makefile.
 | |
| </para>
 | |
| </section>
 | |
| 
 | |
| <bibliography xml:id="parallel_mode.biblio"><info><title>Bibliography</title></info>
 | |
| 
 | |
| 
 | |
|   <biblioentry>
 | |
|     <citetitle>
 | |
|       Parallelization of Bulk Operations for STL Dictionaries
 | |
|     </citetitle>
 | |
| 
 | |
|     <author><personname><firstname>Johannes</firstname><surname>Singler</surname></personname></author>
 | |
|     <author><personname><firstname>Leonor</firstname><surname>Frias</surname></personname></author>
 | |
| 
 | |
|     <copyright>
 | |
|       <year>2007</year>
 | |
|       <holder/>
 | |
|     </copyright>
 | |
| 
 | |
|     <publisher>
 | |
|       <publishername>
 | |
| 	Workshop on Highly Parallel Processing on a Chip (HPPC) 2007. (LNCS)
 | |
|       </publishername>
 | |
|     </publisher>
 | |
|   </biblioentry>
 | |
| 
 | |
|   <biblioentry>
 | |
|     <citetitle>
 | |
|       The Multi-Core Standard Template Library
 | |
|     </citetitle>
 | |
| 
 | |
|     <author><personname><firstname>Johannes</firstname><surname>Singler</surname></personname></author>
 | |
|     <author><personname><firstname>Peter</firstname><surname>Sanders</surname></personname></author>
 | |
|     <author><personname><firstname>Felix</firstname><surname>Putze</surname></personname></author>
 | |
| 
 | |
|     <copyright>
 | |
|       <year>2007</year>
 | |
|       <holder/>
 | |
|     </copyright>
 | |
| 
 | |
|     <publisher>
 | |
|       <publishername>
 | |
| 	 Euro-Par 2007: Parallel Processing. (LNCS 4641)
 | |
|       </publishername>
 | |
|     </publisher>
 | |
|   </biblioentry>
 | |
| 
 | |
| </bibliography>
 | |
| 
 | |
| </chapter>
 |