Skip to content
Navigation Menu
{{ message }}
forked from taskflow/taskflow
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathA0ParallelFor.html
More file actions
138 lines (136 loc) · 18.6 KB
/
Copy pathA0ParallelFor.html
File metadata and controls
138 lines (136 loc) · 18.6 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
<!-- HTML header for doxygen 1.8.13-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<meta name="generator" content="Doxygen 1.8.14"/>
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<title>Taskflow Handbook</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<link rel="icon" type="image/x-icon" href="favicon.ico" />
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="navtreedata.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */
$(document).ready(initResizable);
/* @license-end */</script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/searchdata.js"></script>
<script type="text/javascript" src="search/search.js"></script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td id="projectalign" style="padding-left: 0.5em;">
<div id="projectname"><a href="https://github.com/taskflow/taskflow">Taskflow</a>
 <span id="projectnumber">2.6.0-master-branch</span>
</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.14 -->
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */
var searchBox = new SearchBox("searchBox", "search",false,'Search');
/* @license-end */
</script>
<script type="text/javascript" src="menudata.js"></script>
<script type="text/javascript" src="menu.js"></script>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */
$(function() {
initMenu('',true,false,'search.php','Search');
$(document).ready(function() { init_search(); });
});
/* @license-end */</script>
<div id="main-nav"></div>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
<div id="nav-tree">
<div id="nav-tree-contents">
<div id="nav-sync" class="sync"></div>
</div>
</div>
<div id="splitbar" style="-moz-user-select:none;"
class="ui-resizable-handle">
</div>
</div>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */
$(document).ready(function(){initNavTree('A0ParallelFor.html','');});
/* @license-end */
</script>
<div id="doc-content">
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
onkeydown="return searchBox.OnSearchSelectKey(event)">
</div>
<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<iframe src="javascript:void(0)" frameborder="0"
name="MSearchResults" id="MSearchResults">
</iframe>
</div>
<div class="header">
<div class="headertitle">
<div class="title">A0: Parallel Iterations </div> </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><p>Taskflow provides template function that constructs a task to perform parallel iterations over a range of items.</p>
<h1><a class="anchor" id="IndexBasedParallelFor"></a>
Index-based Parallel Iterations</h1>
<p>Index-based parallel-for performs parallel iterations over a range <code>[beg, end)</code> with the given <code>step</code> size. The task created by <a class="el" href="classtf_1_1FlowBuilder.html#ab8417b211b18bb1e0f45a049331f084d" title="constructs an index-based parallel-for task ">tf::Taskflow::for_each_index(B&& first, E&& last, S&& step, C&& callable)</a> represents parallel execution of the following loop:</p>
<div class="fragment"><div class="line"><span class="comment">// positive step</span></div><div class="line"><span class="keywordflow">for</span>(<span class="keyword">auto</span> i=first; i<last; i+=step) {</div><div class="line"> callable(i);</div><div class="line">}</div><div class="line"></div><div class="line"><span class="comment">// negative step</span></div><div class="line"><span class="keywordflow">for</span>(<span class="keyword">auto</span> i=first; i>last; i+=step) {</div><div class="line"> callable(i);</div><div class="line">}</div></div><!-- fragment --><p>We support only integer-based range. The range can go positive or negative direction.</p>
<div class="fragment"><div class="line">taskflow.<a class="code" href="classtf_1_1FlowBuilder.html#ab8417b211b18bb1e0f45a049331f084d">for_each_index</a>(0, 100, 2, [](<span class="keywordtype">int</span> i) { }); <span class="comment">// loop 50 items with a positive step</span></div><div class="line">taskflow.<a class="code" href="classtf_1_1FlowBuilder.html#ab8417b211b18bb1e0f45a049331f084d">for_each_index</a>(100, 0, -2, [](<span class="keywordtype">int</span> i) { }); <span class="comment">// loop 50 items with a negative step</span></div></div><!-- fragment --><p>Notice that either positive or negative direction is defined in terms of the range, <code>[first, last)</code>, where <code>end</code> is excluded. In the positive case, the 50 items are 0, 2, 4, 6, 8, ..., 96, 98. In the negative case, the 50 items are 100, 98, 96, 04, ... 4, 2. An example of the Taskflow graph for the positive case under 12 workers is depicted below:</p>
<div class="image">
<object type="image/svg+xml" data="parallel_for_1.svg" width="100%">parallel_for_1.svg</object>
</div>
<p>The index types, <code>B</code>, <code>E</code>, and <code>S</code>, are templates to preserve the variable types and their underlying types must be of the same <em>integral</em> type (e.g., <code>int</code>, <code>size_t</code>, <code>unsigned</code>). By default, <a class="el" href="classtf_1_1FlowBuilder.html#ab8417b211b18bb1e0f45a049331f084d" title="constructs an index-based parallel-for task ">tf::Taskflow::for_each_index</a> creates a task that spawns a subflow (see <a class="el" href="chapter3.html">C3: Dynamic Tasking</a>) to run iterations in parallel. The subflow closure captures all input arguments through perfect forwarding to form a stateful closure such that any changes on the arguments will be visible to the execution context of the subflow. For example:</p>
<div class="fragment"><div class="line"><span class="keywordtype">int</span>* vec;</div><div class="line"><span class="keywordtype">int</span> first, last;</div><div class="line"></div><div class="line"><span class="keyword">auto</span> init = taskflow.<a class="code" href="classtf_1_1FlowBuilder.html#a796e29175380f70246cf2a5639adc437">emplace</a>([&](){</div><div class="line"> first = 0;</div><div class="line"> last = 1000;</div><div class="line"> vec = <span class="keyword">new</span> <span class="keywordtype">int</span>[1000]; </div><div class="line">});</div><div class="line"></div><div class="line"><span class="keyword">auto</span> pf = taskflow.<a class="code" href="classtf_1_1FlowBuilder.html#ab8417b211b18bb1e0f45a049331f084d">for_each_index</a>(std::ref(first), std::ref(last), 1, [&](<span class="keywordtype">int</span> i) {</div><div class="line"> <a class="codeRef" doxygen="/Users/twhuang/PhD/Code/taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/io/basic_ostream.html">std::cout</a> << <span class="stringliteral">"parallel iteration on index "</span> << vec[i] << <span class="charliteral">'\n'</span>;</div><div class="line">});</div><div class="line"></div><div class="line">init.<a class="code" href="classtf_1_1Task.html#a8c78c453295a553c1c016e4062da8588">precede</a>(pf);</div></div><!-- fragment --><p>When <code>init</code> finishes, the parallel-for task <code>pf</code> will see <code>first</code> as 0 and <code>last</code> as 1000 and performs parallel iterations over the 1000 items. This property is especially important for task graph parallelism, because users can define end-to-end parallelism through stateful closures that marshal parameter exchange between dependent tasks.</p>
<h1><a class="anchor" id="IteratorBasedParallelFor"></a>
Iterator-based Parallel Iterations</h1>
<p>Iterator-based parallel-for performs parallel iterations over a range specified by two <a href="https://en.cppreference.com/w/cpp/iterator/iterator">STL-styled iterators</a>, <code>first</code> and <code>last</code>. The task created by <a class="el" href="classtf_1_1FlowBuilder.html#a564252001be679600b20ca9ed9920f6a" title="constructs a STL-styled parallel-for task ">tf::Taskflow::for_each(B&& first, E&& last, C&& callable)</a> represents a parallel execution of the following loop:</p>
<div class="fragment"><div class="line"><span class="keywordflow">for</span>(<span class="keyword">auto</span> i=first; i<last; i++) {</div><div class="line"> callable(*i);</div><div class="line">}</div></div><!-- fragment --><p>By default, <a class="el" href="classtf_1_1FlowBuilder.html#a564252001be679600b20ca9ed9920f6a" title="constructs a STL-styled parallel-for task ">tf::Taskflow::for_each(B&& first, E&& last, C&& callable)</a> creates a task that spawns a subflow (see <a class="el" href="chapter3.html">C3: Dynamic Tasking</a>) that applies the callable to the object obtained by dereferencing every iterator in the range <code>[first, last)</code>. It is user's responsibility for ensuring the range is valid within the execution of the parallel-for task. Iterators must have the post-increment operator ++ defined. This version of parallel-for applies to all iterable STL containers.</p>
<div class="fragment"><div class="line"><a class="codeRef" doxygen="/Users/twhuang/PhD/Code/taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/container/vector.html">std::vector<int></a> vec = {1, 2, 3, 4, 5};</div><div class="line">taskflow.<a class="code" href="classtf_1_1FlowBuilder.html#a564252001be679600b20ca9ed9920f6a">for_each</a>(vec.begin(), vec.end(), [](<span class="keywordtype">int</span> i){ </div><div class="line"> <a class="codeRef" doxygen="/Users/twhuang/PhD/Code/taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/io/basic_ostream.html">std::cout</a> << <span class="stringliteral">"parallel for on item "</span> << i << <span class="charliteral">'\n'</span>; </div><div class="line">});</div><div class="line"></div><div class="line"><a class="codeRef" doxygen="/Users/twhuang/PhD/Code/taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/container/list.html">std::list<std::string></a> list = {<span class="stringliteral">"hi"</span>, <span class="stringliteral">"from"</span>, <span class="stringliteral">"t"</span>, <span class="stringliteral">"a"</span>, <span class="stringliteral">"s"</span>, <span class="stringliteral">"k"</span>, <span class="stringliteral">"f"</span>, <span class="stringliteral">"low"</span>};</div><div class="line">taskflow.<a class="code" href="classtf_1_1FlowBuilder.html#a564252001be679600b20ca9ed9920f6a">for_each</a>(list.begin(), list.end(), [](<span class="keyword">const</span> <a class="codeRef" doxygen="/Users/twhuang/PhD/Code/taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/string/basic_string.html">std::string</a>& str){ </div><div class="line"> <a class="codeRef" doxygen="/Users/twhuang/PhD/Code/taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/io/basic_ostream.html">std::cout</a> << <span class="stringliteral">"parallel for on item "</span> << str << <span class="charliteral">'\n'</span>; </div><div class="line">});</div></div><!-- fragment --><p>Similar to index-based parallel-for, the iterator types are templates to enable users to leverage the property of stateful closure. For example:</p>
<div class="fragment"><div class="line"><a class="codeRef" doxygen="/Users/twhuang/PhD/Code/taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/container/vector.html">std::vector<int></a> vec;</div><div class="line"><a class="codeRef" doxygen="/Users/twhuang/PhD/Code/taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/container/vector.html">std::vector<int>::iterator</a> first, last;;</div><div class="line"></div><div class="line"><span class="keyword">auto</span> init = taskflow.<a class="code" href="classtf_1_1FlowBuilder.html#a796e29175380f70246cf2a5639adc437">emplace</a>([&](){</div><div class="line"> vec.resize(1000);</div><div class="line"> first = vec.begin();</div><div class="line"> last = vec.end();</div><div class="line">});</div><div class="line"></div><div class="line"><span class="keyword">auto</span> pf = taskflow.<a class="code" href="classtf_1_1FlowBuilder.html#a564252001be679600b20ca9ed9920f6a">for_each</a>(std::ref(first), std::ref(last), 1, [&](<span class="keywordtype">int</span> i) {</div><div class="line"> <a class="codeRef" doxygen="/Users/twhuang/PhD/Code/taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/io/basic_ostream.html">std::cout</a> << <span class="stringliteral">"parallel iteration on item "</span> << i << <span class="charliteral">'\n'</span>;</div><div class="line">});</div><div class="line"></div><div class="line">init.<a class="code" href="classtf_1_1Task.html#a8c78c453295a553c1c016e4062da8588">precede</a>(pf);</div></div><!-- fragment --><p>When <code>init</code> finishes, the parallel-for task <code>pf</code> will see <code>first</code> pointing to the beginning of <code>vec</code> and <code>last</code> pointing to the end of <code>vec</code> and performs parallel iterations over the 1000 items. The two tasks form an end-to-end task graph where the parameters of parallel-for are computed on the fly.</p>
<h1><a class="anchor" id="PartitionAlgorithms"></a>
Partition Algorithms</h1>
<p>At runtime, the parallel-for task automatically partitions the range into <em>chunks</em> and assign each chunk a task in the spawned subflow. Inspired by the <a href="http://jakascorner.com/blog/2016/06/omp-for-scheduling.html">scheduling algorithms of OpenMP</a>, we support three partition algorithms, <em>static</em> partition, <em>dynamic</em> partition, and <em>guided</em> partition.</p>
<div class="fragment"><div class="line"><span class="keywordtype">size_t</span> first = 0, last = 1000, step = 1, chunk_size = 100;</div><div class="line"></div><div class="line"><span class="keyword">auto</span> task1 = taskflow.<a class="code" href="classtf_1_1FlowBuilder.html#a71f204c670ca5857e5527d6000ba73c4">for_each_index_static</a> (first, last, step, [](<span class="keyword">auto</span>){ }, chunk_size)</div><div class="line"><span class="keyword">auto</span> task1 = taskflow.<a class="code" href="classtf_1_1FlowBuilder.html#a930d0343226874d1d40b9e230cccdd52">for_each_index_dynamic</a>(first, last, step, [](<span class="keyword">auto</span>){ }, chunk_size)</div><div class="line"><span class="keyword">auto</span> task1 = taskflow.<a class="code" href="classtf_1_1FlowBuilder.html#a942449afd25e656b9e14cb526cbd8ad2">for_each_index_guided</a> (first, last, step, [](<span class="keyword">auto</span>){ }, chunk_size)</div><div class="line"></div><div class="line"><span class="comment">// same syntax for the iterator-based parallel-for</span></div><div class="line"><span class="comment">// ...</span></div></div><!-- fragment --><p>Each of these methods takes an additional unsigned argument of the chunk size.</p>
<ul>
<li>The static partition algorithm divides the workload into <em>equal-size</em> chunks. If the given chunk size is zero, it distributes the workload evenly across workers. Static partition has the lowest scheduling overhead but the least optimal workload distribution (i.e., load balancing). </li>
<li>The dynamic partition algorithm dynamically assigns chunks to threads using a data dispatching loop. Dynamic partition has the highest scheduling overhead but the optimal workload distribution in particular when the chunk size equals one. </li>
<li>The guided partition algorithm (1) starts with big chunks proportional to the number of unassigned iterations divided by the number of workers and (2) then makes them progressively smaller until the chunk size reaches at the given size. Guided partition attempts to seek a balance between scheduling overhead and workload distribution.</li>
</ul>
<p>The picture below illustrates the three partition algorithms.</p>
<div class="image">
<img src="parallel_for_partition_algorithms.png" alt="parallel_for_partition_algorithms.png" width="95%"/>
</div>
<p>By default, <a class="el" href="classtf_1_1FlowBuilder.html#ab8417b211b18bb1e0f45a049331f084d" title="constructs an index-based parallel-for task ">tf::Taskflow::for_each_index</a> and <a class="el" href="classtf_1_1FlowBuilder.html#a564252001be679600b20ca9ed9920f6a" title="constructs a STL-styled parallel-for task ">tf::Taskflow::for_each</a> adopt the <em>guided</em> partition algorithm with chunk size equal to one. In practice, guided partition produces decent performance in many applications and is the default of Taskflow's parallel-for algorithm. However, depending on the workload requirement, users may explicitly call for static, dynamic, or guided partition algorithms with a specified chunk size. </p>
</div></div><!-- contents -->
</div><!-- doc-content -->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
<ul>
<li class="navelem"><a class="el" href="Algorithms.html">Algorithms</a></li>
<li class="footer">Generated by
<a href="http://www.doxygen.org/index.html">
<img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.14 </li>
</ul>
</div>
</body>
</html>
You can’t perform that action at this time.
