StackThreads/MP: version 0.77 User's Guide
Let's now move our focus to the parallel Fibonacci function.
@ 1 void pfib(int n, int * r, st_join_counter_t *c) @ 2 { @ 3 if (n < 2) { @ 4 *r = 1; /* write the result */ @ 5 st_join_counter_finish(c); /* say I have done */ @ 6 } else { @ 7 int a, b; @ 8 st_join_counter_t cc[1]; @ 9 ST_POLLING(); /* check if somebody is asking work 10 and free stack */ 11 st_join_counter_init(cc, 2); /* initialize join counter */ 12 ST_THREAD_CREATE(pfib(n - 1, &a, cc)); /* fork fib(n-1, ...) */ 13 pfib(n - 2, &b, cc); /* fork fib(n-2, ...) */ 14 st_join_counter_wait(cc); /* wait for child's completion */ 15 *r = a + b; /* write the result */ 16 st_join_counter_finish(c); /* say I have done */ 17 ST_POLLING(); /* check if somebody is asking work 18 and free stack */ 19 } 20 }
The algorithm is a simple parallel recursion in which the first recursive call is done in parallel with the caller. This is done by
ST_THREAD_CREATE(fib(n - 1, &a, cc));
two additional parameters. The second parameter, r, is the pointer to a storage to which the result value will be stored. The third parameter, c, is a pointer to a storage through which the function signals the end of its computation to the caller.
The third parameter is necessary because, once a procedure call is done
in parallel with the caller, the caller at some point needs to make sure
that the procedure is finished. This synchronization is done via what we
call "synchronization counter." The type of the counter is
st_join_counter_t
. A storage for a
counter is declared at line 8:
st_join_counter_t cc[1];
Line 11 initializes the counter to 2.
st_join_counter_init_(cc, 2); /* initialize join counter */
This declares that two procedures will be associated to the counter cc. The two procedures are called at line 12 and 13.
ST_THREAD_CREATE(pfib(n - 1, &a, cc)); /* fork fib(n-1, ...) */ @ pfib(n - 2, &b, cc); /* fork fib(n-2, ...) */
cc
is given to both procedure calls.
When pfib
is finished, it decrements the counter received through
the third argument at line 5 and 16.
st_join_counter_finish(c);
When two procedures are finished, the value of the counter will become zero, because it is initialized to two and decremented twice. At line 14, the parent waits for the value of the counter to become zero.
st_join_counter_wait(cc);
Note that the procedure first writes the return value and then decrements the counter. Conversely, the parent first waits on the counter and then reads the return value. This order is important; this guarantees that when the parent reads the value, the called procedure has written a value. If, for example, the procedure first decrements the counter and then writes the result value, it is possible for the parent to observe that a counter becomes zero before the callee has written the result value.
If you feel that this program is very awkward just for writing small things like fib, you are right. This example is meant to describe primitives and mention general caution in parallel programming. You can usually make the program syntactically nicer by having a data structure that embeds the synchronization counter and wrapper functions that encapsulate synchronization. In the above pfib function, for example, you might define a data structure "int_box" as follows:
typedef struct int_box { @ int val; @ st_join_counter_t c[1]; } * int_box_t;
int int_box_read(int_box_t b), as follows:
void int_box_write(int_box_t b, int x) { @ b->val = x; @ st_join_counter_finish(b->c); } int int_box_read(int_box_t b, int x) { @ st_join_counter_wait(b->c); @ return b->val; }
It is also not difficult to add a new synchronization primitive for different constraints. See Implementing Synchronization Primitives for the bottom-level machinery for implementing new synchronization primitives.
Up to this point, we have explained that a thread is created by
ST_THREAD_CREATE
and threads can be synchronized by a
synchronization counter. Remaining things are calls to ST_POLLING
at line 9 and 17. Well, it does not have any visible effect. One thing
that happens in ST_POLLING
is that the current processor checks
if an idle processor is requesting a thread to the current processor,
and gives a thread if any (and the current processor has enough
threads). You do not have to understand the details about how threads
are migrated or when a thread belongs to a particular processor, but you
have to understand that threads never get executed in parallel
unless you occasionally call ST_POLLING
, which
distributes work if there are idle processors. See Where should you insert ST POLLING for
hints as to where to insert ST_POLLING
.
In summary, converting a sequential program into a parallel one involves the following steps:
ST_CREATE_THREAD
where parallelism is
necessary. Unlike traditional thread libraries, creating threads much
more than the number of processors is no problem in StackThreads/MP.
ST_POLLING
to periodically distribute work
among processors.