Basic Primitives

StackThreads/MP: version 0.77 User's Guide

2.4: Basic Primitives

Let's now move our focus to the parallel Fibonacci function.

@ 1  void pfib(int n, int * r, st_join_counter_t *c)
@ 2  {
@ 3    if (n < 2) {
@ 4      *r = 1;                     /* write the result */
@ 5      st_join_counter_finish(c);  /* say I have done */
@ 6    } else {
@ 7      int a, b;
@ 8      st_join_counter_t cc[1];
@ 9      ST_POLLING();               /* check if somebody is asking work 
10                                     and free stack */
11      st_join_counter_init(cc, 2); /* initialize join counter */
12      ST_THREAD_CREATE(pfib(n - 1, &a, cc)); /* fork fib(n-1, ...) */
13      pfib(n - 2, &b, cc);        /* fork fib(n-2, ...) */
14      st_join_counter_wait(cc);   /* wait for child's completion */
15      *r = a + b;                 /* write the result */
16      st_join_counter_finish(c);  /* say I have done */
17      ST_POLLING();               /* check if somebody is asking work 
18                                     and free stack */
19    }
20  }

The algorithm is a simple parallel recursion in which the first recursive call is done in parallel with the caller. This is done by

        ST_THREAD_CREATE(fib(n - 1, &a, cc));

two additional parameters. The second parameter, r, is the pointer to a storage to which the result value will be stored. The third parameter, c, is a pointer to a storage through which the function signals the end of its computation to the caller.

The third parameter is necessary because, once a procedure call is done in parallel with the caller, the caller at some point needs to make sure that the procedure is finished. This synchronization is done via what we call "synchronization counter." The type of the counter is st_join_counter_t. A storage for a counter is declared at line 8:

     st_join_counter_t cc[1];

Line 11 initializes the counter to 2.

    st_join_counter_init_(cc, 2);            /* initialize join counter */

This declares that two procedures will be associated to the counter cc. The two procedures are called at line 12 and 13.

    ST_THREAD_CREATE(pfib(n - 1, &a, cc)); /* fork fib(n-1, ...) */
@    pfib(n - 2, &b, cc);       /* fork fib(n-2, ...) */

cc is given to both procedure calls.

When pfib is finished, it decrements the counter received through the third argument at line 5 and 16.

    st_join_counter_finish(c);

When two procedures are finished, the value of the counter will become zero, because it is initialized to two and decremented twice. At line 14, the parent waits for the value of the counter to become zero.

    st_join_counter_wait(cc);

Note that the procedure first writes the return value and then decrements the counter. Conversely, the parent first waits on the counter and then reads the return value. This order is important; this guarantees that when the parent reads the value, the called procedure has written a value. If, for example, the procedure first decrements the counter and then writes the result value, it is possible for the parent to observe that a counter becomes zero before the callee has written the result value.

If you feel that this program is very awkward just for writing small things like fib, you are right. This example is meant to describe primitives and mention general caution in parallel programming. You can usually make the program syntactically nicer by having a data structure that embeds the synchronization counter and wrapper functions that encapsulate synchronization. In the above pfib function, for example, you might define a data structure "int_box" as follows:

typedef struct int_box
{
@   int val;
@   st_join_counter_t c[1];
} * int_box_t;

int int_box_read(int_box_t b), as follows:

void int_box_write(int_box_t b, int x)
{
@  b->val = x;
@  st_join_counter_finish(b->c);
}

int int_box_read(int_box_t b, int x)
{
@  st_join_counter_wait(b->c);
@  return b->val; 
}

It is also not difficult to add a new synchronization primitive for different constraints. See Implementing Synchronization Primitives for the bottom-level machinery for implementing new synchronization primitives.

Up to this point, we have explained that a thread is created by ST_THREAD_CREATE and threads can be synchronized by a synchronization counter. Remaining things are calls to ST_POLLING at line 9 and 17. Well, it does not have any visible effect. One thing that happens in ST_POLLING is that the current processor checks if an idle processor is requesting a thread to the current processor, and gives a thread if any (and the current processor has enough threads). You do not have to understand the details about how threads are migrated or when a thread belongs to a particular processor, but you have to understand that threads never get executed in parallel unless you occasionally call ST_POLLING, which distributes work if there are idle processors. See Where should you insert ST POLLING for hints as to where to insert ST_POLLING.

In summary, converting a sequential program into a parallel one involves the following steps:

Extract Parallelism: Insert ST_CREATE_THREAD where parallelism is necessary. Unlike traditional thread libraries, creating threads much more than the number of processors is no problem in StackThreads/MP.
Introduce Synchronization: Insert synchronization to make sure whatever condition you want to guarantee is satisfied at an appropriate point. For example, wait for another thread to write a result or wait for another thread to release some resources.
Distribute Work: Insert ST_POLLING to periodically distribute work among processors.