PREV UP NEXT StackThreads/MP: version 0.77 User's Guide

2.3: Running

If you successfully compiled the sample program, you should have obtained executable fib (or fib.exe on Windows NT). Run the program normally by typing the program name from the command line, like this:

        harp:366% ./fib

You will see the following message on the terminal:

        harp:366% ./fib
@        pfib: 10714 ms on 1 processors, sfib: 2114 ms

By default, a StackThreads/MP program runs using a single processor. You can use multiple processors by adding -nw option at the command line. For example, if you want to use 10 processors, try this:

        harp:367% ./fib -nw 10
@        pfib: 1091 ms on 10 processors, sfib: 2062 ms

st_main does not see the command line arguments -nw 10 in its argv; they are removed by the runtime system before entering st_main. (see Command Line Options Common for All StackThreadsMP Programs for more options).

Assuming you create a large number of threads, performance is generally maximized when you give a number equal to the number of available processors (on small-scale systems like 4 processor systems) or a number slightly smaller than it (on medium- or large-scale systems like 16 or 64 processor systems). It generally does not improve performance to give a number larger than the number of processors. Also do not try to use many processors when the machine is heavily loaded.

You may notice that the parallel version runs much (5 times) slower than the sequential version. There are simple hacks to improve this up to a point like 3 times, but for very small functions like fib, the overhead like this is the current state of the art. In practice, the granularity of a thread is typically much larger. StackThreads/MP well tolerates a thread whose granularity is something like 200 instructions. (see A recommended programming style for performance for several advices to achieve good performance on StackThreads/MP ).

by -nw is irrelevant to the number of threads you can create in the program. By giving -nw 10, the program spawns 10 OS-level threads during its initialization. An OS-level thread is a thread directly supported by OS, such as LWP on Solaris. We hereafter call an OS-level thread a worker, to avoid confusion between threads supported by StackThreads/MP (created via ST_THREAD_CREATE) and OS-level threads. StackThreads/MP runtime system dispatches dynamically created threads to workers. Also note that although we say -nw specifies the number of processors, it actually specifies the number of workers. In particular, a worker is not permanently bounded to a particular processor. Confusing workers with processors does not normally cause any problem.