StackThreads/MP: version 0.77 User's Guide
If you successfully compiled the sample program, you should have
obtained executable fib
(or fib.exe
on Windows NT). Run
the program normally by typing the program name from the command line,
like this:
harp:366% ./fib
You will see the following message on the terminal:
harp:366% ./fib @ pfib: 10714 ms on 1 processors, sfib: 2114 ms
By default, a StackThreads/MP program runs using a single processor. You
can use multiple processors by adding -nw
option at the command
line. For example, if you want to use 10 processors, try this:
harp:367% ./fib -nw 10 @ pfib: 1091 ms on 10 processors, sfib: 2062 ms
st_main
does not see the command line arguments -nw 10
in its
argv
; they are removed by the runtime system before entering
st_main
. (see Command Line Options Common for All StackThreadsMP Programs for more options).
Assuming you create a large number of threads, performance is generally maximized when you give a number equal to the number of available processors (on small-scale systems like 4 processor systems) or a number slightly smaller than it (on medium- or large-scale systems like 16 or 64 processor systems). It generally does not improve performance to give a number larger than the number of processors. Also do not try to use many processors when the machine is heavily loaded.
You may notice that the parallel version runs much (5 times) slower than
the sequential version. There are simple hacks to improve this up to a
point like 3 times, but for very small functions like fib
, the
overhead like this is the current state of the art. In practice, the
granularity of a thread is typically much larger. StackThreads/MP well
tolerates a thread whose granularity is something like 200 instructions.
(see A recommended programming style for performance for several
advices to achieve good performance on StackThreads/MP ).
by -nw
is irrelevant to the number of threads you can create in
the program. By giving -nw 10
, the program spawns 10
OS-level threads during its initialization. An OS-level thread is
a thread directly supported by OS, such as LWP on Solaris. We hereafter
call an OS-level thread a worker, to avoid confusion between
threads supported by StackThreads/MP (created via
ST_THREAD_CREATE
) and OS-level threads. StackThreads/MP runtime
system dispatches dynamically created threads to workers. Also note that
although we say -nw
specifies the number of processors, it
actually specifies the number of workers. In particular, a worker is not
permanently bounded to a particular processor. Confusing workers with
processors does not normally cause any problem.