StackThreads/MP: version 0.77 User's Guide
The most convenient way is to run your program with -tp
option. For example,
harp:367% ./fib -nw 10 -tp @ pfib: 1091 ms on 10 processors, sfib: 2062 ms
It produces a lot of files whose names are 00stprof.xx.yy
(where
xx and yy is a number) in the current directory. Using 10 processors,
you typically have 10 files.
You may want to profile a particular section of your program. In such
cases, add a call to st_begin_profile()
where you want to begin
profiling and st_end_profile()
where you want to finish
profiling. Currently, you can call them only once in a program run.
The resolution of profiling is, by default, 100 microseconds. Each
processor measures how much time it spends in each state (busy, idle,
etc.) and, at every 100 microseconds, calculates the dominating state of
that period. The log file records a state of each period. You can change
the resolution of profiling by command line option
--time_profile_resolution S
, where S specifies the length
of a period in microseconds. Specifying a large number saves space but
the result may be inaccurate. Specifying a smaller number makes result
more reliable at the expense of space.
Each processor keeps a fixed sized buffer for accumulating profiles and
saves it into a file when the buffer overflows (and when the profile is
finished). It may introduce a large Heisenberg effect into your
profiling. You can increase the size of the in-memory buffer by
--time_profile_buffer_size N
where N specifies the number
of entry in an in-memory buffer. The default is 8100. N does not
necessarily represent the number of periods you can tolerate without
saving the in-memory profile into a secondary storage, because a single
entry describes a number of consecutive periods in a single state. For
example, a processor is busy most of the time, the necessary storage
will be quite small.