StackThreads/MP: version 0.77 User's Guide
If you apply patches under gccpatch
to your gcc, you do not worry
about the problem (see Patches to GCC for how to
apply patches).
In practice, this problem occurs only on Pentium, and only when you
write a nested procedure call to a procedure that calls
ST_THREAD_CREATE
, ST_POLLING
, ST_STEAL_REQ_CHECK
,
and any other synchronization primitives (mutex, semaphore, procedure
compiled by stgcc
). For example, if you write f(g(x), 1, 2, 3, 4)
and if g
may call st_mutex_wait
, this is a nested
procedure call. In such cases, outer procedures may receive wrong
arguments. In this example, f
may observe wrong arguments (more
specifically, 1, 2, 3, and 4 may not be correctly passed to f
).
To workaround this problem without patching gcc, avoid nested procedure calls (at least on Pentium), and do not call the above primitives within C++ constructors (because they are often implicitly called to cast arguments).
In theory, this problem may occur when the compiler would become cleverer. The execution scheme of StackThreads/MP manages stack pointer so that it always points to `top of the stack,' above which no unfinished frames are allocated. In particular, the stack pointer may not point to the currently running frame. Therefore, the value of stack pointer may be different from gcc's idea it. This mismatch causes a problem if gcc exploits it to optimize things (see How does it work for more info about how does StackThreads/MP manage stack pointer).
The calling convention on a particular CPU determines how to manage stack pointer. The compiler may assume that stack pointer after a procedure call has a certain statically-computable offset from its original value, accoding to the calling convention. More specifically, on most calling conventions, the value of stack pointer does not change across a procedure call; in other cases (e.g., `Pascal' convention on Pentium), the callee is responsible for cleaning up arguments pushed on stack. Either case, the compiler knows where does stack pointer point to after a procedure call (more precisely, it statically knows how much does it differ from its original value), and if it wishes, it may exploit the fact to optimize arguments passing in procedure calls.
Suppose, for simplicity, a hypothetical convention in which ith
argument to a procedure is stored at SP[4(i-1)], where SP is stack
pointer, and SP does not change across a procedure call. Given a
procedure call f(g(x), 1, 2, 3, 4)
, for example, the compiler may
generate the following sequence:
@ 1 /* write 1, 2, 3, 4 */ @ 2 SP[ 4] = 1; @ 3 SP[ 8] = 2; @ 4 SP[12] = 3; @ 5 SP[16] = 4; @ 6 /* call g(x) */ @ 7 SP[ 0] = x; @ 8 r = call g; @ 9 /* write the result value*/ 10 SP[ 0] = r; 11 call f;
Since g
is nested inside f
, it first performs a procedure
call r = g(x)
(line 8), and then f(r, 1, 2, 3, 4)
(line
11). Before calling g
, however, it puts a part of arguments to
f
at SP[4] ... SP[16]
(line 2 -- 5). When g
returns (at line 10), the value of SP is the same as the value of SP at
line 5, therefore f
can correctly receive arguments 1 ... 4.
The fundamental problem here is that the argument-writing sequence for
f
was interrupted by another procedure call g
, which was
safe in the original convention because a procedure call does not change
SP, but is not in StackThreads/MP convention in which SP points to
whatever place happens to be top of the stack at that point. In general,
we must tell the compiler that SP becomes `unknown' after any procedure
call.
As described, this problem in theory occurs whenever the compiler does a
good job at placing arguments to procedures. It may in theory occur on
any architecture and for almost any program, even if it does not use a
literally nested procedure call (note that there are no reasons why the
compiler cannot do the same optimization for non-nested version of the
above example, which is, { r = g(x); f = (r, 1, 2, 3, 4) }
.
Let us now move our focus on what does the current gcc do, and let me
describe why do I think this problem occurs only for literally nested
procedure calls on Pentium. First of all, compilers use SP as the base
register when they write arguments to procedures. For other purposes,
frame pointer (FP) is used. This is because FP never changes within a
single procedure, whereas SP in general does because of
alloca
. Therefore named locations (e.g., local variables) are accessed
via FP. On the other hand, arguments to procedures are going to be
accessed via FP of the called procedure, which is SP of the
current procedure. Therefore arguments are written via SP. This limits
the situation in which a compiler exploits the fact that SP is known
after a procedure call. To summarize, a compiler exploits the fact only
to schedule arguments passing to procedure calls.
On all CPUs I know of except for Pentium, first several (typically 4 to
6 words) arguments are passed on registers. This makes opmization less
important on RISC CPUs. Second, the optimization is potentially complex
because alloca
make SP essentially unknown. Therefore any
compiler that tries to optimize arguments passing must take it into
account. A compiler cannot simply assume ``SP is constant.''