Floating SP Problem

StackThreads/MP: version 0.77 User's Guide

10.1: Floating SP Problem

If you apply patches under gccpatch to your gcc, you do not worry about the problem (see Patches to GCC for how to apply patches).

In practice, this problem occurs only on Pentium, and only when you write a nested procedure call to a procedure that calls ST_THREAD_CREATE, ST_POLLING, ST_STEAL_REQ_CHECK, and any other synchronization primitives (mutex, semaphore, procedure compiled by stgcc). For example, if you write f(g(x), 1, 2, 3, 4) and if g may call st_mutex_wait, this is a nested procedure call. In such cases, outer procedures may receive wrong arguments. In this example, f may observe wrong arguments (more specifically, 1, 2, 3, and 4 may not be correctly passed to f).

To workaround this problem without patching gcc, avoid nested procedure calls (at least on Pentium), and do not call the above primitives within C++ constructors (because they are often implicitly called to cast arguments).

In theory, this problem may occur when the compiler would become cleverer. The execution scheme of StackThreads/MP manages stack pointer so that it always points to `top of the stack,' above which no unfinished frames are allocated. In particular, the stack pointer may not point to the currently running frame. Therefore, the value of stack pointer may be different from gcc's idea it. This mismatch causes a problem if gcc exploits it to optimize things (see How does it work for more info about how does StackThreads/MP manage stack pointer).

The calling convention on a particular CPU determines how to manage stack pointer. The compiler may assume that stack pointer after a procedure call has a certain statically-computable offset from its original value, accoding to the calling convention. More specifically, on most calling conventions, the value of stack pointer does not change across a procedure call; in other cases (e.g., `Pascal' convention on Pentium), the callee is responsible for cleaning up arguments pushed on stack. Either case, the compiler knows where does stack pointer point to after a procedure call (more precisely, it statically knows how much does it differ from its original value), and if it wishes, it may exploit the fact to optimize arguments passing in procedure calls.

Suppose, for simplicity, a hypothetical convention in which ith argument to a procedure is stored at SP[4(i-1)], where SP is stack pointer, and SP does not change across a procedure call. Given a procedure call f(g(x), 1, 2, 3, 4), for example, the compiler may generate the following sequence:

@ 1       /* write 1, 2, 3, 4 */
@ 2       SP[ 4] = 1;
@ 3       SP[ 8] = 2;
@ 4       SP[12] = 3;
@ 5       SP[16] = 4;
@ 6       /* call g(x) */
@ 7       SP[ 0] = x;
@ 8       r = call g;
@ 9       /* write the result value*/
10        SP[ 0] = r;
11            call f;

Since g is nested inside f, it first performs a procedure call r = g(x) (line 8), and then f(r, 1, 2, 3, 4) (line 11). Before calling g, however, it puts a part of arguments to f at SP[4] ... SP[16] (line 2 -- 5). When g returns (at line 10), the value of SP is the same as the value of SP at line 5, therefore f can correctly receive arguments 1 ... 4.

The fundamental problem here is that the argument-writing sequence for f was interrupted by another procedure call g, which was safe in the original convention because a procedure call does not change SP, but is not in StackThreads/MP convention in which SP points to whatever place happens to be top of the stack at that point. In general, we must tell the compiler that SP becomes `unknown' after any procedure call.

As described, this problem in theory occurs whenever the compiler does a good job at placing arguments to procedures. It may in theory occur on any architecture and for almost any program, even if it does not use a literally nested procedure call (note that there are no reasons why the compiler cannot do the same optimization for non-nested version of the above example, which is, { r = g(x); f = (r, 1, 2, 3, 4) }.

Let us now move our focus on what does the current gcc do, and let me describe why do I think this problem occurs only for literally nested procedure calls on Pentium. First of all, compilers use SP as the base register when they write arguments to procedures. For other purposes, frame pointer (FP) is used. This is because FP never changes within a single procedure, whereas SP in general does because of alloca. Therefore named locations (e.g., local variables) are accessed via FP. On the other hand, arguments to procedures are going to be accessed via FP of the called procedure, which is SP of the current procedure. Therefore arguments are written via SP. This limits the situation in which a compiler exploits the fact that SP is known after a procedure call. To summarize, a compiler exploits the fact only to schedule arguments passing to procedure calls.

On all CPUs I know of except for Pentium, first several (typically 4 to 6 words) arguments are passed on registers. This makes opmization less important on RISC CPUs. Second, the optimization is potentially complex because alloca make SP essentially unknown. Therefore any compiler that tries to optimize arguments passing must take it into account. A compiler cannot simply assume ``SP is constant.''