How does it work

StackThreads/MP: version 0.77 User's Guide

9.1: How does it work, basically?

Suppose procedure f calls ST_THREAD_CREATE(g()), f performs a normal C procedure call g(). What is important here is that it performs the procedure call just as if it were a sequential call. It also records the fact that the procedure call is a fork point. This way, the runtime system knows that the rest of f can progress even when g is blocked, or when an idle worker shows up. The exact method by which we record a fork point is unimportant; we note that it only takes a couple of instructions.

If g() terminates without blocking, it returns to f just as a normal procedure returns. What happens when g blocks? A thread blocks , for example, g calls st_mutex_lock(j), but m is currently locked by another thread. In this case, we must be able to suspend execution of g for now and later resume it when m is unlocked. We achieve this by splitting the stack at the fork point where g was spawned; the activation frame of g is logically decoupled from the stack and the rest of f continues (when f is still not migrated to another worker). g's activation frame physically remains in the original stack, occupying the top portion of the stack. Stack pointer is kept above g, while frame pointer now points to f (we use a terminology as if a stack grows upward. A frame x is above y if x is nearer to the growing end of the stack than y). Subsequent procedure calls by f allocate a frame above g. When g later restarts its computation, g frame is logically connected to the current frame and becomes the child of it. This frame may of course be different from g's original caller. When g terminates, the control returns to the new parent, not the original caller.

In general, each worker maintains its stack pointer below any unfinished frame. A procedure call or ST_THREAD_CREATE allocates a new frame above stack pointer. Frames of blocked (yet unfinished) threads are logically decoupled from the stack, but physically stay within the stack. Frames of restarted threads are reconnected to the stack.

Leaving blocked frames in their original positions and making further allocations on top of them, we may leave free spaces in a middle of a stack. We do not try to reuse such holes within a stack. We always allocate a new frame on top of the stack. Once a region becomes free in a middle of a stack, it is reused only when no spaces are occupied above it. When such a situation occurs, stack pointer is reset to top of the unfinished frames. Exact details as to how to reclaim such spaces is not described here; we only mention that such reclamation of space occurs when the programmer calls ST_POLLING. This is the first reason why you have to call ST_POLLING periodically.

What about thread migration? Interestingly, a thread migration is just a combination of thread blocks and a resume. Suppose that both f and g are in worker W's stack and W wants to migrate f to another (idle) worker U. At this point, W temporarily blocks g and resumes f. Next f also temporarily blocks and resumes the parent of f. The parent of f passes the context of f to U through shared-memory and then restarts g. U picks up the context of f and restarts it.

In general, to migrate a thread from W to U, W first blocks all frame above the migrating thread. Next W blocks the migrating thread too. W then restarts threads above the migrating thread. The migrating thread is picked up by U and restarted. This way, the migrating thread is pulled out from one stack and reconnected to another.

A thread migration also occurs when ST_POLLING is called. When a worker becomes idle, it randomly selects another worker and sends a request. ST_POLLING checks whether a request has arrived to the calling worker and if it has, performs the above operation to migrate a thread to the idle worker. This is the second (and the last) reason why you have to call ST_POLLING periodically.

The remaining question is, when a worker gets a request, how does it select a thread to migrate. We simply migrate the thread at the bottom of the stack. For example, suppose f performs ST_THREAD_CREATE(g()), g performs ST_THREAD_CREATE(h()), and ST_POLLING() finds a request from an idle worker. In this case, f migrates. When the second request soon arrives to the same worker, then g migrates. When call tree is fairly balanced, as in the Fibonacci example (See An Example), this strategy works very well; all workers will soon get a sub-tree which is fairly near to the root and concentrate on their local work. With this task distribution strategy in mind, let's go on to the next section.