Skip to content

Garbage Collector Algorithm

John Skaller edited this page Nov 29, 2018 · 7 revisions

The Felix GC uses a naive mark sweep algorithm. Copying collectors may be more efficient but require objects to be movable and so are not suitable for use in a system designed to interface with C/C++.

The collector operates in 4 phases.

Root Finding

Initially, all objects are assumed unreachable. The collector searches the machine stacks of all threads to find potential pointers into managed objects and adds the bases of these objects to the set of register roots to find the initial root set.

Mark phase

Starting with the root set, the collector marks objects it knows about reachable. The scan proceeds with a somewhat unique algorithm.

From any pointer, we find the base pointer. Recall a pointer could be interior to the object. Then we find the shape of the object, which contains an array of the offsets into the object of any pointer it contains. Each of these potential pointers is fetched and provided it is not NULL added to a buffer of potentially reachable objects.

The collector then checks if these pointers are interior to managed pointers, finds the base if they are, and then recursively examines the object pointed at.

In this process, the same pointer can be added to the buffer many times.

Pointer buffer

The buffer used is a Judy1Array, which model a set of machine words. It is the fastest known data structure. Because it models a set, attempts to add an already contained value do not change the set.

The function register_pointer() is used to add a value to the buffer. This function is available to the public. The reason for this is to allow the programmer to write a custom scanner for an object such as an STL vector of Felix pointers.

The recursion is limited by a fixed bound of 50, to ensure scanning of long lists does not blow the machine stack. The buffer containing the pointers to be analysed is also bounded.

After the buffer has been filled, or the recursion bottoms out, the buffer is sequentially examined. Any value which is an interior pointer causes the base object to be marked as reachable and retained in the buffer. If the object is already marked reachable that entry is deleted. Foreign pointers are deleted. Interior pointers are converted to their bases.

At the end of this process, the buffer contains base pointers to managed objects which have not been examined. The algorithm then proceeds to examine them, removing them from the buffer as it does so. The examination is the same recursive analysis, regarding the buffer as a collection of roots.

When the buffer is finally empty, more roots from the root set are put into the buffer. The mark phase of the collector terminates when all the roots have been consumed and the buffer of unexamined pointers is empty.

Parallel marking algorithm

If the FLX_GCTHREADS environment variable was set to an integer greater than 1 when the system started up, then the specified number of threads is used to perform the mark phase. All the threads use the same data structures start from the same initial conditions, and terminate from the same final conditions.

Locking is used ONLY on the buffer object. Surprisingly this is not only sufficient, it is maximally optimal. Given the same pointer, two threads are going to perform the same recursive descent. It would seem to lead to a lot of duplication of effort. This is not the case! Instead, because access to the buffer is serialised, once a thread has safely stashed a pointer in the buffer the recursion path through that pointer is blocked. The blockage is required even for a single thread, to prevent an infinite loop when pointers form a cycle. With the serialisation, this path cutting also prevents a thread following the same path as another with some lag.

There is actually a race going on here, but the race has a definite winner, and it is in fact this very competition which makes the algorithm so efficient. Once one thread gets ahead of another, the second thread must follow a different path. It is possible due to the presence of cycles that the paths will converge again. As before the contention is automatically resolved.

Destruction Phase

After the mark algorithm is complete, the unmarked objects from the total set of objects are known to be unreachable. The destruction phase runs their registered finalisers. It does not delete the objects yet.

The finaliser of an object defaults to its C++ destructor. When binding a primitive in Felix, a finaliser can be provided if the destructor is not what is desired. Compiler generated objects always use the C++ destructor.

Finalisers for C objects allow emulation of C++ destruction.

There is another use for finalisers, however. It is possible for a finaliser to force the finalisation of another object before itself, thereby imposing a partial order on otherwise random finalisation. For example if a file object contains a pointer to a lock object, and the lock must be released before the file is closed, the file's finaliser can destroy the lock before closing the file. To ensure the lock is not destroyed twice, it can be marked as released.

This is why the destruction phase does NOT delete any objects.

The destruction phase is single threaded.

Reaping Phase

After the destruction phase, the unreachable objects are deallocated. The reaping phase is single threaded.