-
-
Notifications
You must be signed in to change notification settings - Fork 44
Garbage Collector Algorithm
The Felix GC uses a naive mark sweep algorithm. Copying collectors may be more efficient but require objects to be movable and so are not suitable for use in a system designed to interface with C/C++.
The collector operates in 4 phases.
Initially, all objects are assumed unreachable. The collector searches the machine stacks of all threads to find potential pointers into managed objects and adds the bases of these objects to the set of register roots to find the initial root set.
Starting with the root set, the collector marks objects it knows about reachable. It then finds the pointers in each of these objects, marks what the point at as reachable if the pointers are into managed objects, and then recursively scans those objects.
The destruction phase runs through the complete set of managed objects, checking to see if each one is reachable. If not, the objects finaliser is executed, and the object is removed from the managed set and placed into a set of objects to be deleted.
During this phase, all the objects scheduled for deletion in the destruction phase are deallocated.
A finaliser is a C function which is required to release any unmanaged resources associated with an object, it has the following interface:
typedef void finaliser_t (collector_t*, void*);
The finaliser of an object defaults to its C++ destructor. When binding a primitive in Felix, a finaliser can be provided if the destructor is not what is desired. Compiler generated objects always use the C++ destructor.
Finalisers for C objects allow emulation of C++ destruction.
There is another use for finalisers, however. It is possible for a finaliser to force the finalisation of another object before itself, thereby imposing a partial order on otherwise random finalisation. For example if a file object contains a pointer to a lock object, and the lock must be released before the file is closed, the file's finaliser can destroy the lock before closing the file. To ensure the lock is not destroyed twice, it can be marked as released.
A scanner is a C function that is responsible for finding all the pointers in an object. It is a C function with the following interface:
typedef void *scanner_t(collector_t*, gc_shape_t *, void *, size_t, int);
The arguments, in order are: a pointer to the collector, a pointer to the registered shape of the object, a pointer to some arbitrary scanner specific data, the size of the object as allocated, and an integer which restricts the depth of any recursions that the scanner can make, to protect the machine stack from overflow.
When the programmer defines a primitive type by binding it to a C/C++ they may provide a scanner for that type. If no scanner is provided, the object should not contain managed pointers. More precisely, these pointers are weak, and do not contribute to the determination as to whether what they point at is reachable.
The scan object method is a generic method used to invoke the scanner of an object. It checks if its argument is a pointer interior to a managed object. If not, it does nothing.
If it is, the pointer is adjusted to the base of the managed object, the shape of the object is found, and the scanner registered in the shape is called.
void flx_collector_t::scan_object(void *p, int reclimit)
{
again:
memdata_t memdata = check_interior(p);
if(memdata.head == NULL) return;
size_t dyncount = get_used((void*)memdata.head);
if(memdata.pshape->scanner) {
void *r = memdata.pshape->scanner(this, memdata.pshape,memdata.head,dyncount,reclimit);
if (r) { p = r; goto again; }
}
}
The scanner is required to return NULL unless the pointer is managed, and it is the sole pointer in the object. In this case the scan_object method does a self-tail recursion optimisation by jumping back to the start of its body.
The garbage collector provides a method register_pointer()
.
When a scanner finds a pointer, it calls it, passing the pointer
and recursion limit, possibly decremented by one.
The method is shown here, it is the core of the algorithm:
void flx_collector_t::register_pointer(void *q, int reclimit)
{
if (inrange(q)) {
if(reclimit==0)
{
if(gcthreads>1)
{
::std::unique_lock< ::std::mutex> dummy(j_tmp_lock);
Judy1Set(&j_tmp,(Word_t)q,&je);
j_tmp_cv.notify_one();
}
else {
Judy1Set(&j_tmp,(Word_t)q,&je);
}
if(tracefile)
fprintf(tracefile,"IT: %p\n",q);
}
else scan_object(q, reclimit-1);
}
}
In detail: the method first checks if the machine word is between the bounds of memory allocated so far. The allocation method keeps track of the lowest allocated address, and the highest allocated address plus the size of that object. Any address value outside this range cannot be a pointer to a managed object. This is an optimisation.
The next step is fundamental. We check the recursion limit. If we have not hit the limit on recursions, we call the scan_object method with the recursion limit decremented. This will eventually call register_pointer again. This is the recursive descent examining a subtree of the data structure for pointers.
If we have hit the recursion bound, there are two options.
In the single threaded case, we add the pointer to the j_tmp
buffer. This buffer is a Judy1Array, which is an ultra-high performance
implementation of a set of machine words. The implementation uses a cache
tuned digital trie. Because it models a set, adding a value already in
the set does not cause any change. This is an additional optimisation.
In the case of a multi-threaded scanning operation, a condition variable is used to serialise access to the buffer.
When the Felix compiler generates an object of a product
type, it constructs a shape object containing pointer
to the standard scanner scan_by_offsets
. This scanner
uses an array of offsets into the object which the
compiler puts into the shape table to find the
pointers in the object. For nested products, the offset
array is linearised by the compiler.
NOTE: there is a design fault here. A product component cannot use a custom scanner. A pointer to an object with custom scanner is fine!
Felix uses a buffer to hold discovered pointers which have not yet been checked to see if they point at a managed object.
The buffer used is a Judy1Array
, which model a set of machine words.
It is the fastest known data structure. Because it models a set,
attempts to add an already contained value do not change the set.
The function register_pointer()
is used to add a value to the
buffer. This function is available to the public. The reason for this
is to allow the programmer to write a custom scanner for an object
such as an STL vector of Felix pointers.
From any pointer, we find the base pointer. Recall a pointer could be interior to the object. Then we find the shape of the object, which contains an array of the offsets into the object of any pointer it contains. Each of these potential pointers is fetched and provided it is not NULL added to a buffer of potentially reachable objects.
The collector then checks if these pointers are interior to managed pointers, finds the base if they are, and then recursively examines the object pointed at.
In this process, the same pointer can be added to the buffer many times.
The recursion is limited by a fixed bound of 50, to ensure scanning of long lists does not blow the machine stack. The buffer containing the pointers to be analysed is also bounded.
After the buffer has been filled, or the recursion bottoms out, the buffer is sequentially examined. Any value which is an interior pointer causes the base object to be marked as reachable and retained in the buffer. If the object is already marked reachable that entry is deleted. Foreign pointers are deleted. Interior pointers are converted to their bases.
At the end of this process, the buffer contains base pointers to managed objects which have not been examined. The algorithm then proceeds to examine them, removing them from the buffer as it does so. The examination is the same recursive analysis, regarding the buffer as a collection of roots.
When the buffer is finally empty, more roots from the root set are put into the buffer. The mark phase of the collector terminates when all the roots have been consumed and the buffer of unexamined pointers is empty.
If the FLX_GCTHREADS
environment variable was set to an integer greater
than 1 when the system started up, then the specified number of threads
is used to perform the mark phase. All the threads use the same data structures
start from the same initial conditions, and terminate from the same final
conditions.
Locking is used ONLY on the buffer object. Surprisingly this is not only sufficient, it is maximally optimal. Given the same pointer, two threads are going to perform the same recursive descent. It would seem to lead to a lot of duplication of effort. This is not the case! Instead, because access to the buffer is serialised, once a thread has safely stashed a pointer in the buffer the recursion path through that pointer is blocked. The blockage is required even for a single thread, to prevent an infinite loop when pointers form a cycle. With the serialisation, this path cutting also prevents a thread following the same path as another with some lag.
There is actually a race going on here, but the race has a definite winner, and it is in fact this very competition which makes the algorithm so efficient. Once one thread gets ahead of another, the second thread must follow a different path. It is possible due to the presence of cycles that the paths will converge again. As before the contention is automatically resolved.
After the mark algorithm is complete, the unmarked objects from the total set of objects are known to be unreachable. The destruction phase runs their registered finalisers. It does not delete the objects yet.
This is why the destruction phase does NOT delete any objects.
The destruction phase is single threaded.
After the destruction phase, the unreachable objects are deallocated. The reaping phase is single threaded.