Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process gc preserve #58

Merged
merged 9 commits into from
Jun 24, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions src/gc.c
Original file line number Diff line number Diff line change
Expand Up @@ -3583,6 +3583,16 @@ JL_DLLEXPORT void jl_gc_wb1_noinline(const void *parent) JL_NOTSAFEPOINT
jl_unreachable();
}

JL_DLLEXPORT void jl_gc_preserve_begin_hook(int n, ...) JL_NOTSAFEPOINT
{
jl_unreachable();
}

JL_DLLEXPORT void jl_gc_preserve_end_hook(void) JL_NOTSAFEPOINT
{
jl_unreachable();
}

JL_DLLEXPORT void jl_gc_wb2_noinline(const void *parent, const void *ptr) JL_NOTSAFEPOINT
{
jl_unreachable();
Expand Down
2 changes: 2 additions & 0 deletions src/jl_exported_funcs.inc
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,8 @@
XX(jl_gc_pool_alloc_instrumented) \
XX(jl_gc_queue_multiroot) \
XX(jl_gc_queue_root) \
XX(jl_gc_preserve_begin_hook) \
XX(jl_gc_preserve_end_hook) \
XX(jl_gc_wb1_noinline) \
XX(jl_gc_wb2_noinline) \
XX(jl_gc_wb_binding_noinline) \
Expand Down
4 changes: 4 additions & 0 deletions src/julia.h
Original file line number Diff line number Diff line change
Expand Up @@ -2110,6 +2110,10 @@ typedef struct _jl_task_t {
int8_t threadpoolid;
// saved gc stack top for context switches
jl_gcframe_t *gcstack;
#ifdef MMTK_GC
// GC stack of objects that need to be transitively pinned
jl_gcframe_t *tpin_gcstack;
#endif
size_t world_age;
// quick lookup for current ptls
jl_ptls_t ptls; // == jl_all_tls_states[tid]
Expand Down
34 changes: 0 additions & 34 deletions src/llvm-alloc-opt.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -53,26 +53,6 @@ STATISTIC(RemovedGCPreserve, "Total number of GC preserve instructions removed")

namespace {

static void removeGCPreserve(CallInst *call, Instruction *val)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this function removed? The compiler removes the GC preserve calls if the values to be preserved are removed, or moved to the stack allocation. I don't know why we want to remove this optimization. Furthermore, I am concerned that in those cases where GC preserve is kept, the preserved value is replaced with a null pointer -- preserving a null pointer sounds meaningless.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I misunderstood the code twice.
(1) I thought I was deleting the gc preserve after it had been lowered, ie., after the necessary objects arguments to preserve_begin have been pushed to the shadow stack, but I don't think that is the case.
(2) Since the if case only covered pass.gc_preserve_begin_func, I thought I'd end up with dangling hooks for the preserve_end and that would cause problems with the push and pop hook functions, but the matching pop functions are also removed.
Can you explain what you meant by "moved to stack allocation"? I'm just trying to understand whether removing those cases could cause any correctness issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what you meant by "moved to stack allocation"?

If I understand right, this optimization tries to change heap allocation to stack allocation if possible.

void Optimizer::moveToStack(CallInst *orig_inst, size_t sz, bool has_ref)

If a value x that was heap allocated and is replaced with stack allocation by the optimization, all of its uses need to be fixed with replace_inst.
auto replace_inst = [&] (Instruction *user) {

If we have gc_preserve(x), replace_inst either removes the gc_preserve (removeGCPreserve()), or replace x with the alloca buff on the stack.
if (pass.gc_preserve_begin_func == callee) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you remove the code that checks the gc_preserve_begin_func case, it will fall into the default case.

Value *replace = has_ref ? (Value*)buff : Constant::getNullValue(orig_i->getType());

It either replaces x with the alloca buff (which is fine), or replaces x with null. Then you may see null pointers in the preserve begin hook, and you will need to deal with null pointers in the hook.

I don't think it causes correctness issues, but doing a gc preserve call for null pointers is meaningless.

Copy link
Author

@udesou udesou Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! In that case, it might actually be problematic if we remove that particular gc preserve. The reason being the fact that we need to transitively pin everything in that object's transitive closure, independently of that object itself being allocated in the heap or in the stack.

If we have gc_preserve(x), replace_inst either removes the gc_preserve (removeGCPreserve()), or replace x with the alloca buff on the stack.

Okay, then it should be fine, since the gc preserve is removed only if it doesn't have anything referring to it (has_ref == 0, which I believe would also be considering any possible c call).

{
++RemovedGCPreserve;
auto replace = Constant::getNullValue(val->getType());
call->replaceUsesOfWith(val, replace);
call->setAttributes(AttributeList());
for (auto &arg: call->args()) {
if (!isa<Constant>(arg.get())) {
return;
}
}
while (!call->use_empty()) {
auto end = cast<Instruction>(*call->user_begin());
// gc_preserve_end returns void.
assert(end->use_empty());
end->eraseFromParent();
}
call->eraseFromParent();
}

/**
* Promote `julia.gc_alloc_obj` which do not have escaping root to a alloca.
* Uses that are not considered to escape the object (i.e. heap address) includes,
Expand Down Expand Up @@ -652,16 +632,6 @@ void Optimizer::moveToStack(CallInst *orig_inst, size_t sz, bool has_ref)
call->eraseFromParent();
return;
}
// Also remove the preserve intrinsics so that it can be better optimized.
if (pass.gc_preserve_begin_func == callee) {
if (has_ref) {
call->replaceUsesOfWith(orig_i, buff);
}
else {
removeGCPreserve(call, orig_i);
}
return;
}
if (pass.write_barrier_func == callee ||
pass.write_barrier_binding_func == callee) {
++RemovedWriteBarriers;
Expand Down Expand Up @@ -761,10 +731,6 @@ void Optimizer::removeAlloc(CallInst *orig_inst)
}
else if (auto call = dyn_cast<CallInst>(user)) {
auto callee = call->getCalledOperand();
if (pass.gc_preserve_begin_func == callee) {
removeGCPreserve(call, orig_i);
return;
}
if (pass.typeof_func == callee) {
++RemovedTypeofs;
call->replaceAllUsesWith(tag);
Expand Down
12 changes: 8 additions & 4 deletions src/llvm-final-gc-lowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@ struct FinalLowerGC: private JuliaPassContext {
Function *bigAllocFunc;
Function *allocTypedFunc;
#ifdef MMTK_GC
Function *gcPreserveBeginHookFunc;
Function *gcPreserveEndHookFunc;
Function *writeBarrier1Func;
Function *writeBarrier2Func;
Function *writeBarrierBindingFunc;
Expand Down Expand Up @@ -145,7 +147,7 @@ void FinalLowerGC::lowerPushGCFrame(CallInst *target, Function &F)
IRBuilder<> builder(target->getContext());
builder.SetInsertPoint(&*(++BasicBlock::iterator(target)));
StoreInst *inst = builder.CreateAlignedStore(
ConstantInt::get(getSizeTy(F.getContext()), JL_GC_ENCODE_PUSHARGS(nRoots)),
ConstantInt::get(getSizeTy(F.getContext()), JL_GC_ENCODE_PUSHARGS_NO_TPIN(nRoots)),
builder.CreateBitCast(
builder.CreateConstInBoundsGEP1_32(T_prjlvalue, gcframe, 0),
getSizeTy(F.getContext())->getPointerTo()),
Expand Down Expand Up @@ -407,12 +409,14 @@ bool FinalLowerGC::doInitialization(Module &M) {
bigAllocFunc = getOrDeclare(jl_well_known::GCBigAlloc);
allocTypedFunc = getOrDeclare(jl_well_known::GCAllocTyped);
#ifdef MMTK_GC
gcPreserveBeginHookFunc = getOrDeclare(jl_well_known::GCPreserveBeginHook);
gcPreserveEndHookFunc = getOrDeclare(jl_well_known::GCPreserveEndHook);
writeBarrier1Func = getOrDeclare(jl_well_known::GCWriteBarrier1);
writeBarrier2Func = getOrDeclare(jl_well_known::GCWriteBarrier2);
writeBarrierBindingFunc = getOrDeclare(jl_well_known::GCWriteBarrierBinding);
writeBarrier1SlowFunc = getOrDeclare(jl_well_known::GCWriteBarrier1Slow);
writeBarrier2SlowFunc = getOrDeclare(jl_well_known::GCWriteBarrier2Slow);
GlobalValue *functionList[] = {queueRootFunc, poolAllocFunc, bigAllocFunc, writeBarrier1Func, writeBarrier2Func, writeBarrierBindingFunc, writeBarrier1SlowFunc, writeBarrier2SlowFunc};
GlobalValue *functionList[] = {queueRootFunc, poolAllocFunc, bigAllocFunc, gcPreserveBeginHookFunc, gcPreserveEndHookFunc, writeBarrier1Func, writeBarrier2Func, writeBarrierBindingFunc, writeBarrier1SlowFunc, writeBarrier2SlowFunc};
#else
GlobalValue *functionList[] = {queueRootFunc, queueBindingFunc, poolAllocFunc, bigAllocFunc, allocTypedFunc};
#endif
Expand All @@ -432,8 +436,8 @@ bool FinalLowerGC::doInitialization(Module &M) {
bool FinalLowerGC::doFinalization(Module &M)
{
#ifdef MMTK_GC
GlobalValue *functionList[] = {queueRootFunc, poolAllocFunc, bigAllocFunc, writeBarrier1Func, writeBarrier2Func, writeBarrierBindingFunc, writeBarrier1SlowFunc, writeBarrier2SlowFunc};
queueRootFunc = poolAllocFunc = bigAllocFunc = writeBarrier1Func = writeBarrier2Func = writeBarrierBindingFunc = writeBarrier1SlowFunc = writeBarrier2SlowFunc = nullptr;
GlobalValue *functionList[] = {queueRootFunc, poolAllocFunc, bigAllocFunc, gcPreserveBeginHookFunc, gcPreserveEndHookFunc, writeBarrier1Func, writeBarrier2Func, writeBarrierBindingFunc, writeBarrier1SlowFunc, writeBarrier2SlowFunc};
queueRootFunc = poolAllocFunc = bigAllocFunc = gcPreserveBeginHookFunc = gcPreserveEndHookFunc = writeBarrier1Func = writeBarrier2Func = writeBarrierBindingFunc = writeBarrier1SlowFunc = writeBarrier2SlowFunc = nullptr;
#else
GlobalValue *functionList[] = {queueRootFunc, queueBindingFunc, poolAllocFunc, bigAllocFunc, allocTypedFunc};
queueRootFunc = queueBindingFunc = poolAllocFunc = bigAllocFunc = allocTypedFunc = nullptr;
Expand Down
47 changes: 45 additions & 2 deletions src/llvm-late-gc-lowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2308,9 +2308,52 @@ bool LateLowerGCFrame::CleanupIR(Function &F, State *S, bool *CFGModified) {
continue;
}
Value *callee = CI->getCalledOperand();
if (callee && (callee == gc_flush_func || callee == gc_preserve_begin_func
|| callee == gc_preserve_end_func)) {
if (callee && (callee == gc_flush_func)) {
/* No replacement */
} else if (callee && (callee == gc_preserve_begin_func)) {
/* Replace with a call to the hook functions */
// Initialize an IR builder.
IRBuilder<> builder(CI);

builder.SetCurrentDebugLocation(CI->getDebugLoc());
size_t nargs = 0;
State S2(F);

std::vector<Value*> args;
for (Use &U : CI->args()) {
Value *V = U;
if (isa<Constant>(V))
continue;
if (isa<PointerType>(V->getType())) {
if (isSpecialPtr(V->getType())) {
int Num = Number(S2, V);
if (Num >= 0) {
nargs++;
Value *Val = GetPtrForNumber(S2, Num, CI);
args.push_back(Val);
}
}
} else {
std::vector<int> Nums = NumberAll(S2, V);
for (int Num : Nums) {
if (Num < 0)
continue;
Value *Val = GetPtrForNumber(S2, Num, CI);
args.push_back(Val);
nargs++;
}
}
}
args.insert(args.begin(), ConstantInt::get(T_size, nargs));

ArrayRef<Value*> args_llvm = ArrayRef<Value*>(args);
builder.CreateCall(getOrDeclare(jl_well_known::GCPreserveBeginHook), args_llvm );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCPreserveBeginHook is only compiled when MMTK_GC is set, but this code here is executed for all the builds. The stock build would fail in this case. Same for GCPreserveEndHook below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. It turns out that the stock build was already broken because of a file not being compiled in Makefile, but I've fixed that too and now both builds (stock and MMTk) should work fine.

} else if (callee && (callee == gc_preserve_end_func)) {
/* Replace with a call to the hook functions */
// Initialize an IR builder.
IRBuilder<> builder(CI);
builder.SetCurrentDebugLocation(CI->getDebugLoc());
builder.CreateCall(getOrDeclare(jl_well_known::GCPreserveEndHook), {});
} else if (pointer_from_objref_func != nullptr && callee == pointer_from_objref_func) {
auto *obj = CI->getOperand(0);
auto *ASCI = new AddrSpaceCastInst(obj, JuliaType::get_pjlvalue_ty(obj->getContext()), "", CI);
Expand Down
30 changes: 30 additions & 0 deletions src/llvm-pass-helpers.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -334,6 +334,8 @@ namespace jl_well_known {
static const char *GC_QUEUE_BINDING_NAME = XSTR(jl_gc_queue_binding);
static const char *GC_ALLOC_TYPED_NAME = XSTR(jl_gc_alloc_typed);
#ifdef MMTK_GC
static const char *GC_PRESERVE_BEGIN_HOOK_NAME = XSTR(jl_gc_preserve_begin_hook);
static const char *GC_PRESERVE_END_HOOK_NAME = XSTR(jl_gc_preserve_end_hook);
static const char *GC_WB_1_NAME = XSTR(jl_gc_wb1_noinline);
static const char *GC_WB_2_NAME = XSTR(jl_gc_wb2_noinline);
static const char *GC_WB_BINDING_NAME = XSTR(jl_gc_wb_binding_noinline);
Expand Down Expand Up @@ -424,6 +426,34 @@ namespace jl_well_known {
});

#ifdef MMTK_GC
const WellKnownFunctionDescription GCPreserveBeginHook(
GC_PRESERVE_BEGIN_HOOK_NAME,
[](const JuliaPassContext &context) {
auto func = Function::Create(
FunctionType::get(
Type::getVoidTy(context.getLLVMContext()),
{ T_size_t(context) },
true),
Function::ExternalLinkage,
GC_PRESERVE_BEGIN_HOOK_NAME);

func->addFnAttr(Attribute::InaccessibleMemOrArgMemOnly);
return func;
});

const WellKnownFunctionDescription GCPreserveEndHook(
GC_PRESERVE_END_HOOK_NAME,
[](const JuliaPassContext &context) {
auto func = Function::Create(
FunctionType::get(
Type::getVoidTy(context.getLLVMContext()),
{ },
false),
Function::ExternalLinkage,
GC_PRESERVE_END_HOOK_NAME);
func->addFnAttr(Attribute::InaccessibleMemOrArgMemOnly);
return func;
});
const WellKnownFunctionDescription GCWriteBarrier1(
GC_WB_1_NAME,
[](const JuliaPassContext &context) {
Expand Down
4 changes: 4 additions & 0 deletions src/llvm-pass-helpers.h
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,8 @@ namespace jl_intrinsics {
extern const IntrinsicDescription safepoint;

#ifdef MMTK_GC
extern const IntrinsicDescription gcPreserveBeginHook;
extern const IntrinsicDescription gcPreserveEndHook;
extern const IntrinsicDescription writeBarrier1;
extern const IntrinsicDescription writeBarrier2;
extern const IntrinsicDescription writeBarrierBinding;
Expand Down Expand Up @@ -168,6 +170,8 @@ namespace jl_well_known {
extern const WellKnownFunctionDescription GCAllocTyped;

#ifdef MMTK_GC
extern const WellKnownFunctionDescription GCPreserveBeginHook;
extern const WellKnownFunctionDescription GCPreserveEndHook;
extern const WellKnownFunctionDescription GCWriteBarrier1;
extern const WellKnownFunctionDescription GCWriteBarrier2;
extern const WellKnownFunctionDescription GCWriteBarrierBinding;
Expand Down
39 changes: 39 additions & 0 deletions src/mmtk-gc.c
Original file line number Diff line number Diff line change
Expand Up @@ -570,6 +570,45 @@ JL_DLLEXPORT void jl_gc_array_ptr_copy(jl_array_t *dest, void **dest_p, jl_array
mmtk_memory_region_copy(&ptls->mmtk_mutator, jl_array_owner(src), src_p, jl_array_owner(dest), dest_p, n);
}

#define jl_p_tpin_gcstack (jl_current_task->tpin_gcstack)

#define JL_GC_PUSHARGS_TPIN_ROOT_OBJS(rts_var,n) \
rts_var = ((jl_value_t**)malloc(((n)+2)*sizeof(jl_value_t*)))+2; \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These need to be clarified.

  1. Why do you need to use malloc instead of alloca in
    rts_var = ((jl_value_t**)alloca(((n)+2)*sizeof(jl_value_t*)))+2; \
  2. Why do you need a separate tpin_gcstack in the task? Obviously you can put tpinned roots in the normal gcstack?
  3. If JL_GC_PUSHARGS_TPIN_ROOT_OBJS and tpin_gcstack is only used by gc preserve, probably just call them gc preserve frames or something. Calling them tpin is more confusing, as you can clearly push tpin roots to the normal stack and use existing JL_GC_PUSH.
  4. Add some comments so we know why it is implemented like this.

((void**)rts_var)[-2] = (void*)JL_GC_ENCODE_PUSHARGS(n); \
((void**)rts_var)[-1] = jl_p_tpin_gcstack; \
memset((void*)rts_var, 0, (n)*sizeof(jl_value_t*)); \
jl_p_tpin_gcstack = (jl_gcframe_t*)&(((void**)rts_var)[-2]); \

#define JL_GC_POP_TPIN_ROOT_OBJS() \
jl_gcframe_t *curr = jl_p_tpin_gcstack; \
if(curr) { \
(jl_p_tpin_gcstack = jl_p_tpin_gcstack->prev); \
free(curr); \
}

// Add each argument as a tpin root object.
// However, we cannot use JL_GC_PUSH and JL_GC_POP since the slots should live
// beyond this function. Instead, we maintain a tpin stack by mallocing/freeing
// the frames for each of the preserve regions we encounter
JL_DLLEXPORT void jl_gc_preserve_begin_hook(int n, ...) JL_NOTSAFEPOINT
{
jl_value_t** frame;
JL_GC_PUSHARGS_TPIN_ROOT_OBJS(frame, n);
if (n == 0) return;

va_list args;
va_start(args, n);
for (int i = 0; i < n; i++) {
frame[i] = va_arg(args, jl_value_t *);
}
va_end(args);
}

JL_DLLEXPORT void jl_gc_preserve_end_hook(void) JL_NOTSAFEPOINT
{
JL_GC_POP_TPIN_ROOT_OBJS();
}

// No inline write barrier -- only used for debugging
JL_DLLEXPORT void jl_gc_wb1_noinline(const void *parent) JL_NOTSAFEPOINT
{
Expand Down