Equivalent programs show different behaviour #508

pclayton · 2023-08-17T20:24:19Z

I have an example where a finalizer is unexpectedly not being run. However, the finalizer is run if I transform the code to something (that I thought was) equivalent. This is using MLton 20210117.

A highly reduced example is attached which is built with the command make mlton. The code contains a function declaration

fun theFunction ({n, ...} : t) = ...

and two references to the function of the form theFunction self. When run, it produces the output

finalized a1
finalized obj1

so obj2 is not finalized. If the code is changed to declare

fun theFunction n =

and the two references to the function changed to theFunction (#n self), then it produces the output

finalized a1
finalized obj1
finalized obj2

so obj2 is finalized. Therefore, finalization of obj2 depends on whether the record component n is selected by pattern matching in the function declaration or in the argument of each call.

The reason that finalization does not occur in the original example is probably due to my alternative implementation of FINALIZABLE in the structure Finalizable (which is based on my Poly/ML implementation). At exit, the finalizers are cleaned up by code registered with both MLtonSignal.handleGC and OS.Process.atExit. There is no code that empties the list of pending finalizers at exit as the in the MLton implementation. Instead, data races are avoided by using the module SharedVar which protects access to a shared variable using MLton.Thread.atomically. It can be seen that this protection has an effect for the list of pending finalizers in this example: in the original example, changing

fun foldmap r f x = MLton.Thread.atomically (fn () => foldmap' r f x)

to

fun foldmap r f x = foldmap' r f x

i.e. removing the protection from the module SharedVar produces the output

finalized a1
finalized obj1
finalized a1
free(): double free detected in tcache 2
Aborted (core dumped)

Possibly I am misusing MLton.Thread.atomically or have done something wrong. Still, I am surprised that two seemingly equivalent programs can give different results.

The text was updated successfully, but these errors were encountered:

MatthewFluet · 2023-08-21T01:39:41Z

Using structure Finalizable = MLton.Finalizable produces the output:

finalized obj2
finalized obj1
finalized a1

whether using theFunction self or theFunction (#n self).

I'm not sure why using MLton.Thread.atomically would eliminate the need for a loop in the atExit handler. The purpose of MLton.Thread.atomically is simply to prevent implicit switching to a signal handler thread. Looking at the code:

    fun finalize doGC =
      let
        val () = doGC ()
        val cleanedSome' = cleanPendingList marshalNow globalPendingList
      in
        cleanedSome'
      end

    local
      val onGC = fn () => ignore (cleanPendingList marshalNow globalPendingList)

      fun onExit () =
        let
          val () = while finalize MLton.GC.collect do ()
        in
            ()
        end

      val () = MLtonSignal.handleGC onGC
      val () = OS.Process.atExit onExit
    in
    end

there is no guarantee that all finalizers will be run. In particular, it admits the following sequence of actions:

program concludes, call onExit handler.
call finalize MLton.GC.collect from onExit
call MLton.GC.collect() from finalize
switch to GC signal handler, call onGC
call cleanPendingList from onGC
cleanPendingList returns true (ran a finalizer); having removed that finalizer, another object (with a finalizer) becomes unreachable, but has not yet been collected
onGC returns
GC signal handler concludes, switch back to main thread
resume after MLton.GC.collect() in finalize
call cleanPendingList from finalize
cleanPendingList returns false, because no GC has yet occurred to collect the object with a finalizer
finalize returns false
while finalize MLton.GC.collect do () terminates
onExit returns
program terminates without running final finalizer

However, changing onExit to:

      fun onExit () =
        let
           (* val () = while finalize MLton.GC.collect do () *)
           val localPendingList =
              SharedVar.new (SharedVar.foldmap globalPendingList (fn ((), l) => (l, [])) ())
           fun loop () =
              let
                 val () = MLton.GC.collect ()
              in
                 if cleanPendingList marshalNow localPendingList
                    then loop ()
                    else ()
              end
        in
           loop ()
        end

does not change the behavior of the program (no finalized obj2 message printed).

pclayton · 2023-09-08T18:56:57Z

Using structure Finalizable = MLton.Finalizable produces the output:
finalized obj2
finalized obj1
finalized a1
whether using theFunction self or theFunction (#n self).

The original example does not use MLton.Finalizable because it makes use of additional finalizable features. All use of the new features was eliminated when reducing the example. (I will raise issues to discuss these new features.)

I'm not sure why using MLton.Thread.atomically would eliminate the need for a loop in the atExit handler.

There is a while-loop in the atExit handler. Have I misunderstood your comment?

The purpose of MLton.Thread.atomically is simply to prevent implicit switching to a signal handler thread. Looking at the code:

    fun finalize doGC =
      let
        val () = doGC ()
        val cleanedSome' = cleanPendingList marshalNow globalPendingList
      in
        cleanedSome'
      end

    local
      val onGC = fn () => ignore (cleanPendingList marshalNow globalPendingList)

      fun onExit () =
        let
          val () = while finalize MLton.GC.collect do ()
        in
            ()
        end

      val () = MLtonSignal.handleGC onGC
      val () = OS.Process.atExit onExit
    in
    end

there is no guarantee that all finalizers will be run. In particular, it admits the following sequence of actions:

* program concludes, call `onExit` handler.

* call `finalize MLton.GC.collect` from `onExit`

* call `MLton.GC.collect()` from `finalize`

* switch to GC signal handler, call `onGC`

* call `cleanPendingList` from `onGC`

* `cleanPendingList` returns `true` (ran a finalizer); having removed that finalizer, another object (with a finalizer) becomes unreachable, but has not yet been collected

* `onGC` returns

* GC signal handler concludes, switch back to main thread

* resume after `MLton.GC.collect()` in `finalize`

* call `cleanPendingList` from `finalize`

* `cleanPendingList` returns `false`, because no GC has yet occurred to collect the object with a finalizer

* `finalize` returns `false`

* `while finalize MLton.GC.collect do ()` terminates

* `onExit` returns

* program terminates without running final finalizer

Good point. The handler cannot rely on the result of finalize MLton.GC.collect as you observe. By the way, this issue is much less likely in the original example where finalize inhibits one round of cleaning by the handleGC handler to prevent double-cleaning. However, this is just for performance - to avoid excessive cleaning - and the issue may still occur.

However, changing onExit to:

      fun onExit () =
        let
           (* val () = while finalize MLton.GC.collect do () *)
           val localPendingList =
              SharedVar.new (SharedVar.foldmap globalPendingList (fn ((), l) => (l, [])) ())
           fun loop () =
              let
                 val () = MLton.GC.collect ()
              in
                 if cleanPendingList marshalNow localPendingList
                    then loop ()
                    else ()
              end
        in
           loop ()
        end

I think this would lose finalizers as you observed in a message to MLton-user. It would be possible to apply a similar fix but this becomes quite a lot of code in the original example because there are 'finalizer contexts' which involves a set of pending lists, once for each context. I have tried a different approach, using a boolean notifier, to ensure that finalize returns false only if no cleaning occurred between GC and the function returning. I think this makes the following code work:

while finalize MLton.GC.collect do ()

The updated example is attached.

does not change the behavior of the program (no finalized obj2 message printed).

Also, the updated example does not change the behavior (no finalized obj2 message printed). It's strange.

MatthewFluet · 2023-09-12T20:18:14Z

I'm not sure why using MLton.Thread.atomically would eliminate the need for a loop in the atExit handler.

There is a while-loop in the atExit handler. Have I misunderstood your comment?

I was responding to the following from the original post:

There is no code that empties the list of pending finalizers at exit as the in the MLton implementation. Instead, data races are avoided by using the module SharedVar which protects access to a shared variable using MLton.Thread.atomically.

I misunderstood what was meant by "empties the list of pending finalizers". I took it to mean "the loop that keeps running finalizers", but I realize now that it means "setting r := []".

However, changing onExit to:

      fun onExit () =
        let
           (* val () = while finalize MLton.GC.collect do () *)
           val localPendingList =
              SharedVar.new (SharedVar.foldmap globalPendingList (fn ((), l) => (l, [])) ())
           fun loop () =
              let
                 val () = MLton.GC.collect ()
              in
                 if cleanPendingList marshalNow localPendingList
                    then loop ()
                    else ()
              end
        in
           loop ()
        end

I think this would lose finalizers as you observed in a message to MLton-user.

Agreed, though it shouldn't make a difference in this example, because no finalizers are created by finalizers.

Also, the updated example does not change the behavior (no finalized obj2 message printed). It's strange.

I haven't looked at the updated example in detail, but agreed that is a strange behavior that no finalized obj2 message is printed.

@MatthewFluet

These finalize functions are expected to return false only if no cleaning occurred between evaluation of the `doGC` function argument and the function returning, to indicate that further GC would not find any other unreachable finalizable values, but this was not guaranteed [1]: after one of these finalize functions evaluated its `doGC` function argument, another thread (Poly/ML) or handler (MLton) could clean some pending lists so the finalize function performed no further cleaning and returned false, even though further GC could make more finalizable values reachable, enabling more finalizers to be run. This issue is fixed by introducing a list of notifiers with each pending list. A notifier is a `bool ref`. When a pending list is cleaned, its notifiers are all set to true. A finalize function adds a notifier before evaluating `doGC` and removes the notifier after cleaning, returning the notifier value. Therefore a finalize function is guaranteed to detect any cleaning between evaluation of the `doGC` function argument and the returning. Note that cleaning outside this window may be detected causing the finalize function to return true when it could return false but this does not matter and is unlikely. Another issue with these functions is that they caused the next round of cleaning by the GC handler to be skipped, to avoid double-cleaning, but the `doGC` function argument may not invoke GC: it could simply be the identity function. Therefore cleaning after a subsequent GC would not occur when it should. The issue is avoided by using a different method to reduce the likelihood of double-cleaning: cleaning in the GC handler cleans a pending list only if it has no notifiers. 1. As pointed out by @MatthewFluet MLton/mlton#508 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Equivalent programs show different behaviour #508

Equivalent programs show different behaviour #508

pclayton commented Aug 17, 2023

MatthewFluet commented Aug 21, 2023

pclayton commented Sep 8, 2023

MatthewFluet commented Sep 12, 2023

Equivalent programs show different behaviour #508

Equivalent programs show different behaviour #508

Comments

pclayton commented Aug 17, 2023

MatthewFluet commented Aug 21, 2023

pclayton commented Sep 8, 2023

MatthewFluet commented Sep 12, 2023