Memory Management

With Giraffe Library, memory management of foreign (i.e. non-SML) objects on the heap is automatic. Therefore, it is not necessary to call functions to allocate or free memory. For reference-counted foreign objects, it is not necessary (nor possible) to increment or decrement reference counts: a reference is held by the SML runtime while the object is needed, i.e. reachable in the SML program state. Given this, any API documentation relating to ownership, memory allocation and deallocation and reference counts can be ignored. Once a foreign object is no longer needed, it is finalized: its memory is freed or the reference from SML is dropped.

Implementation details

To determine when foreign objects can be finalized, their use is tracked using weak references: once the SML garbage collector finds that a foreign object is no longer reachable, the object is finalized. With this approach, there are the following potential issues:

  1. The SML garbage collector may trigger finalization at any time while SML code is executing but finalizing an unreachable GObject instance while executing code of another GObject instance could cause corruption of data (because implementations of GObject classes are not generally thread-safe).

  2. The SML garbage collector cannot detect reference cycles involving foreign objects, preventing finalization.

  3. The SML garbage collector may not run automatically as often as needed, delaying finalization and causing unnecessary memory usage.

There are a few things SML programs must do to ensure that foreign objects are finalized once no longer required. These are summarized in the following list and described in further detail in the following sections.

  1. Support asynchronous or synchronous GObject finalization

    Both asynchronous and synchronous finalization can be used in the same program. Asynchronous finalization requires a running main context and finalization is performed automatically in a source callback. This can be enabled using

    Giraffe.Finalize.enableAsyncInContext NONE NONE

    where the NONE arguments specify the default main context and default priority for the source callback, respectively. Synchronous finalization occurs where a program evaluates

    Giraffe.Finalize.sync Giraffe.GC.full

    or just

    Giraffe.Finalize.sync (fn () => ())

    if garbage collection does not need to be performed first.

  2. Avoid reference cycles involving foreign objects

    When connecting a function f to handle a signal of an object obj, e.g.

    Signal.connect obj (Namespace.Type.aSig, f)

    f should not depend on the SML value obj, neither directly nor indirectly. To refer to the object, f should use its argument. If f refers to another object with a handler function that refers to obj, the handler functions should use weak references to the objects in the cycle where not available as arguments. If f does depend on the SML value obj, the object cannot be finalized until the handler function is disconnected (using Signal.handlerDisconnect).

  3. Trigger garbage collection explicitly

    If the SML runtime does not trigger garbage collection often enough, memory use may become unnecessarily high, requiring the program to trigger garbage collection explicitly using Giraffe.GC.full.

The debugging output produced by options of the environment variable GIRAFFE_DEBUG can be used to check when finalization occurs. With the option log-mem, ownership of foreign objects from the SML runtime is logged and finalization is indicated by messages of the form

[giraffe-debug-mem] timestamp free ...

With the option finalizers-pending-on-exit, numbers of foreign objects that could not be finalized when the application exits are logged in a message of the form

[giraffe-debug-finalizers-pending-on-exit] n1 n2 ...

Any non-zero values suggest that a reference cycle is present. This debug option is recommended during development to detect reference cycles early. If a non-zero value does occur, the unfinalized objects are identified by forcing their finalization using the option force-finalization-on-exit and logging their finalization using the option log-mem. These options can be combined as follows:

GIRAFFE_DEBUG={force-finalization-on-exit,log-{finalizers-pending-on-exit,mem}}

Then, the finalization log messages following [giraffe-debug-finalizers-pending-on-exit] are for the objects that could not be finalized. Note that forcing finalization could, in principle, fail but at least some unfinalized object should be identifiable giving an indication of the reference cycle.

Supporting asynchronous or synchronous GObject finalization

Implementation details

For both Poly/ML and MLton, finalizer functions triggered following garbage collection are run in a separate thread from the main application. (In the case of Poly/ML, this is a separate OS thread.) Consequently, a finalizer function may run concurrently with the application. A GObject instance is finalized by dropping the reference held by the SML runtime (using g_object_unref internally). If this is last reference to the instance, this invokes GObject code to destroy the instance. Therefore, using a finalizer function to immediately finalize a GObject instance could result in code to destroy the instance running concurrently with the main application, which could be running the code of another GObject. This is problematic because the implementation of many GObject classes (including all GTK classes) is not thread-safe.

Giraffe Library assumes that no GObject implementation is thread-safe and ensures that no GObject code runs concurrently. This requires GObject finalization to be performed from the main application thread. To achieve this, the finalizer function either

  • adds an idle source function that performs GObject finalization to the main context of the application (asynchronous), or

  • does nothing, requiring the application code to explicitly call finalization (synchronous).

With asynchronous finalization, a single idle source function can perform finalization of many GObject instances.

Giraffe Library requires an application to finalize GObject instances asynchronously or synchronously, possibly using both methods. If neither method is used, no GObject instance will be finalized until the application exits, which may suffice for very simple applications but would generally cause unnecessary memory use.

Asynchronous finalization allows foreign objects to be finalized automatically following garbage collection. It requires a running main context and is enabled using

Giraffe.Finalize.enableAsyncInContext optContext optPriority

optContext specifies the main context to use: its value is either NONE, to use the thread-default main context, or SOME context. optPriority specifies the priority to use for the source function that performs finalization: its value is either NONE, to use the default idle priority, GLib.PRIORITY_DEFAULT_IDLE, or SOME priority.

Typically, an application would evaluate

Giraffe.Finalize.enableAsyncInContext NONE NONE

before running the default main context.

Synchronous finalization causes foreign objects to be finalized at specific points in the application code. It occurs where the application evaluates

Giraffe.Finalize.sync doGC

doGC is a function that is evaluated before finalizing any unreachable foreign objects. Typically, it would be Giraffe.GC.full to perform garbage collection. Any foreign objects found unreachable as a result of evaluating doGC are guaranteed to be finalized before Giraffe.Finalize.sync returns.

The expression

Giraffe.Finalize.sync Giraffe.GC.full

is not generally equivalent to

Giraffe.GC.full ();
Giraffe.Finalize.sync (fn () => ())

The latter does not guarantee that foreign objects found unreachable by Giraffe.GC.full are finalized before Giraffe.Finalize.sync returns if asynchronous finalization is enabled, in which case foreign objects found unreachable may be finalized some time later asynchronously.

Avoiding reference cycles involving foreign objects

Implementation details

A reference cycle involves two or more objects where each object depends on itself via the other objects in the cycle. For example, a reference cycle involving objects A, B and C exists if A references B, B references C and C references A. If there are no other references to A, B and C, then they are unreachable and no longer required. An SML garbage collector is able to detect reference cycles involving SML objects but it cannot know that a foreign object effectively holds a reference to an SML object (in the sense that destroying the foreign object would remove the reference to the SML object) nor can it know what else references a foreign object. Therefore, a foreign object in a reference cycle cannot be finalized nor can any SML objects in the cycle be garbage-collected.

In practice, the only way a foreign object references an SML object is when it holds a reference to an SML function to call. The most common example is when an SML function f is connected to handle emission of a signal from a GObject instance obj: obj effectively holds a reference to f because the reference to f persists until obj is destroyed (unless f is disconnected). If f references the SML value obj, either directly or indirectly via other objects, then there is a reference cycle. In this case, it would be possible to finalize obj once its only reference is (directly or indirectly) from the SML function (because the handler would be removed during destruction of obj) but the SML garbage collector cannot determine this so neither obj nor all SML objects in the reference cycle will ever be garbage collected.

Therefore, for each expression of the form

Signal.connect obj (Namespace.Type.aSig, f)

or

Signal.connectAfter obj (Namespace.Type.aSig, f)

f must not reference the SML value obj, either directly or indirectly via other objects, to allow obj to be finalized while f is still connected.

If f does reference obj, then the signal handler must be disconnected to allow obj to be finalized. This can be done explicitly using Signal.handlerDisconnect and, for Gtk3 only, using the functions in the special structure ChildSignal which allow automatic disconnection of a handler function when some widget receives the “destroy” signal. Still, disconnecting handler functions is a burden so it is better that f does not reference obj but uses another reference to the same object, as described below.

With Gtk3, objects of subclasses of GtkWidget could have signal handlers that create a reference cycle and yet are still cleaned up automatically. This can happen because all signal handlers are disconnected, breaking the reference cycle, when a widget is destroyed. (It is this behaviour that enables functions in the special structure ChildSignal to connect a handler to the “destroy” signal of a widget without then preventing finalization of the widget.)

Implementation details

In Gtk3, widgets are typically destroyed using gtk_widget_destroy which calls g_object_run_dispose which ultimately calls the dispose function of the GObject class where all signal handlers are disconnected.

This behaviour should not be relied upon because it does not avoid the issue for objects of classes not derived from GtkWidget, e.g. GtkTextBuffer, and does not carry over to Gtk4.

When a handler function f references the object whose signal it is handling, it should reference the object using the SML value from its argument. (This uses a new reference to the object that is obtained each time f runs, allowing the reference to be released afterwards.) Such a function f can be connected to handle the signal on the object obj as follows:

val _ = Signal.connect obj (Namespace.Type.aSig, f)

Connecting an equivalent handler function that references the object using the SML value obj instead of its argument, as follows, would introduce a reference cycle:

val _ = Signal.connect obj (Namespace.Type.aSig, fn _ => f obj)  (* causes reference cycle *)

When an object obj1 has a handler function, f1 obj2, that refererences an object obj2, and obj2 has a handler function, f2 obj1, that references obj1, there is a reference cycle. For example, connecting to signals as follows prevents finalization of obj1 and obj2:

val _ = Signal.connect obj1 (someSignalName1Sig, f1 obj2)
val _ = Signal.connect obj2 (someSignalName2Sig, f2 obj1)  (* causes reference cycle *)

In this case, the arguments of the handler functions do not provide SML value for all the required objects so a more general solution is required: weak references to obj1 and obj2 are created and, when run, the handler functions get a reference from the weak reference. This requires the handler functions to handle the case when a reference cannot be obtained. The example above can be fixed as follows:

val obj1' = Weak.new obj1
val obj2' = Weak.new obj2
fun f1' obj1 =
  case Weak.get obj2' of
    SOME obj2 => f1 obj2 obj1
  | NONE => …
fun f2' obj2 =
  case Weak.get obj1' of
    SOME obj1 => f2 obj1 obj2
  | NONE => …
val _ = Signal.connect obj1 (someSignalName1Sig, f1')
val _ = Signal.connect obj2 (someSignalName2Sig, f2')

Triggering garbage collection explicitly

Each SML compiler has its own criteria for triggering garbage collection which may not consider memory allocated for non-SML objects. This may result in garbage collection occurring infrequently, causing unnecessarily high memory use and a perceptible delay when it does occur which may be detrimental to the user experience.

If infrequent garbage collection causes an issue, the application should trigger garbage collection explicitly by evaluating Giraffe.GC.full (). (This could be evaluated as part of synchronous GObject finalization using Giraffe.Finalize.sync Giraffe.GC.full.) The function Giraffe.GC.full is synchronous, so it does not return until garbage collection (but not finalization) is complete.

Exactly where an application should trigger garbage collection depends on the application. For example, if timing is unimportant, garbage collection could be triggered periodically using a timeout callback. The following code adds a timeout callback to trigger garbage collection every 5 seconds indefinitely:

val _ =
  GLib.timeoutAddSeconds (
    GLib.PRIORITY_LOW,
    5,
    fn () => (Giraffe.GC.full (); GLib.SOURCE_CONTINUE)
  )