Write Barriers

I've started working on a new edition of Ruby Under a Microscope that covers Ruby 3.x. I'm working on this in my spare time, so it will take a while. Leave a comment or drop me a line and I'll email you when it's finished.

Ruby’s garbage collection implementation is complex and confusing - yet powerful and beautiful at the same time. Chapter 13 summarizes and explains advanced aspects of Ruby’s garbage collector. I love learning about the difficult bits at a deep enough level to be able to explain them to someone else. In RUM I often show snippets from Ruby's C source code in gray callouts. These sections are for readers who already understand C syntax, or who would like to learn it. These can give you sign posts in case you ever decide to read Ruby’s source code for yourself.

Chapter 13: Incremental and Generational Garbage Collection

Incremental GC3
Marking in More Detail4
Stop The World Garbage Collection8
Incremental Garbage Collection10
How Ruby Implements Incremental GC11
Write Barriers12
A Write Barrier in Action14
Experiment 13-1: What happens if GC can’t free enough slots?15
Generational GC22
Keeping Track of an Object’s Age23
How Marking and Sweeping Work Together24
Seeing Generational GC In Action26
Write Barriers Again29
The Remember Set and Unprotected Values30
Garbage Collection Bitmaps31
Major and Minor GCs34
Experiment 13-2: Major and Minor GCs35
Summary40

Write Barriers

Write barriers warn Ruby’s garbage collection algorithm whenever your program creates a new object that might have to be marked. You can think of the barriers as small boxes that surround arrays, hashes and other data structures you might have in your program. For example, in Listing 13-1 Ruby places a write barrier around arr, the array that contains all of the new objects:


Figure 13-9: A Write Barrier

Figure 13-9 repeats Figure 13-6, which showed the array arr, just after Ruby removed it from the mark stack, along with all of the elements of arr, still on the stack. However, on the left the dotted rectangle represents a writer barrier around this array. (In reality, there’s nothing special about this array; Ruby uses write barriers for all arrays, hashes and other data structures.)

The write barrier, as the name implies, jumps into action whenever your program writes to the array inside. In this example, the barrier will be triggered when the program runs the line of code at (3) in Listing 13-1 inside the loop:

(3) arr.push(Object.new)

If your program was running between incremental GC steps when Ruby added a new element to the array, Ruby would move the array back on to the mark stack:


Figure 13-10: Triggering a Write Barrier Pushes the Array Back on the Mark Stack

Figure 13-10 shows the array arr on the left, inside of a write barrier. The line of code just above, arr.push, triggers the write barrier because it writes to the array’s contents. This triggers the write barrier, which moves the array arr back on to the mark stack, shown on the right. Now during the next GC step, Ruby will process the array’s children again, even if it had processed them earlier. This allows Ruby to find and mark the new object just added to the array.

The mark stack is how Ruby remembers its place inside of the GC algorithm, between one incremental GC and the next. Whenever an incremental GC step starts, Ruby continues to mark the objects present on the mark stack that were pending from last time or that were modified in the meantime.

A Write Barrier in Action

Whenever your program modifies an array, hash or other value protected by write barriers, Ruby calls this C code internally. rb_gc_impl_writebarrier_remember at (1) in Listing 13-2 pushes a modified object back on to the mark stack, as shown above in Figure 13-10:

(1) void rb_gc_impl_writebarrier_remember(void *objspace_ptr, VALUE obj)
    {
        rb_objspace_t *objspace = objspace_ptr;
    
        gc_report(1, objspace, "rb_gc_writebarrier_remember: %s\n", rb_obj_info(obj));
    
(2)     if (is_incremental_marking(objspace)) {
(3)         if (RVALUE_BLACK_P(objspace, obj)) {
(4)             gc_grey(objspace, obj);
            }
        }
        else {
(5)         if (RVALUE_OLD_P(objspace, obj)) {
                rgengc_remember(objspace, obj);
            }
        }
    }
Listing 13-2: The rb_gc_writebarrier_remember function

Ruby first checks at (2) whether it is currently in the midst of an incremental garbage collection. If it is, Ruby continues to the line at (3), and checks whether the given object was already completely processed: whether Ruby marked it and all of its children. (Recall the non-inclusive term “black” from the tricolor GC algorithm.) For example, in Figure 13-9 the arr value was completed processed, since Ruby marked it and also moved all of its children on to the mark stack, removing arr from the mark stack.

If Ruby did already mark the value, and if Ruby already pushed the value’s children on to the mark stack, then Ruby knows it needs to process the value again - since your program modified it. In this case, Ruby continues on to the line at (4) and calls gc_grey, which pushes the value back on to the mark stack. Later Ruby will iterate over its children again, pushing each of them back on to the mark stack.

Looking ahead to the next section, Generational GC on page 22, write barriers use the code in the else clause at (5) to handle “old” values. We’ll cover Generational GC next. But first, Experiment 13-1 will take a look at incremental GC in action.