Write Barriers
I've started working on a new edition of Ruby Under a Microscope that covers Ruby 3.x. I'm working on this in my spare time, so it will take a while. Leave a comment or drop me a line and I'll email you when it's finished.
Ruby’s garbage collection implementation is complex and confusing - yet powerful and beautiful at the same time. Chapter 13 summarizes and explains advanced aspects of Ruby’s garbage collector. I love learning about the difficult bits at a deep enough level to be able to explain them to someone else. In RUM I often show snippets from Ruby's C source code in gray callouts. These sections are for readers who already understand C syntax, or who would like to learn it. These can give you sign posts in case you ever decide to read Ruby’s source code for yourself.
Chapter 13: Incremental and Generational Garbage Collection
Incremental GC | 3 |
Marking in More Detail | 4 |
Stop The World Garbage Collection | 8 |
Incremental Garbage Collection | 10 |
How Ruby Implements Incremental GC | 11 |
Write Barriers | 12 |
A Write Barrier in Action | 14 |
Experiment 13-1: What happens if GC can’t free enough slots? | 15 |
Generational GC | 22 |
Keeping Track of an Object’s Age | 23 |
How Marking and Sweeping Work Together | 24 |
Seeing Generational GC In Action | 26 |
Write Barriers Again | 29 |
The Remember Set and Unprotected Values | 30 |
Garbage Collection Bitmaps | 31 |
Major and Minor GCs | 34 |
Experiment 13-2: Major and Minor GCs | 35 |
Summary | 40 |
Write Barriers
Write barriers warn Ruby’s garbage collection algorithm whenever your program creates a new object that might have to be marked. You can think of the barriers as small boxes that surround arrays, hashes and other data structures you might have in your program. For example, in Listing 13-1 Ruby places a write barrier around arr, the array that contains all of the new objects:
Figure 13-9: A Write Barrier
Figure 13-9 repeats Figure 13-6, which showed the array arr, just after Ruby removed it from the mark stack, along with all of the elements of arr, still on the stack. However, on the left the dotted rectangle represents a writer barrier around this array. (In reality, there’s nothing special about this array; Ruby uses write barriers for all arrays, hashes and other data structures.)
The write barrier, as the name implies, jumps into action whenever your program writes to the array inside. In this example, the barrier will be triggered when the program runs the line of code at (3) in Listing 13-1 inside the loop:
(3) arr.push(Object.new)
If your program was running between incremental GC steps when Ruby added a new element to the array, Ruby would move the array back on to the mark stack:
Figure 13-10: Triggering a Write Barrier Pushes the Array Back on the Mark Stack
Figure 13-10 shows the array arr on the left, inside of a write barrier. The line of code just above, arr.push, triggers the write barrier because it writes to the array’s contents. This triggers the write barrier, which moves the array arr back on to the mark stack, shown on the right. Now during the next GC step, Ruby will process the array’s children again, even if it had processed them earlier. This allows Ruby to find and mark the new object just added to the array.
The mark stack is how Ruby remembers its place inside of the GC algorithm, between one incremental GC and the next. Whenever an incremental GC step starts, Ruby continues to mark the objects present on the mark stack that were pending from last time or that were modified in the meantime.
A Write Barrier in Action
Whenever your program modifies an array, hash or other value protected by write
barriers, Ruby calls this C code internally. rb_gc_impl_writebarrier_remember
at
(1) in Listing 13-2 pushes a modified object back on to the mark stack, as shown
above in Figure 13-10:
(1) void rb_gc_impl_writebarrier_remember(void *objspace_ptr, VALUE obj) { rb_objspace_t *objspace = objspace_ptr; gc_report(1, objspace, "rb_gc_writebarrier_remember: %s\n", rb_obj_info(obj)); (2) if (is_incremental_marking(objspace)) { (3) if (RVALUE_BLACK_P(objspace, obj)) { (4) gc_grey(objspace, obj); } } else { (5) if (RVALUE_OLD_P(objspace, obj)) { rgengc_remember(objspace, obj); } } }
Ruby first checks at (2) whether it is currently in the midst of an incremental
garbage collection. If it is, Ruby continues to the line at (3), and checks
whether the given object was already completely processed: whether Ruby marked
it and all of its children. (Recall the non-inclusive term “black” from the
tricolor GC algorithm.) For example, in Figure 13-9 the arr
value was
completed processed, since Ruby marked it and also moved all of its children on
to the mark stack, removing arr
from the mark stack.
If Ruby did already mark the value, and if Ruby already pushed the value’s
children on to the mark stack, then Ruby knows it needs to process the value
again - since your program modified it. In this case, Ruby continues on to the
line at (4) and calls gc_grey
, which pushes the value back on to the mark
stack. Later Ruby will iterate over its children again, pushing each of them
back on to the mark stack.
Looking ahead to the next section, Generational GC on page 22, write barriers use the code in the else clause at (5) to handle “old” values. We’ll cover Generational GC next. But first, Experiment 13-1 will take a look at incremental GC in action.