How to GC in C(++)

GC`s, Garbage Collectors, are systems to manage memory. Even the good-old C-runtime library has its GC. free, malloc and the others. Now, these aren’t the best memory managers there are. They cause fragmented memory, and scatter the memory of your application over the whole memory rendering the great processor l1 and l2 cache almost useless.

I won’t talk about how to use the C ‘GC’, but how to implement new-generation alike Garbage Collectors in C(++).

The best GC`s can be found in Intermediate Language Virtual Machines which keep track of every object`s type in the memory and can therefore freely move objects in the memory. The great advantage of this is that object that have survived about as much Garbage Collects tend to link a lot to eachother and use eachother. That group survivors is called a generation. When you put a generation close to eachother in the memory, it will usualy only take 1 RAM memory load for the CPU, caching the whole generation in the CPU cache, for a while which is a lot quicker considering that l2-cache is about a thousand times faster than normal RAM.

The problem in C(++) is that you can’t just move objects. You don’t know what part of the memory of an object is a pointer, and what is just a mere integer. Therefore it is impossible to make a generation based Garbage Collector for you just can’t move stuff.

Allocating a big chunks and putting the objects of the C(++) application however using a custom allocation function will generate some additional performance above the traditional malloc although it still isn’t perfect.

One way to get it to work is to let the GC know what a pointer is and what now. This can be done by letting the first 4 (or 8 bytes in case of a 64bit CPU) be a pointer to a function which returns an array of offsets which resemble the pointers in the type.

Now the GC knows what the pointers are in a structure :-D.

The GC can now move the object by updating the pointers from the other objects to the new location of the object!

The only problem with this is that there can’t be pointers to the object, or inside the object from a not referenced pointer exposing object or a object that doesn’t exposes it pointers at all.

To overcome this I guess it could be feasable to add a flag byte in each object that exposes it pointers which allows it to specify a function that can be called when moving the object or when an object referenced is moved.

I’ve tried out some ways to get this all wrapped up in a nice framework (which is very feasable using my ‘objects in c-framework’ or something similar (easier) in C++ using inheritance).

I`m afraid however that such a GC comes too late for by the time it is reliable the Intermediate Languages would have gained way more ground for these can implement way more optimalizations, including in the GC, during runtime for they tend to use an Ahead of Time and Just in Time compiler.

Feedback on this would be welcome :-).

One thought on “How to GC in C(++)”

  1. Nice intro article to garbage collection.

    The garbage collector is what has continually forced me to rewrite PX again and again. On this current build, the GC was the first part of the VM that I built.

    Fragmentation only goes away on a managed “heap” in the technique you described (known as “mark and compact”). Most other GC uses “mark and sweep” which doesn’t allow moving of objects. This does still have some fragmentation however a good algorithm can reduce fragmentation greatly and it doesn’t require the amount of overhead needed for “mark and compact”.

    The trickiest part of “mark and compact” is updating all references to an object when an object moves. The common answer is have a list (linked usually allow in PX, they’re not “linked” together, will be more an array once i fix it) of “handles” that point to an object that be garbage collected. And then, rather than using pointers to the objects is a member variable, you point to the handle, and then when the object is moved you update the handle pointer to the new location. This way all references to an objects is updated with the changing of a single pointer. This requires additional memory that needs to be there at all times (the handle is freed once the object is freed) but is the fastest way I know to do this.

    The other way is to make a huge list of all objects that survive the garbage collection and their new locations in memory. Then traverse memory again, updating each individual member to point to the moved objects. This allows the overhead to be only needed during the actual collection, but you have to traverse memory twice, which can be quite slow in large heaps.

    Another tricky thing is keeping track of what objects are still considered “active”. You need to maintain some sort of list that can be added to or automated from any part of your program that can create objects.

    Just my 2 cents on GC and things that cause the PxVM’s GC to take so long to build. Hope this helps if you ever decide to build one.

    (Maybe I should include different garabge collection techniques in my “Algorithms” tutorials. Nothing too in depth just a general overview…)

Leave a Reply

Your email address will not be published. Required fields are marked *