memory – Intrepid Blog

Ulrich Drepper on Memory

Kernel hacker Ulrich Drepper has written a lengthy paper on the design and especially performance characteristics of memory of modern consumer hardware and it’s written for programmers. A great read so far (it’s published in parts by LWN):

What every programmer should know about memory by Ulrich Drepper.

The GC: Tracing, reversing pointer to object

The problem

Whilest tracing down garbage during the Garbage Collection, as described in earlier posts, it is required to find mark an object pointed to from another object.

This could lead to problems when it is allowed to point anywhere inside an object, or when the object information bit can’t be found by a pointer to the start of the object.

In both cases the Gc should somehow translate a pointer (eg. 0x1234) to the head of the object it points in (eg. 0x1200). The problem here is how to efficiently get 0x1200 from 0x1234.

Avoiding the problem

This problem can be circumvented by disallowing pointers pointing to inside the object and instead require all pointers to point to the start of the object. This will only work when the header of the object has a fixed size, otherwise it will still be impossible to work back to the start of the header, which contains the flags you want to alter when tracing garbage.

Pointers inside a managed object aren’t required and could easily be removed. They can be very usefull though in some situations. One of these situations is being able to inline a private referenced instance into the contained class, of which is known that it won’t be modified, which would result in pointers inside the container class. It will only increase performance in some situations, but it will be usefull to have, and is optional.

A bigger issue are pointers that point to the start of an object with a header in front of it which has got a dynamic size. Either the header shouldn’t be of dynamic size, and a fixed (big) size, or instead of to the start of the object, pointers should point to the header of an object. In the latter case each pointer derefence should require an ‘add’ call to increase the address to match the start of the object, with the additional logic required to attain the size of the header which can be significant depending on the implementation. These extra opcodes for each pointer derefence could not be easily be worked away even when using a JIT because of multi-threading, and certainly could not be avoided when using C. The fixed header size isn’t nice either. For it probably would end up being at least 7 bytes on a 64bit platform (flags + pointer).

Oh, and there is also the java way which uses a handle for each object. Which basicly means that an integer pointer isn’t int*, but int** in java. Where the pointer in between contains the meta-data about the object. This is even more less ideal for it requires an extra derefence for each derefence and it results in twice the amount of pointers to track.

As far as I concern this problem can’t be avoided, when dealing with a Gc that is required to interop with non-managed code. With a JIT it could be done though, but this requires a runtime performance/garbage collection performance trade-off. And I personally like a bit slower garbage collect instead of slower runtime.

Look-up structure

Before tracing the garbage collection usually walks through the heap to mark all objects white and put those in the root-set in the gray-list. During this first walk through the heap a temporary look-up structure could be build to allow efficient look-up during trace.

When only allowing pointers to the start of the object it would be rather easy to do. A modulus-based bucket like implementation, a hash table without the hash-part, would do. This would allow a nice even redistribution of entries throughout the table, for it uses a modulus.

When allowing pointers in the middle of an object it becomes more tricky, because you can’t precompute the pointers to put in the table. You need to have a structure which allows you to look to the left and find the object the pointer points into. Although most pointers still point to the start of the object, the case that they could point inside the object shouldn’t be concidered.

Instead of a modulus-based bucket, division-based buckets should be used. This creates a problem for the likely distribution of blocks between the buckets will vary heavily. One whole bucket could be empty when there is a big allocation in the memory it represents, where almost all small objects which represent most of the pointers could be concentrated in just two buckets.

Bucket-tree

To compensate for this a bucket-tree could be used. Basicly when a bucket contains a lot of elements it would split itself in a new set of buckets. This way a minimal amount of buckets could be wasted on really big allocations, and a lot of buckets could be allocated for small allocations. The look-up time would be a bit worse, but it would be neglectable to the huge lookup time a normal search through the heap would require.

Special treatment for pointers pointing inside objects

When there are only a one or two pointers in the whole heap that point inside another object they could be treated seperately instead, and the rest could keep using the modulus-based buckets.

One way to give those pointers a special treatment is marking them, and resolving to which object they point during the initial walk through the heap where finding objects pointed to from pointers is a lot easier. (pointers needed to be looked up could be sorted by address, and be checked during the walk, which is very efficient).

Another way would be to simulate pointers pointing inside another pointer by making the pointer consist out of an object pointer and a offset from there, which could be cached by the JIT, or expanded before the garbage collection.

Yet another way would be to walk the modulus-based buckets by hand when the pointer isn’t found.

The best method would depend on the implementation of the rest of the Gc.

About modulus-based buckets

I`m still uncertain about whether the distribution in a modulus-based bucket can be trusted at all times. Putting a bucket-tree in default, for worst case would be nice, just to be sure.

Update: some more possibilities:

Binary search

When we put pointers to all allocated blocks sorted somewhere in the memory, one could use binary search, or other search algorithms to search in sorted lists, to find the pointer. When adding shortcuts each ~64K this could be quite efficient. I’ve implemented this method for my Garbage Collector temporarily for it is amongst the easier to implement and I suspect the fastest. Actually, I hope it is the fastest.

I’ll do some tests on different algorithms on a dump of a ‘modal’ application’s managed heap, when I’ve finished the Gc sufficiently to make an application work with it.

The GC: Tracing Garbage

Most Garbage Collectors are tracing. Instead of using Reference Counting, they trace down unused objects (garbage) by tracing from the root set of objects and mark all objects found. Those who weren’t marked have to garbage.

The Root-set

The root-set are all objects that are definitely live objects. The root set typically consists out of the objects linked from the stack and registers of each thread. Also object vital to the Gc and globals are in the root-set.

Tri-Color algorithm

An easy way to represent tracing through objects is the tri-color method. There are three colors an object can have: white, gray and black. All white objects haven’t seem to be referenced yet from black objects which are sure to be non-garbage. The gray objects are those that are referenced by black objects but didn’t have all of their references checked. Sounds confusing? A more practical explanation:

Colour all objects white.
Colour the root-set objects gray.
(Repeat this as long as there are gray coloured objects) Pick a gray object. Colour all objects referenced to from that gray object gray too. (Except those who are black). And colour itself black.
All black objects are now reachable, and all white objects are unreachable. Free those that are white!

Pop-mark-push referenced

An implementation of the Tri-Color algorithm can be accomplished by putting a mark on each object which either can be 1 or 0. 1 representing a black object, and 0 representing a white object. A stack list can be used to keep track of the gray objects:

NB do not confuse this stack, which is a normal Last In First Out list with the call stack.

Mark all objects 0 (white).
Push all objects in the root-set on the gray-stack.
(while there are objects in the gray-stack) Pop an object from the gray-stack. Mark it 1 (black) and push all referenced objects from that object on the gray-stack that aren’t marked 1 (thus white).
All objects still marked 0 (white) can be freed rid of.

Generations

Studies have shown that in almost all applications most allocations have a lifetime less than a few miliseconds, although the rest of the allocations would have a far greater lifespan. When collecting every few miliseconds would cause unnesessary tracing through the long living objects which would still take up most of the memory, and when collecting every few seconds the uncollected short-lifed objects would have spammed the heap and would make the garbage collector locality-of-reference unfriendly (everything worked on wouldn’t fit in one l2-cache load and therefore would be a lot slower).

The older objects which are in comparison with the newer objects highly unlikely to be garbage could be searched less frequently. This could be done by seperating the heap into generations. Each generation is basicly a piece of the heap. A typical garbage collection would only target the youngest few generation. Everytime an object survives a garbage collection it’s moved into an older generation, and finally becomes part of the oldest generation that is only garbage collected very infrequently.

Unpredictable garbage collection duration

The duration of the garbage collections would be hard to predict and could vary a lot. Although generation based gc’s could increase the amount of garbage collections and decrease the total time in comparison with one big, applications that use a garbage collection can not guarentee (practically) uninterupted execution.

Advantages

No extra fields required to keep track of the reference count.
There hasn’t got to be registration of references to objects, cause the gc keeps traces those down.
Hasn’t got any problem with circular references.
Outperforms reference Gc. The registration for each reference in the reference counter Gc takes more time than tracing down that reference. And it doesn’t do unnesessary checking of older objects when generations are kept in mind.

Disadvantages

Way harder to implement.
All pointers inside an object should be known.
Worse latency than reference counter. Although it outperforms refcounter, a garbage collects usually requires the whole program to be halted, which is a really bad thing for some applications.

Conclusion

Tracing gc‘s have shown to easily outperform reference counter gc‘s. The prerequeste to know all pointers in an object can be solved easily for interpreted and jit-ted languages using the gc and can also be managed even in C with effort. The real problem remains the arbitrary and unpredictable length of garbage collections. There are ways to make tracing gc’s have a better latency, which I`ll discuss in coming posts.

The GC: Reference Counter

The simplest form of Garbage Collector is the Reference Counter Garbage Collector.

Each time an object is referenced (a pointer to it is made) the reference count on the object is incremented. If the pointer isn’t used anymore the reference count is decreased. If the reference count hits 0 the object will be freed.

Python uses a Reference Counter.

Advantages

A reference counter does its tracking at runtime. There is no big garbage collection, and the execution of the program won’t be interupted for a while when the garbage is collected.
It’s very easy to implement.

Disadvantages

Instead of the required malloc free pair required for each allocation without Gc, it now requires add_ref free_ref for each pointer/reference to an allocation. When someone forgets to release its reference the object persists.
Circular references will never be collected. An example of a circular reference is a list that contains as an item a reference to itself. When the list is released its count will remain 1 because inside the list it refers to itself and will therefore never be collected.
A RefCounter GC doesn’t keep track of pointers into the managed heap, and therefore can’t move any object because it`ll break pointers. This makes the fragmentation of the RefCounter GC as bad as malloc‘s fragmentation.

Circular references

Circular references can be fixed by adding a trace algorithm. This algorithm will trace through all objects from the root objects (pointers on the stack, etc.) and mark them. The objects that aren’t marked aren’t accessible and therefore are subject to Garbage Collection.

The problem is that a RefCounter Gc usually doesn’t keep track of what pointers are in objects, and that there could be pointers outside the managed heap which only use the add_ref and free_ref. Python struggled with this problem and it implemented the a trace algorithm that depended on the programmer’s of not-python-modules (of which pointers aren’t known) to implement the list of references to objects that also contain objects. By traversing links that way circular dependencies could be found without hurting unmanaged pointers, but this has a high performance hit on Python, and it is still adviced to avoid circular references.

Conclusion

RefCounter Gc’s are easy to implement, and a good idea for scripting languages which know all references so that circular references can be dealt with accoringly. But they still have a lot of disadvantages which other kinds of Gc’s solve a lot better, and more elegantly.

The GC: Introduction

I’ve been working on Garbage Collectors recently, and I`ll share some of my experiences with them with you on my blog.

First off, what is a Garbage Collector?

A Garbage Collector, a.k.a GC is an algorithm that keeps track of your objects in the memory and gets rid of objects that aren’t used (referenced to) anymore, a.k.a garbage collecting. This can only be accomplished when the GC knows all pointers to the objects managed (objects on the managed part of the heap). And they are therefore almost always used with an interpreted/jit-ted language, like Java, Python or C#.

Since the first GC’s (reference counters) they have gained a lot more responsibility and perform a lot more tasks, where collecting the garbage is just one of the things they do.

I`ll discuss the various kinds and aspects of Garbage Collectors with their advantages and disadvantages in coming blog posts.

You can read all articles about the gc in its category.