The problem
Whilest tracing down garbage during the Garbage Collection, as described in earlier posts, it is required to find mark an object pointed to from another object.
This could lead to problems when it is allowed to point anywhere inside an object, or when the object information bit can’t be found by a pointer to the start of the object.
In both cases the Gc should somehow translate a pointer (eg. 0x1234
) to the head of the object it points in (eg. 0x1200
). The problem here is how to efficiently get 0x1200
from 0x1234
.
Avoiding the problem
This problem can be circumvented by disallowing pointers pointing to inside the object and instead require all pointers to point to the start of the object. This will only work when the header of the object has a fixed size, otherwise it will still be impossible to work back to the start of the header, which contains the flags you want to alter when tracing garbage.
Pointers inside a managed object aren’t required and could easily be removed. They can be very usefull though in some situations. One of these situations is being able to inline a private referenced instance into the contained class, of which is known that it won’t be modified, which would result in pointers inside the container class. It will only increase performance in some situations, but it will be usefull to have, and is optional.
A bigger issue are pointers that point to the start of an object with a header in front of it which has got a dynamic size. Either the header shouldn’t be of dynamic size, and a fixed (big) size, or instead of to the start of the object, pointers should point to the header of an object. In the latter case each pointer derefence should require an ‘add’ call to increase the address to match the start of the object, with the additional logic required to attain the size of the header which can be significant depending on the implementation. These extra opcodes for each pointer derefence could not be easily be worked away even when using a JIT because of multi-threading, and certainly could not be avoided when using C. The fixed header size isn’t nice either. For it probably would end up being at least 7 bytes on a 64bit platform (flags + pointer).
Oh, and there is also the java way which uses a handle for each object. Which basicly means that an integer pointer isn’t int*, but int** in java. Where the pointer in between contains the meta-data about the object. This is even more less ideal for it requires an extra derefence for each derefence and it results in twice the amount of pointers to track.
As far as I concern this problem can’t be avoided, when dealing with a Gc that is required to interop with non-managed code. With a JIT it could be done though, but this requires a runtime performance/garbage collection performance trade-off. And I personally like a bit slower garbage collect instead of slower runtime.
Look-up structure
Before tracing the garbage collection usually walks through the heap to mark all objects white and put those in the root-set in the gray-list. During this first walk through the heap a temporary look-up structure could be build to allow efficient look-up during trace.
When only allowing pointers to the start of the object it would be rather easy to do. A modulus-based bucket like implementation, a hash table without the hash-part, would do. This would allow a nice even redistribution of entries throughout the table, for it uses a modulus.
When allowing pointers in the middle of an object it becomes more tricky, because you can’t precompute the pointers to put in the table. You need to have a structure which allows you to look to the left and find the object the pointer points into. Although most pointers still point to the start of the object, the case that they could point inside the object shouldn’t be concidered.
Instead of a modulus-based bucket, division-based buckets should be used. This creates a problem for the likely distribution of blocks between the buckets will vary heavily. One whole bucket could be empty when there is a big allocation in the memory it represents, where almost all small objects which represent most of the pointers could be concentrated in just two buckets.
Bucket-tree
To compensate for this a bucket-tree could be used. Basicly when a bucket contains a lot of elements it would split itself in a new set of buckets. This way a minimal amount of buckets could be wasted on really big allocations, and a lot of buckets could be allocated for small allocations. The look-up time would be a bit worse, but it would be neglectable to the huge lookup time a normal search through the heap would require.
Special treatment for pointers pointing inside objects
When there are only a one or two pointers in the whole heap that point inside another object they could be treated seperately instead, and the rest could keep using the modulus-based buckets.
One way to give those pointers a special treatment is marking them, and resolving to which object they point during the initial walk through the heap where finding objects pointed to from pointers is a lot easier. (pointers needed to be looked up could be sorted by address, and be checked during the walk, which is very efficient).
Another way would be to simulate pointers pointing inside another pointer by making the pointer consist out of an object pointer and a offset from there, which could be cached by the JIT, or expanded before the garbage collection.
Yet another way would be to walk the modulus-based buckets by hand when the pointer isn’t found.
The best method would depend on the implementation of the rest of the Gc.
About modulus-based buckets
I`m still uncertain about whether the distribution in a modulus-based bucket can be trusted at all times. Putting a bucket-tree in default, for worst case would be nice, just to be sure.
Update: some more possibilities:
Binary search
When we put pointers to all allocated blocks sorted somewhere in the memory, one could use binary search, or other search algorithms to search in sorted lists, to find the pointer. When adding shortcuts each ~64K this could be quite efficient. I’ve implemented this method for my Garbage Collector temporarily for it is amongst the easier to implement and I suspect the fastest. Actually, I hope it is the fastest.
I’ll do some tests on different algorithms on a dump of a ‘modal’ application’s managed heap, when I’ve finished the Gc sufficiently to make an application work with it.