While working on a design of a .pak files, which are nothing more then archives, I noticed most existing .pak files are just very simple. This for they put all files after eachother which makes it really hard to even just add a file, or delete one.

Therefor I started thinking about a way to make an efficient archive file for read and write access.

There are basicly 2 ways to accomplish this:

  • Clustering
  • Fragmenting

The archive consists out of several files, and an index which is counted as a file. The start of every file is pointed to from the index by its offset. When a file is deleted, its record in the index is null-ed, which is lateron filled with a new entry and in the meantime ignored (for the only file that can start at position 0 is the index itself).

A file itself is fragmented, every fragment starts with 4 bytes marking the length of the fragment and after that again 4 bytes marking the offset of the next fragment this time. In case there is no next fragment present the next-cluster-offset is 0. The first fragment of every file contains after the 8 bytes marking the length and offset of the next fragment some information about the file like the name.

For the index itself is a file, it also is fragmented and can grow, and shrink.

The 4 byte offset value could be default set to 8 bytes in case files are going to be bigger than 2 Gig, and the oposite in case you are dealing with small files. Although this can decrease the size of files which are frequently edited it isn’t possible to increase or decrease to 8 or 2 bytes in runtime.

The clustered way is practicly the same as the fragmented way except for that every fragment is padded to a multiple of the cluster size. Therefore the ‘next fragment’ value can now be a cluster number instead of an offset. This will both increase the speed of writing due to having always a buffer, also defragmenting will be faster. The files itself will also be able to retain a 2 byte next-cluster pointer longer. The only downside is that files will just be larger.

The fragmented stream wraps these methods pretty efficiently at the moment.
I am thinking to implement an inteligent algorithm which checks which streams are often used which are then placed at the end of the file with a buffer. This also could be applied for files which are marked to be expanding.