Skip navigation

Recently, I’ve been rather distracted by reddit and haven’t put too much work into writing the compiler (well, translator).  BUT That’s different now… I got bored of it.  Now this post is actually about compression, so I’ll just skip over to that.

I was thinking about reverse code engineering and compression formats.  Now I’ve dabbled with the idea of writing a proprietary format before, but I’ve never gotten motivated enough to do so.  I actually found a use for a proprietary compression format though.  The use being my fileserver.  While I could use a known format, I’ve decided a custom compression format combined with a custom encryption should work for privacy reasons.  The encryption is already done.  That is, however, another story and not the topic.

 

So, this compression is defined as follows:

struct Header{
char magic_1 : ‘Q’; //Quaid’s
char magic_2 : ‘C’; //Compression
char magic_3 : ‘G’; //Garbage :D

unsigned char entry_size : 2; //length of each dictionary entry; default to 2. if 0 the dictionary  entries are variable length.
unsigned char dictionary_size; //length of the dictionary… I have it programatically assigned.

Dictionary_Entry* Dictionary; //array of dictionary_entries (defined bellow)
unsigned char* data; //the file data

};

 

struct Dictionary_Entry{
unsigned char index; //well, the byte representation used to denote the entry
unsigned char* entry; //data for the entry
};

The algorithm should first parse the file for any unused byte numbers (1-255). 0 is not used and cannot be part of a compression entry unless it is variable length entries, in which case, the entries are 0 terminated.
The unused numbers are placed into an array as possible indexes. Next the file data is parsed for byte pairs (or whatever the designated length is) as well as the occurrences of each byte pair. Now, there is some math involved to figure out what number of dictionary entries (under the maximum allowed) will yield the smallest compressed size. After that is done, the indexes used are assigned a byte pair. Then all occurrences of those byte pairs are replaced with the corresponding index number. And… That’s all!

It’s fairly simple. I’m sure it’s been done before, but I don’t really care. I have most of the code already written by this point. And I assume it will take a bit of tweaking. I actually am using an Array class that’s part of my personal standard library. Even with that I ended up adding four new functions to it. I expect to be done tomorrow, but I know that my predictions don’t always tend to be correct. If anything a very rudimentary implementation will be made.

-cQuaid

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.