Text+Compression+Challenge

Text Compression Challenge. Please post your code and add link here. On this page please state what compression you achieved in bytes and provide a brief outline of how you tackled it.

David's entry at the JB Forum This code compressed 25382 bytes down to 14892. It finds the most common characters and puts them at the beginning of the compressed string. It then assigns sequences of bits to the characters (the more common the character, the shorter the bit sequence) and puts them all together and converts them back into ASCII characters. The decompression routine just does this in reverse, as it knows that all of the characters it needs are at the beginning.

AltBas entry at the JB Forum It compresses the 23582 byte file to 13840 bytes by replacing trigraphs and digraphs with one byte codes. Chars 9, 32-126 are shifted up to start at 159, chars outside this range are prefixed with chr$(0), chr$(1) is the Cr/Lf pair replacement. The static tri/digraph dictionary was derived from the Jargon File Version 3, a 1.1 MB text file, so I assume the tri/digraphs are close to normal English character distribution. The tri/di dictionary is compressed as 5 bit data. I also tried adding in bit-coding, but that didn't compress as small as tri/di replacements.