Lossless compression based on a variation of tries

tarfeef101 94b85fe490 migrating remotes, changed some comments %!s(int64=6) %!d(string=hai) anos
.nfs0000000003fe627900002241 d217a4d0cb Removed debug lines from print method %!s(int64=7) %!d(string=hai) anos
README.md 1bf738f6ef Created a README %!s(int64=7) %!d(string=hai) anos
a.out 50f8288068 Updated test file %!s(int64=7) %!d(string=hai) anos
source1.h 94b85fe490 migrating remotes, changed some comments %!s(int64=6) %!d(string=hai) anos
source2.h 8ef1079219 removing unnecessary continues... so tedious %!s(int64=7) %!d(string=hai) anos
source3.h 8fa43f72d6 changed compression calculations to subtract constant size of root nodes %!s(int64=6) %!d(string=hai) anos
tasks.txt 94b85fe490 migrating remotes, changed some comments %!s(int64=6) %!d(string=hai) anos
test.cc b511b76b44 bugfixes for features in last commit %!s(int64=7) %!d(string=hai) anos
test.exe 3e05f3520a i fixed all the things :) %!s(int64=7) %!d(string=hai) anos
test.obj 3e05f3520a i fixed all the things :) %!s(int64=7) %!d(string=hai) anos
test2.cc 12a6ed7b8c added tasks.txt to keep track of changes to be made.not sure what changes are actually present in the tests... but that's what diff is for %!s(int64=7) %!d(string=hai) anos
test3.cc 7d91fae20c fixed compilation errors for test3, started testing/debugging. currently, delete is bugged. whether or not that is because of delete or improper insertion remains to be seen. right now, we have the 4th digit of the string pointing to null where it should point to the 5th element... %!s(int64=6) %!d(string=hai) anos
testfile 8fa43f72d6 changed compression calculations to subtract constant size of root nodes %!s(int64=6) %!d(string=hai) anos
testfile2 12a6ed7b8c added tasks.txt to keep track of changes to be made.not sure what changes are actually present in the tests... but that's what diff is for %!s(int64=7) %!d(string=hai) anos
testfile3 8fa43f72d6 changed compression calculations to subtract constant size of root nodes %!s(int64=6) %!d(string=hai) anos

README.md

Triehard-Library

A simple library for Triehards, a variation on a compressed binary trie.

Compared to Patricia Tries, these tries do not store "compressed" substrings as the full length of the non-shared part. These take an entirely different approach. Since patricia tries take non-shared substrings and just store them in one node, they reduce the number of nodes, but not storage space. This also means searching them takes, one average, log (m) (where m is height of the tree) plus k, where k is the length of the string since a full comparison still occurs.

On the other hand, triehards compress by reducing the actual amount of data being stored. Each node has a pointer to its children (as usual), a flag to indicate if a stored value ends there (as usual), and a magnitude. This magnitude represents the amount of repeat digits represented by this node. So, if we have a string 11011110, there are 4 nodes, of magnitudes 2, 1, 4, and 1, respectively. This allows for compression based on repeat digits, without increasing the size (each node is of constant size, and there are less nodes than the total length of all strings when compression is possible). This makes the storage overhead lower than O(n) where n is the length of all strings.

As for searching, these retain the same efficiency as a standard trie O(k), compared to the longer search times of patricia tries. Since the compression is based on repeat digits instead of unshared substrings, this allows for standard searching to still occur.