[Zlib-devel] deflateSetDictionary(): How to determine "most commonly used strings"?
Ralf Junker
ralfjunker at gmx.de
Fri Apr 16 16:12:51 EDT 2010
On 16.04.2010 21:29, Greg Roelofs wrote:
>> To achieve better compression of HTML text, I wonder about any
>>> recommendations on how an optimal set of dictionary strings is
>>> best generated from typical data? What kind of "strings" help
>>> zlib to compress best?
> Is there any reason you can't just run zlib on some typical HTML
> files, perhaps concatenated, and dump the strings corresponding to
> (distance, length) pairs?
That's exactly what I'd like to do, but I fail to find the zlib API
function call to dump the strings. Anything I am missing in zlib.h?
> I suspect that would give you a pretty good idea. You would want to
> do some statistical analysis on the results (frequencies of
> occurrence, turnover rate, etc.), but there's nothing like actually
> looking at real data to get you started...
I am running a my own statistical analysis right now, but so far without
support by zlib nor do I know much of the internals.
What flavor of data dictionary strings does deflateSetDictionary() work
best with?
What kind of compression improvements can I typically expect?
Ralf
More information about the Zlib-devel
mailing list