[Zlib-devel] deflateSetDictionary(): How to determine "most commonly used strings"?
Greg Roelofs
newt at pobox.com
Fri Apr 16 15:29:30 EDT 2010
> To achieve better compression of HTML text, I wonder about any
> recommendations on how an optimal set of dictionary strings is best
> generated from typical data? What kind of "strings" help zlib to
> compress best?
Is there any reason you can't just run zlib on some typical HTML files,
perhaps concatenated, and dump the strings corresponding to (distance,
length) pairs? I suspect that would give you a pretty good idea. You
would want to do some statistical analysis on the results (frequencies
of occurrence, turnover rate, etc.), but there's nothing like actually
looking at real data to get you started...
Greg
More information about the Zlib-devel
mailing list