[Zlib-devel] speed longest_match() using perfect hashing of tri-grams

John Reiser jreiser at bitwagon.com
Tue Jul 24 11:43:48 EDT 2012


Using code for hashing which is perfect for tri-grams (no false matches
for any substring of length 3), then it is possible to speed up deflate()
by around 20% when the length of the output is about 40% to 50% of the
length of the input.  This is the commonly-expected case where compression
is most useful in practice.

The cost is about 1/2 megabyte more space in deflate_state, plus a
20% *penalty* in time if the input compresses too little (such as if
already compressed) or compresses too much.  If already compressed, then
there is almost no searching in longest_match() anyway, so reducing the
search buys almost nothing.  If the input compresses by a factor much larger
than 2.5 then the existing search usually was cut off by s->max_chain_length,
and the remaining true matches still approach or exceed that limit.
Both of the bad cases see little or no reduction in search, but have
paid for the perfect hashing.  In the good case, the perfect hashing
significantly shortens the search.

Any interest?

-- 




More information about the Zlib-devel mailing list