[Zlib-devel] patch-in-progress: vectorized adler32 calculation

Tue Apr 13 19:39:14 EDT 2010

Hi all,

thanks for all the feedback. I helped greatly to make up my mind
on how to proceed.Not to spam the list, I would like to answer
in a single post.

I will port the code to C with intrinsics - probably next weekend.
That solves a number of issues:

* single source for ICC, GCC and MSVC
* widely uniform x86 / x64 code
* no hassle with different ABIs (x64)
* included in the standard build procedure

Detection the CPU features at runtime is necessary to allow for
generic binaries (e.g. certified svn server installers) using the
respective optimal code.

The MSVC deficiencies shown in the link posted by Török seem
to be limited to initialization code (setting data). As part of the
detection phase, all necessary data structures can be prepared
once, so MSVC's issues won't hurt the following runs.

I'm currently not decided whether to support MMX at all. It would
only be used on processors older than 5 years, would get little test
coverage and yields only a modest performance improvement.

All performance figures were gained from actual measurements
and indicate even slightly better numbers than the 15%->5%
improvement. Due to the dependency on buffer sizes, compression
rates etc., I posted somewhat more conservative figures.

The overall performance of the deflate() function is already quite
impressive: For real-world repository data, I measured ~200MB
inflated data per sec. Buffer sizes etc.seem to be o.k. (a few kB
on average) and changing the granularity would be very hard to
do anyways.

-- Stefan^2.