[Zlib-devel] patch-in-progress: vectorized adler32 calculation

Sun Apr 11 17:35:56 EDT 2010

On Apr 11, 2010, at 12:56 PM, Stefan Fuhrmann wrote:
> So, I looked into it. ~15% of the zlib runtime is spent in adler32
> and the C implementation is as fast as it gets (close to 1 byte
> per cycle). The attached masm32 code provides a vectorized
> version of the hotspot of that function.
...
> ; *    adler32_fast_ssse3 ... fastest code, requires SSSE3 CPU feature
> ; *    adler32_fast_sse2  ... almost as fast, requires SSE2 CPU feature
> ; *    adler32_fast_mmx   ... slowest code, for old CPUs

Stefan,

So how much faster are those than the C code for adler32?

Mark