[Zlib-devel] patch-in-progress: vectorized adler32 calculation
Mark Adler
madler at alumni.caltech.edu
Sun Apr 11 17:35:56 EDT 2010
On Apr 11, 2010, at 12:56 PM, Stefan Fuhrmann wrote:
> So, I looked into it. ~15% of the zlib runtime is spent in adler32
> and the C implementation is as fast as it gets (close to 1 byte
> per cycle). The attached masm32 code provides a vectorized
> version of the hotspot of that function.
...
> ; * adler32_fast_ssse3 ... fastest code, requires SSSE3 CPU feature
> ; * adler32_fast_sse2 ... almost as fast, requires SSE2 CPU feature
> ; * adler32_fast_mmx ... slowest code, for old CPUs
Stefan,
So how much faster are those than the C code for adler32?
Mark
More information about the Zlib-devel
mailing list