[Zlib-devel] patch-in-progress: vectorized adler32 calculation

Török Edwin edwintorok at gmail.com
Mon Apr 12 04:34:52 EDT 2010


On 04/12/2010 11:17 AM, Stefan Fuhrmann wrote:
> Mark Adler wrote:
>> On Apr 11, 2010, at 12:56 PM, Stefan Fuhrmann wrote:
>>> So, I looked into it. ~15% of the zlib runtime is spent in adler32
>>> and the C implementation is as fast as it gets (close to 1 byte
>>> per cycle). The attached masm32 code provides a vectorized
>>> version of the hotspot of that function.
>> ...
>>> ; * adler32_fast_ssse3 ... fastest code, requires SSSE3 CPU feature
>>> ; * adler32_fast_sse2 ... almost as fast, requires SSE2 CPU feature
>>> ; * adler32_fast_mmx ... slowest code, for old CPUs
>>
>> Stefan,
>>
>> So how much faster are those than the C code for adler32?
> Throughput:
> SSSE3: ~3.5 bytes / clock tick
> SSE2: ~3 bytes / clock tick
> MMX: ~1.5 bytes / clock tick
> C-Code: ~1 byte / clock tick
>
> IOW, for x86 processors released since 2003 (SSE2), the checksum portion
> is reduced from ~15% to ~5% or better of the inflate runtime.

Have you considered writing the SSE code using compiler intrinsics in C?
They are supported on all the major compilers: GCC, ICC, and MSVC, and 
they appear to work quite well on GCC:
http://www.liranuna.com/sse-intrinsics-optimizations-in-popular-compilers/

The advantage would be that:
-  the compiler could inline the SSE-optimized adler32 (assuming you put 
it into a .h, or .inc file so the compiler sees the implementation), and 
maybe even do some constant propagation
- you would get SSE optimized x86-64 code too (your code is 32-bit only 
from what I can tell). ALL x86-64 CPUs have at least SSE2.
- you could choose which variant to use based on preprocessor defines, 
so that if zlib is compiled with -march=foo, it'll only build the SSE 
variant that works on foo

The disadvantage is that MSVC generates horrible code with SSE 
intrinsics, at least according to that blogpost, so that'd still need a 
.asm file, but the .asm could be generated from C by using mingw for 
example.

Best regards,
--Edwin




More information about the Zlib-devel mailing list