[Zlib-devel] patch-in-progress: vectorized adler32 calculation
Török Edwin
edwintorok at gmail.com
Mon Apr 12 04:34:52 EDT 2010
On 04/12/2010 11:17 AM, Stefan Fuhrmann wrote:
> Mark Adler wrote:
>> On Apr 11, 2010, at 12:56 PM, Stefan Fuhrmann wrote:
>>> So, I looked into it. ~15% of the zlib runtime is spent in adler32
>>> and the C implementation is as fast as it gets (close to 1 byte
>>> per cycle). The attached masm32 code provides a vectorized
>>> version of the hotspot of that function.
>> ...
>>> ; * adler32_fast_ssse3 ... fastest code, requires SSSE3 CPU feature
>>> ; * adler32_fast_sse2 ... almost as fast, requires SSE2 CPU feature
>>> ; * adler32_fast_mmx ... slowest code, for old CPUs
>>
>> Stefan,
>>
>> So how much faster are those than the C code for adler32?
> Throughput:
> SSSE3: ~3.5 bytes / clock tick
> SSE2: ~3 bytes / clock tick
> MMX: ~1.5 bytes / clock tick
> C-Code: ~1 byte / clock tick
>
> IOW, for x86 processors released since 2003 (SSE2), the checksum portion
> is reduced from ~15% to ~5% or better of the inflate runtime.
Have you considered writing the SSE code using compiler intrinsics in C?
They are supported on all the major compilers: GCC, ICC, and MSVC, and
they appear to work quite well on GCC:
http://www.liranuna.com/sse-intrinsics-optimizations-in-popular-compilers/
The advantage would be that:
- the compiler could inline the SSE-optimized adler32 (assuming you put
it into a .h, or .inc file so the compiler sees the implementation), and
maybe even do some constant propagation
- you would get SSE optimized x86-64 code too (your code is 32-bit only
from what I can tell). ALL x86-64 CPUs have at least SSE2.
- you could choose which variant to use based on preprocessor defines,
so that if zlib is compiled with -march=foo, it'll only build the SSE
variant that works on foo
The disadvantage is that MSVC generates horrible code with SSE
intrinsics, at least according to that blogpost, so that'd still need a
.asm file, but the .asm could be generated from C by using mingw for
example.
Best regards,
--Edwin
More information about the Zlib-devel
mailing list