[Zlib-devel] patch-in-progress: vectorized adler32 calculation

John Bowler jbowler at frontiernet.net
Mon Apr 12 11:26:12 EDT 2010


From: Stefan Fuhrmann
>Throughput:
>SSSE3: ~3.5 bytes / clock tick
>SSE2:  ~3 bytes / clock tick
>MMX:  ~1.5 bytes / clock tick
>C-Code: ~1 byte / clock tick
>
>IOW, for x86 processors released since 2003 (SSE2), the checksum portion
>is reduced from ~15% to ~5% or better of the inflate runtime.

Did you measure that last figure?  CPU clock tick calculations rarely translate into time figures in practical applications when the data is in large buffers, because sooner or later the CPU stalls waiting for data.  

John Bowler <jbowler at acm.org>






More information about the Zlib-devel mailing list