[Zlib-devel] patch-in-progress: vectorized adler32 calculation

Sun Apr 11 15:56:58 EDT 2010

Hi there,

I'm currently in the process of tuning the SVN backend code.
After trimming the fat all over the place, I was finally reached
a point where zlib accounts for almost 40% of the runtime.

So, I looked into it. ~15% of the zlib runtime is spent in adler32
and the C implementation is as fast as it gets (close to 1 byte
per cycle). The attached masm32 code provides a vectorized
version of the hotspot of that function. Details can be found
at the top of the ASM files. Most of the code deals with
reading parameters and aligning the source buffer.

I would like to see that code in one of the next zlib releases.
So, please let me know, whether the quality is acceptable
and what else has to be done. Currently, I'm working on a
masm64 and gcc variants. Makefile changes will certainly
also be on the list.

Furthermore, the assembly implementations of fast_inflate
have a number of performance issues (string ops and maybe
register stalls) some of which I already fixed. But that part is
not ready for review, yet.

-- Stefan^2.

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: adlerq32.asm
URL: <http://madler.net/pipermail/zlib-devel_madler.net/attachments/20100411/115ea3b7/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: adler32-patched.c
URL: <http://madler.net/pipermail/zlib-devel_madler.net/attachments/20100411/115ea3b7/attachment.c>