[Zlib-devel] patch-in-progress: vectorized adler32 calculation

Mon Apr 12 04:31:04 EDT 2010

don't assume all processor have MMX

-----Message d'origine-----
De : zlib-devel-bounces at madler.net [mailto:zlib-devel-bounces at madler.net] De
la part de Stefan Fuhrmann
Envoyé : dimanche 11 avril 2010 21:57
À : zlib-devel at madler.net
Objet : [Zlib-devel] patch-in-progress: vectorized adler32 calculation

Hi there,

I'm currently in the process of tuning the SVN backend code.
After trimming the fat all over the place, I was finally reached a point
where zlib accounts for almost 40% of the runtime.

So, I looked into it. ~15% of the zlib runtime is spent in adler32 and the C
implementation is as fast as it gets (close to 1 byte per cycle). The
attached masm32 code provides a vectorized version of the hotspot of that
function. Details can be found at the top of the ASM files. Most of the code
deals with reading parameters and aligning the source buffer.

I would like to see that code in one of the next zlib releases.
So, please let me know, whether the quality is acceptable and what else has
to be done. Currently, I'm working on a
masm64 and gcc variants. Makefile changes will certainly also be on the
list.

Furthermore, the assembly implementations of fast_inflate have a number of
performance issues (string ops and maybe register stalls) some of which I
already fixed. But that part is not ready for review, yet.

-- Stefan^2.