[Zlib-devel] crc32 big/little endian

Joakim Tjernlund joakim.tjernlund at transmode.se
Thu Apr 22 06:44:54 EDT 2010


>
> On 04/22/2010 05:43 AM, Mark Adler wrote:
> > John and Edwin,
> >
> > Thank you for all of the testing.  My conclusion for now is just to
> leave crc32 as is.
> >
>
> Yeah, it is very compiler/CPU dependent, see below.
>
> On 04/22/2010 02:57 AM, John Bowler wrote:
> > From: Joakim Tjernlund
> >> gcc has always had a hard time optimizing crc32. I recently discovered that
> >> -O1 was noticeable faster than -O2 with gcc 4.3.4 in some crc32 tests I was
> >> doing a while back.
> >
> > Wow, you are correct.  Silly me - I just blindly assumed that -O1 would be
> slightly worse than -O2 (and this is *true* on ARM gcc 3.4.4 where -O1
> performs worse of all but still much better than -O0).  Here's an updated BYFOUR table:
> >
> > buffer   -O3   -Os   -O2   -O1   -O0
> > 64   18644   19035   18650   17279   40816
> > 128   17060   17250   17080   17755   36057
> > 256   16280   16366   16276   15802   34619
> > 512   15874   15926   15890   14901   33596
> > 1024   15902   15928   15903   14650   33742
> > 2048   15722   15710   15699   14311   32548
> > 4096   15586   15602   15586   14129   33543
> > 8192   15624   15590   15587   14080   34835
> > 16384   18162   18146   18149   17126   37775
>
> Here are some comparisons on my x86-64 (Intel Core 2 Quad Q9550),
> comparing only compilers, and optimization levels.
> Buffer size is 16384 (8192 is slower for me (17120 vs 16700).
> I used -march=native for all compilers.
>
> The numbers are in percentages, 100% is slowest (18000), and 82.2% is
> fastest (14800).
>
>         (uint64) -O1 -O2 -O3 -Os (uint32) -O1 -O2 -O3 -Os
> gcc-4.4          86.7  92.8  92.8  98.9  90.0  92.8  92.8 100.0
> gcc-4.5          93.9  92.8  92.8  98.9  92.2  92.8  92.8  99.4
> clang 2.6        82.2  88.9  89.4  88.9  85.6  86.7  86.7  86.7
> clang 2.7pre2    92.2  90.6  92.2  90.6  86.1  85.6  87.2  84.4
>
> The absolute best is clang 2.6 -O1, 64-bit crc32.
> The second best is clang 2.7pre2 -Os, 32-bit crc32.
>
> I can post the generated assembly, if anybody is interested in it.
>
> So it looks like gcc generates better code with 64-bit crc32, and clang
> with 32-bit crc32 (except clang 2.6 -O1, which is interesting).

-Os on clang seem unimpl. and mapped to -02(or similar) and
u32 has the best absolute results

To me this test only shows that gcc suck att optimizing crc32
and could need some help. If you rearrange the code, similar to my
test patch but keeping the 32 byte unrolling you might get better
results. Introducing a help ptr for the tables used to help in the past.





More information about the Zlib-devel mailing list