[Zlib-devel] crc32 big/little endian
Joakim Tjernlund
joakim.tjernlund at transmode.se
Thu Apr 22 06:44:54 EDT 2010
>
> On 04/22/2010 05:43 AM, Mark Adler wrote:
> > John and Edwin,
> >
> > Thank you for all of the testing. My conclusion for now is just to
> leave crc32 as is.
> >
>
> Yeah, it is very compiler/CPU dependent, see below.
>
> On 04/22/2010 02:57 AM, John Bowler wrote:
> > From: Joakim Tjernlund
> >> gcc has always had a hard time optimizing crc32. I recently discovered that
> >> -O1 was noticeable faster than -O2 with gcc 4.3.4 in some crc32 tests I was
> >> doing a while back.
> >
> > Wow, you are correct. Silly me - I just blindly assumed that -O1 would be
> slightly worse than -O2 (and this is *true* on ARM gcc 3.4.4 where -O1
> performs worse of all but still much better than -O0). Here's an updated BYFOUR table:
> >
> > buffer -O3 -Os -O2 -O1 -O0
> > 64 18644 19035 18650 17279 40816
> > 128 17060 17250 17080 17755 36057
> > 256 16280 16366 16276 15802 34619
> > 512 15874 15926 15890 14901 33596
> > 1024 15902 15928 15903 14650 33742
> > 2048 15722 15710 15699 14311 32548
> > 4096 15586 15602 15586 14129 33543
> > 8192 15624 15590 15587 14080 34835
> > 16384 18162 18146 18149 17126 37775
>
> Here are some comparisons on my x86-64 (Intel Core 2 Quad Q9550),
> comparing only compilers, and optimization levels.
> Buffer size is 16384 (8192 is slower for me (17120 vs 16700).
> I used -march=native for all compilers.
>
> The numbers are in percentages, 100% is slowest (18000), and 82.2% is
> fastest (14800).
>
> (uint64) -O1 -O2 -O3 -Os (uint32) -O1 -O2 -O3 -Os
> gcc-4.4 86.7 92.8 92.8 98.9 90.0 92.8 92.8 100.0
> gcc-4.5 93.9 92.8 92.8 98.9 92.2 92.8 92.8 99.4
> clang 2.6 82.2 88.9 89.4 88.9 85.6 86.7 86.7 86.7
> clang 2.7pre2 92.2 90.6 92.2 90.6 86.1 85.6 87.2 84.4
>
> The absolute best is clang 2.6 -O1, 64-bit crc32.
> The second best is clang 2.7pre2 -Os, 32-bit crc32.
>
> I can post the generated assembly, if anybody is interested in it.
>
> So it looks like gcc generates better code with 64-bit crc32, and clang
> with 32-bit crc32 (except clang 2.6 -O1, which is interesting).
-Os on clang seem unimpl. and mapped to -02(or similar) and
u32 has the best absolute results
To me this test only shows that gcc suck att optimizing crc32
and could need some help. If you rearrange the code, similar to my
test patch but keeping the 32 byte unrolling you might get better
results. Introducing a help ptr for the tables used to help in the past.
More information about the Zlib-devel
mailing list