[Zlib-devel] crc32 big/little endian
Joakim Tjernlund
joakim.tjernlund at transmode.se
Wed Apr 21 16:11:07 EDT 2010
>
> On 4/21/2010 1:05 PM, Török Edwin wrote:
> > On 04/21/2010 07:49 PM, Török Edwin wrote:
> >> On 04/21/2010 06:42 PM, Mark Adler wrote:
> >>>
> >>> (It's good we're having this discussion, because I just noticed that
> really the tables really only need to be four bytes per entry, not eight, so
> those numbers could all be cut in half. On the other hand, it may be faster
> for the machine to access the native word size in the array when 64 bits. On
> the other hand, the table would take less cache space if it were half the
> size. Any opinions on that? Should we force the tables to use a four-byte
> integer type?)
> >>
> >> My opinion on these matters is that you should always benchmark it.
> >> Preferably on more than one CPU.
> >>
> >> Unless someone beats me to it I'll write a short benchmark code and
> >> report results.
> >
> > Attached is testprogram.
> >
> > Usage:
> > $ gcc crc32test.c crc32.c -O3 -lm
> > $ taskset -c 1 ./a.out
> >
> > Results:
> >
> > 1. with uint32_t and #include stdint.h in crc32.h:
> > Timing crc32 (usec): 17019 average, 111 stddev,95% confidence
> > interval: 16797 - 17241;
> > Timing crc32 (usec): 16998 average, 23 stddev,95% confidence
> > interval: 16952 - 17044;
> > Timing crc32 (usec): 16991 average, 12 stddev,95% confidence
> > interval: 16967 - 17015;
> >
> > 2. with unsigned long in crc32.h:
> > Timing crc32 (usec): 16676 average, 73 stddev,95% confidence
> > interval: 16530 - 16822;
> > Timing crc32 (usec): 16677 average, 27 stddev,95% confidence
> > interval: 16623 - 16731;
> > Timing crc32 (usec): 16665 average, 24 stddev,95% confidence
> > interval: 16617 - 16713;
> >
> > The difference is not much...
>
> In as much as 4kb is not much, this does measurably favor native word
> alignment, which is as we would have expected. In terms of optimizations,
> this would be trading a negligible speed optimization for a more negligible
> and far less portable size optimization.
Not sure I follow you here, are you suggesting using u32 is less portable than
using long? I hope not as crc32 is 32 bits based so the tables should
follow that too. I would say that long is less portable.
>
> You won't get to the cache miss scenario until you actually interrupt
> zlib processing with other cache intensive operations, and that won't
> happen on a single chain of compress/uncompress tests.
Exactly, one cache miss costs many cycles to load form main memory and you
do see that here.
More information about the Zlib-devel
mailing list