[Zlib-devel] crc32 big/little endian

Wed Apr 21 14:21:09 EDT 2010

On 4/21/2010 1:05 PM, Török Edwin wrote:
> On 04/21/2010 07:49 PM, Török Edwin wrote:
>> On 04/21/2010 06:42 PM, Mark Adler wrote:
>>>
>>> (It's good we're having this discussion, because I just noticed that really the tables really only need to be four bytes per entry, not eight, so those numbers could all be cut in half.  On the other hand, it may be faster for the machine to access the native word size in the array when 64 bits.  On the other hand, the table would take less cache space if it were half the size.  Any opinions on that?  Should we force the tables to use a four-byte integer type?)
>>
>> My opinion on these matters is that you should always benchmark it.
>> Preferably on more than one CPU.
>>
>> Unless someone beats me to it I'll write a short benchmark code and
>> report results.
> 
> Attached is testprogram.
> 
> Usage:
> $ gcc crc32test.c crc32.c -O3 -lm
> $ taskset -c 1 ./a.out
> 
> Results:
> 
> 1. with uint32_t and #include stdint.h in crc32.h:
> Timing crc32 (usec):     17019 average, 111 stddev,95% confidence
> interval: 16797 - 17241;
> Timing crc32 (usec):     16998 average, 23 stddev,95% confidence
> interval: 16952 - 17044;
> Timing crc32 (usec):     16991 average, 12 stddev,95% confidence
> interval: 16967 - 17015;
> 
> 2. with unsigned long in crc32.h:
> Timing crc32 (usec):     16676 average, 73 stddev,95% confidence
> interval: 16530 - 16822;
> Timing crc32 (usec):     16677 average, 27 stddev,95% confidence
> interval: 16623 - 16731;
> Timing crc32 (usec):     16665 average, 24 stddev,95% confidence
> interval: 16617 - 16713;
> 
> The difference is not much...

In as much as 4kb is not much, this does measurably favor native word
alignment, which is as we would have expected.  In terms of optimizations,
this would be trading a negligible speed optimization for a more negligible
and far less portable size optimization.

You won't get to the cache miss scenario until you actually interrupt
zlib processing with other cache intensive operations, and that won't
happen on a single chain of compress/uncompress tests.