[Zlib-devel] crc32 big/little endian
John Bowler
jbowler at frontiernet.net
Wed Apr 21 17:08:50 EDT 2010
From: Török Edwin
> Unless someone beats me to it I'll write a short benchmark code and
> report results.
Thanks for the program... I was surprised by the choice of 16384 byte buffer as input to crc32, so I modified the program (attached) to test buffer size, NOBYFOUR and performance on ARM.
The buffer size choice has a major impact on speed on x86 Prescott but optimization levels (so long as optimization is done) only have a small effect:
Buffer -O3 -Os -O2 -O0
64 18644 19035 18650 40816
128 17060 17250 17080 36057
256 16280 16366 16276 34619
512 15874 15926 15890 33596
1024 15902 15928 15903 33742
2048 15722 15710 15699 32548
4096 15586 15602 15586 33543
8192 15624 15590 15587 34835
16384 18162 18146 18149 37775
text 13473 12481 12293 13746
data 296 296 296 296
bss 16420 16396 16420 16420
total 30189 29173 29709 30462
error <1% <1% <1% 5-10%
That's compiled gcc 4.3.4. The 'optimal' size of buffer is 4096 or 8192 bytes. 16384 bytes has an (approximately) 16% speed cost. Decreasing to 512 bytes has little effect on speed.
On x86, repeating these experiments with -DNOBYFOUR the times go up by around a factor of 2.5 throughout.
On ARM, however, using gcc 3.4.4 (gcc 4 probably has substantially better ARM support) a different picture emerges. The buffer size behavior no longer occurs (this is an XScale ARM system running on a LinkSys NSLU - SlugOS), but -Os how consistently gives best performance, and the penalty of using NOBYFOUR is almost gone - indeed there is a speed improvement over the BYFOUR code for the 64 byte buffer (50528 us vs 51879 us). Here are the 'BYFOUR' figures (this is for 10 times less data than on x86):
buffer -O3 -Os -O2 -O0
64 53476 53147 51879 142842
128 47443 46498 45408 123773
256 44212 43190 41955 114270
512 42613 41508 40210 109485
1024 41816 40701 39346 107129
2048 41406 40261 38919 105972
4096 41214 40080 38719 105328
8192 41132 39955 38604 105046
16384 41079 39906 38567 104928
text 17651 17427 17471 21083
data 308 308 308 308
bss 16392 16392 16392 16392
total 34351 34127 34171 37783
error 0.2-0.5% 0.2-0.5% 0.2-0.6% 0.1-0.3%
And with -DNOBYFOUR, as a percentage of the above:
buffer -O3 -Os -O2 -O0
64 105% 95% 97% 111%
128 114% 105% 107% 125%
256 120% 112% 114% 134%
512 123% 115% 117% 139%
1024 125% 117% 119% 142%
2048 126% 118% 120% 143%
4096 127% 118% 121% 144%
8192 127% 119% 121% 144%
16384 127% 119% 121% 144%
text 8787 8603 8643 10403
data 308 308 308 308
bss 16392 16392 16392 16392
total 25487 25303 25343 27103
error 0.2-0.5 0.2-0.5% 0.2-0.3% 0.1-0.2%
The best speed on ARM is 38567us obtained with -O2 and BYFOUR and a 16Kbyte buffer size, the best NOBYFOUR speed is 21% slower with the same buffer and optimization settings.
Conclusion? Well, there is no conclusion - the best approach depends on compiler, architecture and, perhaps most telling, the size of the buffers coming in to the crc32 function - something that may be difficult to control.
John Bowler <jbowler at acm.org>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: crc32test.c
URL: <http://madler.net/pipermail/zlib-devel_madler.net/attachments/20100421/57a224cd/attachment.c>
More information about the Zlib-devel
mailing list