[Zlib-devel] CRC speed
Greg Roelofs
newt at pobox.com
Sun Feb 8 19:58:15 EST 2004
After burning about 30 DVDs' worth of backups, one begins to desire--
nay, lust for!--performance in ways never truly appreciated before.
As a result, I've been testing and updating "check," a CRC-32/Adler-32
checksum utility I maintain but that is based largely on Mark's code.
(It's excruciatingly handy for verifying backups without doing a much
longer, more heavily I/O-bound--and sometimes impossible--byte-for-byte
comparison of files. It also helped me find a pair of flaky bits in my
RAM, but that's another story. See http://freshmeat.net/projects/checkcrc/
and http://www.memtest86.com/ for details, respectively.)
Anyway, after trying various levels of loop-unrolling (absolutely no
difference with gcc 2.95.3) and buffer sizes (about 1% improvement
from 4KB to 32KB), I remembered zlib 1.2.x and hacked in a trivial
USE_ZLIB section. Holy buckets, what a difference! Contrary to what
the ChangeLog says...
- New and improved crc32()
- About 50% faster, thanks to suggestions from Rodney Brown
...it's actually about 80% faster. (Well, 80% faster than check's old
code, but a fairly careful inspection didn't turn up any obvious dif-
ferences between that and the "normal" zlib CRC-32 code. Not tested,
though.)
So...well done, Rodney! And thanks to Mark for incorporating it so
nicely into zlib. And apologies for taking so long to get around to
testing it, but I was insufficiently motivated until now. ;-) I'll
get the new version of check uploaded later today, I hope.
Btw, a 16 KB buffer proved best for the new code. Full results appended.
Greg
* Table of timings for completely cached, 696402058-byte test file on 1 GB,
* 1.4 GHz Athlon XP ("1600+") running Linux 2.4.24 with one memory address
* excluded due to flaky behavior (found during 6-hour memtest86 run). All
* versions compiled with gcc 2.95.3 (Slackware 8.0).
*
* old, inline version:
*
* 4 KB: 3.620u 1.240s 0:04.89 99.3% 0+0k 0+0io 102pf+0w
* (check 3.560u 1.320s 0:04.87 100.2% 0+0k 0+0io 102pf+0w
* 4.2) 3.570u 1.290s 0:04.86 100.0% 0+0k 0+0io 102pf+0w
* 3.520u 1.350s 0:04.86 100.2% 0+0k 0+0io 102pf+0w
* 3.710u 1.170s 0:04.87 100.2% 0+0k 0+0io 102pf+0w
*
* 4 KB: 3.570u 1.310s 0:04.91 99.3% 0+0k 0+0io 102pf+0w
* 3.410u 1.450s 0:04.86 100.0% 0+0k 0+0io 102pf+0w
* 3.590u 1.280s 0:04.86 100.2% 0+0k 0+0io 102pf+0w
* 3.530u 1.350s 0:04.88 100.0% 0+0k 0+0io 102pf+0w
* 3.700u 1.180s 0:04.87 100.2% 0+0k 0+0io 102pf+0w
* avg = 0:04.873 (for both 4 KB sets)
*
* 8 KB: 3.510u 1.300s 0:04.85 99.1% 0+0k 0+0io 102pf+0w
* 3.480u 1.350s 0:04.83 100.0% 0+0k 0+0io 102pf+0w
* 3.470u 1.340s 0:04.82 99.7% 0+0k 0+0io 102pf+0w
* 3.510u 1.310s 0:04.81 100.2% 0+0k 0+0io 102pf+0w
* 3.480u 1.340s 0:04.81 100.2% 0+0k 0+0io 102pf+0w
*
* 16 KB: 3.650u 1.170s 0:04.85 99.3% 0+0k 0+0io 102pf+0w
* 3.550u 1.260s 0:04.82 99.7% 0+0k 0+0io 102pf+0w
* 3.560u 1.260s 0:04.81 100.2% 0+0k 0+0io 102pf+0w
* 3.570u 1.240s 0:04.81 100.0% 0+0k 0+0io 102pf+0w
* 3.500u 1.320s 0:04.81 100.2% 0+0k 0+0io 102pf+0w
* avg = 0:04.820
*
* 32 KB: 3.520u 1.290s 0:04.85 99.1% 0+0k 0+0io 102pf+0w
* 3.540u 1.270s 0:04.81 100.0% 0+0k 0+0io 102pf+0w
* 3.480u 1.330s 0:04.80 100.2% 0+0k 0+0io 102pf+0w
* 3.420u 1.390s 0:04.81 100.0% 0+0k 0+0io 102pf+0w
* 3.530u 1.280s 0:04.81 100.0% 0+0k 0+0io 102pf+0w
* avg = 0:04.816
*
* 64 KB: 3.590u 1.310s 0:04.93 99.3% 0+0k 0+0io 102pf+0w
* 3.680u 1.220s 0:04.90 100.0% 0+0k 0+0io 102pf+0w
* 3.530u 1.380s 0:04.90 100.2% 0+0k 0+0io 102pf+0w
* 3.620u 1.280s 0:04.90 100.0% 0+0k 0+0io 102pf+0w
* 3.560u 1.340s 0:04.90 100.0% 0+0k 0+0io 102pf+0w
*
* USE_ZLIB version:
*
* 4 KB: 1.440u 1.310s 0:02.75 100.0% 0+0k 0+0io 103pf+0w
* 1.440u 1.320s 0:02.75 100.3% 0+0k 0+0io 103pf+0w
* 1.490u 1.260s 0:02.75 100.0% 0+0k 0+0io 103pf+0w
* 1.450u 1.300s 0:02.75 100.0% 0+0k 0+0io 103pf+0w
* 1.320u 1.450s 0:02.77 100.0% 0+0k 0+0io 103pf+0w
* 1.540u 1.230s 0:02.77 100.0% 0+0k 0+0io 103pf+0w
* 1.610u 1.150s 0:02.75 100.3% 0+0k 0+0io 103pf+0w
* 1.590u 1.160s 0:02.75 100.0% 0+0k 0+0io 103pf+0w
* 1.350u 1.410s 0:02.76 100.0% 0+0k 0+0io 103pf+0w
* 1.580u 1.190s 0:02.77 100.0% 0+0k 0+0io 103pf+0w
* 1.460u 1.300s 0:02.75 100.3% 0+0k 0+0io 103pf+0w
*
* 8 KB: 1.480u 1.260s 0:02.78 98.5% 0+0k 0+0io 103pf+0w
* 1.500u 1.240s 0:02.73 100.3% 0+0k 0+0io 103pf+0w
* 1.450u 1.280s 0:02.73 100.0% 0+0k 0+0io 103pf+0w
* 1.280u 1.470s 0:02.75 100.0% 0+0k 0+0io 103pf+0w
* 1.420u 1.320s 0:02.73 100.3% 0+0k 0+0io 103pf+0w
*
* 16 KB: 1.290u 1.410s 0:02.73 98.9% 0+0k 0+0io 103pf+0w
* 1.460u 1.240s 0:02.70 100.0% 0+0k 0+0io 103pf+0w
* 1.410u 1.290s 0:02.69 100.3% 0+0k 0+0io 103pf+0w
* 1.500u 1.190s 0:02.69 100.0% 0+0k 0+0io 103pf+0w
* 1.360u 1.340s 0:02.70 100.0% 0+0k 0+0io 103pf+0w
* 1.310u 1.380s 0:02.69 100.0% 0+0k 0+0io 103pf+0w
* 1.500u 1.200s 0:02.69 100.3% 0+0k 0+0io 103pf+0w
* 1.320u 1.380s 0:02.70 100.0% 0+0k 0+0io 103pf+0w
* 1.420u 1.280s 0:02.69 100.3% 0+0k 0+0io 103pf+0w
* 1.330u 1.360s 0:02.69 100.0% 0+0k 0+0io 103pf+0w
* 1.380u 1.320s 0:02.70 100.0% 0+0k 0+0io 103pf+0w
* avg = 0:02.697 (80.7% faster than old 4 KB code)
*
* 32 KB: 1.440u 1.280s 0:02.76 98.5% 0+0k 0+0io 103pf+0w
* 1.360u 1.350s 0:02.71 100.0% 0+0k 0+0io 103pf+0w
* 1.320u 1.390s 0:02.71 100.0% 0+0k 0+0io 103pf+0w
* 1.460u 1.260s 0:02.71 100.3% 0+0k 0+0io 103pf+0w
* 1.370u 1.340s 0:02.71 100.0% 0+0k 0+0io 103pf+0w
*
* 64 KB: 1.340u 1.480s 0:02.87 98.2% 0+0k 0+0io 103pf+0w
* 1.340u 1.480s 0:02.81 100.3% 0+0k 0+0io 103pf+0w
* 1.470u 1.350s 0:02.81 100.3% 0+0k 0+0io 103pf+0w
* 1.480u 1.340s 0:02.81 100.3% 0+0k 0+0io 103pf+0w
* 1.430u 1.380s 0:02.81 100.0% 0+0k 0+0io 103pf+0w
More information about the Zlib-devel
mailing list