[Zlib-devel] CRC speed

Greg Roelofs newt at pobox.com
Sun Feb 8 19:58:15 EST 2004


After burning about 30 DVDs' worth of backups, one begins to desire--
nay, lust for!--performance in ways never truly appreciated before.
As a result, I've been testing and updating "check," a CRC-32/Adler-32
checksum utility I maintain but that is based largely on Mark's code.

(It's excruciatingly handy for verifying backups without doing a much
longer, more heavily I/O-bound--and sometimes impossible--byte-for-byte
comparison of files.  It also helped me find a pair of flaky bits in my
RAM, but that's another story.  See http://freshmeat.net/projects/checkcrc/
and http://www.memtest86.com/ for details, respectively.)

Anyway, after trying various levels of loop-unrolling (absolutely no
difference with gcc 2.95.3) and buffer sizes (about 1% improvement
from 4KB to 32KB), I remembered zlib 1.2.x and hacked in a trivial
USE_ZLIB section.  Holy buckets, what a difference!  Contrary to what
the ChangeLog says...

- New and improved crc32()
    - About 50% faster, thanks to suggestions from Rodney Brown

...it's actually about 80% faster.  (Well, 80% faster than check's old
code, but a fairly careful inspection didn't turn up any obvious dif-
ferences between that and the "normal" zlib CRC-32 code.  Not tested,
though.)

So...well done, Rodney!  And thanks to Mark for incorporating it so
nicely into zlib.  And apologies for taking so long to get around to
testing it, but I was insufficiently motivated until now. ;-)  I'll
get the new version of check uploaded later today, I hope.

Btw, a 16 KB buffer proved best for the new code.  Full results appended.

Greg

 * Table of timings for completely cached, 696402058-byte test file on 1 GB,
 * 1.4 GHz Athlon XP ("1600+") running Linux 2.4.24 with one memory address
 * excluded due to flaky behavior (found during 6-hour memtest86 run).  All
 * versions compiled with gcc 2.95.3 (Slackware 8.0).
 *
 * old, inline version:
 *
 *   4 KB:  3.620u 1.240s 0:04.89 99.3%     0+0k 0+0io 102pf+0w
 *  (check  3.560u 1.320s 0:04.87 100.2%    0+0k 0+0io 102pf+0w
 *    4.2)  3.570u 1.290s 0:04.86 100.0%    0+0k 0+0io 102pf+0w
 *          3.520u 1.350s 0:04.86 100.2%    0+0k 0+0io 102pf+0w
 *          3.710u 1.170s 0:04.87 100.2%    0+0k 0+0io 102pf+0w
 *
 *   4 KB:  3.570u 1.310s 0:04.91 99.3%     0+0k 0+0io 102pf+0w
 *          3.410u 1.450s 0:04.86 100.0%    0+0k 0+0io 102pf+0w
 *          3.590u 1.280s 0:04.86 100.2%    0+0k 0+0io 102pf+0w
 *          3.530u 1.350s 0:04.88 100.0%    0+0k 0+0io 102pf+0w
 *          3.700u 1.180s 0:04.87 100.2%    0+0k 0+0io 102pf+0w
 *                  avg = 0:04.873 (for both 4 KB sets)
 *
 *   8 KB:  3.510u 1.300s 0:04.85 99.1%     0+0k 0+0io 102pf+0w
 *          3.480u 1.350s 0:04.83 100.0%    0+0k 0+0io 102pf+0w
 *          3.470u 1.340s 0:04.82 99.7%     0+0k 0+0io 102pf+0w
 *          3.510u 1.310s 0:04.81 100.2%    0+0k 0+0io 102pf+0w
 *          3.480u 1.340s 0:04.81 100.2%    0+0k 0+0io 102pf+0w
 *
 *  16 KB:  3.650u 1.170s 0:04.85 99.3%     0+0k 0+0io 102pf+0w
 *          3.550u 1.260s 0:04.82 99.7%     0+0k 0+0io 102pf+0w
 *          3.560u 1.260s 0:04.81 100.2%    0+0k 0+0io 102pf+0w
 *          3.570u 1.240s 0:04.81 100.0%    0+0k 0+0io 102pf+0w
 *          3.500u 1.320s 0:04.81 100.2%    0+0k 0+0io 102pf+0w
 *                  avg = 0:04.820
 *
 *  32 KB:  3.520u 1.290s 0:04.85 99.1%     0+0k 0+0io 102pf+0w
 *          3.540u 1.270s 0:04.81 100.0%    0+0k 0+0io 102pf+0w
 *          3.480u 1.330s 0:04.80 100.2%    0+0k 0+0io 102pf+0w
 *          3.420u 1.390s 0:04.81 100.0%    0+0k 0+0io 102pf+0w
 *          3.530u 1.280s 0:04.81 100.0%    0+0k 0+0io 102pf+0w
 *                  avg = 0:04.816
 *
 *  64 KB:  3.590u 1.310s 0:04.93 99.3%     0+0k 0+0io 102pf+0w
 *          3.680u 1.220s 0:04.90 100.0%    0+0k 0+0io 102pf+0w
 *          3.530u 1.380s 0:04.90 100.2%    0+0k 0+0io 102pf+0w
 *          3.620u 1.280s 0:04.90 100.0%    0+0k 0+0io 102pf+0w
 *          3.560u 1.340s 0:04.90 100.0%    0+0k 0+0io 102pf+0w
 *
 * USE_ZLIB version:
 *
 *   4 KB:  1.440u 1.310s 0:02.75 100.0%    0+0k 0+0io 103pf+0w
 *          1.440u 1.320s 0:02.75 100.3%    0+0k 0+0io 103pf+0w
 *          1.490u 1.260s 0:02.75 100.0%    0+0k 0+0io 103pf+0w
 *          1.450u 1.300s 0:02.75 100.0%    0+0k 0+0io 103pf+0w
 *          1.320u 1.450s 0:02.77 100.0%    0+0k 0+0io 103pf+0w
 *          1.540u 1.230s 0:02.77 100.0%    0+0k 0+0io 103pf+0w
 *          1.610u 1.150s 0:02.75 100.3%    0+0k 0+0io 103pf+0w
 *          1.590u 1.160s 0:02.75 100.0%    0+0k 0+0io 103pf+0w
 *          1.350u 1.410s 0:02.76 100.0%    0+0k 0+0io 103pf+0w
 *          1.580u 1.190s 0:02.77 100.0%    0+0k 0+0io 103pf+0w
 *          1.460u 1.300s 0:02.75 100.3%    0+0k 0+0io 103pf+0w
 *
 *   8 KB:  1.480u 1.260s 0:02.78 98.5%     0+0k 0+0io 103pf+0w
 *          1.500u 1.240s 0:02.73 100.3%    0+0k 0+0io 103pf+0w
 *          1.450u 1.280s 0:02.73 100.0%    0+0k 0+0io 103pf+0w
 *          1.280u 1.470s 0:02.75 100.0%    0+0k 0+0io 103pf+0w
 *          1.420u 1.320s 0:02.73 100.3%    0+0k 0+0io 103pf+0w
 *
 *  16 KB:  1.290u 1.410s 0:02.73 98.9%     0+0k 0+0io 103pf+0w
 *          1.460u 1.240s 0:02.70 100.0%    0+0k 0+0io 103pf+0w
 *          1.410u 1.290s 0:02.69 100.3%    0+0k 0+0io 103pf+0w
 *          1.500u 1.190s 0:02.69 100.0%    0+0k 0+0io 103pf+0w
 *          1.360u 1.340s 0:02.70 100.0%    0+0k 0+0io 103pf+0w
 *          1.310u 1.380s 0:02.69 100.0%    0+0k 0+0io 103pf+0w
 *          1.500u 1.200s 0:02.69 100.3%    0+0k 0+0io 103pf+0w
 *          1.320u 1.380s 0:02.70 100.0%    0+0k 0+0io 103pf+0w
 *          1.420u 1.280s 0:02.69 100.3%    0+0k 0+0io 103pf+0w
 *          1.330u 1.360s 0:02.69 100.0%    0+0k 0+0io 103pf+0w
 *          1.380u 1.320s 0:02.70 100.0%    0+0k 0+0io 103pf+0w
 *                  avg = 0:02.697 (80.7% faster than old 4 KB code)
 *
 *  32 KB:  1.440u 1.280s 0:02.76 98.5%     0+0k 0+0io 103pf+0w
 *          1.360u 1.350s 0:02.71 100.0%    0+0k 0+0io 103pf+0w
 *          1.320u 1.390s 0:02.71 100.0%    0+0k 0+0io 103pf+0w
 *          1.460u 1.260s 0:02.71 100.3%    0+0k 0+0io 103pf+0w
 *          1.370u 1.340s 0:02.71 100.0%    0+0k 0+0io 103pf+0w
 *
 *  64 KB:  1.340u 1.480s 0:02.87 98.2%     0+0k 0+0io 103pf+0w
 *          1.340u 1.480s 0:02.81 100.3%    0+0k 0+0io 103pf+0w
 *          1.470u 1.350s 0:02.81 100.3%    0+0k 0+0io 103pf+0w
 *          1.480u 1.340s 0:02.81 100.3%    0+0k 0+0io 103pf+0w
 *          1.430u 1.380s 0:02.81 100.0%    0+0k 0+0io 103pf+0w




More information about the Zlib-devel mailing list