Faithful zlib beta testers,
Version 4 of the new inflate code is available for testing here:
http://www.alumni.caltech.edu/~madler/infnew-4.tar.gz
This new code can be applied to zlib 1.1.4 as follows (Unix example):
gnutar xvfz zlib-1.1.4.tar.gz
cd zlib-1.1.4
gnutar xvfz ../infnew-4.tar.gz
patch zlib.h zlib.h-1.2.diff
rm infblock.* infcodes.* infutil.*
./configure
[ ... make your own modifications to Makefile as needed ... ]
make test
[ ... test zlib with your applications ... ]
This version includes more speed improvements, compiler compatibility
improvements, and a new added inflate interface for use in file-based
decompressors such as unzip and gzip. This latter variant on
inflate(), called inflateBack() provides a call-back interface that
works similarly to the original inflate code used by unzip and gzip and
provides a further speed improvement over the new inflate(). This
interface is documented in the patched zlib.h.
Here is a speed comparison between zlib 1.1.4 and infnew-4 (the first
column is the output buffer size):
zlib 1.1.4:
1 x 600 MHz 750CXe processor, 99.837592 MHz bus
256: 0.2870 sec, 192013382 inst, 1.115 inst per cycle
512: 0.2192 sec, 151989337 inst, 1.156 inst per cycle
1024: 0.1788 sec, 128623948 inst, 1.199 inst per cycle
2048: 0.1544 sec, 114256952 inst, 1.233 inst per cycle
4096: 0.1420 sec, 106751746 inst, 1.253 inst per cycle
8192: 0.1377 sec, 102656765 inst, 1.242 inst per cycle
16384: 0.1395 sec, 100514787 inst, 1.201 inst per cycle
32768: 0.1399 sec, 99506693 inst, 1.185 inst per cycle
65536: 0.1442 sec, 99545023 inst, 1.151 inst per cycle
131072: 0.1606 sec, 99681804 inst, 1.035 inst per cycle
262144: 0.1768 sec, 99782505 inst, 0.940 inst per cycle
infnew-4:
1 x 600 MHz 750CXe processor, 99.837592 MHz bus
256: 0.2525 sec, 158781073 inst, 1.048 inst per cycle
512: 0.1849 sec, 113193590 inst, 1.020 inst per cycle
1024: 0.1472 sec, 88909876 inst, 1.007 inst per cycle
2048: 0.1286 sec, 76522082 inst, 0.992 inst per cycle
4096: 0.1184 sec, 69929635 inst, 0.984 inst per cycle
8192: 0.1131 sec, 66239457 inst, 0.976 inst per cycle
16384: 0.1138 sec, 64006565 inst, 0.937 inst per cycle
32768: 0.1159 sec, 62526964 inst, 0.899 inst per cycle
65536: 0.1126 sec, 60237558 inst, 0.892 inst per cycle
131072: 0.1203 sec, 59100449 inst, 0.819 inst per cycle
262144: 0.1287 sec, 58530035 inst, 0.758 inst per cycle
And here is a speed comparison using the same test file between the old
inflate code as used by gzip and inflateBack():
gzip 1.3.3's inflate:
1 x 600 MHz 750CXe processor, 99.837592 MHz bus
32768: 0.1347 sec, 111298775 inst, 1.377 inst per cycle
inflateBack():
1 x 600 MHz 750CXe processor, 99.837592 MHz bus
32768: 0.1008 sec, 59795039 inst, 0.989 inst per cyc
Inflate's speed tends to be bound more by memory reading and writing
than by instruction execution. So while the number of instructions
executed to do the same job have been reduced by 40% to almost 50%, the
speed on my processor has only increased by 20% to 30% or so. Your
mileage may vary.
mark