[Zlib-devel] Inflate 1.2beta3 available for testing

Mark Adler madler at alumni.caltech.edu
Mon Dec 23 00:33:01 EST 2002


On Tuesday, December 10, 2002, at 03:17  PM, Paul Marquess wrote:
> 3. Interestingly, for small buffer sizes, between 10k and 100k, 1.1.4 
> seemed to be better than beta2.

I have made several changes to improve the speed of the new inflate, 
including making it faster than the old inflate for all buffer sizes.  
You can get it here:

     http://www.alumni.caltech.edu/~madler/infnew-3.tar.gz

Here's how to use it, from a previous email:

> As an example, to test on Unix, get zlib-1.1.4.tar.gz and 
> infnew-0.tar.gz into a directory, and then in that directory do:
>
>     rm -rf zlib-1.1.4
>     gnutar xfz zlib-1.1.4.tar.gz
>     cd zlib-1.1.4
>     gnutar xfz ../infnew-0.tar.gz
>     rm infblock.* infcodes.* infutil.*
>     ./configure
>     ... any personal modifications you'd like to make to the Makefile 
> ...
>     make test
>     ... test zlib with your applications ...

This version uses pre-increments (*++ptr) instead of post-increments 
(*ptr++) in inffast.c.  The pre-increments are faster on my development 
machine, a PowerPC.  However, you can compile inffast.c with POSTINC 
defined and it will use post-increments instead.  I would appreciate it 
if you can test it with both and let me know your processor and 
compiler, whether you see a speed difference, and if so, in what 
direction.  Thanks.

Below are some speed results on a PowerPC using gcc 3.1 with various 
buffer sizes, first with zlib 1.1.5b (same speed as 1.1.4), and second 
with zlib inflate 1.2beta3.  The test file is 1.7 MB compressed, 8.1 MB 
uncompressed.  The data shows that while the number of instructions 
continue to decrease with buffer size, the execution time reaches a 
minimum at around 16K buffer for 1.1.5 and 64K for 1.2.  This is 
probably due to my data caches, L1=32K and L2=256K, recycling for the 
larger buffer sizes.  Across the buffer sizes, the speed improvement of 
the new inflate on my machine is around 20%.

mark


zlib 1.1.5:
1 x 600 MHz 750CXe processor, 99.837592 MHz bus
     128: 0.3268 seconds, 216314449 instructions, 1.103 instructions per 
cycle
     256: 0.2995 seconds, 202935084 instructions, 1.129 instructions per 
cycle
     512: 0.2265 seconds, 159291467 instructions, 1.172 instructions per 
cycle
    1024: 0.1835 seconds, 133755272 instructions, 1.215 instructions per 
cycle
    2048: 0.1572 seconds, 118016598 instructions, 1.251 instructions per 
cycle
    4096: 0.1438 seconds, 109795163 instructions, 1.272 instructions per 
cycle
    8192: 0.1383 seconds, 105321670 instructions, 1.269 instructions per 
cycle
   16384: 0.1386 seconds, 102966284 instructions, 1.238 instructions per 
cycle
   32768: 0.1384 seconds, 101860486 instructions, 1.227 instructions per 
cycle
   65536: 0.1500 seconds, 101960908 instructions, 1.133 instructions per 
cycle
  131072: 0.1612 seconds, 102088824 instructions, 1.055 instructions per 
cycle
  262144: 0.1818 seconds, 102216276 instructions, 0.937 instructions per 
cycle
  524288: 0.1879 seconds, 102394686 instructions, 0.908 instructions per 
cycle

1.2beta3:
1 x 600 MHz 750CXe processor, 99.837592 MHz bus
     128: 0.2632 seconds, 164827918 instructions, 1.044 instructions per 
cycle
     256: 0.2511 seconds, 158781076 instructions, 1.054 instructions per 
cycle
     512: 0.1840 seconds, 116061843 instructions, 1.051 instructions per 
cycle
    1024: 0.1467 seconds, 92641948 instructions, 1.052 instructions per 
cycle
    2048: 0.1285 seconds, 80272626 instructions, 1.041 instructions per 
cycle
    4096: 0.1188 seconds, 73265033 instructions, 1.028 instructions per 
cycle
    8192: 0.1134 seconds, 68992061 instructions, 1.014 instructions per 
cycle
   16384: 0.1137 seconds, 66161731 instructions, 0.970 instructions per 
cycle
   32768: 0.1168 seconds, 63946216 instructions, 0.912 instructions per 
cycle
   65536: 0.1118 seconds, 60943639 instructions, 0.909 instructions per 
cycle
  131072: 0.1193 seconds, 59473639 instructions, 0.831 instructions per 
cycle
  262144: 0.1278 seconds, 58719887 instructions, 0.766 instructions per 
cycle
  524288: 0.1321 seconds, 58324107 instructions, 0.736 instructions per 
cycle





More information about the Zlib-devel mailing list