[Zlib-devel] infnew-5 available for testing

Chris Anderson christop at fellspt.charm.net
Thu Jan 2 16:24:01 EST 2003


Here are the runs that were done without writing output to /dev/null.
Best as I can tell, this saved about .5 secs at the smaller buffer sizes
and less than that at the larger sizes.

I also found crc32 code I was using to be slower than the BYFOUR
crc_table[4][256] posted in the zlib-dev-list archives.  When I coded my
gz stream (the original http gzip content-encoding problem I was trying to
solve), I didn't realize the crc32 took so much of the deflate time
(Doh!), and used the one I already had instead of the one in zlib.  The
downside of using the BYFOUR tables seems to be some cpu cache thrashing
when the buffer gets big (zbuflen = 64K).  There are some gprofs of my
application below.

Another observation is that the infnew code is faster when the profiler
feedback optimization (-prof_use) is thrown at it.  The original
zlib-1.1.4 doesn't get as much speedup with this optimization, but is
pretty fast without it anyway.


zlib-1.1.4 + infnew-5
---------------------

zlib-1.1.4 + infnew-5, icc -O3 -tpp6
zbuflen   1024, clock 13.550, time 13.720
zbuflen   1024, clock 13.540, time 13.703
zbuflen   2048, clock 12.990, time 13.148
zbuflen   2048, clock 13.000, time 13.156
zbuflen   4096, clock 12.670, time 12.834
zbuflen   4096, clock 12.680, time 12.829
zbuflen   8192, clock 12.470, time 12.626
zbuflen   8192, clock 12.470, time 12.625
zbuflen  16384, clock 12.180, time 12.375
zbuflen  16384, clock 12.270, time 12.460
zbuflen  32768, clock 12.590, time 12.787
zbuflen  32768, clock 12.540, time 12.751
zbuflen  65536, clock 13.630, time 13.842
zbuflen  65536, clock 13.680, time 13.901

zlib-1.1.4 + infnew-5, icc -O3 -tpp6 -prof_use
zbuflen   1024, clock 12.060, time 12.219
zbuflen   1024, clock 12.090, time 12.242
zbuflen   2048, clock 11.640, time 11.783
zbuflen   2048, clock 11.650, time 11.796
zbuflen   4096, clock 11.430, time 11.586
zbuflen   4096, clock 11.440, time 11.587
zbuflen   8192, clock 11.320, time 11.485
zbuflen   8192, clock 11.320, time 11.465
zbuflen  16384, clock 11.320, time 11.515
zbuflen  16384, clock 11.230, time 11.419
zbuflen  32768, clock 11.260, time 11.459
zbuflen  32768, clock 11.180, time 11.365
zbuflen  65536, clock 12.160, time 12.359
zbuflen  65536, clock 12.150, time 12.362


zlib-1.1.4 + infnew-3
---------------------

zlib-1.1.4 + infnew-3, icc -O3 -tpp6
zbuflen   1024, clock 13.090, time 13.249
zbuflen   1024, clock 13.090, time 13.243
zbuflen   2048, clock 12.600, time 12.767
zbuflen   2048, clock 12.590, time 12.761
zbuflen   4096, clock 12.350, time 12.502
zbuflen   4096, clock 12.340, time 12.502
zbuflen   8192, clock 12.160, time 12.318
zbuflen   8192, clock 12.160, time 12.329
zbuflen  16384, clock 11.990, time 12.179
zbuflen  16384, clock 11.950, time 12.140
zbuflen  32768, clock 12.290, time 12.490
zbuflen  32768, clock 12.180, time 12.380
zbuflen  65536, clock 13.310, time 13.534
zbuflen  65536, clock 13.330, time 13.553

zlib-1.1.4 + infnew-3, icc -O3 -tpp6 -prof_use
zbuflen   1024, clock 12.260, time 12.434
zbuflen   1024, clock 12.280, time 12.429
zbuflen   2048, clock 11.790, time 11.950
zbuflen   2048, clock 11.810, time 11.966
zbuflen   4096, clock 11.530, time 11.693
zbuflen   4096, clock 11.550, time 11.686
zbuflen   8192, clock 11.360, time 11.527
zbuflen   8192, clock 11.380, time 11.522
zbuflen  16384, clock 11.170, time 11.366
zbuflen  16384, clock 11.240, time 11.404
zbuflen  32768, clock 11.310, time 11.498
zbuflen  32768, clock 11.410, time 11.588
zbuflen  65536, clock 12.270, time 12.484
zbuflen  65536, clock 12.230, time 12.423


zlib-1.1.4
----------

zlib-1.1.4, icc -O3 -tpp6
zbuflen   1024, clock 11.880, time 12.035
zbuflen   1024, clock 11.880, time 12.028
zbuflen   2048, clock 11.660, time 11.817
zbuflen   2048, clock 11.660, time 11.803
zbuflen   4096, clock 11.620, time 11.773
zbuflen   4096, clock 11.590, time 11.741
zbuflen   8192, clock 11.630, time 11.786
zbuflen   8192, clock 11.680, time 11.818
zbuflen  16384, clock 11.930, time 12.146
zbuflen  16384, clock 12.160, time 12.343
zbuflen  32768, clock 12.630, time 12.842
zbuflen  32768, clock 12.390, time 12.580
zbuflen  65536, clock 14.080, time 14.329
zbuflen  65536, clock 14.030, time 14.256

zlib-1.1.4, icc -O3 -tpp6 -prof_use
zbuflen   1024, clock 11.680, time 11.831
zbuflen   1024, clock 11.690, time 11.830
zbuflen   2048, clock 11.470, time 11.633
zbuflen   2048, clock 11.470, time 11.624
zbuflen   4096, clock 11.390, time 11.546
zbuflen   4096, clock 11.390, time 11.547
zbuflen   8192, clock 11.410, time 11.560
zbuflen   8192, clock 11.420, time 11.575
zbuflen  16384, clock 11.550, time 11.738
zbuflen  16384, clock 11.550, time 11.728
zbuflen  32768, clock 12.400, time 12.621
zbuflen  32768, clock 12.540, time 12.745
zbuflen  65536, clock 13.870, time 14.101
zbuflen  65536, clock 13.960, time 14.186


In the profiles below, I believe the updatewindow is icc's statistics
gathering function.

zlib-1.1.4 + infnew-5 profiles
------------------------------

zlib-1.1.4 + infnew-5, icc -qp -O3 -tpp6, gprof zbuflen 1024
time   seconds   seconds   calls  us/call  us/call  name
66.29     8.37     8.37   344785    24.27    24.27  inflate_fast
14.54    10.20     1.84   246099     7.46     7.46  CRC32::calc
 3.61    12.14     0.46    51711     8.80     8.80  inflate_table
 3.08    12.53     0.39   246099     1.58     1.58  updatewindow
 0.25    12.56     0.03   246231     0.13     0.21  InputStream::readBytes
 0.22    12.58     0.03   246100     0.11    51.01  GZInputStream::uncompress
 0.11    12.61     0.01   140294     0.10     0.15  StdioInputStream::fillBuf

zlib-1.1.4 + infnew-5, icc -qp -O3 -tpp6, gprof zbuflen 4096
time   seconds   seconds   calls  us/call  us/call  name
73.16     8.76     8.76   102086    85.85    85.85  inflate_fast
15.39    10.61     1.84    61549    29.96    29.96  CRC32::calc
 3.46    11.54     0.41    61549     6.73     6.73  updatewindow
 3.31    11.94     0.40    51711     7.67     7.67  inflate_table
 0.11    11.95     0.01    61560     0.22     0.35  InputStream::readBytes
 0.08    11.96     0.01    61550     0.16   194.11  GZInputStream::uncompress
 0.07    11.98     0.01    35075     0.22     0.22  StdioInputStream::fillBuf

zlib-1.1.4 + infnew-5, icc -qp -O3 -tpp6, gprof zbuflen 16384
time   seconds   seconds   calls  ms/call  ms/call  name
74.03     8.94     8.94    38633     0.23     0.23  inflate_fast
15.07    10.76     1.82    15389     0.12     0.12  CRC32::calc
 3.62    11.72     0.44    51711     0.01     0.01  inflate_table
 2.90    12.07     0.35    15389     0.02     0.67  inflate
 0.05    12.07     0.01    15392     0.00     0.00  InputStream::readBytes
 0.02    12.08     0.00    15390     0.00     0.78  GZInputStream::uncompress
 0.00    12.08     0.00     6622     0.00     1.82  ZInputStream::fillBuf

zlib-1.1.4 + infnew-5, icc -qp -O3 -tpp6, gprof zbuflen 65536
time   seconds   seconds   calls  ms/call  ms/call  name
73.16     9.51     9.51    22635     0.42     0.42  inflate_fast
18.87    11.96     2.45     3848     0.64     0.64  CRC32::calc
 2.63    12.74     0.34     3848     0.09     2.74  inflate
 1.82    12.98     0.24     3848     0.06     0.06  updatewindow
 0.08    12.99     0.01     3849     0.00     3.37  GZInputStream::uncompress
 0.03    13.00     0.00     1657     0.00     7.84  ZInputStream::fillBuf
 0.00    13.00     0.00     3851     0.00     0.00  InputStream::readBytes
 0.00    13.00     0.00     2194     0.00     0.00  StdioInputStream::fillBuf


zlib-1.1.4 profiles
-------------------

zlib-1.1.4, icc -qp -O3 -tpp6, gprof zbuflen 1024
time   seconds   seconds   calls  us/call  us/call  name
72.84     8.59     8.59   182758    46.98    46.98  inflate_fast
15.05    10.36     1.77   141587    12.53    12.53  CRC32::calc
 3.86    10.81     0.46    51711     8.80     8.80  huft_build
 3.46    11.22     0.41   156853     2.60    58.25  inflate_codes
 2.19    11.48     0.26   318856     0.81     0.81  inflate_flush
 1.54    11.66     0.18   141587     1.28    69.92  inflate_blocks
 0.28    11.70     0.03   246231     0.13     0.22  InputStream::readBytes
 0.27    11.73     0.03   105937     0.29   110.73  ZInputStream::fillBuf
 0.20    11.75     0.02   141588     0.17    82.63  GZInputStream::uncompress
 0.10    11.76     0.01   140294     0.08     0.15  StdioInputStream::fillBuf
 0.08    11.77     0.01   140294     0.07     0.07  SysFile::read

zlib-1.1.4, icc -qp -O3 -tpp6, gprof zbuflen 4096
time   seconds   seconds   calls  us/call  us/call  name
74.92     8.68     8.68    68849   126.13   126.13  inflate_fast
14.14    10.32     1.64    35515    46.14    46.14  CRC32::calc
 4.25    10.81     0.49    51711     9.52     9.52  huft_build
 2.60    11.12     0.30   106968     2.81     2.81  inflate_flush
 2.28    11.38     0.26    52276     5.04   175.00  inflate_codes
 1.52    11.55     0.18    35515     4.95   279.59  inflate_blocks
 0.10    11.57     0.01    26485     0.44   437.23  ZInputStream::fillBuf
 0.07    11.57     0.01    68952     0.11     0.14  zFreeFunc
 0.07    11.58     0.01    61560     0.13     0.16  InputStream::readBytes

zlib-1.1.4, icc -qp -O3 -tpp6, gprof zbuflen 16384
time   seconds   seconds   calls  ms/call  ms/call  name
72.70     8.60     8.60    41198     0.21     0.21  inflate_fast
15.60    10.45     1.85    11026     0.17     0.17  CRC32::calc
 4.29    10.95     0.51    51711     0.01     0.01  huft_build
 3.63    11.38     0.43    64661     0.01     0.01  inflate_flush
 1.83    11.60     0.22    28162     0.01     0.33  inflate_codes
 1.63    11.79     0.19    11026     0.02     0.90  inflate_blocks
 0.07    11.80     0.01    15392     0.00     0.00  InputStream::readBytes
 0.05    11.81     0.01    68952     0.00     0.00  zAllocFunc
 0.05    11.81     0.01    11027     0.00     1.07  GZInputStream::uncompress

zlib-1.1.4, icc -qp -O3 -tpp6, gprof zbuflen 65536
time   seconds   seconds   calls  ms/call  ms/call  name
64.38     9.10     9.10    35534     0.26     0.26  inflate_fast
20.39    11.98     2.88     3597     0.80     0.80  CRC32::calc
 8.01    13.12     1.13    47732     0.02     0.02  inflate_flush
 3.77    13.65     0.53    51711     0.01     0.01  huft_build
 1.60    13.88     0.23    20812     0.01     0.50  inflate_codes
 1.42    14.08     0.20     3597     0.06     3.12  inflate_blocks
 0.17    14.10     0.02     3598     0.01     3.92  GZInputStream::uncompress
 0.07    14.11     0.01     3851     0.00     0.00  InputStream::readBytes
 0.03    14.12     0.00    68952     0.00     0.00  zFreeFunc





More information about the Zlib-devel mailing list