[Zlib-devel] infnew-5 available for testing
Chris Anderson
christop at fellspt.charm.net
Thu Jan 2 16:24:01 EST 2003
Here are the runs that were done without writing output to /dev/null.
Best as I can tell, this saved about .5 secs at the smaller buffer sizes
and less than that at the larger sizes.
I also found crc32 code I was using to be slower than the BYFOUR
crc_table[4][256] posted in the zlib-dev-list archives. When I coded my
gz stream (the original http gzip content-encoding problem I was trying to
solve), I didn't realize the crc32 took so much of the deflate time
(Doh!), and used the one I already had instead of the one in zlib. The
downside of using the BYFOUR tables seems to be some cpu cache thrashing
when the buffer gets big (zbuflen = 64K). There are some gprofs of my
application below.
Another observation is that the infnew code is faster when the profiler
feedback optimization (-prof_use) is thrown at it. The original
zlib-1.1.4 doesn't get as much speedup with this optimization, but is
pretty fast without it anyway.
zlib-1.1.4 + infnew-5
---------------------
zlib-1.1.4 + infnew-5, icc -O3 -tpp6
zbuflen 1024, clock 13.550, time 13.720
zbuflen 1024, clock 13.540, time 13.703
zbuflen 2048, clock 12.990, time 13.148
zbuflen 2048, clock 13.000, time 13.156
zbuflen 4096, clock 12.670, time 12.834
zbuflen 4096, clock 12.680, time 12.829
zbuflen 8192, clock 12.470, time 12.626
zbuflen 8192, clock 12.470, time 12.625
zbuflen 16384, clock 12.180, time 12.375
zbuflen 16384, clock 12.270, time 12.460
zbuflen 32768, clock 12.590, time 12.787
zbuflen 32768, clock 12.540, time 12.751
zbuflen 65536, clock 13.630, time 13.842
zbuflen 65536, clock 13.680, time 13.901
zlib-1.1.4 + infnew-5, icc -O3 -tpp6 -prof_use
zbuflen 1024, clock 12.060, time 12.219
zbuflen 1024, clock 12.090, time 12.242
zbuflen 2048, clock 11.640, time 11.783
zbuflen 2048, clock 11.650, time 11.796
zbuflen 4096, clock 11.430, time 11.586
zbuflen 4096, clock 11.440, time 11.587
zbuflen 8192, clock 11.320, time 11.485
zbuflen 8192, clock 11.320, time 11.465
zbuflen 16384, clock 11.320, time 11.515
zbuflen 16384, clock 11.230, time 11.419
zbuflen 32768, clock 11.260, time 11.459
zbuflen 32768, clock 11.180, time 11.365
zbuflen 65536, clock 12.160, time 12.359
zbuflen 65536, clock 12.150, time 12.362
zlib-1.1.4 + infnew-3
---------------------
zlib-1.1.4 + infnew-3, icc -O3 -tpp6
zbuflen 1024, clock 13.090, time 13.249
zbuflen 1024, clock 13.090, time 13.243
zbuflen 2048, clock 12.600, time 12.767
zbuflen 2048, clock 12.590, time 12.761
zbuflen 4096, clock 12.350, time 12.502
zbuflen 4096, clock 12.340, time 12.502
zbuflen 8192, clock 12.160, time 12.318
zbuflen 8192, clock 12.160, time 12.329
zbuflen 16384, clock 11.990, time 12.179
zbuflen 16384, clock 11.950, time 12.140
zbuflen 32768, clock 12.290, time 12.490
zbuflen 32768, clock 12.180, time 12.380
zbuflen 65536, clock 13.310, time 13.534
zbuflen 65536, clock 13.330, time 13.553
zlib-1.1.4 + infnew-3, icc -O3 -tpp6 -prof_use
zbuflen 1024, clock 12.260, time 12.434
zbuflen 1024, clock 12.280, time 12.429
zbuflen 2048, clock 11.790, time 11.950
zbuflen 2048, clock 11.810, time 11.966
zbuflen 4096, clock 11.530, time 11.693
zbuflen 4096, clock 11.550, time 11.686
zbuflen 8192, clock 11.360, time 11.527
zbuflen 8192, clock 11.380, time 11.522
zbuflen 16384, clock 11.170, time 11.366
zbuflen 16384, clock 11.240, time 11.404
zbuflen 32768, clock 11.310, time 11.498
zbuflen 32768, clock 11.410, time 11.588
zbuflen 65536, clock 12.270, time 12.484
zbuflen 65536, clock 12.230, time 12.423
zlib-1.1.4
----------
zlib-1.1.4, icc -O3 -tpp6
zbuflen 1024, clock 11.880, time 12.035
zbuflen 1024, clock 11.880, time 12.028
zbuflen 2048, clock 11.660, time 11.817
zbuflen 2048, clock 11.660, time 11.803
zbuflen 4096, clock 11.620, time 11.773
zbuflen 4096, clock 11.590, time 11.741
zbuflen 8192, clock 11.630, time 11.786
zbuflen 8192, clock 11.680, time 11.818
zbuflen 16384, clock 11.930, time 12.146
zbuflen 16384, clock 12.160, time 12.343
zbuflen 32768, clock 12.630, time 12.842
zbuflen 32768, clock 12.390, time 12.580
zbuflen 65536, clock 14.080, time 14.329
zbuflen 65536, clock 14.030, time 14.256
zlib-1.1.4, icc -O3 -tpp6 -prof_use
zbuflen 1024, clock 11.680, time 11.831
zbuflen 1024, clock 11.690, time 11.830
zbuflen 2048, clock 11.470, time 11.633
zbuflen 2048, clock 11.470, time 11.624
zbuflen 4096, clock 11.390, time 11.546
zbuflen 4096, clock 11.390, time 11.547
zbuflen 8192, clock 11.410, time 11.560
zbuflen 8192, clock 11.420, time 11.575
zbuflen 16384, clock 11.550, time 11.738
zbuflen 16384, clock 11.550, time 11.728
zbuflen 32768, clock 12.400, time 12.621
zbuflen 32768, clock 12.540, time 12.745
zbuflen 65536, clock 13.870, time 14.101
zbuflen 65536, clock 13.960, time 14.186
In the profiles below, I believe the updatewindow is icc's statistics
gathering function.
zlib-1.1.4 + infnew-5 profiles
------------------------------
zlib-1.1.4 + infnew-5, icc -qp -O3 -tpp6, gprof zbuflen 1024
time seconds seconds calls us/call us/call name
66.29 8.37 8.37 344785 24.27 24.27 inflate_fast
14.54 10.20 1.84 246099 7.46 7.46 CRC32::calc
3.61 12.14 0.46 51711 8.80 8.80 inflate_table
3.08 12.53 0.39 246099 1.58 1.58 updatewindow
0.25 12.56 0.03 246231 0.13 0.21 InputStream::readBytes
0.22 12.58 0.03 246100 0.11 51.01 GZInputStream::uncompress
0.11 12.61 0.01 140294 0.10 0.15 StdioInputStream::fillBuf
zlib-1.1.4 + infnew-5, icc -qp -O3 -tpp6, gprof zbuflen 4096
time seconds seconds calls us/call us/call name
73.16 8.76 8.76 102086 85.85 85.85 inflate_fast
15.39 10.61 1.84 61549 29.96 29.96 CRC32::calc
3.46 11.54 0.41 61549 6.73 6.73 updatewindow
3.31 11.94 0.40 51711 7.67 7.67 inflate_table
0.11 11.95 0.01 61560 0.22 0.35 InputStream::readBytes
0.08 11.96 0.01 61550 0.16 194.11 GZInputStream::uncompress
0.07 11.98 0.01 35075 0.22 0.22 StdioInputStream::fillBuf
zlib-1.1.4 + infnew-5, icc -qp -O3 -tpp6, gprof zbuflen 16384
time seconds seconds calls ms/call ms/call name
74.03 8.94 8.94 38633 0.23 0.23 inflate_fast
15.07 10.76 1.82 15389 0.12 0.12 CRC32::calc
3.62 11.72 0.44 51711 0.01 0.01 inflate_table
2.90 12.07 0.35 15389 0.02 0.67 inflate
0.05 12.07 0.01 15392 0.00 0.00 InputStream::readBytes
0.02 12.08 0.00 15390 0.00 0.78 GZInputStream::uncompress
0.00 12.08 0.00 6622 0.00 1.82 ZInputStream::fillBuf
zlib-1.1.4 + infnew-5, icc -qp -O3 -tpp6, gprof zbuflen 65536
time seconds seconds calls ms/call ms/call name
73.16 9.51 9.51 22635 0.42 0.42 inflate_fast
18.87 11.96 2.45 3848 0.64 0.64 CRC32::calc
2.63 12.74 0.34 3848 0.09 2.74 inflate
1.82 12.98 0.24 3848 0.06 0.06 updatewindow
0.08 12.99 0.01 3849 0.00 3.37 GZInputStream::uncompress
0.03 13.00 0.00 1657 0.00 7.84 ZInputStream::fillBuf
0.00 13.00 0.00 3851 0.00 0.00 InputStream::readBytes
0.00 13.00 0.00 2194 0.00 0.00 StdioInputStream::fillBuf
zlib-1.1.4 profiles
-------------------
zlib-1.1.4, icc -qp -O3 -tpp6, gprof zbuflen 1024
time seconds seconds calls us/call us/call name
72.84 8.59 8.59 182758 46.98 46.98 inflate_fast
15.05 10.36 1.77 141587 12.53 12.53 CRC32::calc
3.86 10.81 0.46 51711 8.80 8.80 huft_build
3.46 11.22 0.41 156853 2.60 58.25 inflate_codes
2.19 11.48 0.26 318856 0.81 0.81 inflate_flush
1.54 11.66 0.18 141587 1.28 69.92 inflate_blocks
0.28 11.70 0.03 246231 0.13 0.22 InputStream::readBytes
0.27 11.73 0.03 105937 0.29 110.73 ZInputStream::fillBuf
0.20 11.75 0.02 141588 0.17 82.63 GZInputStream::uncompress
0.10 11.76 0.01 140294 0.08 0.15 StdioInputStream::fillBuf
0.08 11.77 0.01 140294 0.07 0.07 SysFile::read
zlib-1.1.4, icc -qp -O3 -tpp6, gprof zbuflen 4096
time seconds seconds calls us/call us/call name
74.92 8.68 8.68 68849 126.13 126.13 inflate_fast
14.14 10.32 1.64 35515 46.14 46.14 CRC32::calc
4.25 10.81 0.49 51711 9.52 9.52 huft_build
2.60 11.12 0.30 106968 2.81 2.81 inflate_flush
2.28 11.38 0.26 52276 5.04 175.00 inflate_codes
1.52 11.55 0.18 35515 4.95 279.59 inflate_blocks
0.10 11.57 0.01 26485 0.44 437.23 ZInputStream::fillBuf
0.07 11.57 0.01 68952 0.11 0.14 zFreeFunc
0.07 11.58 0.01 61560 0.13 0.16 InputStream::readBytes
zlib-1.1.4, icc -qp -O3 -tpp6, gprof zbuflen 16384
time seconds seconds calls ms/call ms/call name
72.70 8.60 8.60 41198 0.21 0.21 inflate_fast
15.60 10.45 1.85 11026 0.17 0.17 CRC32::calc
4.29 10.95 0.51 51711 0.01 0.01 huft_build
3.63 11.38 0.43 64661 0.01 0.01 inflate_flush
1.83 11.60 0.22 28162 0.01 0.33 inflate_codes
1.63 11.79 0.19 11026 0.02 0.90 inflate_blocks
0.07 11.80 0.01 15392 0.00 0.00 InputStream::readBytes
0.05 11.81 0.01 68952 0.00 0.00 zAllocFunc
0.05 11.81 0.01 11027 0.00 1.07 GZInputStream::uncompress
zlib-1.1.4, icc -qp -O3 -tpp6, gprof zbuflen 65536
time seconds seconds calls ms/call ms/call name
64.38 9.10 9.10 35534 0.26 0.26 inflate_fast
20.39 11.98 2.88 3597 0.80 0.80 CRC32::calc
8.01 13.12 1.13 47732 0.02 0.02 inflate_flush
3.77 13.65 0.53 51711 0.01 0.01 huft_build
1.60 13.88 0.23 20812 0.01 0.50 inflate_codes
1.42 14.08 0.20 3597 0.06 3.12 inflate_blocks
0.17 14.10 0.02 3598 0.01 3.92 GZInputStream::uncompress
0.07 14.11 0.01 3851 0.00 0.00 InputStream::readBytes
0.03 14.12 0.00 68952 0.00 0.00 zFreeFunc
More information about the Zlib-devel
mailing list