[Zlib-devel] trees.c patch proposal to reduce block header size

Frédéric Kayser f.kayser at free.fr
Fri Nov 18 07:16:54 EST 2011


Hello,
I've started to look if some of the optimizations done by Deflopt and Defluff could be back ported to zlib.
This usually only saves a few bits in each dynamic block header and eventually a couple of bytes on the entire stream.

My first attempt is useful when using Z_HUFFMAN_ONLY or Z_RLE strategies.
Here is a comment found in trees.c:
    /* The pkzip format requires that at least one distance code exists,
     * and that at least one bit should be sent even if there is only one
     * possible code. So to avoid special checks later on we force at least
     * two codes of non zero frequency.
     */
This is indeed suboptimal, my patch produces single code and even a dummy (zero length) single code.

*** Z_HUFFMAN_ONLY like files (no length/distance pairs) ***
Zlib produces this type of headers:
 [5] HLIT  257 (raw:0)
 [5] HDIST   2 (raw:1)
 [4] HCLEN  18 (raw:14)
...
[3] d_01 CL (val: 1) (distance 1)
[3] d_02 CL (val: 1) (distance 2)

HLIT is equal to 257 this means that there's actually no length recorded in the header (only 256 literal plus the End of Block).
In this case the two distance codes (HDIST = 2) are totally useless (the deflate stream cannot contain length/distance pairs since there are no length at all), but as stated in the comment at least one distance code has to be recorded, my patch produces this:
 [5] HLIT  257 (raw:0)
 [5] HDIST   1 (raw:0)
 [4] HCLEN  10 (raw:6)
...
[5] d_01 CL (val: 0) (distance 1)
This time HDIST = 1 (the minimum) thus only one distance code is stored and since it's never used it size is set to zero (in this particular sample it also has the side effect of reducing HCLEN).
Attached files: demo0.txt.gz (produced by minigzip) and demo0_kay_patch1.gz (produced by a patched minigzip).

*** Z_RLE like files (distance=1 all the time) ***
Zlib produces this type of headers:
 [5] HLIT  267 (raw:10)
 [5] HDIST   2 (raw:1)
 [4] HCLEN  18 (raw:14)
...
[6] ZREP (9 times)
 [_] l_01 CL (val: 0) (length 3)
 [_] l_02 CL (val: 0) (length 4)
 [_] l_03 CL (val: 0) (length 5)
 [_] l_04 CL (val: 0) (length 6)
 [_] l_05 CL (val: 0) (length 7)
 [_] l_06 CL (val: 0) (length 8)
 [_] l_07 CL (val: 0) (length 9)
 [_] l_08 CL (val: 0) (length 10)
 [_] l_09 CL (val: 0) (lengths [11-12])
 [3] l_10 CL (val: 7) (lengths [13-14])
 [3] d_01 CL (val: 1) (distance 1)
 [3] d_02 CL (val: 1) (distance 2)
Here again we have two distance codes, the first one is effectively used but the second is useless.
My patch produces this:
 [5] HLIT  267 (raw:10)
 [5] HDIST   1 (raw:0)
 [4] HCLEN  18 (raw:14)
...
[6] ZREP (9 times)
 [_] l_01 CL (val: 0) (length 3)
 [_] l_02 CL (val: 0) (length 4)
 [_] l_03 CL (val: 0) (length 5)
 [_] l_04 CL (val: 0) (length 6)
 [_] l_05 CL (val: 0) (length 7)
 [_] l_06 CL (val: 0) (length 8)
 [_] l_07 CL (val: 0) (length 9)
 [_] l_08 CL (val: 0) (length 10)
 [_] l_09 CL (val: 0) (lengths [11-12])
 [2] l_10 CL (val: 7) (lengths [13-14])
 [4] d_01 CL (val: 1) (distance 1)
This time only one distance code is recorded.
Attached files: demo1.txt.gz (produced by minigzip) and demo1_kay_patch1.gz (produced by a patched minigzip).

*** only one distance code but different from zero ***
Zlib produces this type of headers:
 [5] HLIT  267 (raw:10)
 [5] HDIST   6 (raw:5)
 [4] HCLEN  18 (raw:14)
...
[5] ZREP (9 times)
 [_] l_01 CL (val: 0) (length 3)
 [_] l_02 CL (val: 0) (length 4)
 [_] l_03 CL (val: 0) (length 5)
 [_] l_04 CL (val: 0) (length 6)
 [_] l_05 CL (val: 0) (length 7)
 [_] l_06 CL (val: 0) (length 8)
 [_] l_07 CL (val: 0) (length 9)
 [_] l_08 CL (val: 0) (length 10)
 [_] l_09 CL (val: 0) (lengths [11-12])
 [3] l_10 CL (val: 7) (lengths [13-14])
 [4] d_01 CL (val: 1) (distance 1)
 [5] ZREP (4 times)
 [_] d_02 CL (val: 0) (distance 2)
 [_] d_03 CL (val: 0) (distance 3)
 [_] d_04 CL (val: 0) (distance 4)
 [_] d_05 CL (val: 0) (distances [5-6])
 [4] d_06 CL (val: 1) (distances [7-8])
Here the distance code actually used is the last one (distance = 7) the first one is useless and has been introduced by zlib.
My patch produces this:
 [5] HLIT  267 (raw:10)
 [5] HDIST   6 (raw:5)
 [4] HCLEN  18 (raw:14)
...
[5] ZREP (9 times)
 [_] l_01 CL (val: 0) (length 3)
 [_] l_02 CL (val: 0) (length 4)
 [_] l_03 CL (val: 0) (length 5)
 [_] l_04 CL (val: 0) (length 6)
 [_] l_05 CL (val: 0) (length 7)
 [_] l_06 CL (val: 0) (length 8)
 [_] l_07 CL (val: 0) (length 9)
 [_] l_08 CL (val: 0) (length 10)
 [_] l_09 CL (val: 0) (lengths [11-12])
 [3] l_10 CL (val: 7) (lengths [13-14])
 [5] ZREP (5 times)
 [_] d_01 CL (val: 0) (distance 1)
 [_] d_02 CL (val: 0) (distance 2)
 [_] d_03 CL (val: 0) (distance 3)
 [_] d_04 CL (val: 0) (distance 4)
 [_] d_05 CL (val: 0) (distances [5-6])
 [4] d_06 CL (val: 1) (distances [7-8])
This time distance = 1 is set to zero and costs nothing in the header since it's part of a repeat x zeros RLE encoding.
Attached files: demo3.txt.gz (produced by minigzip) and demo1_kay_patch3.gz (produced by a patched minigzip).

The second part of the patch ensures full bit precision when selecting between static, dynamic and stored blocks.

Best regards
Frederic Kayser
-------------- next part --------------
A non-text attachment was scrubbed...
Name: demo0.txt.gz
Type: application/x-gzip
Size: 124 bytes
Desc: not available
URL: <http://madler.net/pipermail/zlib-devel_madler.net/attachments/20111118/537a95ae/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: demo1.txt.gz
Type: application/x-gzip
Size: 129 bytes
Desc: not available
URL: <http://madler.net/pipermail/zlib-devel_madler.net/attachments/20111118/537a95ae/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: demo3.txt.gz
Type: application/x-gzip
Size: 129 bytes
Desc: not available
URL: <http://madler.net/pipermail/zlib-devel_madler.net/attachments/20111118/537a95ae/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: demo0_kay_patch1.gz
Type: application/x-gzip
Size: 121 bytes
Desc: not available
URL: <http://madler.net/pipermail/zlib-devel_madler.net/attachments/20111118/537a95ae/attachment-0003.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: demo1_kay_patch1.gz
Type: application/x-gzip
Size: 128 bytes
Desc: not available
URL: <http://madler.net/pipermail/zlib-devel_madler.net/attachments/20111118/537a95ae/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: demo3_kay_patch1.gz
Type: application/x-gzip
Size: 129 bytes
Desc: not available
URL: <http://madler.net/pipermail/zlib-devel_madler.net/attachments/20111118/537a95ae/attachment-0005.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kay_patch1.diff
Type: text/x-patch
Size: 2904 bytes
Desc: not available
URL: <http://madler.net/pipermail/zlib-devel_madler.net/attachments/20111118/537a95ae/attachment-0006.bin>


More information about the Zlib-devel mailing list