[Zlib-devel] Z_RLE deflation

Mon Jun 6 01:48:37 EDT 2005

Hi, everyone,

After a fairly long absence from this list, I'm back and slowly catching
up with the latest messages. Although I did not participate to the
"known bugs in distance-checking code?" discussion, I am glad that you
eventually decided not to reduce the inflater's speed for the sake of
silently accepting invalid (and potentially dangerous) deflate streams.

But more importantly, I would like to make the following clarification:

On Sun, 27 Feb 2005, Mark Adler wrote:

> On Feb 27, 2005, at 6:20 PM, Greg Roelofs wrote:
>> OK, that's close to what I'd surmised based on experimentation.  Close,
>> but not quite identical:  levels 1-3 and 4-9 differ from each other.
>
> Greg,
>
> Thanks for testing that.  I think it's a bug.  Z_RLE is always supposed to
> use the fast longest-match routine, which is the relevant difference between
> levels 1-3 and 4-9.  I'll put that on my list of things to look at.

No, that is NOT a bug. It's the default behavior, as given by the
difference between deflate_fast() and deflate_slow(). Z_RLE is approx.
equally fast when using deflate_fast (levels 1-3) and deflate_slow
(levels 4-9), but the recommendation is to run it at the higher
compression level, for a slightly better performance. If you look at
Greg's report, you can see what I mean:

   IDAT length with method 137 (fm 0 zl 3 zs 3) =   241049
   IDAT length with method 138 (fm 1 zl 3 zs 3) =   329234
   ...
   IDAT length with method 143 (fm 0 zl 4 zs 3) =   240994
   IDAT length with method 144 (fm 1 zl 4 zs 3) =   329163
   ...

Remark that 241049 > 240994, and 329234 > 329163. So the latter (zl=4)
is better.

The explanation is provided below.

The "true" RLE is achieved by deflate_slow(). Due to a speedup quirk of
deflate_fast(), the runs that are longer than 258 are encoded optimally
by deflate_slow(), but not by deflate_fast().

Assume an "aaa...a" run that is 300 bytes long.

deflate_slow() encodes it as you would expect:

  literal 'a'
  distance -1, length 258
  distance -1, length 41

(If the run is right at the beginning of the file, pay attention that
the literal 'a' appears twice, as the current deflate implementation is
unable to produce distance codes that point to the very first literal.)

On the other hand, deflate_fast() encodes it as follows:

  literal 'a'
  distance -1, length 258
  literal 'a'
  distance -1, length 40

This is obviously inefficient, and the reason lies in the line 1509 in
deflate.c (zlib version 1.2.2.3):

         /* Insert new strings in the hash table only if the match length
          * is not too large. This saves time but degrades compression.
          */
  #ifndef FASTEST
>>       if (s->match_length <= s->max_insert_length &&
             s->lookahead >= MIN_MATCH) {
             s->match_length--; /* string at strstart already in table
*/
             do { ...

The comment from the original source file says it all ("... This saves
time but degrades compression.")
s->max_insert_length is 4, 5, or 6, depending if the compression level
is 1, 2, or 3. If the run is longer than 258, then s->match_length is
258, and the code after the "do" instruction (specifically, the
INSERT_STRING macros) won't get executed. This causes the remaining run
to be encoded sub-optimally.

Now, we could complicate the deflater, to use deflate_slow() under
Z_RLE, no matter what the compression level is; or, we could simplify it
by removing the above code (between #ifndef FASTEST...#endif) altogether
and have a slightly slower but slightly better compression for the
levels 1,2,3 regardless of the strategy; or we could leave it as is.

My first choice is the third alternative (leave the code as is); my
second choice (but not much behind the first) is the second alternative
(make the deflation slightly slower and slightly better-compressed --
recall that this is only for the levels 1,2 and 3, and recall that we're
in 2005). The first alternative is not so good, in my opinion.

Other opinions?
Cosmin