[Zlib-devel] pigz 2.1.4 output differs for -p 1 and -p 2

wpilorz at gmail.com wpilorz at gmail.com
Sun Nov 16 18:20:07 EST 2008


On Mon, Nov 17, 2008 at 12:02:06AM +0100, wpilorz at gmail.com wrote:
> On Sat, Nov 15, 2008 at 01:40:30PM -0800, Mark Adler wrote:
> > On Nov 13, 2008, at 5:24 PM, Mark Adler wrote:
> > >This behavior is odd.  In theory, Z_FULL_FLUSH should erase any  
> > >memory of the previous data, yet somehow the size of the next block  
> > >is different when using Z_FULL_FLUSH as opposed to deflateReset().   
> > >deflateReset() is doing a "better" job of erasing the memory of the  
> > >previous compression.  This may in fact be a bug in deflate in zlib.
> > 
> > 
> > I found where that difference was coming from.  Below is a patch to  
> > deflate to correct the problem.
> > 
> > Now I'll look into remedies for the differences due to  
> > deflateSetDictionary(), i.e. when not using -i.
> > 
> > Mark
> > 
> > 
> > *** zlib-1.2.3.3/deflate.c	2006-09-03 23:57:22.000000000 -0700
> > --- zlib-1.2.3.4/deflate.c	2008-11-15 13:13:58.000000000 -0800
> > ***************
> > *** 847,852 ****
> > --- 847,854 ----
> >                    */
> >                   if (flush == Z_FULL_FLUSH) {
> >                       CLEAR_HASH(s);             /* forget history */
> > +                     if (s->lookahead == 0)
> > +                         s->strstart = 0;
> >                   }
> >               }
> >               flush_pending(strm);
> > 
> > 
> I have applied two patches to zlib 1.2.3 (the one above and that patching lines 332-339 of deflate.c),
> compiled zlib and linked pigz (compiled with -O3) to the resulting static libz.a.
> 
> For some input pigz with options -n -T -i -p 1 generated invalid data (tests
> run on CentOS 5 Linux, PIII i386 CPU):
> 
> $ perl -we 'use strict; for my $i (1 .. 99_999) { my $imod= $i % 101; my $itx=""; for (1 .. $imod) { $itx .= "X". ($i^$_); }; print "Line_$i : $itx\n"}' | md5sum -b;
> 2d74cda70d7653466d7072d07563e55d *-
> 
> $ for op in '-i -p 1' '-i -p 2'; do perl -we 'use strict; for my $i (1 .. 99_999) { my $imod= $i % 101; my $itx=""; for (1 .. $imod) { $itx .= "X". ($i^$_); }; print "Line_$i : $itx\n"}' | /usr/local/pigz_081116/pigz  -n -T $op | md5sum -b; done
> 627ae1f31111d3db9c7901a3d135b71d *-
> f49f22039d758be59b6fe766d2c5f154 *-
> 
> $ for op in '-i -p 1' '-i -p 2'; do perl -we 'use strict; for my $i (1 .. 99_999) { my $imod= $i % 101; my $itx=""; for (1 .. $imod) { $itx .= "X". ($i^$_); }; print "Line_$i : $itx\n"}' | /usr/local/pigz_081116/pigz  -n -T $op | zcat | md5sum -b; done
> 
> zcat: stdin: invalid compressed data--crc error
> 
> zcat: stdin: invalid compressed data--length error
> c9cccdd56734dc1c48e85fbb3be40ad4 *-
> 2d74cda70d7653466d7072d07563e55d *-
> 
> 
> I have patched plain zlib-1.2.3, is that OK, or are there any other patched which should also be applied?
> (The directory names in your patches might suggest there have been some other patches applied to zlib-1.2.3)
> 
> Also, I have disk data (about 285 MB) which generates valid but different results when compressed through
> pigz -6 -n -T -p 1
> and 
> pigz -6 -n -T -p 2
> 
> I will try to find something that could be sent via email for that.
> (PS. that data also gives invalid  strem (not accepted by zcat)
>  when compressed with /usr/local/pigz_081116/pigz -6 -n -T -i -p 1
> )
> 
> 
OK, got real-world test data for the case without -i:

$ wget http://www.adobe.com/devnet/acrobat/pdfs/reader_overview.pdf
[...]
Saving to: `reader_overview.pdf'
00:13:49 (107 KB/s) - `reader_overview.pdf' saved [976214/976214]
$ sha1sum -b reader_overview.pdf
2b89f57e7d5ea090a415f62b2e551931a8907477 *reader_overview.pdf
$ md5sum -b reader_overview.pdf
3ccd761d09f71f2d328d1da0867f8bc2 *reader_overview.pdf
$ env LANG=C TZ=GMT ls -l reader_overview.pdf
-rw-rw-r-- 1 wp wp 976214 Sep 10 20:12 reader_overview.pdf
$ for op in '-i -p 1' '-i -p 2' '-p 1' '-p 2'; do cat reader_overview.pdf | /usr/local/pigz_081116/pigz -6 -n -T $op | md5sum -b; done
b28f0c862ddf58f0c00c3a153ee291dd *-
5b74a773a72571e11491541021f6bcb6 *-
5b3bbc526623f765e83bdf8b783782ef *-
e1ad03dc26c71a12a57db75dbe6ceb1b *-


So, different compressed output, but decompressed correctly:

$ for op in '-i -p 1' '-i -p 2' '-p 1' '-p 2'; do cat reader_overview.pdf | /usr/local/pigz_081116/pigz -6 -n -T $op | zcat | md5sum -b; done
3ccd761d09f71f2d328d1da0867f8bc2 *-
3ccd761d09f71f2d328d1da0867f8bc2 *-
3ccd761d09f71f2d328d1da0867f8bc2 *-
3ccd761d09f71f2d328d1da0867f8bc2 *-


Best regards,

Wojtek





More information about the Zlib-devel mailing list