[Zlib-devel] new stuff

Chris Anderson christop at charm.net
Mon Dec 29 21:14:46 EST 2003


I added amd64 asm to
http://www.eetbeetee.org/zlib/zlib-1.2.1/contrib/inflate86/inffas86.c

The 64 bit version uses the full 64 bits of the hold register, like the
mmx version.  Since the hold register has at least 32 bits at the top of
the do loop, it also unrolls the literal decoding part of the loop so
that it does something like this:

do {
  if (bits < 32) {
    hold += (unsigned long)*((unsigned int *)in)++ << bits;
  }
  this = lcode[hold & lmask];
  op = (unsigned)(this.bits);
  hold >>= op;
  bits -= op;
  op = (unsigned)(this.bits);
  if (op == 0) {
    *out++ = this.val;
    this = lcode[hold & lmask];
  dolen:
    op = (unsigned)(this.bits);
    hold >>= op;
    bits -= op;
    op = (unsigned)(this.bits);
    if (op == 0 ) {
      *out++ = this.val;
       continue;
    }
  }
  ...

That gains at most 2% speed.  I noticed that the speed gains of using
inffas86.c is next to nothing when compression level is 1 and I believe
that is because more time is spent decoding literals and less time using
weird x86 string copy instructions like rep movsb.  With the unrolling,
the function also has to do an extra bounds check to make sure there is
at least 1 more input code and 1 more output byte available.

I also made the copy operate on words instead of bytes when the distance
is > 1:

if (op & 1)
  *out++ = *from++;
op >>= 1;
do {
  *((ush *)out)++ = *((ush *)from)++;
} while(--op);

This requires unaligned short pointers.  That gains at most another 2%
points.

Here's the speed difference with my favorite test data on an Athlon 64
3000+ / 512K cache version (gcc3.2).

# 32 bit version
$ make CFLAGS="-O3 -m32 -DUSE_MMAP -fomit-frame-pointer"
$ time minigzip -d < ../mozilla-source-1.3.tar.gz > /dev/null

real    0m1.623s
user    0m1.610s
sys     0m0.010s

# 64 bit version
$ make CFLAGS="-O3 -DUSE_MMAP -fomit-frame-pointer"
$ time minigzip -d < ../mozilla-source-1.3.tar.gz > /dev/null

real    0m1.487s
user    0m1.480s
sys     0m0.010s

# 32 bit inffas86.c version
$ make CFLAGS="-O3 -m32 -DUSE_MMAP -DASMINF -fomit-frame-pointer" \
OBJA=inffas86.o
$ time minigzip -d < ../mozilla-source-1.3.tar.gz > /dev/null

real    0m1.242s
user    0m1.220s
sys     0m0.020s

# 64 bit inffas86.c version
$ make CFLAGS="-O3 -DUSE_MMAP -DASMINF -fomit-frame-pointer" \
OBJA=inffas86.o

$ time minigzip -d < ../mozilla-source-1.3.tar.gz > /dev/null

real    0m1.187s
user    0m1.170s
sys     0m0.020s

---

And a 2.4ghz P4 (gcc3.2):

$ make CFLAGS="-O3 -DUSE_MMAP -fomit-frame-pointer"
$ time minigzip -d < ../mozilla-source-1.3.tar.gz > /dev/null

real    0m2.403s
user    0m2.385s
sys     0m0.018s

$ make CFLAGS="-O3 -DUSE_MMAP -DASMINF -fomit-frame-pointer" \
OBJA=inffas86.o
$ time minigzip -d < ../mozilla-source-1.3.tar.gz > /dev/null

real    0m1.936s
user    0m1.902s
sys     0m0.033s






More information about the Zlib-devel mailing list