[Zlib-devel] infnew-5 available for testing
Chris Anderson
christop at fellspt.charm.net
Thu Jan 9 23:58:01 EST 2003
On Thu, 9 Jan 2003, Mark Adler wrote:
> On Thursday, January 9, 2003, at 05:43 PM, Chris Anderson wrote:
>
> > One of the differences
> > between icc -O3 and icc -O3 -prof_use is that icc -O3 unrolls this loop
> > and icc -O3 -prof_use does not (infnew-5/inffast.c):
>
> This does not surprise me. I tried several variations on the amount of
> loop unrolling and used the optimal amount for the distribution of
> lengths in typical deflate streams (I should say optimal for my
> processor).
>
That explains a lot.
I just tried icc version 7.0 (was using 6.0) and it no longer unrolled
that loop with -O3 and improved speed slightly. Moreover, POSTINC
improved times with both -O3 and -O3 -prof_use:
icc -O3
zbuflen 16384, clock 11.530, time 12.173
zbuflen 16384, clock 11.550, time 12.066
icc -O3 -DPOSTINC
zbuflen 16384, clock 11.440, time 12.004
zbuflen 16384, clock 11.520, time 12.045
icc -O3 -prof_use
zbuflen 16384, clock 10.850, time 11.385
zbuflen 16384, clock 10.930, time 11.449
icc -O3 -prof_use -DPOSTINC
zbuflen 16384, clock 10.670, time 11.226
zbuflen 16384, clock 10.720, time 11.242
The intel gods must have been listening.
> > The asm for this loop without unrolling looks like this
> ...
> > movl 48(%esp), %ebx #234.25
> > addl $-3, %ebx #234.25
> > movl %ebx, 48(%esp) #234.25
>
> That's interesting. I would think that any self-respecting compiler
> would keep the loop counter in a register instead of on the stack.
> Then again, perhaps I'm more used to a processor with 32 registers than
> one with eight.
>
> mark
>
Looks like gcc does something simular, but doesn't have the -2
instructions with POSTINC:
gcc3.2 -O3 -DPOSTINC (15 instructions)
.L46:
movb (%edx), %cl
movb %cl, (%edi)
incl %edx
movb (%edx), %cl
incl %edi
movb %cl, (%edi)
incl %edx
incl %edi
movb (%edx), %cl
movb %cl, (%edi)
subl $3, -64(%ebp)
incl %edx
incl %edi
cmpl $2, -64(%ebp)
ja .L46
icc7 -O3 -DPOSTINC (13 instructions, w/o POSTINC it was 15)
..B1.37: # Preds ..B1.37 ..B1.36 # Infreq
movb (%edx), %bl #231.36
movb %bl, (%ebp) #231.25
movb 1(%edx), %cl #232.36
lea 1(%edx), %edi #231.36
lea 1(%ebp), %ebx #231.25
movb %cl, 1(%ebp) #232.25
movb 2(%edx), %cl #233.36
movb %cl, 2(%ebp) #233.25
addl $-3, %eax #234.25
addl $3, %edx #233.36
addl $3, %ebp #233.25
cmpl $2, %eax #235.30
ja ..B1.37 # Prob 90% #235.30
But, the times are better with POSTINC!
gcc3.2 -O3 -DPOSTINC
zbuflen 16384, clock 11.940, time 12.481
zbuflen 16384, clock 12.020, time 12.592
gcc3.2 -O3
zbuflen 16384, clock 12.680, time 13.257
zbuflen 16384, clock 12.680, time 13.226
More information about the Zlib-devel
mailing list