[Zlib-devel] Trivial inflate_table() perf improvement

Steve Snyder swsnyder at insightbb.com
Tue Nov 20 11:34:08 EST 2007


Here's a trivial performance improvement.

While running Mozilla code in Intel's VTune profiler recdently, the 
profiler noted a performance problem in zlib's inflate_table().  There 
is a stall when a 16-bit value is written, then a 32-bit value that 
contains that 16-bit value is immediately read back.  The stall occurs 
because the 32-bit read must wait for the 16-bit write to complete.  

I've attached a patch to remedy the problem.

The original code wrote 3 constant structure members to memory, then 
read the 4-byte structure back and copied it twice to the target 
array, incrementing the pointer on each copy.  That totals 7 writes to 
memory and 4 reads, in 27 bytes of x86 code.  

The new code creates the constant structure at build time.  It is read 
from memory at runtime and copied twice to the target array.  The 
pointer is incremented once, after both structs have been written.  
Totals: 3 writes and 3 reads, in 19 bytes of x86 code.  

The code below was built with MSVC 7.1 (VS2003), with optimization 
enabled, from zlib v1.2.3 source.

The "trivial" comes the apparent infrequency of this code being 
called in normal use.

Original code:
--------------

        this.op = (unsigned char)64;
        this.bits = (unsigned char)1;
        this.val = (unsigned short)0;
        *(*table)++ = this;
00A4EEA5 8B 45 14         mov         eax,dword ptr [table] 
00A4EEA8 8B 10            mov         edx,dword ptr [eax] 
00A4EEAA C6 45 10 40      mov         byte ptr [this],40h 
00A4EEAE C6 45 11 01      mov         byte ptr [ebp+11h],1 
00A4EEB2 66 89 4D 12      mov         word ptr [ebp+12h],cx 
00A4EEB6 8B 4D 10         mov         ecx,dword ptr [this] 
00A4EEB9 89 0A            mov         dword ptr [edx],ecx 
00A4EEBB 83 00 04         add         dword ptr [eax],4 
00A4EEBE 8B 10            mov         edx,dword ptr [eax] 
        *(*table)++ = this;
00A4EEC0 89 0A            mov         dword ptr [edx],ecx 
00A4EEC2 83 00 04         add         dword ptr [eax],4 


Modified code:
--------------

        code FAR * const t = *table;
00A4EEA5 8B 45 14          mov         eax,dword ptr [table] 
00A4EEA8 8B 08             mov         ecx,dword ptr [eax] 
        t[0] = t[1] = invalid_code;
00A4EEAA 8B 15 C8 AF BF 00 mov         edx,dword ptr [invalid_code] 
00A4EEB0 89 51 04          mov         dword ptr [ecx+4],edx 
00A4EEB3 89 11             mov         dword ptr [ecx],edx 
        (*table) += 2;
00A4EEB5 83 00 08          add         dword ptr [eax],8 

-----------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: inflate_table-stall.patch
Type: text/x-diff
Size: 1440 bytes
Desc: not available
URL: <http://madler.net/pipermail/zlib-devel_madler.net/attachments/20071120/040d0d89/attachment.bin>


More information about the Zlib-devel mailing list