[Zlib-devel] Trivial inflate_table() perf improvement
Steve Snyder
swsnyder at insightbb.com
Tue Nov 20 11:34:08 EST 2007
Here's a trivial performance improvement.
While running Mozilla code in Intel's VTune profiler recdently, the
profiler noted a performance problem in zlib's inflate_table(). There
is a stall when a 16-bit value is written, then a 32-bit value that
contains that 16-bit value is immediately read back. The stall occurs
because the 32-bit read must wait for the 16-bit write to complete.
I've attached a patch to remedy the problem.
The original code wrote 3 constant structure members to memory, then
read the 4-byte structure back and copied it twice to the target
array, incrementing the pointer on each copy. That totals 7 writes to
memory and 4 reads, in 27 bytes of x86 code.
The new code creates the constant structure at build time. It is read
from memory at runtime and copied twice to the target array. The
pointer is incremented once, after both structs have been written.
Totals: 3 writes and 3 reads, in 19 bytes of x86 code.
The code below was built with MSVC 7.1 (VS2003), with optimization
enabled, from zlib v1.2.3 source.
The "trivial" comes the apparent infrequency of this code being
called in normal use.
Original code:
--------------
this.op = (unsigned char)64;
this.bits = (unsigned char)1;
this.val = (unsigned short)0;
*(*table)++ = this;
00A4EEA5 8B 45 14 mov eax,dword ptr [table]
00A4EEA8 8B 10 mov edx,dword ptr [eax]
00A4EEAA C6 45 10 40 mov byte ptr [this],40h
00A4EEAE C6 45 11 01 mov byte ptr [ebp+11h],1
00A4EEB2 66 89 4D 12 mov word ptr [ebp+12h],cx
00A4EEB6 8B 4D 10 mov ecx,dword ptr [this]
00A4EEB9 89 0A mov dword ptr [edx],ecx
00A4EEBB 83 00 04 add dword ptr [eax],4
00A4EEBE 8B 10 mov edx,dword ptr [eax]
*(*table)++ = this;
00A4EEC0 89 0A mov dword ptr [edx],ecx
00A4EEC2 83 00 04 add dword ptr [eax],4
Modified code:
--------------
code FAR * const t = *table;
00A4EEA5 8B 45 14 mov eax,dword ptr [table]
00A4EEA8 8B 08 mov ecx,dword ptr [eax]
t[0] = t[1] = invalid_code;
00A4EEAA 8B 15 C8 AF BF 00 mov edx,dword ptr [invalid_code]
00A4EEB0 89 51 04 mov dword ptr [ecx+4],edx
00A4EEB3 89 11 mov dword ptr [ecx],edx
(*table) += 2;
00A4EEB5 83 00 08 add dword ptr [eax],8
-----------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: inflate_table-stall.patch
Type: text/x-diff
Size: 1440 bytes
Desc: not available
URL: <http://madler.net/pipermail/zlib-devel_madler.net/attachments/20071120/040d0d89/attachment.bin>
More information about the Zlib-devel
mailing list