[Zlib-devel] Mark Adler, where are you?

Daniel Richard G. oss at teragram.com
Sat Jul 21 00:14:18 EDT 2012


On Sat, 21 Jul 2012, Jan Seiffert wrote:

> Maybe it's time for an official "make perf_test"

Sounds like a fine idea. Like the POV-Ray benchmark, it could become 
popular as a system performance metric in its own right.

>> That's a lot of trouble to go to to make uFastInt a 32-bit integer on
>> x86-64. Why not just use uint32_t unconditionally?
>
> Because i wasn't sure. zlib is designed and written to cater to old/small
> systems were int is just 16 bit. So i could prop. have choosen
> uint16_fastest_t.
>
> A 8bit µC would really hate it to use 32Bit for these counter/index.

Okay, rephrase: unconditionally for all systems with a 32-bit int. I 
wasn't thinking about ancient and embedded systems, but those do need to 
be covered, too.

> This patch was more to bang some C language foo into place. The question 
> is not if a CPU is faster doing 64 Bit arith. At best it is exactly the 
> same (minus really wacky systems). The Patch is about minimizing type 
> conversions and sign-/zeroextends mandated by C by using the native 
> register size. For some CPUs that's an uLong, but not all... Then the 
> Patch ran into the Problem that a 64Bit CPU can be slower doing 64 Bit 
> arith.

My understanding is that this is broadly the case for 64-bit CPUs; they 
can still push 32-bit words around faster. Unless there's an exception, 
you could just use uint32_t for all modern 32- and 64-bit systems. (Unless 
some 32-bit systems can push around 16-bit words faster...)

> The key lies in creating a special type in the first place so
> you can easily redirect all uses in the program to the right type. 
>
> Choosing a different type for __x86_64__ by ifdefs is a clutch, no doubt,
> but at least as long as you can confine the ugliness to one place...

Well, that's what config headers are for...

> Sure, malloc isn't cheap, but note: this is the memory allocator you 
> pass in with your struct deflate. I for example use a thread local 
> special allocator, 200 instructions and the alloc is done. memset a 
> 128kb area needs ca. 128kb / 8 * 3 = 49k instructions on a 64 Bit CPU 
> (if you have a postincrement pointer). That get's worse very fast. And 
> general purpose allocators are also often very fast, if they are worth 
> their salt.

Okay, we've gone off on a tangent here.

I added a zmemzero() call (i.e. memset()) to deflateInit2_(), right after 
a malloc() that is used in the course of initializing a deflate_state 
struct. This is an initialization routine, not an inner loop. If I stick 
an fprintf() in there, and run "make check" with a static build, it gets 
called exactly fourteen times.

> And slow if done unconditionally. I can already see the patch in two 
> years from Google or some embedded dev to remove it.

LOL

> If it has line numbers, yes.
> A small reproducer would also be OK.

I just used the zlib "example" program. With current git master source, 
here's the interesting bit:

     ==14644== 27188 errors in context 1 of 1:
     ==14644== Conditional jump or move depends on uninitialised value(s)
     ==14644==    at 0x405120: fill_window (deflate.c:1442)
     ==14644==    by 0x406123: deflate_slow (deflate.c:1743)
     ==14644==    by 0x404179: deflate (deflate.c:901)
     ==14644==    by 0x401752: test_large_deflate (example.c:319)
     ==14644==    by 0x402303: main (example.c:587)
     ==14644==  Uninitialised value was created by a heap allocation
     ==14644==    at 0x4C28F9F: malloc (vg_replace_malloc.c:236)
     ==14644==    by 0x40EB9F: zcalloc (zutil.c:310)
     ==14644==    by 0x402656: deflateInit2_ (deflate.c:294)
     ==14644==    by 0x4023E5: deflateInit_ (deflate.c:207)
     ==14644==    by 0x4015C6: test_large_deflate (example.c:290)
     ==14644==    by 0x402303: main (example.c:587)


--Daniel


-- 
Daniel Richard G. || danielg at teragram.com || Software Developer
Teragram Linguistic Technologies (a division of SAS)
http://www.teragram.com/


More information about the Zlib-devel mailing list