[Zlib-devel] Mark Adler, where are you?
Daniel Richard G.
oss at teragram.com
Sat Jul 21 00:14:18 EDT 2012
On Sat, 21 Jul 2012, Jan Seiffert wrote:
> Maybe it's time for an official "make perf_test"
Sounds like a fine idea. Like the POV-Ray benchmark, it could become
popular as a system performance metric in its own right.
>> That's a lot of trouble to go to to make uFastInt a 32-bit integer on
>> x86-64. Why not just use uint32_t unconditionally?
>
> Because i wasn't sure. zlib is designed and written to cater to old/small
> systems were int is just 16 bit. So i could prop. have choosen
> uint16_fastest_t.
>
> A 8bit µC would really hate it to use 32Bit for these counter/index.
Okay, rephrase: unconditionally for all systems with a 32-bit int. I
wasn't thinking about ancient and embedded systems, but those do need to
be covered, too.
> This patch was more to bang some C language foo into place. The question
> is not if a CPU is faster doing 64 Bit arith. At best it is exactly the
> same (minus really wacky systems). The Patch is about minimizing type
> conversions and sign-/zeroextends mandated by C by using the native
> register size. For some CPUs that's an uLong, but not all... Then the
> Patch ran into the Problem that a 64Bit CPU can be slower doing 64 Bit
> arith.
My understanding is that this is broadly the case for 64-bit CPUs; they
can still push 32-bit words around faster. Unless there's an exception,
you could just use uint32_t for all modern 32- and 64-bit systems. (Unless
some 32-bit systems can push around 16-bit words faster...)
> The key lies in creating a special type in the first place so
> you can easily redirect all uses in the program to the right type.
>
> Choosing a different type for __x86_64__ by ifdefs is a clutch, no doubt,
> but at least as long as you can confine the ugliness to one place...
Well, that's what config headers are for...
> Sure, malloc isn't cheap, but note: this is the memory allocator you
> pass in with your struct deflate. I for example use a thread local
> special allocator, 200 instructions and the alloc is done. memset a
> 128kb area needs ca. 128kb / 8 * 3 = 49k instructions on a 64 Bit CPU
> (if you have a postincrement pointer). That get's worse very fast. And
> general purpose allocators are also often very fast, if they are worth
> their salt.
Okay, we've gone off on a tangent here.
I added a zmemzero() call (i.e. memset()) to deflateInit2_(), right after
a malloc() that is used in the course of initializing a deflate_state
struct. This is an initialization routine, not an inner loop. If I stick
an fprintf() in there, and run "make check" with a static build, it gets
called exactly fourteen times.
> And slow if done unconditionally. I can already see the patch in two
> years from Google or some embedded dev to remove it.
LOL
> If it has line numbers, yes.
> A small reproducer would also be OK.
I just used the zlib "example" program. With current git master source,
here's the interesting bit:
==14644== 27188 errors in context 1 of 1:
==14644== Conditional jump or move depends on uninitialised value(s)
==14644== at 0x405120: fill_window (deflate.c:1442)
==14644== by 0x406123: deflate_slow (deflate.c:1743)
==14644== by 0x404179: deflate (deflate.c:901)
==14644== by 0x401752: test_large_deflate (example.c:319)
==14644== by 0x402303: main (example.c:587)
==14644== Uninitialised value was created by a heap allocation
==14644== at 0x4C28F9F: malloc (vg_replace_malloc.c:236)
==14644== by 0x40EB9F: zcalloc (zutil.c:310)
==14644== by 0x402656: deflateInit2_ (deflate.c:294)
==14644== by 0x4023E5: deflateInit_ (deflate.c:207)
==14644== by 0x4015C6: test_large_deflate (example.c:290)
==14644== by 0x402303: main (example.c:587)
--Daniel
--
Daniel Richard G. || danielg at teragram.com || Software Developer
Teragram Linguistic Technologies (a division of SAS)
http://www.teragram.com/
More information about the Zlib-devel
mailing list