[Zlib-devel] [3/8][RFC V3 Patch] Add special ARM Adler32 version
Jan Seiffert
kaffeemonster at googlemail.com
Sun May 1 17:48:46 EDT 2011
2011/4/24 Jan Seiffert <kaffeemonster at googlemail.com>:
> This adds an NEON version, a iWMMXt version for Intel (now Marvel)
> StrongARM and a version for ARMv6 DSP instructions of Adler32.
>
Thanks again to Edwin Török the NEON and ARMv6 DSP version are now
tested and fixed.
The good news is NEON:
an i.MX515 at 800MHz (arm7l) with NEON
-------- orig ------
a: 0x0CB4B676, 10000 * 160000 bytes t: 4010 ms
a: 0x25BEB273, 10000 * 159999 bytes t: 2990 ms
a: 0x733CB174, 10000 * 159998 bytes t: 4060 ms
a: 0x1144AF76, 10000 * 159996 bytes t: 4050 ms
a: 0x3F4ECB8A, 10000 * 159992 bytes t: 4060 ms
a: 0x1902A382, 10000 * 159984 bytes t: 4060 ms
-------- vec ------
a: 0x0CB4B676, 10000 * 160000 bytes t: 1450 ms
a: 0x25BEB273, 10000 * 159999 bytes t: 1450 ms
a: 0x733CB174, 10000 * 159998 bytes t: 1460 ms
a: 0x1144AF76, 10000 * 159996 bytes t: 1450 ms
a: 0x3F4ECB8A, 10000 * 159992 bytes t: 1460 ms
a: 0x1902A382, 10000 * 159984 bytes t: 1450 ms
speedup: 2.765517
The bad news is ARMv6 DSP:
an i.MX515 at 800MHz (arm7l) with ARMv6 DSP
-------- orig ------
a: 0x0CB4B676, 10000 * 160000 bytes t: 4040 ms
a: 0x25BEB273, 10000 * 159999 bytes t: 3880 ms
a: 0x733CB174, 10000 * 159998 bytes t: 4070 ms
a: 0x1144AF76, 10000 * 159996 bytes t: 4060 ms
a: 0x3F4ECB8A, 10000 * 159992 bytes t: 4050 ms
a: 0x1902A382, 10000 * 159984 bytes t: 4060 ms
-------- vec2 ------
a: 0x0CB4B676, 10000 * 160000 bytes t: 4240 ms
a: 0x25BEB273, 10000 * 159999 bytes t: 4250 ms
a: 0x733CB174, 10000 * 159998 bytes t: 3300 ms
a: 0x1144AF76, 10000 * 159996 bytes t: 4240 ms
a: 0x3F4ECB8A, 10000 * 159992 bytes t: 4240 ms
a: 0x1902A382, 10000 * 159984 bytes t: 4250 ms
speedup: 0.952830
Yes, it's slower. I tried some other tricks like more unrolling and so
on, but to no avail.
The code is now disabled and only kept for reference, maybe i have
some idea, but i guess not.
(some time numbers have funny drops, i saw that, i chalk that up to a
fluke in Kernel timekeeping, they are once off and wrong, running
again and again always hits some other time)
The attached patch is the relative difference.
Pushed to git.
So only untested now are iWMMXt and mips DSP ASE.
And i have the gut feeling mips DSP ASE will also be slower...
Greetings
Jan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 03.1-arm.patch
Type: text/x-patch
Size: 11794 bytes
Desc: not available
URL: <http://madler.net/pipermail/zlib-devel_madler.net/attachments/20110501/0f991802/attachment.bin>
More information about the Zlib-devel
mailing list