[Zlib-devel] crc32 big/little endian
John Bowler
jbowler at frontiernet.net
Fri Apr 23 17:40:51 EDT 2010
From: Joakim Tjernlund
>> What's a "help ptr"?
>
>t0 = tab[0];
>t1 = tab[1];
[etc]
>Then use tx in the CRC macros. This is just a guess, but I do know
>such things helped on PPC with gcc ~3.4.6
They shouldn't because tab[0] etc are clearly common sub-expressions. On the contrary on a machine suffering register pressure they may hinder (because the compiler might feel obliged to actually store t0/t1/t2/t3, though I don't see why a good compiler wouldn't just eliminate them if necessary.)
The expressions in question have the form:
tab[c][i]
Where 'c' is a constant 0, 1, 2 or 3 and 'i' is a byte sized index and the tables are 1024 bytes, so this reduces byte arithmetic:
(tab + 1024*c + 4*i)
Leaving this explicit in the code doesn't make any assumptions about how it is evaluated, and it is clear that '1024*c' is a constant throughout.
On the ARM arithmetic can be performed in load instructions, but only limited combinations. There are a vast number of ways of arranging the above apparently simple expression and, really, only the compiler knows which one is best. Because arithmetic can be swapped between real arithmetic (ADD/SUB) instructions and load and because this all has an effect on pipeline stalls it's best left to the compiler.
What I've seen over the last couple of days is that the BYFOUR change does help, the crc32 code takes about 4 times as long as the corresponding memset (that's amazing because memset is highly optimized) and that aligning, even to 32 byte boundaries, has a bigger effect that tweaking loop index variables.
I did also try changing the code to use an end pointer, (buf+len), in place of 'len'. I didn't manage to prove anything (other than that it's easy to introduce bugs even in trivial rewrites.)
John Bowler <jbowler at acm.org>
More information about the Zlib-devel
mailing list