[Zlib-devel] [PATCH] deflate.c: identify slide_Pos() for later optimization

Jan Seiffert kaffeemonster at googlemail.com
Tue Jul 24 05:26:20 EDT 2012


John Reiser schrieb:
> Modern "multimedia" vectorized hardware instructions can speed deflate().
> For higher-end x86* CPUs the speedup might be 2% to 3% of total CPU time.
> On a slower CPU, or with a compiler plus instruction decoder that suffer
> longer latency after a branch (such as gcc for some PowerPC chips)
> then the improvement might be 5% to 8%.
> 
> The attached patch introduces a new subroutine slide_Pos() in deflate.c
> which identifies the operation that is subject to optimization.
> The opportunity arises when sliding the window.  The vectors head[]
> and prev[] of substring indices are adjusted using saturating subtraction.
> A very good compiler should be able to recognize and vectorize the operation
> from the patched source.  If not, then any compiler which can inline a local
> subroutine should give code which is no worse than the unmodified version.
> A compiler which does not inline slide_Pos might introduce a penalty
> approximately equal to the cost of two internal subroutine calls.
> 
> If there is interest, then I will follow with assembly-language versions
> of slide_Pos for i686/x86_64 (with runtime selection among several variants
> according to actual hardware capabilities), PowerPC altivec (compile-time
> selection) and ARM neon (compile-time selection.)
> 


https://github.com/kaffeemonster/zlib/commit/aa30d206d93755c1f5c287b78dd7d165fb0998a8
https://github.com/kaffeemonster/zlib/commit/b42095f4f19fba56e8dc22bfc95b3a73eb7988a8
https://github.com/kaffeemonster/zlib/commit/4745b24f63ec6d32b53bbdf2f9a851169e748c0f
https://github.com/kaffeemonster/zlib/commit/f4336d6c7e25c3168018e77b6c120c184c75c6b9
https://github.com/kaffeemonster/zlib/commit/43131d40c8a456b063e86b9ad590e5cc08830fb6
https://github.com/kaffeemonster/zlib/commit/f228e868bc70342d23acabbf6e85a0dac9382beb
https://github.com/kaffeemonster/zlib/commit/ab6b44cffd59b0ad69b1ae6343d69a2d53e500c8
https://github.com/kaffeemonster/zlib/commit/536f865979818781ef5afc0c2a305747d8c9b2bd
https://github.com/kaffeemonster/zlib/commit/c8366d438ece79dfb98874def0eaa524a619c6a7

And no, most compiler do not get it that this is a subtraction with saturation.



Greetings
	Jan

-- 
A UDP packet walks into a




More information about the Zlib-devel mailing list