[Zlib-devel] [7/6][RFC V2.1 Patch] Blackfin implementation

Fri Apr 8 17:32:54 EDT 2011

2011/4/8 Mike Frysinger <vapier at gentoo.org>:
> On Friday, April 08, 2011 15:21:44 Jan Seiffert wrote:
>> 2011/4/8 Mike Frysinger <vapier at gentoo.org>:
[snip]
>> > the core often times is running at 5x or 6x the speed of system devices,
>> >so any external
>> >
>> > memory i/o is probably going to dominate the stalls.
>>
>> Yes, prop.
>> Unfortunately i don't think the CPU can reorder  instructions (esp.
>> loads) so one can fetch the next 4 Byte while doing calc. on the last
>> 4 Byte.
>> Or am i wrong?
>
> there is no insn reordering in the Blackfin architecture as doing so explodes
> silicon size ... which is bad for embedded.  it does have an interlocked
> pipeline so that loads/stores are "backgrounded", and the stall doesnt occur
> until the result is actually used (or the result is available in which case
> there is no stall).
>
>        /* A 32bit fetch is put onto the bus and R0 is marked */
>        R0 = [P0];
>        ... do some stuff without R0 in the pipeline ...
>        /* If the load has not yet completed, we stall here */
>        R0 += 1;
>
> the Blackfin PRM explains this bit of magic starting at "Load/Store Operation"
> on page 6-68.

Nice, but, hmmm.
Tried that by moving the loads 4 instructions before use, helped, but
not very much (1.7 -> 1.9). For that i had to use another loop, like i
envisioned this afternoon, but after adding all the n--  needed its
slower then the loop in the patch (1.7 -> 1.5).
Argh, i can only parallel with an Ireg -= Mreg, but i can't mac by an Ireg.

Two MACs, nice, but if you can't feed them...
I must be missing something.

> -mike
>

Greetings
Jan

-- 
Murphy's Law of Combat
Rule #3: "Never forget that your weapon was manufactured by the
lowest bidder"