[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [MiNT] C bit
> > > Compiler output (-m68030 -O3):
> > >
> > > _calc_load_average:
> > > ...
> > > move.l _uptime,a0
> > > lea $0001(a0),a1
> > > move.l a1,_uptime
> > > add.w #$00c8,_uptimetick
> > > move.l a0,d1 /* notice: this is the UNUPDATED value of _uptime!!!!
> > > */
> > > addq.l #$01,d1 /* how sucky: now they update it again, while this
> > > value is already in a1!!!*/
> > > move.l d1,d2
> > > mulu.l #$cccccccd,d0-d2
> > > lsr.l #$02,d0
> > > move.l d0,a1
> > > lea (a1,d0.l*4),a1
> > > move.l a1,d0
> > > cmp.l d1,d0
> > > bne.s L459
> > > ...
> > >
> > > Comments?
> > >
> > Yeah:
> ...
> > NB!! To make things worse there is redundancy. Notice how they first
> > add 1 to uptime and then do the same thing all over again.
>
> The optimization of the modulo is probably done by itself. Anyway, this
> inefficiency only costs two cycles.
>
> > It seems this code is quite long and the GNU dudes obviously did
> > their best to get rid of the costly divu.l dn:dn instruction. Very smart
>
> I'd guess that this optimization is done in the hardware independent part
> of GCC and that code may assume a larger difference between MUL and DIV
> timings than the '030 has.
>
Perhaps I am missing something, but IMHO the remainder is never bigger
(neither has bigger size), than the divisor. So to calculate the remainder
of a division by constant 5, the "normal" divu.w can be used, which is 2
cycles faster, than the mulu.l used (including the time for fetching
operands, Johan, not only the time for the opcode itself).
> > though I ask myself if they really achieved something here. This is a
> > typical generic kind of optimisation I expect from a compiler and can hardly
> > be called efficient. Infact the code grows quite large.
>
> If it's efficient or not depends on how long the code actually takes to
> execute, of course. Code size isn't very often a problem nowadays, but it
> would of course be bad if it caused the cache to be badly utilized.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Indeed.
> MULU.L never takes more than 44 cycles on the '030.
44 cycles plus "fetch immediate effective address time", which is 2 cycles
in this case.
Gtx,
--
Konrad M.Kokoszkiewicz
|mail: draco@atari.org | Atari Falcon030 user |
|http://www.obta.uw.edu.pl/~draco/ | Moderator gregis LATINE |
|http://draco.atari.org | (loquentium) |
** Ea natura multitudinis est,
** aut servit humiliter, aut superbe dominatur (Liv. XXIV,25)
*************************************************************
** U pospolstwa normalne jest, ze albo sluzy ono unizenie,
** albo bezczelnie sie panoszy.