[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MiNT] C bit



> > > Compiler output (-m68030 -O3):
> > > 
> > > _calc_load_average:
> > > 	...
> > > 	move.l	_uptime,a0 
> > > 	lea	$0001(a0),a1
> > > 	move.l	a1,_uptime
> > > 	add.w	#$00c8,_uptimetick
> > > 	move.l	a0,d1 /* notice: this is the UNUPDATED value of _uptime!!!!
> > > */
> > > 	addq.l	#$01,d1 /* how sucky: now they update it again, while this
> > > value is already in a1!!!*/  
> > > 	move.l	d1,d2
> > > 	mulu.l	#$cccccccd,d0-d2
> > > 	lsr.l	#$02,d0
> > > 	move.l	d0,a1
> > > 	lea	(a1,d0.l*4),a1
> > > 	move.l	a1,d0
> > > 	cmp.l	d1,d0
> > > 	bne.s	L459
> > > 	...
> > > 
> > > Comments?
> > > 
> > 	Yeah:
> ...
> > 	NB!! To make things worse there is redundancy. Notice how they first
> > add 1 to uptime and then do the same thing all over again.
> 
> The optimization of the modulo is probably done by itself. Anyway, this
> inefficiency only costs two cycles.
> 
> > 	It seems this code is quite long and the GNU dudes obviously did
> > their best to get rid of the costly divu.l dn:dn instruction. Very smart
> 
> I'd guess that this optimization is done in the hardware independent part
> of GCC and that code may assume a larger difference between MUL and DIV
> timings than the '030 has.
>

Perhaps I am missing something, but IMHO the remainder is never bigger
(neither has bigger size), than the divisor. So to calculate the remainder
of a division by constant 5, the "normal" divu.w can be used, which is 2
cycles faster, than the mulu.l used (including the time for fetching
operands, Johan, not only the time for the opcode itself). 
 
> > though I ask myself if they really achieved something here. This is a
> > typical generic kind of optimisation I expect from a compiler and can hardly
> > be called efficient. Infact the code grows quite large.
> 
> If it's efficient or not depends on how long the code actually takes to
> execute, of course. Code size isn't very often a problem nowadays, but it
> would of course be bad if it caused the cache to be badly utilized.
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Indeed.

> MULU.L never takes more than 44 cycles on the '030.

44 cycles plus "fetch immediate effective address time", which is 2 cycles
in this case.
 
Gtx,

--
Konrad M.Kokoszkiewicz
|mail: draco@atari.org                  |  Atari Falcon030 user   |
|http://www.obta.uw.edu.pl/~draco/      | Moderator gregis LATINE |
|http://draco.atari.org                 |       (loquentium)      |

** Ea natura multitudinis est,
** aut servit humiliter, aut superbe dominatur (Liv. XXIV,25)
*************************************************************
** U pospolstwa normalne jest, ze albo sluzy ono unizenie,
** albo bezczelnie sie panoszy.