[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MiNT] Re: GCC



> > void myfunc(void)
> 
> You doesn't like ANSI-C?

I prefer to make things explicit if there's no good reason not to.
You'll find me adding unnecessary casts and parenthesis too.

> Here are the generated assembler from the latest gcc. I used the 

That's actually better than I get with egcs 1.1.1 (the code I posted was
from an old article and might have been created by gcc 2.5.8 or something
like that)!
GCC seems to have caught up with Lattice when it comes to making use
of the m68k instruction set.  :-)

>  * -O3

You know that this will auto-inline functions, I hope?

>  * -fomit-frame-pointer

Always a good one to add.

>  * -fno-defer-pop

Uhm, that will actually slow the code down since you'll get extra adds
to register a7. Might make it somewhat easier to debug by hand, though.

>  * -fexpensive-optimizations
>  * -fstrength-reduce

Both of those are included by default even with -O2 according to all
gcc documentation I've ever seen.
The only difference between -O3 and -O2 is function inlining according
to my docs.

Naturally, for a program such as this, enabling loop unrolling (using
-funroll-loops) will really help the speed too.


Now on to the code:

L5:
        clrl a0@(-2500)		; clrl a0@ with the second compiler
        subql #4,a0
        dbra d2,L5
        clrw d2
        subql #1,d2
        jcc L5

A pity the compiler didn't realize that there's no need for the last
two lines (since d2 is set to a constant number < 65536). Of course,
it doesn't matter when it comes to performance.
It's nice to see that the latest compiler removes that silly offset,
but neither one realizes that it can use pre-decrement (or post-increment,
which would be the most obvious choice).

L14:
        addl d3,a1@(d0:l)	; 3 instructions with the second compiler
        addql #4,d0		;  but then this isn't needed instead
        addql #1,d1
        moveq #24,d5
        cmpl d1,d5
        jge L14

So at least one version of GCC realizes that only two instructions are
needed to do the actual work (I'm not sure, though, if the sequence above
would actually be faster on all/any of the processors).
The '24' is still loaded in every iteration of the loop for some reason.
I don't know why the compilers don't realize that neither d1 nor d5 are
used in the loop. The strength reduction should tell it that it can
remove them completely or use for example a dbra.

L10:
        clrl d1			; The first compiler only needed
        movel d4,a0		;  two instructions here...
        addl d3,a0
        .even
...
        lea a1@(625),a1		; ...and five here.
        moveq #100,d5
        addl d5,d3
        addql #1,d2
        moveq #24,d5
        cmpl d2,d5
        jge L10

The first compiler recognizes that the '24' is reused here (which is a bit
strange since it didn't realize it in the loop itself, where it's much
more important). The second one, however, has decided to use register
d5 as some kind of 'general constant register', putting the number '100'
in there and then of course also needs to reload the '25'. Very silly
since there are still unused registers available if you really must put
the constant in one before comparing with it.
The same strength reduction argument as above applies here too.


So, the first compiler is smarter about some things and the second about
other. What we need is some kind of combination...

-- 
  Chalmers University   | Why are these |  e-mail:   rand@cd.chalmers.se
     of Technology      |  .signatures  |            johan@rand.thn.htu.se
                        | so hard to do |  WWW/ftp:  rand.thn.htu.se
   Gothenburg, Sweden   |     well?     |            (MGIFv5, QLem, BAD MOOD)