convention is very efficient and simple to implement on these processors. Microsoft has used that stdcall calling convention everywhere in the Win32 API: the caller pushes the arguments, call a function from a system DLL, and that's all. The function has removed its own arguments.
Correct me if I'm wrong but we do have such instruction as well (removed in 68060) and it's called RTD (ReTurn and Deallocate). This is the primary reason how this question occurred to me, trap handler can easily check function number (as does right now), execute subroutine and return with RTD #size, where size would be read from some lookup table for each function number.
NB: On Atari, if your stack is big enough, you don't have to clean it up after each system call. You can make a big cleanup at the end... or never.
Interesting idea :)