[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: pipes & ptys



itschere@TechFak.Uni-Bielefeld.DE writes:
> Hi everywhere,
> 
>   After having played a bit with the VDI version of MGR - and wondering
> about why it is so slow when more than one window is open - I've hacked
> a small experimental version of a window manager. Just one window which
> can't be resized or removed, just to see what speed _can_ be achieved.
> 
>   Since some says it's so much optimized that I doubt I can get graphics
               days? :)
> output any faster ;-) But still it is not very much faster than the MGR.
> 
>   When trying to track down the problem, I stumbled over the pipes through
> which data gets passed. Normally they're resonably fast, say, fast enough.
> But if I open a pipe as a pseudo tty and use 1 Bytes reads/writes - which
> looks normal for shell i/o to me

 hmm user processes doing 1 byte read()/write()s _is_ slow. :(  system
calls are expensive...  (and also the library adds its share too...)

 example, take the difference between curses+mintlibs and curses+gnulib.
i think what makes gnulib-linked curses programs look so sloow is
only gnulibs console write() turns everything into 1-byte writes.
curses+mintlib is already faster with MiNT than without... (at least
for me on /dev/fasttext or /dev/vt* :-)

>  - I can't even get 1K/sec through it, and
> all this on a TT. Is there really so much overhead in the tty routines in
> tty.c? Mightn't this perhaps be worth improoving, since not just MGR or
> my tricky window manager suffer from this bottleneck, but everything else
> which works with pseudo ttys also...
> 
>   Anybody likes to start a discussion about that?

 not sure how much can be done about 1-byte IO (i fear the library
and system call overhead are the biggest part already...)  i think the
first thing to do would be change gnulib and all your other programs
that still do it to avoid 1-byte IO where possible. (might need a few
if (__mint) and Fcntls on ttys but the result should be worth it...)
and then improve tty.c&friends so that long read/writes can get _really_
efficient.  currently the problem is they do a lot of device-level
1-char IO and shuffling bytes into 32-bit `chars' and back... i.e. when a
pty slave writes 1k that gets expanded into 4k, and when the master then
does a 1k read its collapsed again.  and whats worse pipefs' read then
gets called a 1000 times!  instead of once.  that is what makes ptys
slow... (and serial ports too, but i said that before. :)

 now how fix this (i mean _really_ fix this. so that a 1k read ends up
as one 1k device read whenever possible, without additional moving data
around) and stay compatible with existing devices?  here is an idea...

1. add 2 optional functions to DEVDRV struct, for now i call them bread
and bwrite. (NULL means they are not there)  they work like device read
and write, only with bytes instead of longs.  (btw there are 3 longs
reserved in DEVDRV now and i can think of atleast 2 more functions to
add later, readv and writev...  so extend the struct somehow?)

2. if bwrite is there use it instead of write in tty_write (also in
bflush, midiws...), if the write is RAW just check for job control and
return (*f->dev->bwrite)(f, buf, nbytes);
 and if bread is there do the same atleast for RAW reads in tty_read.
(this includes reading pty masters...)

3. add bread and bwrite functions to the pty device.  the slaves output
pipe can then be changed to use bytes directly. (i think.  the other
direction of course not...)

4. add support for CLOCAL, HUPCL, VMIN etc and make _real_ modem devices...
with bread/bwrite they finally could get decent thruput without 99% CPU load.

 this is just an idea i got and i have no idea if and when i could do
all this... :)  but what do you think?

 cheers
	Juergen
-- 
J"urgen Lock / nox@jelal.north.de / UUCP: ..!uunet!unido!uniol!jelal!nox
								...ohne Gewehr
PGP public key fingerprint =  8A 18 58 54 03 7B FC 12  1F 8B 63 C7 19 27 CF DA