[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MiNT] State of the Union



Hi guys,

Forgive me for responding to two people at once, but I think it reads better, even though its really long and I apologize for that as well.  I'm going to end up being the black sheep again, I can tell.  I'm very opinionated in some ways and please forgive me if I rub anyone the wrong way.  However, I do tend to at least attempt to explain every possible reason for my thinking, in detail.  Maybe I can either get someone to think in another direction, or you can do the same for me.   And even if you skip the rant in the middle, please read the end.

Frank Naaumann wrote:
Ah, I see. Personally I wanted to go into the kqueue() direction as 
introduced and used by the BSD systems. This solution looks very easy to 
use and is extremly flexible and extensible. It's basically a 
filedescriptor and you add a list of events (determined by id and a 
filter) you want to wait for.

epoll() is about the same, only for Linux.

Howard Chu said:
Requiring use of Fnctl() or any other system call to alter the set of
events imposes too much overhead. On servers where the set of
descriptors of interest changes frequently, this kind of approach is a
serious loser.

Well, most of the studies I've seen and all the benchmarks and test applications all show that kqueue() and epoll() perform significantly better than select() and poll() when you have large numbers of file descriptors, which is the exact opposite of what you claim.  Perhaps you should find a method where you aren't constantly changing the set of descriptors!  I believe kqueue() and epoll() both have some quirks, but perform similarly for more file descriptors than any MiNT program is likely to see .. ie .. this won't be the bottleneck of the application.  Also, under Linux the trip across the user/kernel barrier is nothing.  On Solaris and MiNT and many other OSs an extra OS call is a slight issue, but perhaps not as much as the alternative!

The problem, as I see it, is that if you notify the OS of what file handles you want, you are indeed taking a slight performance hit for the extra OS call, but you are only going to be doing this once per new file handle.  The most common case where this is critical is in web servers, so this is done when a new socket is opened, and another call is done when its closed.  The OS knows exactly what files its watching the rest of the time.  This information stays static between calls.

By comparison, something like select() can have a different set of file descriptors for each and every call to select().  This means lots of internal housekeeping by the kernel for every select() call.   If you have 30 applications that call select() you aren't going to check 30 bitmaps to see if that application is waiting on that event when it happens.  When the IO comes in, you'll need to know exactly which application(s) need to be woken up.  I don't see parsing a bitmap of 8000 bits x 30 applications to set this up, each and every call to select(). 

With kqueue() and epoll(), this relationship is done once, and I think that is where you get the scalability.   Making this work well in userspace isn't difficult at all because the interface supports a few user-supplied data fields that allow you to access the data structures you need immediately, and without having to check a bitmask, at least in the case of epoll, as I don't use kqueue much (I use event libraries that abstract away the difference and allow portability to either system).  Now, for a system with a large number of open descriptors, just parsing that bitmask will be timely, and finally, you need to find the actual data that goes with this descriptor, and keeping a massive table is obviously insane if you want to scale to any GEMDOS descriptor number that might be returned, as you can't guarantee that file handle numbers get reused as connections are created and deleted.  You could have 1201 file descriptors open, and the first 1200 get closed.  Your table would have to be at least 1201 entries even though you are now using only one.  Will you do a search on the file handle to find the information you need?    Sure you can do a binary tree, or whatever, but epoll returns the file handle, and a user-set pointer right to the user.   The record keeping isn't "passed on to userspace" as you claim.  The kernel does all the work for you!

Now, an mmap() of a shared region sounds nice, but .. under MiNT() ?  I don't see it being a good solution for the Atari, and if you look at Fselect(), you are passing pointers anyway.  GEMDOS can write the data directly to the structure.  Only a pointer is passed.  The same with epoll().  There isn't a big copy problem here.  You can't just change the shared memory region and hope that the kernel will reflect the changes.  It doesn't know you changed something, and its NOT going to check your table every time an IO event happens.

>From Howard Chu's link he posted:
Another point where select (and poll) wins is that there is a fast mapping from the input set to the result set - i.e., if you want to know "did event #5 occur?" you can find out in constant time, because it's just a fixed bitfield lookup. For all the other mechanisms that either return events one at a time or in a flat list, you have to iterate thru the list to look for "event #5". They have thus taken the linear search that the kernel does for select and kicked it out into userland, thus perceiving a great savings in kernel CPU time but not really improving life for the application developer. There is an obvious way to solve both of these problems with no additional cost - do both.

Select and poll() never win, unless you are writing your program in reverse.  Maybe you are missing out in how an event driven program works.  I don't want to check if something happened on file descriptor #5.  I NEVER have to iterate through the returned structures looking for descriptor #5.  This is where your logic breaks and why the existing stuff doesn't work for you.  Userland isn't doing any linear searches!

All you want to do is know when you have to deal with an event.  When the events come in, you deal with each one, in turn, as fast as they come in.  The kernel gives you a struct that tells you what file handle needs to be worked with (handy), and a pointer that you gave it.  This pointer should have whatever information you need in a nice struct to deal with the event that just came in - function pointers, data arrays, state information, priority, whatever.   When you are done with that one, grab the next event.

If you want to do priorities instead of FIFO, then you can easily read the events, grab a priority number from your passed structure, and drop the pointer into the proper priority queue.  You can then service your priority queue directly, or have a separate thread do it (with the right locking).  Having a table of indirect indexes just creates more record keeping.   I'm not exactly sure how this indirection idea allows you to do priority ... do you sort the returned list or something so you can handle the lower indexes first?  The sorting would be worse than the pointer copy I outlined above, especially considering that anything that was highest priority would be done immediately without the queuing. 

>From Howard Chu's link he posted:
This approach completely eliminates any copying of event
records/descriptor lists between user/kernel space, so constructing
event lists and modifying them is essentially zero-cost. Also, since the
array of events remains in place, the application can inspect higher
priority events of interest by direct reference.

When you say that constructing event lists is zero cost, do you actually mean to say that the kernel works with this memory mapped data directly?  How does the kernel know when you change something if you don't make an OS call to tell it?   Do you expect the kernel to scan this shared memory region constantly?

Frank Naumann wrote:
> I further suggested that to implement this cleanly, the entire AES could
> be represented as a wrapper library around GEMDOS calls that talk to the
> device driver.  For example, appl_init() might open the device and get
> the file handle, appl_exit() would close it, evnt_multi() would set
> parameters and read from the device file handle and write into the
> supplied return values, etc.

Yes, this can be done. But the questions is if it's necessary. For 
backward compatibility you have to provide the old API anyway and if you 
want to use the new features you need to define a new API.

I'm interested in what other ideas you have on solving the "unified event loop" situation.  I've already pitched one idea, but not yet heard specifically where the problems are that need to be redesigned.

For example, should AES events come through a GEMDOS handle (as I mentioned), or should GEMDOS file handles be a special form of AES event (the expanded evnt_multi approach), or should they be on equal ground, both coming from a more flexible interface?

-- Evan