[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [MiNT] State of the Union
Evan K. Langlois wrote:
Frank Naaumann wrote:
Ah, I see. Personally I wanted to go into the kqueue() direction as
introduced and used by the BSD systems. This solution looks very easy to
use and is extremly flexible and extensible. It's basically a
filedescriptor and you add a list of events (determined by id and a
filter) you want to wait for.
epoll() is about the same, only for Linux.
epoll() only works for filedescriptors and is not extensible. kqueue()
works for a variety of other things, such as signals and filesystem
events, and has a well-defined extension mechanism.
Howard Chu said:
Requiring use of Fnctl() or any other system call to alter the set of
events imposes too much overhead. On servers where the set of
descriptors of interest changes frequently, this kind of approach is a
serious loser.
Well, most of the studies I've seen and all the benchmarks and test
applications all show that kqueue() and epoll() perform significantly
better than select() and poll() when you have large numbers of file
descriptors, which is the exact opposite of what you claim.
I've seen those studies too. They had me convinced enough to go
implement epoll() for OpenLDAP. Too bad their results were not
applicable. They only model the behavior of a web server, like Apache.
They are not general enough to represent all the types of client/server
interaction one encounters on the internet.
Perhaps you
should find a method where you aren't constantly changing the set of
descriptors!
Don't be stupid. It is possible to design an event mechanism that
efficiently handles all classes of application load, as I have
described. Not doing so, and forcing all applications to work only one
way, is ridiculous.
I believe kqueue() and epoll() both have some quirks, but
perform similarly for more file descriptors than any MiNT program is
likely to see .. ie .. this won't be the bottleneck of the application.
I agree that the class of problems any MiNT program is likely to see is
probably different from the problem I was faced with. In that regard,
I'm willing to let most of this discussion drop.
Also, under Linux the trip across the user/kernel barrier is nothing.
Nonsense. I have the benchmarks that prove this statement to be false.
The problem, as I see it, is that if you notify the OS of what file
handles you want, you are indeed taking a slight performance hit for the
extra OS call, but you are only going to be doing this once per new file
handle. The most common case where this is critical is in web servers,
so this is done when a new socket is opened, and another call is done
when its closed. The OS knows exactly what files its watching the rest
of the time. This information stays static between calls.
Again, it's not up to you to dictate the connection behavior of a system
/ protocol.
By comparison, something like select() can have a different set of file
descriptors for each and every call to select(). This means lots of
internal housekeeping by the kernel for every select() call. If you
have 30 applications that call select() you aren't going to check 30
bitmaps to see if that application is waiting on that event when it
happens. When the IO comes in, you'll need to know exactly which
application(s) need to be woken up. I don't see parsing a bitmap of
8000 bits x 30 applications to set this up, each and every call to
select().
I'm not saying that select is perfect, otherwise I wouldn't have
bothered to write my equeue() description.
Now, an mmap() of a shared region sounds nice, but .. under MiNT() ? I
don't see it being a good solution for the Atari, and if you look at
Fselect(), you are passing pointers anyway. GEMDOS can write the data
directly to the structure. Only a pointer is passed. The same with
epoll(). There isn't a big copy problem here. You can't just change
the shared memory region and hope that the kernel will reflect the
changes. It doesn't know you changed something, and its NOT going to
check your table every time an IO event happens.
You're not paying attention. Here's how it works:
you provide a set of descriptors of interest to the kernel in a table
of event structures.
the kernel creates pointers from its internal file structures that
point back to copies of your event structures.
when an event happens on the descriptor, the kernel checks to see if
the event pointer is set, and if so, it examines the event structure and
performs whatever event wakeup is required.
So far, that's exactly what kqueue/epoll do. Now the added feature in
equeue:
the event structure is directly accessible from user and kernel
context, it is not copied by the kernel. The user can set a MASK bit in
the event structure at any time. If an event happens on a descriptor,
and the kernel checks the event pointer, and sees the MASK bit set, it
ignores the event. Like I said, this is similar to the sigprocmask concept.
Thus, you can change things in the shared memory region and they will
take effect directly without requiring a system call.
>From Howard Chu's link he posted:
Another point where select (and poll) wins is that there is a fast
mapping from the input set to the result set - i.e., if you want to
know "did event #5 occur?" you can find out in constant time, because
it's just a fixed bitfield lookup. For all the other mechanisms that
either return events one at a time or in a flat list, you have to
iterate thru the list to look for "event #5". They have thus taken the
linear search that the kernel does for select and kicked it out into
userland, thus perceiving a great savings in kernel CPU time but not
really improving life for the application developer. There is an
obvious way to solve both of these problems with no additional cost -
do both.
Select and poll() never win, unless you are writing your program in
reverse. Maybe you are missing out in how an event driven program
works.
Once again, you're assuming that there's only one way to do things, and
that your way is the right way for all purposes. Anybody who thinks that
is pretty much always wrong.
I don't want to check if something happened on file descriptor
#5. I NEVER have to iterate through the returned structures looking for
descriptor #5. This is where your logic breaks and why the existing
stuff doesn't work for you. Userland isn't doing any linear searches!
Maybe you don't use prioritized events, but other applications do.
If you want to do priorities instead of FIFO, then you can easily read
the events, grab a priority number from your passed structure, and drop
the pointer into the proper priority queue. You can then service your
priority queue directly, or have a separate thread do it (with the right
locking).
Again, that introduces more unnecessary overhead. Your approach requires
a complete roundtrip through the event list to find the priority events
and place them on the queue, then a second roundtrip through the list to
handle the normal events. Such an approach will never scale. This is
exactly what I meant about pushing the linear search overhead into userland.
Having a table of indirect indexes just creates more record
keeping. I'm not exactly sure how this indirection idea allows you to
do priority ... do you sort the returned list or something so you can
handle the lower indexes first?
No. I have an array of event structures, each element in the array
corresponds to a descriptor of interest. My app happens to know that
descriptor 5 is "special" and it happens to know that descriptor #5 is
slot #3 in the array of structures. When the event call returns, I can
immediately check slot #3 and test the flags to see whether an event was
triggered. Once that's handled, I go through the list of offsets
handling each remaining event in turn.
>From Howard Chu's link he posted:
This approach completely eliminates any copying of event
records/descriptor lists between user/kernel space, so constructing
event lists and modifying them is essentially zero-cost. Also, since the
array of events remains in place, the application can inspect higher
priority events of interest by direct reference.
When you say that constructing event lists is zero cost, do you actually
mean to say that the kernel works with this memory mapped data
directly?
Yes.
How does the kernel know when you change something if you
don't make an OS call to tell it? Do you expect the kernel to scan
this shared memory region constantly?
No, that would be stupid. You perform one system call when you want to
wait on events. As part of that system call, you can also indicate
whether the event table has changed. This is all explained in my equeue
writeup.
It's OK to be opinionated, but better to have well-founded facts on
which those opinions are based. It doesn't sound to me like you've ever
written an application / server that handles thousands of clients and
hundreds of thousands of operations per second. I have, many times.
I'm interested in what other ideas you have on solving the "unified
event loop" situation. I've already pitched one idea, but not yet heard
specifically where the problems are that need to be redesigned.
For example, should AES events come through a GEMDOS handle (as I
mentioned), or should GEMDOS file handles be a special form of AES event
(the expanded evnt_multi approach), or should they be on equal ground,
both coming from a more flexible interface?
Don't be like Sun and forget about signals. epoll() certainly is an
improvement for some class of applications on Linux, but without a
totally integrated kitchen-sink approach like kqueue/equeue you're going
to go through all this churn and still have an unsolved event management
problem.
--
-- Howard Chu
Chief Architect, Symas Corp. Director, Highland Sun
http://www.symas.com http://highlandsun.com/hyc
Symas: Premier OpenSource Development and Support