Frank Naumann napisał(a):
Hello!1) a buggy kernel module can crash the kernel in the same way as any other buggy kernel routine anywayIf module will corrupt kernel memory - yes. But does whole kernel need to be killed when module makes wrong kfree()?Yes. The problem is you don't know the reason. You even answered this yourself:" ... If module will corrupt kernel memory - yes. ... "So, how about this: a kernel module corrupt memory, for example kmalloc internal data. On the next call of kfree() the kmalloc code see that the data structures are destroyed - what the can do now except halting the kernel?
Nothing. But if user have AES and vconsoles (lets imagine we'll have vconsoles in the future) and AES will do wrong kfree() do we have to halt everything?
Additionally if a kernel module have such a bad failure you will see this during development. Such bugs don't live very long. So the chance to encounter such a bug from which you can recover is very, very small.
I'm sill young (25 actually) but I've never seen an aplication that had all bugs spotted on developemnt/testing stage.
2) tracking resource usage will be a never ending story at this levelyou will for sure oversee tons of dependency thus introducing(!) lot ofnew bugs instead removing bugsI thought about it only becouse we would need to do cleanup when we kill module - the same thing we do when we kill a process.I don't think it make much sense to try to recover from such failures. First the dependencies will guide into a hell. Second, the chance to encounter a bug from which you can recover is very small as explained above. It's much more realistic that you have a real bad bug from which you can't recover.
What dependencies? Module dependencise? I do nto think we'll have much module dependecies.
So why add lot of new code (with lot of new bugs) to try to recover from a situation that most likely don't happen. And even if it happen the new bugs of the new code will stop you :-)3) such fatal bugs you are able to detect are fixed in a short time (bugs you can detect are maybe wrong API usage or such simple things)Better safe the sorry.That's why there are lot of assert's in the kernel api routines to verify the right usage. If a kernel programmer do something wrong he will see it on the next test.
Assert is one way path. No way to recover. That is proper if kenrel really can't continue, but if there is a wey to recover - we shouldn try.
Personally I trust no code. I just have hope ;). Speaking seriously: the more code executes in kernel space, the more bugs we have in kernel space becouse there is no program without bugs.And you want to add lot of new code for a very trickily and sensitive recovering mechanism that isn't helpful in most cases.
Better safe then sorry ;) -- Semper Fidelis Adam Klobukowski atari@gabo.pl