yield() sucks anyway, so it depends which flavour of suckage you prefer.
From here.
Nikita Youshchenko’s web space
yield() sucks anyway, so it depends which flavour of suckage you prefer.
From here.
I’ve spent a couple of hours today searching for information how to make area allocated in kernel by vmalloc() accessible to user-space process.
For area allocated with kmalloc(), there is remap_pfn_range() that could be called from driver’s mmap() method. But for vmalloc()-allocated area, it very looked like it is necessary either to walk pde’s/pme’s/pte’s by hand, or to handle per-page faults with nopage() vma’s method …
… fortunately not that bad. There is remap_vmalloc_range() in recent 2.6 kernels, and also there are vmalloc_user() and vmalloc_32_user() helper functions to prepare memory to use with remap_vmalloc_range().
Just it was somewhat tricky to find it without prior knowledge
.
In the past years we have used a setup with almost everything (network services, user desktop sessions, chroot environments for different projects, etc) running on a single server. It looked somewhat cool that our Linux may handle all that.
Of course there have been all sorts of problems. Some have been easy fixable, some not. But the worse problem has always been if hardware started to misfunction. Our “server of everything” have been running on different hardware, mostly PC-style. Some hardware have been working more stable, some less – but several times during those years we have been facing situations when server crashed every several hours and we had hard times to find what is wrong.
Things changed a couple of years ago. At last our lab found a way to obtain modern server hardware. On those servers, we deployed a set of Xen-based virtual machines and distributed services among those. There have been zero hardware problems and just a few software issues over two years. No crashes. No hangs. It looked wonderful compared to what was before.
… until we decided to move the servers to a server room located in the other part of our building, to reduce noice in the lab.
After moving, reconnection and initial configuration, things first looked working. But in the night one server suddenly rebooted without any visible reason.
Next workday, other server rebooted while running a handful of user sessions. For people working on thin clients in the lab, it looked like sudden desktop hang. Because of that, first idea was that network infrastructure in the building failed (before we had all equipment nearby, now we used not-our network to connect to the servers). So we went to the person who controles it … and he pinged other our server successfully.
Next we thought that server room has power supply issues. Silly idea… everything works, expect our servers. But we decided to connect the failed server to our own UPS. It started, worked for some minutes, and rebooted again.
After two years with things working, we just could not get idea that our so-well-working setup decided to break.
And actually it was our setup that had problems. After some investigation, we found a way to reproduce the crash reliably. Problem was in the kernel, I reported it as Bug #542250. After looking into the code, I even found how to fix it, tested it on the failing server, and it stopped crashing. I’ve sent the patch to the bug report.
Bug is xen-specific, and may show only on systems with several CPUs. But it is not dom0-specific – domU’s may be also affected if they run -xen flavour of the kernel, and have more than one (virtual) CPU.
We did not face this bug before the move because dom0 on the servers worked without reboot for months, so some relatively old kernel was running (this is not very good security-wise, but since dom0’s have network interfaces in restricted network only, running older kernel is not that dangerous). Perhaps kernel that was actually running did not contain the bug.
I hope the fix this will be included in some lenny update – if not, I will have to rebuild kernel locally, both for i386 and amd64, after every kernel update.
Our today’s hero is company named Port, with their can4linux drivers.
These people demonstrate wonderful knowledge on how to do mutual-exclusion in kernel. Here is an example:
if(0 == atomic_read(&Can_isopen[minor])) {
/* first time called, initialize hardware and global data */
...
}
... /* many more code, without any locks held */
atomic_inc(&Can_isopen[minor]);
Looks like they are sure that using atomic here will help them to catch first call reliably.
One more example:
spin_lock(&waitflag_lock);
for(i = 0; i < CAN_MAX_OPEN; i++) {
if(CanWaitFlag[minor][i] == 0) break;
}
spin_unlock(&waitflag_lock);
And now they think that i is a reliable index of zero array element.
Both examples taken from their open() routine.
There are numerous other issues in the code - races, improper use of kernel infrastructure, etc. A very good example of out-of-community, no-review development.
Anyway, they are still in business, and do offer all that to their customers. Let's wish them good luck
.