With help of LWN site, I’ve discovered in-progress effort of writing a serious book about parallel programming, in form of a real open-source project where people can contribute.

Book is titled Is Parallel Programming Hard, And, If So, What Can You Do About It?, edited by Paul E. McKenney. It is still far from being complete, but even in current state it definitely is a recommended reading.

Many things are explained using Linux kernel code and environment. So it is also a good kernel resource.

As for project – things are managed in git repository, with commits from several people. Book sources are in TeX.

Dec 262010
 

If you ever develop a filesystem, and in readpage() method do whatever work with the page contents on CPU (e.g. memcpy() data from whereever, or do decompression, or whatever else), do not forget to flush data cache! There is flush_dcache_page() routine exactly for that. If you miss the call, you will have long frustrating debugging session with userspace crashes in hard-reproducible way, at points that look perfectly correct… because that correct code is still in data cache, and CPU executes garbage instead.

If you ever have strange and difficult debugging case, never follow guesses too far. Your “hey, something is likely wrong in this freshly-coded routine!..” may be totally wrong, so reading it for 100th time won’t help. Actual problem may be completely elsewhere, and the only way to reliably locate it (of course if it is reproducible) is step-by-step information extraction on what is going on. Even if it is as difficult as analysing disassembly code of foreign architecture near the crash point.

Pay attention to the details. Init crashes? At what location? Add a printk() in the crash handler to find out crash address. Use objdump on the binary (this time – /lib/ld-linux.so.3) to check instructions near that address. What, it crashes at a memory write while survives a memory write to the same page 2 instructions above? How could that happen? Maybe interrupt in between and page table breakage inside handler? But then – why it is that reproducible? Interrupt always within 2-instruction range, and even inserted printk()s don’t change that? Impossible… But then… what “crashing” actually means? Where is kernel entered? At DATA ABORT? Grr… no longer reproducible after printk() in DATA ABORT handler… Save a couple of values instead of printk() and later show those at crash point?.. Aha, now reproducible again – but last DATA ABORT was at different address! So it is not DATA ABORT… then what? Seach for other exception paths, try printk()s there… What? Illegal instruction? Why? Dump memory locations near pc at crash point… everything binary-equal to what is in executable file? What the hell is going on here? Maybe… caching?.. caching? caching!

 

Linux kernel build system (aka kbuild) is perfect.

Kbuild is absolutely reliable – for many years of everyday work, I did not face a single case of broken build.

Kbuild always rebuilds only what is really needed – even in complex cases when working directory was switched to a different point of git tree, and/or some configuration keys have been changed.

Kbuild requires very small Makefiles. In typical case, it may be just a single line! Or a few lines describing what is built here and from what sources. But if needed, kbuild is flexible enough to tune build options on per-target granularity.

The only wrong thing with kbuild is that it is for kernel only…

… or is it? Kbuild has support for building user-space programs. That is used to build tools used by kernel build process. Including complex things – Qt-based configuration utility for example.

By a simple hack, this support may be activated for a typical use case from embedded linux kernel development – when there is an out-of-tree kernel module and several userspace support programs.

Here is how to do that:

01 obj-m := module1.o module2.o
02
03 hostprogs-m := test
04 HOSTLOADLIBES_test = -lpthread
05
06 KDIR ?= /path/to/kernel/tree/
07
08 modules:
09        make -C $(KDIR) KBUILD_EXTMOD=`pwd` modules
10
11 clean:
12        make -C $(KDIR) KBUILD_EXTMOD=`pwd` clean
13
14 __build: $(hostprogs-m)

Line 1 is a common one-line-style kbuild Makefile. Lines 6-12 are standard addition to use kbuild out of kernel tree without typing much.

Magic is at line 14. It is a hack – it uses kbuild’s internals, namely __build default goal. It adds dependency on $(hostprogs-m) to __build, thus activating kbuild’s support for building userspace targets. Lines 3-4 utilize that support to build ‘test’ executable from test.c source, and link it against libpthreads. Syntax is similar to everything in kbuild, and is documented at top of scripts/Makefile.host file in the kernel source tree.

Exactly same trick may be used for pure userspace build. Although if build requires more that simple compilation of C/C++ files into executables or shared libraries, things will become more complex since additional hooks into kbuild may be needed.

Nov 192009
 

yield() sucks anyway, so it depends which flavour of suckage you prefer.

From here.

 

I’ve spent a couple of hours today searching for information how to make area allocated in kernel by vmalloc() accessible to user-space process.

For area allocated with kmalloc(), there is remap_pfn_range() that could be called from driver’s mmap() method. But for vmalloc()-allocated area, it very looked like it is necessary either to walk pde’s/pme’s/pte’s by hand, or to handle per-page faults with nopage() vma‘s method …

… fortunately not that bad. There is remap_vmalloc_range() in recent 2.6 kernels, and also there are vmalloc_user() and vmalloc_32_user() helper functions to prepare memory to use with remap_vmalloc_range().

Just it was somewhat tricky to find it without prior knowledge :) .

 

In the past years we have used a setup with almost everything (network services, user desktop sessions, chroot environments for different projects, etc) running on a single server. It looked somewhat cool that our Linux may handle all that.

Of course there have been all sorts of problems. Some have been easy fixable, some not. But the worse problem has always been if hardware started to misfunction. Our “server of everything” have been running on different hardware, mostly PC-style. Some hardware have been working more stable, some less – but several times during those years we have been facing situations when server crashed every several hours and we had hard times to find what is wrong.

Things changed a couple of years ago. At last our lab found a way to obtain modern server hardware. On those servers, we deployed a set of Xen-based virtual machines and distributed services among those. There have been zero hardware problems and just a few software issues over two years. No crashes. No hangs. It looked wonderful compared to what was before.

… until we decided to move the servers to a server room located in the other part of our building, to reduce noice in the lab.

After moving, reconnection and initial configuration, things first looked working. But in the night one server suddenly rebooted without any visible reason.

Next workday, other server rebooted while running a handful of user sessions. For people working on thin clients in the lab, it looked like sudden desktop hang. Because of that, first idea was that network infrastructure in the building failed (before we had all equipment nearby, now we used not-our network to connect to the servers). So we went to the person who controles it … and he pinged other our server successfully.

Next we thought that server room has power supply issues. Silly idea… everything works, expect our servers. But we decided to connect the failed server to our own UPS. It started, worked for some minutes, and rebooted again.

After two years with things working, we just could not get idea that our so-well-working setup decided to break.

And actually it was our setup that had problems. After some investigation, we found a way to reproduce the crash reliably. Problem was in the kernel, I reported it as Bug #542250. After looking into the code, I even found how to fix it, tested it on the failing server, and it stopped crashing. I’ve sent the patch to the bug report.

Bug is xen-specific, and may show only on systems with several CPUs. But it is not dom0-specific – domU’s may be also affected if they run -xen flavour of the kernel, and have more than one (virtual) CPU.

We did not face this bug before the move because dom0 on the servers worked without reboot for months, so some relatively old kernel was running (this is not very good security-wise, but since dom0′s have network interfaces in restricted network only, running older kernel is not that dangerous). Perhaps kernel that was actually running did not contain the bug.

I hope the fix this will be included in some lenny update – if not, I will have to rebuild kernel locally, both for i386 and amd64, after every kernel update.

 

Our today’s hero is company named Port, with their can4linux drivers.

These people demonstrate wonderful knowledge on how to do mutual-exclusion in kernel. Here is an example:

    if(0 == atomic_read(&Can_isopen[minor])) {
        /* first time called, initialize hardware and global data */
        ...
    }
    ... /* many more code, without any locks held */
    atomic_inc(&Can_isopen[minor]);

Looks like they are sure that using atomic here will help them to catch first call reliably.

One more example:

    spin_lock(&waitflag_lock);
    for(i = 0; i < CAN_MAX_OPEN; i++) {
        if(CanWaitFlag[minor][i] == 0) break;
    }
    spin_unlock(&waitflag_lock);

And now they think that i is a reliable index of zero array element.

Both examples taken from their open() routine.

There are numerous other issues in the code - races, improper use of kernel infrastructure, etc. A very good example of out-of-community, no-review development.

Anyway, they are still in business, and do offer all that to their customers. Let's wish them good luck :) .

© 2011 yoush.homelinux.org Suffusion theme by Sayontan Sinha