Patrick's playground: November 2009 Archives

Mon Nov 23 16:57:44 CET 2009

Christmas Wishlist

Oh well. I don't really celebrate Christmas (once in a decade is enough). And it's a bit early to be in the christmas spirit (which most years seems to be a mix of being stressed and being rude to others).
Still there's a few things every geek can use, and maybe there's a magic elf out there who can make a wish come true, eh?
Now if you, by chance, feel the need to help me out there's a few little things I could use. My current buildbox is quite nice, but I'm hitting a ceiling. It's not so much the CPU (quadcore ftw), but the IO that's hurting. And I'm getting a bit cramped with space - 65GB distfiles alone!

So here's the things I would really appreciate:

SSD, SATA-II, 32GB or larger
harddisk, SATA-II, 750G or larger, at least two (RAID-1)
SATA Controller, PCI or PCI-E, "something reliable"
Memory, 4x4GB, for great justice

Having those bits would help a lot - let's see:
an SSD is really nice to compile in. Tremendously helps performance. I currently have "only" 8GB, which limits what I can build (boost with FEATURES="test" needs more!) and how much I can build in parallel.
Harddisks - all the distfiles, binpkgs and other things add up. At current prices a small disk makes little sense, and I'd want some reliability, so I'd want at least two disks (raid1) or more (raid5).
Since I have a "desktop" mainboard there's only four sata ports. This means I can currently add 0 extra disks. Which sucks a bit :) So I'd need a sata controller to really enjoy any goodies.
And lastly even with the current 8GB RAM I'm hitting a saturation point. It's insane, but with 4 or more chroots doing stuff it can get painfully tight. Again, desktop mainboard, so I'd need 4x4GB to have an advantage (or get a new mainboard with all the complications that involves)

All that just to compile faster. Aren't we a decadent bunch of hackers? :)

Posted by Patrick | Permalink

Sat Nov 21 16:58:06 CET 2009

Random things

Now with 3D working quite well on my radeon (HD4650) I've tested a few games. Somewhere in the last 3 days the performance got a good kick, the improvements are noticeable.
Quake3 by default is limited to 1600x1200 and pretty much maxes out at 90fps all the time. It feels really smooth, and the resolution is quite nice too :)
UT2004 runs at 1920x1200, looks awesome, but is a bit too slow. Around 1280x1024 the performance is high enough to play well.
Nexuiz at 1920x1200 has a rather wobbly performance. Sometimes it goes down to about one frame per second, then spikes back to 30+. Around 1280x1024 it too becomes nicely playable.
That's quite awesome for an open driver, and it's about the first time I've heard the graphics card fan kick in like that.

And people were right, you really only need the -9999 versions of mesa + libdrm + xf86-video-ati.

About the portage options and all that, darkside has blogged about some in the past. It's nice to have some numbers to put next to the --jobs magic :)
And KingTaco had found the CPU hotplugging madness quite some time before me. Plus ca change, plus c'est la meme chose.

I've been slacking a bit, now I'm feeling almost guilty for letting bugs rot. It's interesting to see how motivation works. Sometimes I even wonder why I spend any time on such things - as soon as I fix a bug upstream does a new release, a security issue is found or some other form of breakage. It's a bit frustrating to see this endless stream of "work" coming in ...
But I'm quite happy to see more and more involvement from users. It's good to see people trying to help. Most seem to lack confidence, what they lack in skills they make up for in learning at an insane pace. So as long as there is someone to guide them a bit they are totally awesome. Which means that if I can get some more people motivated I can finally resume infinite slacking because I've been made redundant. I think that should be the goal of every package maintainer :)

Posted by Patrick | Permalink

Thu Nov 19 00:43:58 CET 2009

CPU Hotswapping and how to disable processors

Here's something awesome I found mostly by accident:
In recent kernels the support for hotswapping CPUs works on x86/amd64 architectures. I stumbled over it in the 2.6.32 menuconfig and couldn't wonder if it actually works. So I had a look and found this gem:

# cat /proc/interrupts | grep CPU
            CPU0       CPU1       CPU2       CPU3

Very boring, 4 processors.

echo 0 > /sys/devices/system/cpu/cpu3/online

And we just knocked out one!
We see that in dmesg:

kvm: disabling virtualization on CPU3                                                                                                                       
CPU 3 is now offline

Hmm, are you thinking what I'm thinking?

kvm: disabling virtualization on CPU2                                                                                                                       
CPU 2 is now offline                                                                                                                                        
kvm: disabling virtualization on CPU1                                                                                                                       
CPU 1 is now offline                                                                                                                                        
SMP alternatives: switching to UP code

Wheeee. I just castrated it to a single core! I actually didn't check if the kernel lets me take CPU0 offline. That would be hilarious. Anyway ...

echo  > /sys/devices/system/cpu/cpu1/online

And we just gained a CPU:

SMP alternatives: switching to SMP code                                                                                                                     
Booting processor 1 APIC 0x1 ip 0x6000                                                                                                                      
Initializing CPU#1                                                                                                                                          
Calibrating delay using timer specific routine.. 5200.20 BogoMIPS (lpj=10400418)                                                                            
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)                                                                                           
CPU: L2 Cache: 512K (64 bytes/line)                                                                                                                         
CPU 1/0x1 -> Node 0                                                                                                                                         
CPU: Physical Processor ID: 0                                                                                                                               
CPU: Processor Core ID: 1                                                                                                                                   
CPU1: AMD Phenom(tm) 9950 Quad-Core Processor stepping 03                                                                                                   
checking TSC synchronization [CPU#0 -> CPU#1]: passed.                                                                                                      
kvm: enabling virtualization on CPU1

This is seriously wicked. Now I just need to figure out how to bolt that onto powermanagement so that the machine knocks out cores when idle and powersaves. Linux never gets boring ...

Posted by Patrick | Permalink

Tue Nov 17 20:15:18 CET 2009

"Random crashes" with glibc 2.10 and 2.11

As documented in this bug (which mirrors an upstream bug here there's a bug in glibc 2.10 and 2.11, and this one seems to be easy to hit. Multithreaded apps "randomly" crash with "Invalid free" and other confusing errors. A hackaround is to unset or empty the environment variable "MALLOC_CHECK_". For me setting MALLOC_CHECK_="" before starting some of the affected packages seems to completely hide the error, now we can only hope that the gentoo glibc gets this patch soon.

Posted by Patrick | Permalink

Tue Nov 17 19:47:10 CET 2009

Awesome portage options

For the rest of this post I'll only consider portage 2.2. Most options are in portage 2.1 already, but I'm a lazy bum, so I don't compare to see what's what.

You can set PORTAGE_DEFAULT_OPS in /etc/make.conf, but if you add --ask you will have trouble running emerge from a script. --ignore-default-opts disables those defaults so you can run emerge --sync in a cronjob again.

Sets are really great, --list-sets shows you which are available. Just have a look, there are some nice ones - "security", "installed", "unavailable" ... they can help streamline some tasks. I find their names quite self-explanatory.

If you want to put something into the world file without rebuilding it use --noreplace, and if you want to remove it again use --deselect.

--nospinner disables that funny rotating spinner thingy so you can save precious bandwidth when connected remotely, and --quiet hides most of the output, which can be nice if you don't want to be hypnotized by scrolling compile output. For the OCD crowd --quiet-build might be nice as it doesn't show the compile output on console, but redirects to logfiles. --changelog is neat for seeing the log messages for that update, this often shows fixed bugs or other issues you might care about. --color with a parameter y or n toggles colorized output. And of course --alphabetical. The horror of unsorted output!

Sometimes people are confused that emerge -e world tries to update packages that emerge -uND world misses. That is usually caused by build-only dependencies. --with-bdeps=y and --complete-graph are good options to modify portage behaviour.

If you're on a fast machine and in a hurry you can try to set --jobs X with a reasonable value of X. Think about memory needs and such before setting it to infinity minus one! With --keep-going it gets really easy to not have the whole process stopped on the first failed package. This is not without issues, but it avoids the --resume --skipfirst in a loop tricks. If --jobs seems to hard to calibrate to you --load-average=LOAD may help to limit it.

For the scripters --columns might be nice, it tweaks the output to be more script friendly.

Support for binary packages has grown considerably, there's support for local (-k / -K) and remote (-g / -G ) binpkg repositories. And you can --buildpkg and --buildpkgonly to create them (they are stored in PKGDIR). There's --binpkg-respect-use to only install the packages that have useflags set the same as the current configuration - it's a very powerful mechanism if you need to support Gentoo on multiple machines and don't want to compile that much.

I hope y'all enjoyed this little lesson in RTFM, there's plenty of other options to discover. Don't be afraid of the documentation, it doesn't bite and makes your life easier :)

Posted by Patrick | Permalink

Mon Nov 16 16:29:32 CET 2009

Configuring Portage

Few people take the time to actually read through the documentation, but if you have some time to spare "man make.conf" is a great read.
For example you can pre-set some CLI options like --ask or --verbose in EMERGE_DEFAULT_OPTS so you never have to type them again. Especially the FEATURES variable has some interesting bits:

buildpkg builds packages of everything
buildsyspkg builds only packages of the system set, which is awesome for recovery and doesn't take much space.

keepwork keeps the $WORKDIR and can be quite useful for debugging purposes (but not for general use)
noclean leaves even more there.
fail-clean is the opposite, it always wipes the build directories. Useful if you build on a small (but fast) disk or tmpfs.
installsources installs all the package sources to /usr/src/debug/, which can be used for debugging, but eats lots of space. Together with splitdebug it offers some really great debugging convenience.
test-fail-continue helps when you just want to have the tests run for logging purposes, but don't want the package to not be installed if tests fail. Most people won't need this.

split-elog and split-log features are quite interesting if you do logging.

Logging can be very nice to have, and portage has lots of configuration options for it.
PORTAGE_ELOG_SYSTEM defines how the log data is sent, be it through syslog, email or just to a file. Or completely custom?
And you can do combinations like PORTAGE_ELOG_SYSTEM="mail:warn,error syslog:* save".
PORTAGE_ELOG_CLASSES defines what you want to log - warnings, errors, qa warnings, everything ... it's your choice.

Of course there are lots of other configuration options:
PORTAGE_NICENESS can be useful when you don't want portage to interfere with anything else.
PORTAGE_IONICE_COMMAND needs ionice (or an equivalent tool) and can be used to make the disk activity of portage a bit less distracting. Both features may increase the time needed to install things, but will make portage more benign so you can still do things while it runs.

Also you can change almost all directories - PORTDIR, DISTDIR, PKGDIR and so on. This allows you to make portage behave a lot more like you want it (unless the defaults satisfy you already ...)

Posted by Patrick | Permalink

Sun Nov 15 19:08:48 CET 2009

HOWTO Radeon (opensource) + R700 + 3D / OpenGL

Finally I stopped slacking for long enough to fix a few bits of my desktop, and the results are grrrrrrreat.
Now I have (OpenGL!) full effects in KDE4 as opposed to the slightly less bouncy XRender-accelerated thingies before. Performance is pretty awesome (but then the HD4650 shouldn't even notice those few effects).
What you need:

~arch install (I'm not going to care to find out what minimal versions you need)
x11 overlay
a really recent kernel

And with really recent I mean "at least 2.6.32". At the time of writing that hasn't been released, so a 2.6.32-rc6 git-sources has to substitute for me. From what I've read you might have to disable framebuffer for things to work well, but as I'm usually seeing text mode for ~30 seconds every month I don't care enough to find out. I'm lazy!
In the kernel config you need to enable DRM and especially the radeon bits. Device Drivers -> Graphics -> Direct Rendering Manager is the "most important" bit there.
The following packages were suggested in a few places, I have no idea if that is the minimal set. But you'll have to unmask:

>=x11-libs/libdrm-9999
>=media-libs/mesa-9999
>=x11-base/xorg-server-9999
>=x11-proto/fixesproto-9999
>=x11-proto/xextproto-9999
>=x11-proto/xf86vidmodeproto-9999
>=x11-proto/renderproto-9999
>=x11-proto/recordproto-9999
>=x11-proto/inputproto-9999
>=x11-proto/xineramaproto-9999
>=x11-proto/bigreqsproto-9999
>=x11-proto/xf86driproto-9999
>=x11-proto/xf86dgaproto-9999
>=x11-proto/xcmiscproto-9999
>=x11-base/xorg-drivers-9999
>=x11-libs/libXext-9999
>=x11-libs/libXi-9999
>=x11-proto/xproto-9999
>=x11-libs/libX11-9999
>=x11-libs/libxcb-9999
>=x11-proto/xcb-proto-9999

Now go forth and rebuild all your shiny new packages.
If you managed to build that and reboot your new kernel things should look pretty much as before. The only "obvious" hints I've found to test are the output of glxinfo (has changed quite a bit) and that KDE4 allows me to use OpenGL now. And maybe the wobbly windows effect was a giveaway :)

I'm positively surprised that things have progressed this far, and I'm happy to finally be able to use more of my graphics card :)

EDIT: Seems that this is not the minimal set of packages and configuration needed. Some people suggested -9999 packages of mesa + libdrm + xf86-video-ati only. If that works even better :)

Posted by Patrick | Permalink

Fri Nov 13 13:20:59 CET 2009

The OOM Killer (and how to make it less annoying)

The linux kernel has lots of complexity in memory management. Swap allows to go beyond the size of real memory to allow applications to use "more". But still, at some point, you might exhaust all available memory.

The next application that requests memory (usually through malloc) will cause the kernel some trouble. It can either deny the request (which often causes hilarious results in the application) or free some memory somehow. (About the hilarity: Many coders assume that a malloc will always succeed. If it doesn't you'll get interesting misbehaviour like segmentation faults. Lots of fun to debug ...)

So, how does the kernel free memory? It can't just ask some other processes to surrender some. But it can terminate processes! It's a terminally stupid idea, but it's so stupid that it often works. And the handler for that is, obviously, the out-of-memory killer.
There's a very nice bit of information hidden in /proc to tell you what the oom-killer would do if it had to run now.

/proc/$pid/oom_score

contains the current value of the process with PID $pid. You could just compare them and see who is good and who is bad. And you can adjust it - a rarely used protection, but it might just help the oom-killer to act more sanely and less psychotic.

/proc/$pid/oom_adj

That's a numerical value used as a multiplier. Valid values are in the range -16 to +15, plus the special value -17, which disables oom-killing altogether for this process. The heuristic is quite complex, to quote:

The process to be killed in an out-of-memory situation is selected among all others
based on its badness score. This value equals the original memory size of the process
and is then updated according to its CPU time (utime + stime) and the
run time (uptime - start time). The longer it runs the smaller is the score.
Badness score is divided by the square root of the CPU time and then by
the double square root of the run time.
Swapped out tasks are killed first. Half of each child's memory size is added to
the parent's score if they do not share the same memory. Thus forking servers
are the prime candidates to be killed. Having only one 'hungry' child will make
parent less preferable than the child.

On some systems you might not ever want to have the oom-killer strike. It's just a hilariously bad idea to kill random processes. And you can even disable it:

The sysctl vm.overcommit_memory variable (also represented in /proc/sys/vm/overcommit_memory ) defines the behaviour. To summarize: 0 is default, where the kernel uses some heuristics and allows allocating more memory than available (which is what can trigger the nice OOM assassin) 1 always allows overcommit. The documentation is a bit sparse, but it seems to be tuned by vm.overcommit_ratio, which gives a percentage to overcommit (unless I misread that). And finally a value of 2 disables overcommitting and limits application memory to the size of (swap + ram*ratio). This means that worst case you'll disallow a request when there's still physical memory available, but you'll never have to trigger Mr.OOM-Killer.

What is best? That depends on what you do and how you want things to fail. overcommit_memory = 2 will cause memory allocation failures, but your machine will always be "alive". overcommit_memory = 0 might allow to allocate more memory, but you risk getting any process killed by oom. Sucks to have sshd killed on a server - maybe it's not the best idea to have a psychotic process assassin? But it's your choice, so do what you want to do :)

Posted by Patrick | Permalink

Tue Nov 10 20:18:46 CET 2009

The mistery of swappiness

For the longest time operating systems have been able to handle swap. In short swap extends physical memory with slow diskspace so that applications can use more memory than there is available.
On most unix systems the swap is in a dedicated partition because that has the lowest overhead. Plus you don't risk running out of diskspace when you want to swap, so things are quite predictable and nice. Linux has a very nice knob you can turn to affect the swap policy. It will not avoid swapping (in some situations you will have to), but it will affect how and when swap is used. That knob is /proc/sys/vm/swappiness.

The kernel default is a value of 60. The value can be between 0 and 100 and is effectively a percentage. It is used roughly in the following way:
If all available memory is exhausted (application memory, buffers and filesystem cache) and any memory allocation is requested the kernel needs to free a few pages of memory. It can either swap out application memory or drop some filesystem cache. The "swappiness" knob affects the probability which one is chosen.
This means that at a swappiness of 0 the kernel will try to never swap out a process, and at 100 it will try to always swap out processes and keep the filesystem cache intact. So with the default, if you use more than ca. 40% of your memory for applications and the rest is used as filesystem cache it will already start swapping a bit. The hilarious result is that you may up swapping a lot with lots of memory left - think of a machine with 64GB RAM! If you try to use 32G memory you'll be in swap hell.
That default might have been good with machines with less than 256MB RAM, but with current desktops and servers it is usually not optimal.
Now you might be tempted to tune it down to 0. Avoid swap. Swap is slow. All is good?
Not quite. At 0 your machine will try to avoid swapping until the last moment. Then it will have killed all filesystem cache (so every file operation will hit the disks) and in addition to that you start swapping like a madman. The result is usually a "swap storm" that hits very sudden. At the point where you might need some performance your machine doesn't provide it and might just be unresponsive to your input for a few minutes.
The other end (a value near 100) might make sense for a file server, but then it might be cheaper to just not run extra services on a machine that is very loaded already. I don't really see a usecase for a swappiness of 100 except maybe on machines that are very memory-limited.
On my desktop I've found a swappiness of 10-20 to be the sweet spot. This means that when 80%+ of memory is used by applications the machine will start swapping, but it's a more gradual hit and not an instant kill. And because there's still some filesystem cache the responsiveness for starting new processes (like a login shell ;) ) is still high enough to allow recovery from this pessimal system state.
Still your goal for optimal performance should be to avoid swapping. Disk access is slower than RAM by a factor of 1000 or more!
I've seen servers achieve roughly double the throughput with the right swappiness value - it can avoid an expensive hardware upgrade. Of course that's not all the tuning advice I have, so if you wish to discuss that feel free to send me a mail and maybe I can prove to you that Gentoo is the fastest penguin out there ...

Maybe I should discuss the OOM killer too - most people have seen it, but few know who it is and why he goes killing processes.

Posted by Patrick | Permalink