Thu Jul 28 23:07:08 CEST 2011

Reboot

After spending quite a bit more time than expected in Berlin I've finally returned back "home". Having access to more than a suitcase of stuff can be convenient (although it appears to be an optional luxury now).

Due to some unfortunate hardware failures the fastest working CPU I have locally is a single-core Athlon64. It's quite fascinating to see how bloated everything has become, what used to be a droolworthy CPU not so long ago is now the lower end for a simple desktop. Especially memory consumption is insane - Thunderbird easily absorbs 5GB RAM if you push it a bit. Firefox seems to grow at a rate of ~200MB/day and needs to be regularly restarted. So much sadness.

I'm slowly catching up with gentoo things, and it appears that I've found two new minions to recruit. Just as I had realized that I have some time and motivation - I like it. More people means less work per person, so less burnout, less abandoned packages and so on. More better happy.

Thanks to the work of sochotnicky our Ohloh Statistics have finally been updated. There are some interesting results - for example the amount of committers has been roughly constant for the last 2-3 years, which means that recruiting is at least absorbing the normal attrition. But I think we need to do one better and get things growing ... how else are we supposed to keep everything in good shape?
Which also makes me think about bugs and how to squish them most effectively. There are so many bugs open that I find it hard to get an overview what is "most urgent" or what are trivial bugs that might just take 5 minutes of work to fix. So we should definitely revive the BugDays and make it easier for people to get involved and provide us with fixes. Right now I have no motivation (and not enough hardware) to do any tinderboxing as I can easily divert all processing power I have into bugfix testing. And that's going to make people happier, on average, than finding even more bugs ;) (although we need to improve on both ends - and we need more metrics so we know where we are and where we are going).

And on it goes, the infinite hamster wheel of progress - who wants to help?

Posted by Patrick | Permalink

Fri Jul 1 19:44:26 CEST 2011

MDMA

The Monitoring-Driven Master Administration


For software dev we have Test-Driven Development (TDD), unittests and all that machinery. The goal of all those methods is to catch errors, best before they can hurt anyone.
Detecting problems early saves you lots of time and frustration and makes changing and improving things easier.

For admins, we now have monitoring-driven administration:

(1) Set up monitoring. Watch it fail and notify you

(2) Set up service

(3) Watch all monitoring switch to greenlight

There are some simple rules to be followed:
No service can be deployed without monitoring. If there is any critical warning from the monitoring it needs to be fixed. If there are warnings they should be fixed, either by tackling the problems or increasing the monitoring threshold.

The default state is all green - no warnings, no errors (except during the test/integration phase of new services). Any warnings or errors should trigger you into fix-this-stuff mode.

Rationale:
When you enable a service in the greenlight state you never figure out if you monitor the right bits. Maybe the check for free disk is running locally instead of remotely? Will always look good, even if the actual service is in a failed state.

Having any warnings means that something is in a state you consider not-good. Either fix the service or the monitoring thresholds.

Results:
You'll be able to sleep a lot better if you get your daily status email and you know that everything is working fine. Then you can focus on improving things instead of playing infinite fireman.

If this sounds like stating the obvious, well, most good ideas are ...

Posted by Patrick | Permalink