Fri Jan 24 07:00:53 CET 2014

NTP: Working around bad hardware since 1842

After my recent troubles with NTP and excessive time drift things have settled down.

For reasons unknown to me the time drift on the problem server changed from -330ppm to +2.3ppm. I'm not quite sure how to interpret that.
Comparing some other machines I have access to:
Old P4:         -292.238
Dell R510:        12.428
Another R510:     13.232
Random amd64:    -28.438
Another amd64:   -23.323
A newish Xeon:    -7.296
So the general trend seems to be older = more time drift. And machines with "same" hardware appear to have similar drift factors.
The stability of <30ppm means a drift of about 0.1sec/day. Without extra correction that's tolerable (a few seconds a month), but still unsatisfactory.

The old P4 drifting at 300ppm means it'll be getting close to 3 minutes a week away from "real time" - that's enough to cause problems if you rely on it.

I think the lesson in this is "manufacturers use the cheapest they can get away with", so every computer should have a time correction mechanism (NTP, DCF-77, GPS - doesn't matter as long as you correct it). And there's a reasonable assumption that environmental factors (heat, hardware aging, change in the provided voltage, ...) will randomly change the time drift.

And I thought timekeeping was a problem solved two centuries ago ...

Posted by Patrick | Permalink

Fri Jan 17 04:24:40 CET 2014

Unexpected fun with NTP

This morning I had to fix an unexpected dovecot "failure" by restarting it. Apparently it only tolerates time jumps of less than seven seconds.
The trigger of this oopsie is NTP:
Jan 16 23:52:53 stupidserver ntpd[27668]: synchronized to 202.112.10.36, stratum 3
Jan 16 23:52:45 stupidserver ntpd[27668]: time reset -7.732856 s
Riiight. That's not nice, but why does it jump around so much? Looks like the time behaviour worsened over the last days:
Jan 15 19:34:18 stupidserver ntpd[27668]: no servers reachable
Jan 15 19:59:56 stupidserver ntpd[27668]: synchronized to 202.112.10.36, stratum 2
Jan 15 20:06:22 stupidserver ntpd[27668]: time reset +0.533773 s
...
Jan 16 11:47:33 stupidserver ntpd[27668]: synchronized to 202.112.10.36, stratum 2
Jan 16 11:47:30 stupidserver ntpd[27668]: time reset -2.966137 s
...
Jan 16 18:14:28 stupidserver ntpd[27668]: synchronized to 202.112.10.36, stratum 2
Jan 16 18:15:27 stupidserver ntpd[27668]: time reset -4.223295 s
...
Jan 16 23:52:53 stupidserver ntpd[27668]: synchronized to 202.112.10.36, stratum 3
Jan 16 23:52:45 stupidserver ntpd[27668]: time reset -7.732856 s
That's an offset of more than 1sec/h, and that's with ntpd correcting at around 330 PPM. The docs say: "The capture range of the loop is 500 PPM at an interval of 64s decreasing by a factor of two for each doubling of interval." (PPM = parts-per-million)
In other words, if the drift is above 500 PPM it may force a clock reset because it can't drift fast enough. And it looks like this situation was either a failing mainboard RTC clock, or a screwed up ntp server (since it always sync'ed to the same one).

I've tried two things to avoid this time skipping:
1) Change the ntp servers used to something more "local" - the global pool.ntp.org may not be as reliable as servers geographically close you
2) Remove the drift file to force the system to re-learn

The results, at first glance, look promising:
Jan 17 10:48:37 stupidserver ntpd[3059]: kernel time sync status 0040
Jan 17 10:52:55 stupidserver ntpd[3059]: synchronized to 202.120.2.101, stratum 3
Jan 17 10:52:50 stupidserver ntpd[3059]: time reset -5.023639 s
Jan 17 10:57:54 stupidserver ntpd[3059]: synchronized to 202.120.2.101, stratum 3
Jan 17 11:01:08 stupidserver ntpd[3059]: synchronized to 202.73.36.32, stratum 1
Jan 17 11:05:34 stupidserver ntpd[3059]: kernel time sync enabled 0001
So after an initial 5-second skip it managed to sync twice without abnormal drift. Let's hope that it's going to stay sane ...

Posted by Patrick | Permalink

Thu Jan 16 07:36:03 CET 2014

EAPI usage in tree

Total number of ebuilds: 37807

EAPI 0:  5959  15.78%
EAPI 1:   370   0.98%
EAPI 2:  3335   8.82%
EAPI 3:  3005   7.95%
EAPI 4: 12385  32.76%
EAPI 5: 12746  33.72%
That looks quite good: EAPI5 has grown very well, EAPI1 is almost gone.

EAPI0 is still needlessly common, and EAPI 2+3 should be deprecated.

Update: Now running as a cronjerb, Output here, History here

Posted by Patrick | Permalink