Chrony Rate Fluctuations

The following graphs show the fluctuations in the rates of the system clock and of the real time clocks on a variety of computers on the theory network. Until June all were synchronize against the same system,, a stratum 2 ntp server on campus ( the time delay is on the order of 100s of microseconds to that machine from any of these computers). as the top graph shows that server had a 3-4 msec sawtooth drift against GPS time. Thereafter, string was synchronized against, a stratum 1 server synchronized against GPS. In Sept, 2007, string was put onto ntp and sychronized against a stratum 0 GPS clock ( A Garmin 18LV GPS receiver with a PPS output) against which it maintains a roughly 2-3 microsecond offset. All of the other clocks are chrony synchronized against it. It is less a msec via switches away from all of the other clocks.

The following graphs plot the rate of the system clock vs the ntp server (red line and left hand scale) and the rate of the RTC vs the system clock(real time clock-- the CMOS clock)( dotted lines and right hand scale) against the time in days after 00:00 on the date shown. The rates are in units of microseconds per second. These rates are determined by comparing the reading on the system clock with the ntp determined times on the NTP server to adjust the rate of the system clock, and the rate of the RTC vs the system clock. Note that the strong correlation between the rate fluctuations suggests that the system clock is the primary source of noise, and that in general the RTC has better stability than does the system clock.

In the graphs for the week ending Feb 11, the huge instability in the case of one of the machines, info,i and of the other machines after they were restarted on Feb 9, is unexplained. There seems to be an instability in the operation of chrony. The restoration of a semblance of order after the 10th was done by decreasing the maxupdateskew to 1/5 (from unlimited). Dilaton was the most accurate clock in its rate fluctuations before that restarting, but not afterwards.
Well, I have finally tracked down the problem. That stratum 2 server stinks. I got a gps device with a PPS output, which I hooked up to a couple of the machines. The most interesting is string, which had some of the most unstable behaviour with chrony and in the following graph, I have plotted the response of string to the gps clock ( with chriny switched off) to and to, a stratum 1 server. The huge regular sawtooth waves come from Not only is the system on average about 3ms fast, its offset varies regularly. is very much better behaved-- considering that it is almost 10 msec away ( peer delay), its accuracy differs from the gps time by only about a few tens of a microsecond. (The "line" across the top is the gps time, with a width, a jtter of about 3 microseconds. The jagged line starting at 24 hr is, while the huge oscillation is, a supposed stratum 2 source. It may be that because it is running SunOS, the kernel cannot regulate the system clock properly leading to this behaviour.
(Note that in each case exactly the same overall drift has been removed from the data-- ie the drift was determined from teh GPS clock and then the same drift was removed from each of the other graphs.)
What is interesting is that while the gps spikes are all late ( by a few microseconds) both the ntp sources are early. This seems to imply that the outbound ntp packets take slightly longer than the inbound packets. On Apr 14 all of the machines except dilaton and string were changed to get their primary time from string, which gets its time from Dilaton got its time from, a time server located at Microsoft but was switched to string on Apr 15. In August, String was switched to running ntp with a Garmin 18LVC gps receiver delivering PPS signals to ntp. The accuracy of string then became of the order of a microsecond. In Nov, the bottom graphs were added. These give the measured offsets and round trip delay times for string as the stratum 0 source from each of the machines. The large ( up to 1 sec) round trip times seem to be due to problems with the switches installed in Physics (Cisco Gigabit switches) which seem to insert latencies of up to 2 seconds in routing the ntp packets between the various machines and string. monopole, charge, gauge, boson, dilaton, flory, info, fluxon are all on the same set of switches, so the delays come from single switches.

This is especially obvious in the week ending Feb 18 Some of the machines have huge (10ppm) fluctuations in the rate, and at exactly the same time, others (eg charge) are running in the .2 ppm range of fluctuations. Ie, these fluctutions are not coming from the source They seem to be inherent in the way chrony is setting the rates.

Since the time between comparison of the system clock vs the NTP server is of the order of 100-1000 sec (peer delay is .6ms typically) , the noise rate in the case of the best system would correspond to less than a millisecond drift

    One 450MHz Intel Pentium III Processor, 128M RAM, 903.19 Bogomips Total
   Two 450MHz Intel Pentium III Processors, 256M RAM, 1805.35 Bogomips Total
    One 750MHz Intel Pentium III Processor, 256M RAM, 1498.05 Bogomips Total
    One 750MHz Intel Pentium III Processor, 256M RAM, 1498.00 Bogomips Total
    One 750MHz Intel Pentium III Processor, 384M RAM, 1498.05 Bogomips Total
    One 935MHz Intel Pentium III Processor, 384M RAM, 1872.92 Bogomips Total
    One 935MHz Intel Pentium III Processor, 256M RAM, 1872.86 Bogomips Total
    One 1.6GHz Intel Pentium 4 Processor, 512M RAM, 3194.28 Bogomips Total
     One 2.67GHz Intel Pentium 4 Processor, 0.99GB RAM, 5339.53 Bogomips Total
   Two 2.8GHz Intel Pentium 4 Processors, 0.98GB RAM, 11179.02 Bogomips Total
    Two 3GHz Intel Pentium 4 Processors, 0.99GB RAM, 12008.29 Bogomips Total
    Two 3GHz Intel Intel(R) Pentium(R) D CPU 3.00GHz Processors, 1GB RAM, 12008.64 Bogomips Total

These rate fluctuations do not represent the actual clock accuracy, (in general chrony keeps the clocks to within a millisecond or less) but do represent the stability in the onboard system clock (driven from the bus frequency) and to some extent the real time clock. As chrony works, it measures the real time clock against the system clock, so an unstable system clock would produce an apparently unstable real time clock. In general the RTC seems to be more stable than is the system clock ( the correleated fluctuations in the system and RTC would suggest that a fair amount of the RTC instability comes from the system clock, rather than the RTC itself. )