Recently, I was working on a research topic for Red Hat Insights which is a hosted service designed to help people proactively identify and resolve technical issues of Red Hat products. During that time a Chinese romantic comedy film; "I Belonged to You" was released. On hearing the name, I thought to myself, "that title couldn't be any better for this post". Just like the film goes, "I'm only a passerby in your world". So did the leap second! And soon another leap second is coming - let's cherish it this time. These little moments in time can be incredibly challenging, and also incredibly interesting. But, before we start talking about leap seconds, let's introduce some background about time itself.
What's UTC?
UTC (Universal Time Coordinated) is an official standard for the current time. UTC evolved from the former GMT (Greenwich Mean Time) that once was used to set the clocks on ships before they left for a long journey. Earlier, GMT had been adopted as the world's standard time, but one of the reasons that GMT was replaced as the official standard time was the fact that it was based on the mean solar time (MST). Newer methods of time measurement showed that MST varied significantly itself. The following list will explain the main components of UTC:
- Universal means that the time can be used everywhere in the world, meaning that it is independent from time zones (i.e. it's not local time). To convert UTC to local time, one would have to add or subtract the local time zone.
- Coordinated means that several institutions contribute their estimate of the current time, and UTC is built by combining these estimates.
What's TAI?
International Atomic Time (TAI, from the French name Temps Atomique International) is defined as the weighted average of the time kept by about 200 atomic clocks in over 50 national laboratories worldwide. It is the basis for Coordinated Universal Time (UTC), which is used for civil timekeeping all over the Earth's surface, and for Terrestrial Time, which is used for astronomical calculations. As of 30 June 2015 when another leap second was added, TAI is exactly 36 seconds ahead of UTC. The 36 seconds results from the initial difference of 10 seconds at the start of 1972, plus 26 leap seconds in UTC since 1972. One second of TAI time is a constant duration defined by cesium radiation. TAI times are identified by year, month, day, hour, minute, and second. There are exactly 86,400 TAI seconds in every TAI day.
What's a leap second?
A leap second is a one-second adjustment that is occasionally applied to Coordinated Universal Time (UTC) in order to keep its time of day close to the mean solar time. Without such a correction, time reckoned by Earth's rotation drifts away from atomic time because of irregularities in the Earth's rate of rotation. Since this system of correction was implemented in 1972, 26 leap seconds have been inserted, the most recent on June 30, 2015 at 23:59:60 UTC, and the next leap second will be inserted on December 31, 2016, at 23:59:60 UTC.
What's Unix time?
Unix time (also known as POSIX time or Epoch time) is a system for describing instants in time, defined as the number of seconds that have elapsed since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970, not counting leap seconds. It is used widely in Unix-like and many other operating systems and file formats. Because it does not handle leap seconds, it is neither a linear representation of time nor a true representation of UTC. Unix time may be checked on most Unix systems by typing date +%s on the command line. For example:
$ date +%s --date='2016-10-22T14:56:00Z'
1477148160
$ date +%s --date='Jan 01 1970 00:00:00 UTC'
0
What's the trouble when representing a leap second?
The Unix time number increases by exactly 86,400 each day, regardless of how long the day is. When a leap second occurs, the UTC day is not exactly 86,400 seconds long, so that a discontinuity occurs in the Unix time number. For a positive leap second, it is inserted between second 23:59:59 of a chosen UTC calendar date (the last day of a month, usually June 30 or December 31) and second 00:00:00 of the following date. This extra second is displayed on UTC clocks as 23:59:60. A negative leap second would suppress second 23:59:59 of the last day of a chosen month, so that second 23:59:58 of that date would be followed immediately by second 00:00:00 of the following date.
How does Linux kernel implement a leap second?
To represent a leap second, Linux kernel handles a leap second processing in its own way. Taking RHEL7.2 GA kernel for example, we can track the code path as follows: kernel/time/ntp.c
/*
* this routine handles the overflow of the microsecond field
*
* The tricky bits of code to handle the accurate clock support
* were provided by Dave Mills (Mills@UDEL.EDU) of NTP fame.
* They were originally developed for SUN and DEC kernels.
* All the kudos should go to Dave for this stuff.
*
* Also handles leap second processing, and returns leap offset
*/
int second_overflow(unsigned long secs)
{
s64 delta;
int leap = 0;
/*
* Leap second processing. If in leap-insert state at the end of the
* day, the system clock is set back one second; if in leap-delete
* state, the system clock is set ahead one second.
*/
switch (time_state) {
case TIME_OK:
if (time_status & STA_INS) {
time_state = TIME_INS;
ntp_next_leap_sec = secs + SECS_PER_DAY -
(secs % SECS_PER_DAY);
} else if (time_status & STA_DEL) {
time_state = TIME_DEL;
ntp_next_leap_sec = secs + SECS_PER_DAY -
((secs+1) % SECS_PER_DAY);
}
break;
case TIME_INS:
if (!(time_status & STA_INS)) {
ntp_next_leap_sec = TIME64_MAX;
time_state = TIME_OK;
} else if (secs % SECS_PER_DAY == 0) {
leap = -1;
time_state = TIME_OOP;
printk(KERN_NOTICE
"Clock: inserting leap second 23:59:60 UTC\n"); //<<<------------ SEEING THIS MSG IF LEAP SECOND EVENT IS HANDLED VIA THIS ROUTINE } break; case TIME_DEL: if (!(time_status & STA_DEL)) { ntp_next_leap_sec = TIME64_MAX; time_state = TIME_OK; } else if ((secs + 1) % SECS_PER_DAY == 0) { leap = 1; ntp_next_leap_sec = TIME64_MAX; time_state = TIME_WAIT; printk(KERN_NOTICE "Clock: deleting leap second 23:59:59 UTC\n"); } break; case TIME_OOP: ntp_next_leap_sec = TIME64_MAX; time_state = TIME_WAIT; break; case TIME_WAIT: if (!(time_status & (STA_INS | STA_DEL))) time_state = TIME_OK; break; } /* Bump the maxerror field */ time_maxerror += MAXFREQ / NSEC_PER_USEC; if (time_maxerror > NTP_PHASE_LIMIT) {
time_maxerror = NTP_PHASE_LIMIT;
time_status |= STA_UNSYNC;
}
/* Compute the phase adjustment for the next second */
tick_length = tick_length_base;
delta = ntp_offset_chunk(time_offset);
time_offset -= delta;
tick_length += delta;
/* Check PPS signal */
pps_dec_valid();
if (!time_adjust)
goto out;
if (time_adjust > MAX_TICKADJ) {
time_adjust -= MAX_TICKADJ;
tick_length += MAX_TICKADJ_SCALED;
goto out;
}
if (time_adjust < -MAX_TICKADJ) {
time_adjust += MAX_TICKADJ;
tick_length -= MAX_TICKADJ_SCALED;
goto out;
}
tick_length += (s64)(time_adjust * NSEC_PER_USEC / NTP_INTERVAL_FREQ)
<< NTP_SCALE_SHIFT;
time_adjust = 0;
out:
return leap;
}
/**
* accumulate_nsecs_to_secs - Accumulates nsecs into secs
*
* Helper function that accumulates a the nsecs greater then a second
* from the xtime_nsec field to the xtime_secs field.
* It also calls into the NTP code to handle leapsecond processing.
*
*/
static inline unsigned int accumulate_nsecs_to_secs(struct timekeeper *tk)
{
u64 nsecps = (u64)NSEC_PER_SEC << tk->hift;
unsigned int clock_set = 0;
while (tk-> xtime_nsec >= nsecps) {
int leap;
tk->xtime_nsec -= nsecps;
tk->xtime_sec++;
/* Figure out if its a leap sec and apply if needed */
leap = second_overflow(tk->xtime_sec); //<<<---------------------------- if (unlikely(leap)) { struct timespec64 ts; tk->xtime_sec += leap;
ts.tv_sec = leap;
ts.tv_nsec = 0;
tk_set_wall_to_mono(tk,
timespec64_sub(tk->wall_to_monotonic, ts));
__timekeeping_set_tai_offset(tk, tk->tai_offset - leap);
clock_set = TK_CLOCK_WAS_SET;
}
}
return clock_set;
}
/**
* update_wall_time - Uses the current clocksource to increment the wall time
*
*/
void update_wall_time(void)
{
struct clocksource *clock;
struct timekeeper *real_tk = &timekeeper;
struct timekeeper *tk = &shadow_timekeeper;
cycle_t offset;
int shift = 0, maxshift;
unsigned int clock_set = 0;
unsigned long flags;
raw_spin_lock_irqsave(&timekeeper_lock, flags);
/* Make sure we're fully resumed: */
if (unlikely(timekeeping_suspended))
goto out;
clock = real_tk->clock;
#ifdef CONFIG_ARCH_USES_GETTIMEOFFSET
offset = real_tk->cycle_interval;
#else
offset = clocksource_delta(clock->read(clock), clock->cycle_last,
clock->mask);
#endif
/* Check if there's really nothing to do */
if (offset < real_tk->cycle_interval)
goto out;
/*
* With NO_HZ we may have to accumulate many cycle_intervals
* (think "ticks") worth of time at once. To do this efficiently,
* we calculate the largest doubling multiple of cycle_intervals
* that is smaller than the offset. We then accumulate that
* chunk in one go, and then try to consume the next smaller
* doubled multiple.
*/
shift = ilog2(offset) - ilog2(tk->cycle_interval);
shift = max(0, shift);
/* Bound shift to one less than what overflows tick_length */
maxshift = (64 - (ilog2(ntp_tick_length())+1)) - 1;
shift = min(shift, maxshift);
while (offset >= tk->cycle_interval) {
offset = logarithmic_accumulation(tk, offset, shift,
&clock_set);
if (offset < tk->cycle_interval<<shift)
shift--;
}
/* correct the clock when NTP error is too big */
timekeeping_adjust(tk, offset);
/*
* XXX This can be killed once everyone converts
* to the new update_vsyscall.
*/
old_vsyscall_fixup(tk);
/*
* Finally, make sure that after the rounding
* xtime_nsec isn't larger than NSEC_PER_SEC
*/
clock_set |= accumulate_nsecs_to_secs(tk); //<<<------------------------ write_seqcount_begin(&timekeeper_seq); /* Update clock->cycle_last = tk->cycle_last;
/*
* Update the real timekeeper.
*
* We could avoid this memcpy by switching pointers, but that
* requires changes to all other timekeeper usage sites as
* well, i.e. move the timekeeper pointer getter into the
* spinlocked/seqcount protected sections. And we trade this
* memcpy under the timekeeper_seq against one before we start
* updating.
*/
memcpy(real_tk, tk, sizeof(*tk));
timekeeping_update(real_tk, clock_set);
write_seqcount_end(&timekeeper_seq);
out:
raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
if (clock_set)
/* Have to call _delayed version, since in irq context*/
clock_was_set_delayed();
}
Since the kernel part has been prepared, how does NTP work with it? When a leap second is coming, the 'add/delete leap second' flag will be set or received by the NTP daemon. For example, if you're simulating a leap second event, you may setup an NTP server to announce a leap second with a leap seconds list. If your system is a normal NTP client, it will receive the leap second announcement from upstream NTP servers. In both cases, NTP daemon will pass the leap second flag to the kernel by the adjtimex syscall by default. Let's take a look at the relevant code of ntp:
// ntp-4.2.6p5/ntpd/ntp_loopfilter.c
if (pll_control && kern_enable) {
/*
* We initialize the structure for the ntp_adjtime()
* system call. We have to convert everything to
* microseconds or nanoseconds first. Do not update the
* system variables if the ext_enable flag is set. In
* this case, the external clock driver will update the
* variables, which will be read later by the local
* clock driver. Afterwards, remember the time and
* frequency offsets for jitter and stability values and
* to update the frequency file.
*/
memset(&ntv, 0, sizeof(ntv));
if (ext_enable) {
ntv.modes = MOD_STATUS;
} else {
#ifdef STA_NANO
ntv.modes = MOD_BITS | MOD_NANO;
#else /* STA_NANO */
ntv.modes = MOD_BITS;
#endif /* STA_NANO */
if (clock_offset < 0)
dtemp = -.5;
else
dtemp = .5;
#ifdef STA_NANO
ntv.offset = (int32)(clock_offset * 1e9 +
dtemp);
ntv.constant = sys_poll;
#else /* STA_NANO */
ntv.offset = (int32)(clock_offset * 1e6 +
dtemp);
ntv.constant = sys_poll - 4;
#endif /* STA_NANO */
ntv.esterror = (u_int32)(clock_jitter * 1e6);
ntv.maxerror = (u_int32)((sys_rootdelay / 2 +
sys_rootdisp) * 1e6);
ntv.status = STA_PLL;
/*
* Enable/disable the PPS if requested.
*/
if (pps_enable) {
if (!(pll_status & STA_PPSTIME))
report_event(EVNT_KERN,
NULL, "PPS enabled");
ntv.status |= STA_PPSTIME | STA_PPSFREQ;
} else {
if (pll_status & STA_PPSTIME)
report_event(EVNT_KERN,
NULL, "PPS disabled");
ntv.status |= ~(STA_PPSTIME |
STA_PPSFREQ);
}
if (sys_leap == LEAP_ADDSECOND) //<<<------------- ADD SECOND
ntv.status |= STA_INS;
else if (sys_leap == LEAP_DELSECOND) //<<<------------- DEL SECOND
ntv.status |= STA_DEL;
}
/*
* Pass the stuff to the kernel. If it squeals, turn off
* the pps. In any case, fetch the kernel offset,
* frequency and jitter.
*/
if (ntp_adjtime(&ntv) == TIME_ERROR) { //<<<---------------- ADJTIME
if (!(ntv.status & STA_PPSSIGNAL))
report_event(EVNT_KERN, NULL,
"PPS no signal");
}
The flag 'STA_INS' or 'STA_DEL' directs the kernel to insert or delete leap second via the ntp_adjtime() function.
How to observe a leap second?
Check upstream ntp servers if they are aligned to issue a leap second, we can use this command:
# ntpq -c "lassoc" -c "mrv &1 &999 leap,srcadr,stratum"
The 'leap=01' means that it's going to add a second. Sample output as follows:
ind assid status conf reach auth condition last_event cnt
===========================================================
1 54441 9024 yes yes none reject reachable 2
srcadr=flos-desktop-wireles.lan, leap=01, stratum=6
Insertion of a leap second is always scheduled for the end of a month, preferably at the end of June or December, at UTC midnight. So if the day is the end of June or December and a leap second is scheduled, we may see the 'leap_add_sec' or 'leap_del_sec' message in the 'Leap Field' when running this command:
# ntpq -c rv
Sample output as follows:
associd=0 status=4615 leap_add_sec, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.6p5@1.2349-o Fri Jul 3 05:08:39 UTC 2015 (1)",
processor="x86_64", system="Linux/3.10.0-327.36.2.el7.x86_64", leap=01, stratum=7,
precision=-21, rootdelay=3.886, rootdisp=270.305, refid=192.168.17.62,
reftime=d93da8d6.e5d79437 Tue, Jun 30 2015 23:46:30.897,
clock=d93da8dc.03806f2e Tue, Jun 30 2015 23:46:36.013, peer=53187, tc=6,
mintc=3, offset=7.566, frequency=1.055, sys_jitter=0.000,
clk_jitter=2.675, clk_wander=0.176
The 'leap_add_sec' means that there will be a second insertion after 23:59:59 UTC time of the current day. To verify if the local system has leapsecond bits set already, check the output of ntptime command for 'INS' or 'DEL' flag:
# ntptime | grep status
status 0x2011 (PLL,INS,NANO), #<<<------ INS OR DEL
In addition to the above usual commands, there are additional ways to observe the interaction between ntp daemon and kernel. Adding debugging code to ntp and using systemtap to monitor the values of kernel internal variables are useful debugging aid. The following logs, collected from my testing servers with custom ntp and kernel, may shed light on these aspects:
How ntp insert the INS flag to the kernel?
Please note the changes before and after the time stamp 23:46:19.
** Systemtap output snippet:
[Tue Jun 30 23:46:16 2015] second_overflow(): entering. ***time_status 0x2001, time_state 0x0***
[Tue Jun 30 23:46:17 2015] second_overflow(): entering. ***time_status 0x2001, time_state 0x0***
[Tue Jun 30 23:46:18 2015] second_overflow(): entering. ***time_status 0x2001, time_state 0x0***
[Tue Jun 30 23:46:19 2015] second_overflow(): entering. ***time_status 0x2001, time_state 0x0***
[Tue Jun 30 23:46:19 2015] sys_adjtimex(): timex.modes 0x203d, timex.status 0x11, comm ntpd, pid 1715
[Tue Jun 30 23:46:19 2015] sys_adjtimex(): timex.modes 0x80, timex.status 0x2011, comm ntpd, pid 1715
[Tue Jun 30 23:46:20 2015] second_overflow(): entering. ***time_status 0x2011, time_state 0x0***
[Tue Jun 30 23:46:21 2015] second_overflow(): entering. ***time_status 0x2011, time_state 0x1***
[Tue Jun 30 23:46:22 2015] second_overflow(): entering. ***time_status 0x2011, time_state 0x1***
[Tue Jun 30 23:46:23 2015] second_overflow(): entering. ***time_status 0x2011, time_state 0x1***
** ntpd debugging output snippet:
1 Jul 07:46:18 ntpd[1715]: select(): nfound=-1, error: Interrupted system call
refclock_receive: at 69 127.127.1.0
refclock_sample: n 1 offset 0.000000 disp 0.010000 jitter 0.000000
clock_filter: n 6 off 0.000000 del 0.000000 dsp 0.187528 jit 0.000000
poll_update: at 69 127.127.1.0 poll 6 burst 1 retry 0 head 0 early 2 next 1
**DBG: leapsec:0x336, sys_leap:0x1, file:ntp_timer.c, line:336, func:timer
**DBG: leapsec:0x335, sys_leap:0x1, file:ntp_timer.c, line:355, func:timer
**DBG: leapsec:0x335, sys_leap:0x1, file:ntp_timer.c, line:359, func:timer
1 Jul 07:46:19 ntpd[1715]: select(): nfound=-1, error: Interrupted system call
refclock_receive: at 70 127.127.1.0
refclock_sample: n 1 offset 0.000000 disp 0.010000 jitter 0.000000
clock_filter: n 7 off 0.000000 del 0.000000 dsp 0.062522 jit 0.000000
select: survivor 127.127.1.0 0.000000
select: combine offset 0.000000000 jitter 0.000000000
poll_update: at 70 127.127.1.0 poll 6 burst 0 retry 0 head 0 early 2 next 59
clock_update: at 70 sample 70 associd 20682
local_clock: mu 69 state 5 poll 6 count 6
**DBG: ntv.status:0x11, ntv.modes:0x203d, leap:0x1, file:ntp_loopfilter.c, line:564, func:local_clock
**DBG: ntp_adjtime() called, file:ntp_loopfilter.c, line:577, func:local_clock
**DBG: ntp_adjtime() called, file:ntp_loopfilter.c, line:607, func:local_clock
local_clock: offset 0.000000000 jit 0.000000238 freq -14.661 stab 0.000 poll 6
poll_update: at 70 127.127.1.0 poll 6 burst 0 retry 0 head 0 early 2 next 59
**DBG: leapsec:0x335, sys_leap:0x1, file:ntp_timer.c, line:336, func:timer
**DBG: leapsec:0x334, sys_leap:0x1, file:ntp_timer.c, line:355, func:timer
**DBG: leapsec:0x334, sys_leap:0x1, file:ntp_timer.c, line:359, func:timer
1 Jul 07:46:20 ntpd[1715]: select(): nfound=-1, error: Interrupted system call
**DBG: leapsec:0x334, sys_leap:0x1, file:ntp_timer.c, line:336, func:timer
**DBG: leapsec:0x333, sys_leap:0x1, file:ntp_timer.c, line:355, func:timer
**DBG: leapsec:0x333, sys_leap:0x1, file:ntp_timer.c, line:359, func:timer
1 Jul 07:46:21 ntpd[1715]: select(): nfound=-1, error: Interrupted system call
What do the kernel and ntp do during the the leap second event?
The call trace is shown intentionally when the leap second is handled in kernel's discipline. The debugging output is showing what ntpd logged during that moment.
** Systemtap output snippet:
[Tue Jun 30 23:59:56 2015] second_overflow(): entering. ***time_status 0x2011, time_state 0x1***
[Tue Jun 30 23:59:57 2015] second_overflow(): entering. ***time_status 0x2011, time_state 0x1***
[Tue Jun 30 23:59:58 2015] second_overflow(): entering. ***time_status 0x2011, time_state 0x1***
[Tue Jun 30 23:59:59 2015] second_overflow(): entering. ***time_status 0x2011, time_state 0x1***
[Wed Jul 1 00:00:00 2015] second_overflow(): entering. ***time_status 0x2011, time_state 0x1***
[Wed Jul 1 00:00:00 2015] second_overflow(): setting leap...
0xffffffff810cad95 : second_overflow+0x225/0x2a0 [kernel] #<<<------------- Call trace for review
0xffffffff810ca241 : update_wall_time+0x271/0x670 [kernel]
0xffffffff810d1e15 : tick_sched_timer+0x25/0x60 [kernel]
0xffffffff8108c667 : __run_hrtimer+0x67/0x210 [kernel]
0xffffffff8108ca69 : hrtimer_interrupt+0xe9/0x220 [kernel]
0xffffffff8151b42b : smp_apic_timer_interrupt+0x3b/0x50 [kernel]
0xffffffff815194bd : apic_timer_interrupt+0x6d/0x80 [kernel]
** ntpd debugging output snippet:
1 Jul 07:59:57 ntpd[1715]: select(): nfound=-1, error: Interrupted system call
**DBG: leapsec:0x3, sys_leap:0x1, file:ntp_timer.c, line:336, func:timer
**DBG: leapsec:0x2, sys_leap:0x1, file:ntp_timer.c, line:355, func:timer
**DBG: leapsec:0x2, sys_leap:0x1, file:ntp_timer.c, line:359, func:timer
1 Jul 07:59:58 ntpd[1715]: select(): nfound=-1, error: Interrupted system call
**DBG: leapsec:0x2, sys_leap:0x1, file:ntp_timer.c, line:336, func:timer
**DBG: leapsec:0x1, sys_leap:0x1, file:ntp_timer.c, line:355, func:timer
**DBG: leapsec:0x1, sys_leap:0x1, file:ntp_timer.c, line:359, func:timer
1 Jul 07:59:59 ntpd[1715]: select(): nfound=-1, error: Interrupted system call
**DBG: leapsec:0x1, sys_leap:0x1, file:ntp_timer.c, line:336, func:timer
**DBG: leapsec:0x0, sys_leap:0x0, file:ntp_timer.c, line:342, func:timer
event at 890 0.0.0.0 051b 0b leap_event #<<<-------------------------------- LEAP EVENT
1 Jul 07:59:59 ntpd[1715]: select(): nfound=-1, error: Interrupted system call
**DBG: leapsec:0x0, sys_leap:0x0, file:ntp_timer.c, line:336, func:timer
1 Jul 08:00:00 ntpd[1715]: select(): nfound=-1, error: Interrupted system call
**DBG: leapsec:0x0, sys_leap:0x0, file:ntp_timer.c, line:336, func:timer
read_network_packet: fd=19 length 48 from 192.168.17.52
fetch_timestamp: system network time stamp: 1435708800.594039
fetch_timestamp: timestamp delta: 0.000175451 (incl. prec fuzz)
1 Jul 08:00:00 ntpd[1715]: input_handler: Processed a gob of fd's in 0.130882 msec
processing timestamp delta 0.000446352 (with prec. fuzz)
receive: at 892 192.168.17.62<-192.168.17.52 flags 19 restrict 5d0
restrict: interval 66 headway 8 limit 64
receive: at 892 192.168.17.62<-192.168.17.52 mode 3 len 48 sendpkt(19, dst=192.168.17.52, src=192.168.17.62, ttl=0, len=48) transmit: at 892 192.168.17.62-&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;192.168.17.52 mode 4 len 48
processing time for 1 buffers 0.000200014
1 Jul 08:00:01 ntpd[1715]: select(): nfound=-1, error: Interrupted system call
**DBG: leapsec:0x0, sys_leap:0x0, file:ntp_timer.c, line:336, func:timer
** System journal snippet:
Jul 01 07:59:59 testhost kernel: Clock: inserting leap second 23:59:60 UTC
So far, we would have a basic impression of what a leap second looks like and how the kernel 'perceives' it. Is that enough? Are there any unexpected effects when the time steps back?
No man ever steps in the same river twice?
Leap seconds are a discontinuity of civil time. The time does not continue to increase monotonically but it is stepped by one second. In most Unix-like systems if a leap second is to be inserted, the kernel of the OS just steps the time back by one second at the beginning of the leap second, the last second of the UTC day is repeated so that duplicate timestamps can occur. Many applications would get confused if the system time is stepped back. So even though the kernel can handle leap seconds normally, the way it does is not optimal for applications. Is there a good way to avoid this?
What's the best practice to handle leap seconds?
The way to handle a leap second should depend on whether or not the Linux system is running a NTP or PTP daemon. Last year my friend Miroslav Lichvar wrote an excellent document that explained different approaches of handling leap seconds thoroughly in our developer's blog. In this section we will focus on the practical use of these methods.
Running ntpd in default mode
As we know, systems with any version of Red Hat Enterprise Linux should automatically account for leap second corrections if they are using the NTP daemon to synchronize their local timekeeping with an NTP server. During the last day before a leap second correction, NTP servers should notify their clients that a leap second will occur, and at 23:59:59 UTC, the Linux kernel should add or remove an extra second by making the sixtieth second occur twice or removing it entirely. With the default settings of ntpd, during the insertion of a leap second, you can monitor the time with this script:
# while sleep 0.1 ; do date -u -Ins ; done
Sample output:
2015-06-30T23:59:58,018654774+0000
2015-06-30T23:59:58,123359023+0000
2015-06-30T23:59:58,229552900+0000
2015-06-30T23:59:58,334737210+0000
2015-06-30T23:59:58,440948455+0000
2015-06-30T23:59:58,609485640+0000
2015-06-30T23:59:58,716430299+0000
2015-06-30T23:59:58,822196217+0000
2015-06-30T23:59:58,928354352+0000
2015-06-30T23:59:59,033388523+0000 <<<------------ 1st occurrence
2015-06-30T23:59:59,139528389+0000
2015-06-30T23:59:59,245912308+0000
2015-06-30T23:59:59,354560230+0000
2015-06-30T23:59:59,460970930+0000
2015-06-30T23:59:59,578670612+0000
2015-06-30T23:59:59,681923277+0000
2015-06-30T23:59:59,787195607+0000
2015-06-30T23:59:59,892817166+0000
2015-06-30T23:59:59,999465899+0000
2015-06-30T23:59:59,157591495+0000 <<<------------ 2nd occurrence
2015-06-30T23:59:59,264467758+0000
2015-06-30T23:59:59,370565537+0000
2015-06-30T23:59:59,475371544+0000
2015-06-30T23:59:59,581479763+0000
2015-06-30T23:59:59,686079283+0000
2015-06-30T23:59:59,790239515+0000
2015-06-30T23:59:59,896284479+0000
2015-07-01T00:00:00,002516468+0000
2015-07-01T00:00:00,108540140+0000
2015-07-01T00:00:00,215023678+0000
2015-07-01T00:00:00,321339155+0000
2015-07-01T00:00:00,427251596+0000
2015-07-01T00:00:00,532776107+0000
2015-07-01T00:00:00,641236141+0000
2015-07-01T00:00:00,748231556+0000
2015-07-01T00:00:00,853118309+0000
2015-07-01T00:00:00,959433385+0000
Meanwhile a special message will be printed to the system log:
kernel: Clock: inserting leap second 23:59:60 UTC
The leap second is handled in kernel's discipline, which can keep the time accurate but not monotonic.
Running ptp
PTP is based on International Atomic Time (TAI). The PTP grandmaster communicates the current offset between UTC and TAI, so that UTC can be computed from the received PTP time. For linuxptp (the implementation of the PTP on RHEL) with default options, ptp4l and phc2sys will set the kernel flag to insert a leap second as the system clock continues to run in UTC. The kernel will then insert the leap second as normal. Can we 'melt' the leap second so that to make the time look smoothly? That sounds good for applications. There are several ways people think of to achieve this goal. Let's move on.
Running ntpd in slew mode
For ntpd there's a mode called 'slew' mode, which can be seen in the ntpd man page:
-x
Normally, the time is slewed if the offset is less than the step threshold, which is 128 ms by default, and stepped
if above the threshold. This option sets the threshold to 600 s, which is well within the accuracy window to set
the clock manually. Note: Since the slew rate of typical Unix kernels is limited to 0.5 ms/s, each second of adjustment
requires an amortization interval of 2000 s. Thus, an adjustment as much as 600 s will take almost 14 days to complete.
This option can be used with the -g and -q options. Note: The kernel time discipline is disabled with this option.
When ntpd is running in this mode, the kernel will not 'feel' the existence of leap second and ntp daemon will slowly adjust the time instead. If slew mode is used, then applications do not have to deal with the "abrupt" leap second, since it is done over time. If slew mode is used and a leap second is occurring, the system time will differ from the official time by about one second. So applications interacting with other systems have either to cope with the one second difference, or the other systems also have to run in slew mode (and have the same slew speed). The one-second difference gained after the leap second will be measured and corrected later by slewing in normal operation using NTP servers, which already corrected their local clocks.
Yet another NTP member, Chrony
Red Hat Enterprise Linux 7 changed the default NTP client to chrony, which is a full-featured implementation of the NTP. When the system clock is synchronized by chronyd, the leap second correction is by default made by the kernel just like what ntpd does. The 'leapsecmode' option has been introduced to tell chrony what it should do when a leap second occurs. It can be set to four different values in the configuration file '/etc/chrony.conf':
system
When inserting a leap second, the kernel steps the system clock backwards by one second when the clock gets to
00:00:00 UTC. When deleting a leap second, it steps forward by one second when the clock gets to 23:59:59 UTC. This
is the default mode when the system driver supports leap seconds.
step
This is similar to the system mode, except the clock is stepped by chronyd instead of the kernel. It can be
useful to avoid bugs in the kernel code that would be executed in the system mode. This is the default mode when the
system driver doesn't support leap seconds.
slew
The clock is corrected by slewing started at 00:00:00 UTC when a leap second is inserted or 23:59:59 UTC when
a leap second is deleted. This may be preferred over the system and step modes when applications running on the
system are sensitive to jumps in the system time and it's acceptable that the clock will be off for a longer time.
On Linux with the default maxslewrate value (see section maxslewrate) the correction takes 12 seconds.
ignore
No correction is applied to the clock for the leap second. The clock will be corrected later in normal operation
when new measurements are made and the estimated offset includes the one second error.
Unlike ntpd with the '-x' option, chronyd with 'leapsecmode slew' option can still be used as a good NTP server. The
local clock is corrected by slew, but the time served to NTP clients is stepped on leap second. The slow local
correction is not visible to the clients.
Heard of the 'Leap Smear' technique, have a try?
When serving time to NTP clients that can't be configured to correct their clocks for a leap second by slewing or they would correct them at slightly different rates when it's necessary to keep them close together, chrony running in slew mode can combine with 'smoothtime' directive to enable a server leap smear. When smearing a leap second, the leap status is suppressed on the server and the served time is corrected slowly be slewing instead of stepping. The clients don't need any special configuration as they don't know there is any leap second and they follow the server time, which eventually brings them back to UTC. Care must be taken to ensure they use for synchronization only NTP servers, which smear the leap second in exactly the same way. This feature needs to be used carefully, because the server is intentionally not serving its best estimate of the true time. At this point which method for correcting the system clock on leap second is better? It depends on the application you're running . Before making a choice between adopting a leap second or not, it is good to understand the time dependence of your application on accuracy and precision as well as the differences among above methods. And then you can make intelligent trade-offs based on that knowledge.
No NTP no PTP, I'm just a Minimalist
Let's forget the NTP and PTP temporarily and think about where you live. Suppose you live in Canton, China, and will travel to Kyoto, Japan. The time zones in these two places are different. Imagine how is the time kept in systems of these two places? Will there be any problems due to the time representation around these systems? There's another important role we have to mention when we talk about this scenario - the time zone.
Asking about the time in an unfamiliar city
The Time Zone Database (often called tz or zoneinfo) contains code and data that represent the history of local time for many representative locations around the globe. It is updated periodically to reflect changes made by political bodies to time zone boundaries, UTC offsets, and daylight-saving rules. With time zone files, a system can convert the system time to local time. There's also a 'leapseconds' file, which contains a list of all leap seconds in the tzdata distribution. The data of this file will be used by the tzcode when one of the "right" timezones is selected. On RHEL6 or RHEL7, an example of showing the differences between the "posix" timezone file and the "right" timezone file is as follows:
# file /usr/share/zoneinfo/Asia/Hong_Kong
/usr/share/zoneinfo/Asia/Hong_Kong: timezone data, version 2, 5 gmt time flags, 5 std time flags, no leap seconds, 69 transition times, 5 abbreviation chars
# file /usr/share/zoneinfo/right/Asia/Hong_Kong
/usr/share/zoneinfo/right/Asia/Hong_Kong: timezone data, version 2, 5 gmt time flags, 5 std time flags, 27 leap seconds, 69 transition times, 5 abbreviation chars
The leap second data is described in the "right" timezone files, in other words, it will produce days with 86,401 seconds, and so it is not consistent with POSIX. In fact if you are using "right" time zone files, the system clock should be kept in TAI-10 instead of UTC, for they presume that the system clock value of time_t is a count including all leap seconds. For people who want to synchronize a client using a "right" timezone file, it would need to use a special NTP server serving time in TAI-10 instead of UTC, so that the client's system clock is kept in TAI-10 and applications can convert the time as expected with the "right" timezones. Note that the NTP implementations we have in Red Hat Enterprise Linux (ntpd and chronyd) don't do this.
Still reluctant to use NTP or PTP?
By default, Linux systems not using NTP or PTP to synchronize their time will not correct for leap seconds, and the time reported by these systems will have a one-second difference relative to UTC after the leap second correction. One should reset the clock manually after leap second occurrence.
Post Scriptum
I would like to thank Miroslav Lichvar for his professional advice in my writing of this post. After I delivered a technical session to the folks internally, I received great feedback and good questions on different application scenarios. I would like to share the outstanding questions here.
Slew mode of ntpd does not work as expected.
In earlier versions of RHEL 5, RHEL 6, and RHEL 7 there were known issues with slew mode.
- On RHEL 5, system time is not adjusted by a leap second when ntpd runs in slew mode if the version of ntp is 'ntp-4.2.2p1-8' and earlier.(KCS#68712)
- On RHEL 6 and RHEL 7, several ntp versions:
ntp-4.2.6p5-18.el7 ntp-4.2.6p5-19.el7_0 ntp-4.2.6p5-1.el6 ntp-4.2.6p5-2.el6_6 ntp-4.2.6p5-2.el6_5 ntp-4.2.6p5-2.el6_5.1
will have the leap second inserted (with stepping even if it is running in slew mode), but by ntpd instead of the kernel.(KCS#1379783)
- If you're still running RHEL 4, ntpd in RHEL 4 sets the kernel flag even after starting the ntpd service in slew mode. This bug was not fixed in RHEL 4.(KCS#1507793)
Is it necessary to have the latest tzdata package if my RHEL host is a NTP client?
Some people may hear of the tzdata package which is said to be a must upgrade component when the leap second is coming. Is that true? The answer is NO. Using an appropriate time keeping way for time synchronization, such as using NTP or PTP, can make you keep out of upgrading the tzdata package.(KCS#1465713) Also refer to above discussion about tzdata.
If the NTP server running chronyd and with the 'leapsecmode ignore' option, can this server be served as a public NTP server?
That wouldn't be a good idea as the clients would step their clock at 00:00:00 UTC, but the server would not, so there would be a one-second offset and the clients might step forward again, and then back again when the server figures out it's wrong and steps back.
Chronyd may crash when performing server leap smear
This new bug has been fixed both in upstream and Red Hat Enterprise Linux.(KCS#2759021)
Any handy tool can help detect leap second related issues?
We have Red Hat Insights and Red Hat Access Labs, feel free to try them.
Is there a guiding document for resolving leap second issues in RHEL?
I recommend reading the article "Resolve Leap Second Issues in Red Hat Enterprise Linux".(KCS#15145) Welcome to the end of this post, let's look forward to the extra second and enjoy it. ;)
Last updated: January 16, 2017