C and C++ logo

C++20 introduces counting_semaphore and binary_semaphore, which support blocking acquire() and non-blocking try_acquire() as well as timed try_acquire_for() and try_acquire_until(). On platforms that support __platform_wait()/__platform_notify(), we select an implementation strategy based on atomic; otherwise, we attempt to use POSIX semaphores.

If you missed the previous article, read it here: Implementing C++20 atomic waiting in libstdc++

Semaphores in C++

Here's what the ISO C++ Standard has to say on the matter of semaphores:

1 Class template counting_semaphore maintains an internal counter that is initialized when the semaphore is created. The counter is decremented when a thread acquires the semaphore, and is incremented when a thread releases the semaphore. If a thread tries to acquire the semaphore when the counter is zero, the thread will block until another thread increments the counter by releasing the semaphore.

2 least_max_value shall be non-negative; otherwise the program is ill-formed.

3 Concurrent invocations of the member functions of counting_semaphore, other than its destructor, do not introduce data races.

void release(ptrdiff_t update = 1);

8 Preconditions: update >= 0 is true, and update <= max() - counter is true.

9 Effects: Atomically execute counter += update. Then, unblocks any threads that are waiting for counter to be greater than zero.

10 Synchronization: Strongly happens before invocations of try_acquire that observe the result of the effects.

11 Throws: system_error when an exception is required (32.2.2).

12 Error conditions: Any of the error conditions allowed for mutex types (32.5.4.2).

bool try_acquire() noexcept;

13 Effects: Attempts to atomically decrement counter if it is positive, without blocking. If counter is not decremented, there is no effect and try_acquire immediately returns. An implementation may fail to decrement counter even if it is positive. [Note 1 : This spurious failure is normally uncommon, but allows interesting implementations based on a simple compare and exchange (Clause 31). — end note] An implementation should ensure that try_acquire does not consistently return false in the absence of contending semaphore operations.

14 Returns: true if counter was decremented, otherwise false.

void acquire();

15 Effects: Repeatedly performs the following steps, in order:

  • (15.1) — Evaluates try_acquire. If the result is true, returns.<.li>
  • (15.2) — Blocks on *this until counter is greater than zero.

16 Throws: system_error when an exception is required (32.2.2).

17 Error conditions: Any of the error conditions allowed for mutex types (32.5.4.2).

template bool try_acquire_for(const chrono::duration& rel_time); template bool try_acquire_until(const chrono::time_point& abs_time);

18 Effects: Repeatedly performs the following steps, in order:

  • (18.1) — Evaluates try_acquire(). If the result is true, returns true.
  • (18.2) — Blocks on *this until counter is greater than zero or until the timeout expires. If it is unblocked by the timeout expiring, returns false. The timeout expires (32.2.4) when the current time is after abs_time (for try_acquire_until) or when at least rel_time has passed from the start of the function (for try_acquire_for).

19 Throws: Timeout-related exceptions (32.2.4), or system_error when a non-timeout-related exception is required (32.2.2). 20 Error conditions: Any of the error conditions allowed for mutex types (32.5.4.2).

OK, so how to implement it?

If we have POSIX semaphores available, the following type will be defined:

    struct __platform_semaphore
    {
      using __clock_t = chrono::system_clock;
  #ifdef SEM_VALUE_MAX
      static constexpr ptrdiff_t _S_max = SEM_VALUE_MAX;
  #else
      static constexpr ptrdiff_t _S_max = _POSIX_SEM_VALUE_MAX;
  #endif

      explicit __platform_semaphore(ptrdiff_t __count) noexcept
      { sem_init(&_M_semaphore, 0, __count); }

      ~__platform_semaphore()
      { sem_destroy(&_M_semaphore); }

      // ...
    };

And implements acquire/​​​​​​try_acquire/release:

    struct __platform_semaphore
    {
      // ...

      void
      _M_acquire() noexcept
      {
        for (;;)
        {
          auto __err = sem_wait(&_M_semaphore);
          if (__err && (errno == EINTR))
            continue;
          else if (__err)
            std::terminate();
          else
            break;
        }
      }

      _GLIBCXX_ALWAYS_INLINE bool
      _M_try_acquire() noexcept
      {
        for (;;)
        {
          auto __err = sem_trywait(&_M_semaphore);
          if (__err && (errno == EINTR))
            continue;
          else if (__err && (errno == EAGAIN))
            return false;
          else if (__err)
            std::terminate();
          else
            break;
        }
        return true;
      }

    _GLIBCXX_ALWAYS_INLINE void
    _M_release(std::ptrdiff_t __update) noexcept
    {
      for(; __update != 0; --__update)
      {
        auto __err = sem_post(&_M_semaphore);
        if (__err)
          std::terminate();
      }
    }
      // ...
    };

For the timed wait operations, we have to worry about what the "basis" clock is and how to convert the user-supplied clock. For POSIX semaphores, this is handled as follows:

    struct __platform_semaphore
    {
      using __clock_t = chrono::system_clock;
      // ...

      bool
      _M_try_acquire_until_impl(const chrono::time_point<__clock_t>& __atime)
        noexcept
      {

        auto __s = chrono::time_point_cast(__atime);
        auto __ns = chrono::duration_cast(__atime - __s);

        struct timespec __ts =
        {
          static_cast(__s.time_since_epoch().count()),
          static_cast(__ns.count())
        };

        for (;;)
        {
          if (auto __err = sem_timedwait(&_M_semaphore, &__ts))
            {
              if (errno == EINTR)
          continue;
              else if (errno == ETIMEDOUT || errno == EINVAL)
          return false;
              else
          std::terminate();
            }
          else
            break;
        }
        return true;
      }

      template
        bool
        _M_try_acquire_until(const chrono::time_point<_Clock,
                             _Duration>& __atime) noexcept
        {
          if constexpr (std::is_same_v<__clock_t, _Clock>)
            {
              return _M_try_acquire_until_impl(__atime);
            }
          else
            {
              const typename _Clock::time_point __c_entry = _Clock::now();
              const auto __s_entry = __clock_t::now();
              const auto __delta = __atime - __c_entry;
              const auto __s_atime = __s_entry + __delta;
              if (_M_try_acquire_until_impl(__s_atime))
                return true;

              // We got a timeout when measured against __clock_t but
              // we need to check against the caller-supplied clock
              // to tell whether we should return a timeout.
              return (_Clock::now() < __atime);
            }
        }
    };

If we have support for atomic::wait/notify_one/all then the following type will be defined:

    struct __atomic_semaphore
    {
      static constexpr ptrdiff_t _S_max = __gnu_cxx::__int_traits::__max;
      explicit __atomic_semaphore(__detail::__platform_wait_t __count) noexcept
        : _M_counter(__count)
      {
        __glibcxx_assert(__count >= 0 && __count <= _S_max);
      }


      static _GLIBCXX_ALWAYS_INLINE bool
      _S_do_try_acquire(__detail::__platform_wait_t* __counter) noexcept
      {
        auto __old = __atomic_impl::load(__counter, memory_order::acquire);
        if (__old == 0)
          return false;

        return __atomic_impl::compare_exchange_strong(__counter,
                                                      __old, __old - 1,
                                                      memory_order::acquire,
                                                      memory_order::relaxed);
      }

      _GLIBCXX_ALWAYS_INLINE void
      _M_acquire() noexcept
      {
        auto const __pred = [this]
                            {return _S_do_try_acquire(&this->_M_counter);};
        std::__atomic_wait_address(&_M_counter, __pred);
      }

      bool
      _M_try_acquire() noexcept
      {
        auto const __pred = [this]
                            { return _S_do_try_acquire(&this->_M_counter); };
        return std::__detail::__atomic_spin(__pred);
      }

      // ...
    };

This expects an implementation of __atomic_wait_address that can accept a predicate; however, we have only defined __atomic_wait_address_v to this point (see part 1).

  template<typename _EntersWait>
    struct __waiter
    {
      // ...

      template<typename _Pred>
        void
        _M_do_wait(_Pred __pred) noexcept
        {
          do
           {
             __platform_wait_t __val;
             if (__base_type::_M_do_spin(__pred, __val))
               return;
             __base_type::_M_w._M_do_wait(__base_type::_M_addr, __val);
           }
          while (!__pred());
        }
    };

    template<typename _Tp, typename _Pred>
      void
      __atomic_wait_address(const _Tp* __addr, _Pred __pred) noexcept
      {
        __detail::__enters_wait __w(__addr);
        __w._M_do_wait(__pred);
      }
 

There is a problem with this formulation, in that __enters_wait will atomically increment and decrement a waiter count, which __atomic_semaphore does not need, as it inherently tracks the presence of waiters. In the previous article, we introduced the notion of __enters_wait and __bare_wait types to be used by waiters and notifiers, respectively. We extend that notion also to support "bare" waiters.

  // This call is to be used by atomic types which track contention externally
  template
    void
    __atomic_wait_address_bare(const __detail::__platform_wait_t* __addr,
			       _Pred __pred) noexcept
    {
#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
      do
        {
          __detail::__platform_wait_t __val;
          if (__detail::__bare_wait::_S_do_spin(__addr, __pred, __val))
            return;
          __detail::__platform_wait(__addr, __val);
        }
      while (!__pred());
#else // !_GLIBCXX_HAVE_PLATFORM_WAIT
      __detail::__bare_wait __w(__addr);
      __w._M_do_wait(__pred);
#endif
    }

We also need to introduce a corresponding "bare" notify, which skips checks for waiters:

    struct __waiter_pool
    {
      // ...

      void
      _M_notify(const __platform_wait_t* __addr, bool __all, bool __bare) noexcept
      {
        if (!(__bare || _M_waiting()))
          return;

#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
        __platform_notify(__addr, __all);
#else
        if (__all)
          _M_cv.notify_all();
        else
          _M_cv.notify_one();
#endif
      }
    };

Supporting timed waits on __atomic_semaphore

The C++ Standard has extensive support for durations and time points and specifies waiting operations in terms of those types:

  template<class Rep, class Period>
    bool try_acquire_for(const chrono::duration<Rep, Period>& rel_time);

  template<class Clock, class Duration>
    bool try_acquire_until(const chrono::time_point<Clock, Duration>& abs_time);

However, <chrono> is not supported in the freestanding (non-hosted) subset of the standard library, but <atomic> is. This means we can't inject a dependency on the header or in its underlying bits/atomic_base.h header. For this reason, all timed waiting functionality is split to bits/atomic_timed_wait.h. We preferentially use std::chrono::steady_clock as our basis clock on platforms that have futexes or pthread_cond_clockwait. Otherwise, we fall back to the system_clock and convert all user-supplied time points to this clock:

    #ifdef _GLIBCXX_HAVE_LINUX_FUTEX || _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT
      using __wait_clock_t = chrono::steady_clock;
    #else
      using __wait_clock_t = chrono::system_clock;
    #endif

    template<typename _Clock, typename _Dur>
      __wait_clock_t::time_point
      __to_wait_clock(const chrono::time_point<_Clock, _Dur>& __atime) noexcept
      {
        const typename _Clock::time_point __c_entry = _Clock::now();
        const __wait_clock_t::time_point __w_entry = __wait_clock_t::now();
        const auto __delta = __atime - __c_entry;
        using __w_dur = typename __wait_clock_t::duration;
        return __w_entry + chrono::ceil<__w_dur>(__delta);
      }

    template<typename _Dur>
      __wait_clock_t::time_point
      __to_wait_clock(const chrono::time_point<__wait_clock_t, _Dur>& __atime) noexcept
      {
        using __w_dur = typename __wait_clock_t::duration;
        return chrono::ceil<__w_dur>(__atime);
      }

If a platform supports an efficient __platform_wait(), it is also expected to provide an efficient timed wait version of the same called __platform_wait_until(). For Linux Futexes, we implement this as:

  // returns true if wait ended before timeout
  template<typename _Dur>
    bool
    __platform_wait_until_impl(const __platform_wait_t* __addr, __platform_wait_t __old,
    const chrono::time_point<__wait_clock_t, _Dur>& __atime) noexcept
    {
      auto __s = chrono::time_point_cast<chrono::seconds>(__atime);
      auto __ns = chrono::duration_cast<chrono::nanoseconds>(__atime - __s);

      struct timespec __rt =
      {
        static_cast<std::time_t>(__s.time_since_epoch().count()),
        static_cast<long>(__ns.count())
      };

      auto __e = syscall (SYS_futex, __addr,
      static_cast<int>(__futex_wait_flags::__wait_bitset_private),
                             __old, &__rt, nullptr,
      static_cast<int>(__futex_wait_flags:: __bitset_match_any));

      if (__e)
      {
        if ((errno != ETIMEDOUT) && (errno != EINTR) && (errno != EAGAIN))
          __throw_system_error(errno);
        return true;
      }
      return false;
    }

    // returns true if wait ended before timeout
    template<typename _Clock, typename _Dur>
      bool
      __platform_wait_until(const __platform_wait_t* __addr, __platform_wait_t __old,
      const chrono::time_point<_Clock, _Dur>& __atime)
      {
        if constexpr (is_same_v<__wait_clock_t, _Clock>)
          {
            return __platform_wait_until_impl(__addr, __old, __atime);
          }
        else
          {
            if (!__platform_wait_until_impl(__addr, __old,
                                            __to_wait_clock(__atime)))
            {
              // We got a timeout when measured against __clock_t but
              // we need to check against the caller-supplied clock
              // to tell whether we should return a timeout.
              if (_Clock::now() < __atime)
              return true;
            }
            return false;
          }
      }

As with __platform_wait(), for platforms that do not provide __platform_wait_until(), we fall back to using a mutex and condition variable and defining the following support functions:

    // Returns true if wait ended before timeout.
    // _Clock must be either steady_clock or system_clock.
    template<typename _Clock, typename _Dur>
      bool
      __cond_wait_until_impl(__condvar& __cv, mutex& __mx,
                             const chrono::time_point<_Clock, _Dur>& __atime)
      {
        static_assert(std::__is_one_of<_Clock,
        chrono::steady_clock,
        chrono::system_clock>::value);

        auto __s = chrono::time_point_cast<chrono::seconds>(__atime);
        auto __ns = chrono::duration_cast<chrono::nanoseconds>(__atime - __s);

        __gthread_time_t __ts =
        {
          static_cast<std::time_t>(__s.time_since_epoch().count()),
          static_cast<long<(__ns.count())
        };

        // if we have a pthreads implementation that supports CLOCK_MONOTONIC
        // and the caller's clock is steady_clock, use that
    #ifdef _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT
        if constexpr (is_same_v<chrono::steady_clock, _Clock>)
          __cv.wait_until(__mx, CLOCK_MONOTONIC, __ts);
        else
    #endif
        __cv.wait_until(__mx, __ts);
        return _Clock::now() & __atime;
      }

    // returns true if wait ended before timeout
    template<typename _Clock, typename _Dur>
      bool
      __cond_wait_until(__condvar& __cv, mutex& __mx,
      const chrono::time_point<_Clock, _Dur>& __atime)
      {
    #ifdef _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT
        if constexpr (is_same_v<_Clock, chrono::steady_clock>)
          return __detail::__cond_wait_until_impl(__cv, __mx, __atime);
        else
    #endif
        if constexpr (is_same_v<_Clock, chrono::system_clock>)
          return __detail::__cond_wait_until_impl(__cv, __mx, __atime);
        else
          {
            if (__cond_wait_until_impl(__cv, __mx, __to_wait_clock(__atime)))
              {
                // We got a timeout when measured against __clock_t but
                // we need to check against the caller-supplied clock
                // to tell whether we should return a timeout.
                if (_Clock::now() < __atime)
                  return true;
              }
              return false;
            }
      }

The formulation of __waiter_pool from part 1 assumes that we are in the market for indefinitely blocking waits. However, in this case, we only care about timed waiting. The actual implementation in bits/atomic_wait.h is:

      struct __waiter_pool_base
      {
        // ...
        alignas(_S_align) __platform_wait_t _M_wait = 0;
      #ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
        mutex _M_mtx;
      #endif
        alignas(_S_align) __platform_wait_t _M_ver = 0;
        #ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
        __condvar _M_cv;
      #endif

        void
        _M_enter_wait() noexcept;

        void
        _M_leave_wait() noexcept;

        bool
        _M_waiting() const noexcept;

        void
        _M_notify(const __platform_wait_t* __addr, bool __all, bool __bare) noexcept;

        static __waiter_pool_base&
        _S_for(const void* __addr) noexcept;
      };

For indefinite atomic waits, the derived type __waiter_pool is used:

      struct __waiter_pool : __waiter_pool_base
      {
        void
        _M_do_wait(const __platform_wait_t* __addr, __platform_wait_t __old) noexcept;
      };

For timed atomic waits, the derived type __timed_waiter_pool is used:

      struct __timed_waiter_pool : __waiter_pool_base
      {
        // returns true if wait ended before timeout
        template
          bool
          _M_do_wait_until(__platform_wait_t* __addr, __platform_wait_t __old,
          const chrono::time_point<_Clock, _Dur>& __atime)
          {
      #ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
            return __platform_wait_until(__addr, __old, __atime);
      #else
            __platform_wait_t __val;
            __atomic_load(__addr, &__val, __ATOMIC_RELAXED);
            if (__val == __old)
            {
              lock_guard __l(_M_mtx);
              return __cond_wait_until(_M_cv, _M_mtx, __atime);
            }
      #endif // _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
          }
      };

Similarly, __waiter is split into a __waiter_base type:

      template<typename _Tp>
        struct __waiter_base
        {
          using __waiter_type = _Tp;

          __waiter_type& _M_w;
          __platform_wait_t* _M_addr;

          template
          static __platform_wait_t*
          _S_wait_addr(const _Up* __a, __platform_wait_t* __b);

          static __waiter_type&
          _S_for(const void* __addr) noexcept
          {
            static_assert(sizeof(__waiter_type) == sizeof(__waiter_pool_base));
            auto& res = __waiter_pool_base::_S_for(__addr);
            return reinterpret_cast<__waiter_type&>(res);
          }

          template<typename _Up>
          explicit __waiter_base(const _Up* __addr) noexcept
            : _M_w(_S_for(__addr))
            , _M_addr(_S_wait_addr(__addr, &_M_w._M_ver))
          { }

          void
          _M_notify(bool __all, bool __bare = false);

          template<typename _Up, typename _ValFn, typename _Spin = __default_spin_policy>
            static bool
            _S_do_spin_v(__platform_wait_t* __addr, const _Up& __old, _ValFn __vfn,
                         __platform_wait_t& __val, _Spin __spin = _Spin{ });

          template<typename _Up, typename _ValFn, typename _Spin = __default_spin_policy>
            bool
            _M_do_spin_v(const _Up& __old, _ValFn __vfn, __platform_wait_t& __val,
                         _Spin __spin = _Spin{ });

          template<typename _Pred, typename _Spin = __default_spin_policy>
            static bool
            _S_do_spin(const __platform_wait_t* __addr, _Pred __pred,
                       __platform_wait_t& __val, _Spin __spin = _Spin{ });

          template<typename _Pred, typename _Spin = __default_spin_policy>
            bool
            _M_do_spin(_Pred __pred, __platform_wait_t& __val, _Spin __spin = _Spin{ });
        };

From which __waiter is derived:

      template<typename _EntersWait>
        struct __waiter : __waiter_base<__waiter_pool>
        {
          using __base_type = __waiter_base<__waiter_pool>;

          template<typename _Tp>
            __waiter(const _Tp* __addr) noexcept;

          // ...

          template<typename _Tp, typename _ValFn>
            void
            _M_do_wait_v(_Tp __old, _ValFn __vfn);

          template<typename _Pred>
            void
            _M_do_wait(_Pred __pred) noexcept;
        };

      using __enters_wait = __waiter<std::true_type>;
      using __bare_wait = __waiter<std::false_type>;

__timed_waiter is similarly derived:

      template<typename _EntersWait>
        struct __timed_waiter : __waiter_base<__timed_waiter_pool>
        {

          using __base_type = __waiter_base<__timed_waiter_pool>;

          template<typename _Tp>
             __timed_waiter(const _Tp* __addr) noexcept;

          // ...

          // returns true if wait ended before timeout
          template<typename _Tp, typename _ValFn, typename _Clock, typename _Dur>
            bool
            _M_do_wait_until_v(_Tp __old, _ValFn __vfn,
            const chrono::time_point<_Clock, _Dur>& __atime) noexcept;

          // returns true if wait ended before timeout
          template<typename _Pred,
                   typename _Clock, typename _Dur>
            bool
            _M_do_wait_until(_Pred __pred, __platform_wait_t __val,
            const chrono::time_point<_Clock, _Dur>& __atime) noexcept;

          // returns true if wait ended before timeout
          template<typename _Pred,
                   typename _Clock, typename _Dur>
            bool
            _M_do_wait_until(_Pred __pred,
                             const chrono::time_point<_Clock, _Dur>& __atime) noexcept;

          template<typename _Tp, typename _ValFn,
                   typename _Rep, typename _Period>
            bool
            _M_do_wait_for_v(_Tp __old, _ValFn __vfn,
                             const chrono::duration<_Rep, _Period>& __rtime) noexcept;

          template<typename _Pred,
                   typename _Rep, typename _Period>
            bool
            _M_do_wait_for(_Pred __pred,
                           const chrono::duration<_Rep, _Period>& __rtime) noexcept;
      };

      using __enters_timed_wait = __timed_waiter<std::true_type>;
      using __bare_timed_wait = __timed_waiter<std::false_type>;

As with __waiter, there are top-level __atomic_wait_address_until/for wrappers that __atomic_semaphore calls into, all of which follow the general form of:

      // returns true if wait ended before timeout
      template<typename _Tp, typename _ValFn,
               typename _Clock, typename _Dur>
        bool
        __atomic_wait_address_until_v(const _Tp* __addr, _Tp&& __old, _ValFn&& __vfn,
                                      const chrono::time_point<_Clock, _Dur>& __atime) noexcept
        {
          __detail::__enters_timed_wait __w{__addr};
          return __w._M_do_wait_until_v(__old, __vfn, __atime);
        }

      template<typename _Tp, typename _Pred,
               typename _Clock, typename _Dur>
        bool
        __atomic_wait_address_until(const _Tp* __addr, _Pred __pred,
                                    const chrono::time_point<_Clock, _Dur>& __atime) noexcept
        {
          __detail::__enters_timed_wait __w{__addr};
          return __w._M_do_wait_until(__pred, __atime);
        }

With versions for "bare" wait and the _for_v/_for variants to wait for a supplied duration. The actual waiting entry points on __timed_waiter are implemented as follows:

      struct __timed_waiter
      {
        // ...

        template<typename _Tp, typename _ValFn,
                 typename _Clock, typename _Dur>
          bool
          _M_do_wait_until_v(_Tp __old, _ValFn __vfn,
                             const chrono::time_point<_Clock, _Dur>& __atime) noexcept
          {
            __platform_wait_t __val;
            if (_M_do_spin(__old, std::move(__vfn), __val,
                             __timed_backoff_spin_policy(__atime)))
              return true;
            return __base_type::_M_w._M_do_wait_until(__base_type::_M_addr, __val, __atime);
          }

        template<typename _Pred,
                 typename _Clock, typename _Dur>
          bool
          _M_do_wait_until(_Pred __pred, __platform_wait_t __val,
                           const chrono::time_point<_Clock, _Dur>& __atime) noexcept
          {
            for (auto __now = _Clock::now(); __now < __atime; __now = _Clock::now())
            {
              if (__base_type::_M_do_spin(__pred, __val,
                                          __timed_backoff_spin_policy(__atime, __now)))
                return true;

              if (__base_type::_M_w._M_do_wait_until(__base_type::_M_addr, __val, __atime)
                    && __pred())
                return true;
            }
            return false;
          }
      };

Aside from the usual differences between the _v and predicate forms of wait, the timed waits introduce a custom spin policy:

      struct __timed_backoff_spin_policy
      {
        __wait_clock_t::time_point _M_deadline;
        __wait_clock_t::time_point _M_t0;

        template<typename _Clock, typename _Dur>
           __timed_backoff_spin_policy(chrono::time_point<_Clock, _Dur> __deadline = _Clock::time_point::max(),
                                       chrono::time_point<_Clock, _Dur> __t0 = _Clock::now()) noexcept
          : _M_deadline(__to_wait_clock(__deadline))
          , _M_t0(__to_wait_clock(__t0))
        { }

        bool
        operator()() const noexcept
        {
          using namespace literals::chrono_literals;
          auto __now = __wait_clock_t::now();
          if (_M_deadline <= __now)
            return false;

          auto __elapsed = __now - _M_t0;
          if (__elapsed > 128ms)
          {
            this_thread::sleep_for(64ms);
          }
          else if (__elapsed > 64us)
          {
            this_thread::sleep_for(__elapsed / 2);
          }
          else if (__elapsed > 4us)
          {
            __thread_yield();
          }
          else
            return false;
          return true;
        }
      };

After the usual spin completes unsatisfied, this will begin sleeping the current thread as long as the deadline hasn't been reached. The relative wait_for variants are implemented in terms of the absolute wait_until members:

    struct __timed_waiter
    {
      // ...

      template<typename _Tp, typename _ValFn,
               typename _Rep, typename _Period>
        bool
        _M_do_wait_for_v(_Tp __old, _ValFn __vfn,
                         const chrono::duration<_Rep, _Period>& __rtime) noexcept
        {
          __platform_wait_t __val;
          if (_M_do_spin_v(__old, std::move(__vfn), __val))
            return true;

          if (!__rtime.count())
            return false; // no rtime supplied, and spin did not acquire

          auto __reltime = chrono::ceil<__wait_clock_t::duration>(__rtime);

          return __base_type::_M_w._M_do_wait_until(_base_type::_M_addr,
                                                    __val,
                                                    chrono::steady_clock::now() + __reltime);
        }

        template<typename _Pred,
                 typename _Rep, typename _Period>
          bool
          _M_do_wait_for(_Pred __pred,
          const chrono::duration<_Rep, _Period>& __rtime) noexcept
          {
            __platform_wait_t __val;
            if (__base_type::_M_do_spin(__pred, __val))
              return true;

            if (!__rtime.count())
              return false; // no rtime supplied, and spin did not acquire

            auto __reltime = chrono::ceil<__wait_clock_t::duration>(__rtime);

            return _M_do_wait_until(__pred, __val, chrono::steady_clock::now() + __reltime);
          }
    };

The rest of __atomic_semaphore

With support for atomic timed waits in place, we can define the remaining members of __atomic_semaphore:

  struct __atomic_semaphore
  {
    // ...

    template<typename _Clock, typename _Duration>
      _GLIBCXX_ALWAYS_INLINE bool
      _M_try_acquire_until(const chrono::time_point<_Clock,
                           _Duration>& __atime) noexcept
      {
        auto const __pred =
        [this] { return _S_do_try_acquire(&this->_M_counter); };

        return __atomic_wait_address_until_bare(&_M_counter, __pred, __atime);
      }

      template<typename _Rep, typename _Period>
        _GLIBCXX_ALWAYS_INLINE bool
        _M_try_acquire_for(const chrono::duration<_Rep, _Period>& __rtime) noexcept
        {
          auto const __pred =
          [this] { return _S_do_try_acquire(&this->_M_counter); };

          return __atomic_wait_address_for_bare(&_M_counter, __pred, __rtime);
        }
    };

You might have observed that there are more __atomic_wait_address_for/until variations than are actually used by __atomic_semaphore. C++26 will likely add timed versions of wait() to atomic as well as a version that accepts a predicate. The underlying support for these operations is already present but not yet exposed via the interface of atomic, and the details are subject to change in a future version of GCC.

The rest of <semaphore>

The semaphore implementation to use is conditionally chosen based on the presence of the atomic wait feature-test macro:

      #if defined __cpp_lib_atomic_wait
      using __semaphore_impl = __atomic_semaphore;
      #elif _GLIBCXX_HAVE_POSIX_SEMAPHORE
      using __semaphore_impl = __platform_semaphore;
      #endif

We then implement std::counting_semaphore in terms of __semaphore_impl:

    template<ptrdiff_t __least_max_value = __semaphore_impl::_S_max>
      class counting_semaphore
      {
        static_assert(__least_max_value >= 0);
        static_assert(__least_max_value <= __semaphore_impl::_S_max);

        __semaphore_impl _M_sem;

        public:
        explicit counting_semaphore(ptrdiff_t __desired) noexcept
          : _M_sem(__desired)
          { }

        ~counting_semaphore() = default;

        counting_semaphore(const counting_semaphore&) = delete;
        counting_semaphore& operator=(const counting_semaphore&) = delete;

        static constexpr ptrdiff_t
        max() noexcept
        { return __least_max_value; }

        void
        release(ptrdiff_t __update = 1) noexcept(noexcept(_M_sem._M_release(1)))
        { _M_sem._M_release(__update); }

        void
        acquire() noexcept(noexcept(_M_sem._M_acquire()))
        { _M_sem._M_acquire(); }

        bool
        try_acquire() noexcept(noexcept(_M_sem._M_try_acquire()))
        { return _M_sem._M_try_acquire(); }

        template<typename _Rep, typename _Period>
          bool
          try_acquire_for(const std::chrono::duration<_Rep, _Period>& __rtime)
          { return _M_sem._M_try_acquire_for(__rtime); }

        template<typename _Clock, typename _Dur>
          bool
          try_acquire_until(const std::chrono::time_point<_Clock, _Dur>& __atime)
          { return _M_sem._M_try_acquire_until(__atime); }
      };

And std::binary_semaphore is just a type alias:

      using binary_semaphore = std::counting_semaphore<1>;

Next time

With all of the waiting detail and semaphores out of the way, the next installment will look at <latch> and <barrier>.

Last updated: February 5, 2024