Concurrency with Modern C++ Notes 2: Multithreading

Chapter 2 Multithreading

2.1 Threads

std::thread has no copy operations. It accepts a callable as work package, whose return value is ignored.

The creator of std::thread should manage its lifecycle, i.e. it should invoke join() to wait the thread ends or detach() to detach itself from the thread. Actually, before join() or detach() is called, the thread is joinable, and the destructor of a joinable thread throws a std::terminate exception.

One thing worth noting is that detached threads will terminate with the executable binary, which means when the main thread exits, all detached threads will also exit even if their work package hasn’t fully done. Take below for an example:

c++

int main() {
    std::thread t([] { std::cout << "hello" << std::endl; });
    t.detach();
    // if this line is commented, "hello" may not be printed
    // std::this_thread::sleep_for(std::chrono::milliseconds(1));
    return 0;
}

std::thread‘s constructor is a variadic template. So if you want to pass argument by reference, it needs to use std::ref even if the parameter of the callable as work package is reference.
We can use swap() method to swap (in a move way) two threads.
We can use std::thread::native_handle() to get information about system-specific implementation of std::thread.

2.2 Shared Data

Insertion to and extracting from global stream objects (like std::cin, std::cout) are thread-safe, although the output statements can interleave. In another word, writing to std::cout is not a data race but a race condition (of output statements).
There are many kinds of mutex. Most basically, there is a std::mutex, which supports lock(), try_lock() and unlock(). Then it’s std::recursive_mutex, which can lock many times and stay locked until unlock as many times as it has locked. There also std::timed_mutex and std::recursive_timed_mutex which support try_lock_for() and try_lock_until().

std::shared_timed_mutex (since C++14) and std::shared_mutex (since C++17) also provide a series of methods of *_lock_shared_*, which can be used to implement read-write lock (introduced later).
Cool, right? Since we have mutex we can write some code like this:
c++
1
2
3
4
std::mutex m;
m.lock();
sharedVariable = getVar();
m.unlock();
However, it’s quite prone to deadlock due to the getVar(): what if it throws an exception? what if it also acquire the mutex m? what if it’s a library function and someday gets upgraded with some code you never know?

So apparently, it’s better to avoid calling functions while holding a lock.
To solve deadlocks, we can use locks: std::lock_guard, std::unique_lock, std::shared_lock(since C++14) and std::scoped_lock (since C++17).

First let’s look at std::lock_guard. Maybe you’ve heard about RAII. Yep, that’s the mechanism std::lock_guard uses to solve the deadlock which happens when you forget to release the lock (maybe because an exception is thrown):
c++
1
2
3
4
5
{
std::mutex m;
std::lock_guard<std::mutex> lockGuard(m);
/* critical section */
}
Then it’s std::unique_lock, which is stronger but more expensive than std::lock_guard. For example it enables you to create a lock without locking the mutex immediately, recursively lock a mutex and so on.

One thing worth noting is that we can use std::lock(), which is a variadic template, to lock multiple mutexes in an atomic step:
c++
1
2
3
4
std::mutex a, b;
std::unique_lock<std::mutex> guard1(a, std::defer_lock);
std::unique_lock<std::mutex> guard2(b, std::defer_lock);
std::lock(guard1, guard2);
Here comes std::shared_lock, which behaves like std::unique_lock, except in the condition that it’s used with std::shared_mutex or std::shared_timed_mutex (which are introduced before). It can be used to implement a read-write lock. To be more precise, std::lock_guard<std::shared_mutex> or std::unique_lock<std::shared_mutex> is used for write lock while std::shared_lock<std::shared_mutex> is used for read lock. This is essentially because std::shared_mutex supports both *_lock_* and *_lock_shared_* methods which invoked separately by std::unique_lock and std::shared_lock.
Finally it’s std::scoped_lock. Still remember the std::lock() function? Yep, they are very similar. Actually, std::scoped_lock‘s constructor is a variadic template, which 1) behaves like a std::lock_guard when there is just one mutex argument, 2) invokes std::lock() when there are multiple mutex arguments.

In another word, std::scoped_lock can lock many mutexes in an atomic step.
Sometimes we need to ensure that objects are initialized in a thread-safe way (imagine the singleton design pattern), typically there are three ways to do that (ok, if you count in initializing objects in main thread before creation of child threads, there are four).

The first is use constexpr to initialize objects as constant expressions in compile time. Note that an object can be annotated as constexpr only if its class satisfies some restrictions. For example, it cannot have virtual base class and virtual methods; it’s constructor must be empty (except for the initialization list) and const expression; its base classes and non-static members should all be initialized (in the initialization list) and so on.

The second is to use std::call_once and std::once_flag. The semantic is easy to understand: std::call_once is a function, which accepts two parameters, the first one is a std::once_flag and the second one is a callable. We can invoke std::call_once many times with the same std::once_flag, and exactly one callable of them will be executed exactly once.

Use this to implement singleton:

c++

class MySingleton {
private:
    static std::once_flag initInstanceFlag;
    static MySingleton* instance;
    MySingleton() = default;
    ~MySingleton() = default;

public:
    MySingleton(const MySingleton&) = delete;
    MySingleton& operator=(const MySingleton&) = delete;

    static MySingleton* getInstance(){
        std::call_once(initInstanceFlag, MySingleton::initSingleton);
        return instance;
    }

    static void initSingleton(){
        instance = new MySingleton();
    }
};

MySingleton* MySingleton::instance = nullptr;
std::once_flag MySingleton::initInstanceFlag;

The third is static variables with block scope. Those static variables are created exactly once and lazily, which means they won’t get created until used. And since C++11, there is another guarantee: static variables with block scope are created in a thread-safe way (but it seems to be dependent on compiler implementations). So we can write a singleton class like this:

c++

class MySingleton {
public:
    static MySingleton& getInstance() {
        static MySingleton instance;
        return instance;
    }

private:
    MySingleton() = default;
    ~MySingleton() = default;
    MySingleton(const MySingleton&) = delete;
    MySingleton& operator=(const MySingleton&) = delete;
};

2.3 Thread-Local Data

Actually I’ve never heard about the thread_local keyword in C++ before. This keyword acts like static:
- if it qualifies a variable in namespace scope or as static class member, the variable will be created before its first usage:
  c++
  1
  2
  3
  4
  5
  6
  class A {
  public:
  thread_local static int x;
  };
  
  thread_local int A::x;
- if it qualifies a variable in a function, the variable will be created at its first usage:
  c++
  1
  2
  3
  void f() {
  thread_local int a;
  }
The difference between thread_local and static is that variables qualified by the former have lifecycle bound to the thread which created them while ones qualified by the latter have lifecycle bound to the main thread.

2.4 Condition Variables

std::condition_variable is literally a condition variable, which provides methods like notify_one(), notify_all(), wait() and so on.

The wait() method usually accepts two parameters, the first one is a std::unique_lock and the second one is a callable called Predict. Let’s take a closer look.
Why does the lock need to be a std::unique_lock instead of std::lock_guard? Be aware that when wait() is invoked, the lock gets released and actually we will see later that the lock gets acquired and released repeatedly, so we need a std::unique_lock instead of a one-time std::lock_guard.

Then what’s the role of Predict? When talking about condition variables, we should be clear about these two phenomena: lost wakeup and spurious wakeup. Lost wakeup is to say the notify could come before the wait while spurious wakeup is to say the thread in waiting state could wake up itself even if there is no notification. Predict is to solve these problems:

c++

std::unique_lock<std::mutex> lck(mutex_);
condVar.wait(lck, []{ return dataReady; });

// equivalent to
std::unique_lock<std::mutex> lck(mutex_);
while ( ![]{ return dataReady; }() ) {
    condVar.wait(lck);
}

The dataReady in the above example is a flag used to synchronize the notification. It doesn’t need to be an atomic, but it must be protected by a mutex (we can use std::lock_guard here):
c++
1
2
3
4
5
{
std::lock_guard<std::mutex> lck(mutex_);
dataReady = true;
}
condVar.notify_one();
If not protected by a mutex, it may happen that the modification to dataReady and notification are executed rightly after the Predict check and before the condition variable wait, which will cause the thread to wait forever.

2.5 Tasks

Task is also a mechanism to perform work package asynchronously. Different from threads, tasks are not necessarily in another thread. Actually, the workflow of tasks is to perform the work package and produce the promise, and the result can be synchronized through a future.

std::async is a simple way to create a task, and its return value is the future of the task. Other than the work package, we can pass in a policy when invoking std::async, which can be std::launch::deferred for lazy evaluation or std::launch::async for eager evaluation.

Also, it’s not necessary to assign the return value of std::async to a variable. In another word, we can just invoke it and dismiss its return value, in which case the future is called fire and forget future:

c++

1	std::async(std::launch::async, []{ std::cout << "fire and forget" << std::endl; });

Note that we need std::launch::async to make sure a eager evaluation because we have no future to wait on.

However, there is an inconspicuous drawback here: future waits on its destructor until its promise is done. In the case of fire and forget futures, the futures are temporary, whose destructor gets invoked immediately after the std::async creating them. So the async is actually, umm, a fake one:

c++

std::async(std::launch::async, [] {
    std::this_thread::sleep_for(std::chrono::seconds(5));
    std::cout << "first thread" << std::endl;
});
/* waiting for the promise done */

std::async(std::launch::async, [] {
    std::this_thread::sleep_for(std::chrono::seconds(1));  
    // get printed after 6 seconds instead of 1
    std::cout << "second thread" << std::endl;
});

std::package_task is another way to create a task which is not executed immediately. Actually its usage typically consists of four steps:

c++

// 1. create the task
std::packaged_task<int(int, int)> sumTask([](int a, int b){ return a + b; });

// 2. assign to a future
std::future<int> sumResult = sumTask.get_future();

// 3. do the execution
sumTask(2000, 11);

// 4. wait on the future
sumResult.get();

To my understanding, std::async combines the first three steps together and the task created by it cannot accept parameters.

If we want to execute the task and wait on the future multiple times, it needs to invoke the reset() method of std::packaged_task.

std::promise can set not only a value but also an exception with set_exception() method. If that’s the case, the corresponding future will encounter the exception when invoking the get() method.

std::future can use valid() method to check if the shared state is available and use wait_for() or wait_until() to wait with a timeout. The latter returns a std::future_status, which is a scoped enum class with enumerations of deferred, ready and timeout (to be frank, I don’t know what deferred means)
Different from std::future, std::shared_future is copyable and can be queried multiple times.

We have two ways to get a std::shared_future: get_future() method of std::promise and share() method of std::future. Note that after invocation of share(), the valid() method of std::future shall return false.
I think it needs a clarification about available shared state here. We know valid() method of std::future or std::shared_future indicates whether an available shared state exists. In another word, if it returns true, wait() method can be called without exception; if it returns false, wait() will result in an exception.

For initialized std::future, before the first get(), wait() or share(), the valid() will return true; while after that, valid() shall return false. And for initialized std::shared_future, valid() shall always return true, which means you can always query on a std::shared_future.
If the callable used in std::async and std::packaged_task throws an exception, it will be stored in the shared state (just like what set_exception() method of std::promise does), and rethrown when queried by future. One thing worth noting is that std::current_exception() can be used to get the caught exception in the catch block.
void as the template argument, std::promise and std::future could be used for notification and synchronization. Compared to condition variables, the task-based notification mechanism could not perform synchronization multiple times (since std::promise could only set its value once and std::future could only query once) but needn’t a shared variable or mutex and isn’t prone to lost wakeup or spurious wakeup.

So the conclusion is that if multiple synchronization is not needed, task-based notification mechanism is preferred.

Link

https://subscription.packtpub.com/book/programming/9781839211027/5