|   |   |  |   |   | 
Store executable instructions. i.e., functions and member functions.
Global and static variables.
Temporary variables.
Dynamic memory allocation. Notice that member variables could be both in stack or heap. It depends on how the object is instantiated.
|  |  |  |
| ——– | ——- | ——- |
| const_cast | CPP Reference |  |
| std::distance | CPP Reference | CPlusPlus.com |
| std::remove_reference | CPP Reference | CPlusPlus.com |
| std::forward | CPP Reference | CPlusPlus.com |
extern "C" {} to disable  name mangling.About intialize list: a class’s member variables will initialize before the constructor is excuted unless you write initializer list in the constructor. Also notice that the order the member variables initialize is not influenced by the order you write the initializer list.
In a class definition, do not write initializer for static members, unless static constexpr.
template<typename T> void foo(const T& x) {
  static int v=1; cout << v++ << endl;
}
int main() {
  foo(1); foo(1.1); // 1 1
}
extern in header file included by all .c files that need it. static is within the compilation unit so every .cpp files that include the header file has its own version of this variable.
    extern int GlobalState; and in one cpp file, write int GlobalState = 1;.// static int a = 1; // don't
int GetA() {
static int a = 1;
return a;
}
class Singleton {
  public:
    static Singleton& Instance() {
      static Singleton instance;
      return instance;
    }
    Singleton(const Singleton&) = delete;
    Singleton(Singleton&&) = delete;
    Singleton& operator=(const Singleton&) = delete;
    Singleton& operator=(Singleton&&) = delete;
  private:
    Singleton() = default;
    ~Singleton() = default;
};
=0 can have definition.=0 asks all the children override this function.child->base::foo()reinterpret_cast for non-pointer/reference: unsigned int j = 0xffffffffu; int jj = reinterpret_cast<int&>(j);.distance(first, second) returns the number of increments for first to get to second. For example, distance(container.begin(), container.end()) returns the contianer’s size.const_cast allows you to modify a constant pointer or reference. Noticed that it won’t work for constant variables.
    int i = 3;
const int& rci = i;
const_cast<int&>(rci) = 4;
const int i = 5;
const int* p = &i;
*const_cast<int*>(p) = 6; // won't work
You can use const_cast to implement T const& operator[](const size_t&) const with T& operator[](const size_t&):
T const& operator[] (const size_t& idx) const {
  return const_cast<T&>(*this)[idx];
}
  while(y) {
    // Carry now contains common set bits of x and y
    carry = x & y ;
    // Sum of bits of x and y where at least one of the bits is not set
    x = x ^ y; 
    // Carry is shifted by one so that adding it to x gives the required sum
    y = carry << 1; }
  return x;
bool hasDifferentSign(int m, int n) {
  return (m ^ n) < 0;
}
int getLowestBit(int n) {
  return n & (-n);
}
bool isPowerOfTwo(int n) {
  return (n & (n - 1)) == 0;
}
bool swap(int& a, int& b) {
  a = a ^ b;
  b = a ^ b;
  a = a ^ b;
}
std::remove_reference: make sure the parameter is value, not reference.
    template<typename T>
void foo(typename remove_reference<T>::type param) {}
And the implementation of std::move needs it:
template<typename T>
typename std::remove_reference<T>::type&& move(T&& t) {
  return static_cast<typename std::remove_reference<T>::type&&>(t);
}
Sample implementation:
template <class T>
struct remove_reference {
  typedef T type;
};
template <class T>
struct remove_reference<T&> {
  typedef T type;
};
template <class T>
struct remove_reference<T&&> {
  typedef T type;
};
std::forward: Perfect forwarding is there to ensure that the argument provided to a function is forwarded to another function (or used within the function) with the same value category (basically r-value vs. l-value) as originally provided.
Use it when forwarding parameters to another funcion inside a funtion that has universal reference because rvalue reference is lvalue.
Sample Implementation (think about Reference Collapsing):
template <typename T>
T&& forward(typename std::remove_reference<T>::type& param) {
  return static_cast<T&&>(param);
}
template <typename T>
T&& forward(typename std::remove_reference<T>::type&& param) {
  return static_cast<T&&>(param);
}
template<template<typename, typename> class Container, typename T, typename TT>
void foo(Container<T, TT>& p) {
  std::cout << typeid(T).name() << " " << typeid(TT).name() << "\n";
}
template<size_t N>
size_t getCStrLength(const char(&str)[N]) {
  return N;
}
int i = 0xffffffff;
unsigned int u = 0xffffffff;
cout << i << "\n";          // -1
cout << u << "\n";          // 4294967295
cout << 0xffffffff << "\n"; // 4294967295
+:
    foo foo::operator+ (const foo& c) const
{
  foo result;
  // ...
  return result;
}
+=:
    foo& foo::operator+= (const foo& c)
{
  //...
  return *this;
}
<< and >>:
    std::ostream& operator<<(std::ostream& os, const T& obj)
{
    // write obj to stream
    return os;
}
std::istream& operator>>(std::istream& is, T& obj)
{
    // read obj from stream
    if (/* T could not be constructed */)
        is.setstate(std::ios::failbit);
    return is;
}
TODO
std::thread can run a function on a new thread.
    #include <thread>
void foo() {}
int main() {
  std::thread worker(foo);
  // do other things...
  worker.join(); // wait for it to complete
}
join() / detach() two times will crash the program.thread’s destructor before join() or detach() will crash the program.main() returns, all the detached threads will be suspended by the OS.std::scoped_lock (C++ 17) or std::lock_guard (C++ 11) to lock a std::mutex so that you don’t need to release the lock manually.std::atomic<T>:
    std::atomic<bool> ready (false);
void foo() {
  while (!ready) std::this_thread::yield(); // use yiled when waiting
  // do something
}
constexpr int WORKER_NUM = 32;
std::counting_semaphore<WORKER_NUM> signal(0);
void Scheduler() {
  signal.release(WORKER_NUM);
}
void Worker() {
  signal.acquire();
}
std::mutex m;
std::condition_variable cv;
bool ready = false;
void Scheduler() {
  {
    std::lock_guard lk(m);
  }
}
void Worker() {
  std::unique_lock lk(m);
  cv.wait(lk, []{ return ready; });
  //...
  lk.unlock();
}
compare_exchange_weak vs. compare_exchange_strong: compare_exchange_weak may not exchange the desired value and return false even if the comparing value equals to the memory. compare_exchange_strong will guarentee the successfulness.T old = *address, assume;
do {
  assume = old;
  old = CAS(address, old, old+1);
} while (old != assume);
int lock = 0;
// ...
while (CAS(&lock, 0, 1) == 1) // try to lock it
  continue;
// ...
lock = 0;
template<typename T>
bool CircularQueue::Pop(T& val) {
  int head;
  do {
    head = _head.load();
    // if (head == _tail.load())
    if (head == _write.load()) // avoid reading the memory that hasn't been written
      return false;
    val = _buffer[head];
  } while (!_head.compare_exchange_strong(head, (head+1) % _capacity))
  return true;
}
template<typename T>
bool CircularQueue::Push(T& val) {
  int tail;
  do { // move the tail pointer
    tail = _tail.load();
    if ((tail+1) % _capacity == _head.load())
      return false;
  } while (!_tail.compare_exchange_strong(tail, (tail+1) % _capacity))
  _buffer[tail] = val;
  while (!_write.compare_exchange_strong(tail, (tail+1) % _capacity)) // move the write pointer after writing
  return true;
}
TODO
int x = 0, y = 0;
// thread 1
x = 1;
r1 = y;
cout << r1 << "\n";
// thread 2
y = 1;
r2 = x;
cout << r2 << "\n";
The results might all be zero because the optimization of the compiler, or different CPU core having different caches. To synchronize the order of memory RW, we can use mutex. We can also use memory barriers (faster than mutex).
Memory barriers will guarantee that the order of RW memories will be the same as the order of the source codes. E.g., thread 1 may execute r1 = y before x = 1 because the optimization by the compiler. Using memory barriers can force the execution order.
int x = 0, y = 0;
// thread 1
x = 1;
STORELOAD_FENCE();
r1 = y;
cout << r1 << "\n";
// thread 2
y = 1;
STORELOAD_FENCE();
r2 = x;
cout << r2 << "\n";
In this case the results might be 1 1, 0 1, 1 0 but never 0 0!
Load-Load
The loads before the barrier will happen before the loads after the barrier.
Store-Store
The stores before the barrier will happen before the stores after the barrier. Think of git push.
Load-Store
Store-Load
Every memory R/W after Acquire stays after Acquire.

Every memory R/W before Release stays before Release.

std::atomic<bool> x { false };
// in thread, write
x.store(true, std::memory_order_deq_cst);
// in thread, read
bool y = x.load(std::memory_order_deq_cst);
Sequence Consistent guarantees the memory RW order is the same as source codes (memory barriers).
It also guarantees if at a certain time t, a thread R/W the atomic variable, all the threads after t will see the results after this R/W. E.g., if thread 1 writes the variable with 1 at time t, then all the threads at t+1 loading the variable will get 1. (think of how it might not be 1 on a multi core CPU)
std::memory_order_relaxed. Used in shared_ptr
Relaxed guarantees the R/W is atomic. It doesn’t guarantee the order of executions of R/W is the same as the source codes. (no memory barriers)
See Memory Barriers.
```std::memory_order_release```
All the memory R/W before Release will be seen by Acquire.

### Consume
```std::memory_order_consume```.
Consume is weaker than Acquire. It won't carry dependancies from memories that not related to Release.

## Memory
- Packing (padding is the size of struct's largest element): 
  ```cpp
  struct A { // size 16
    int a;
    double e;
  };
  struct B { // size 12
    char a; char b; char c;
    int d;
    char e;
  };
  struct C { // size 24.
    // padding is not sizeof(A) but still sizeof(int). nested structures doesn't change padding
    // but sizeof(aa) is still A's size.
    int a;  // 8
    A aa;   // 16
  };
  struct D { // size 24.
    // padding is not sizeof(A) but still sizeof(int). nested structures doesn't change padding
    // but sizeof(aa) is still A's size.
    int a; // 8
    A aa;  // 16
    int b; //8
  };
To disable packing, #pragma pack(1). Without packing, CPU deoes more readings.
class A { // size 1
  char a;
}
class B { // size 16
  char a;
  virtual void foo() {}
}
| realloc | CPP Reference | CPlusPlus.com | 
| calloc | CPP Reference | CPlusPlus.com | 
| std::allocator | CPP Reference | CPlusPlus.com | 
realloc tries to expand or compress the exisiting memories. If there is not enough memory, the old memory block is not freed and null pointer is returned.
calloc takes two arguments and initializes the allocated memory block to zero: int *arr = (int*)calloc(n, sizeof(int));
    calloc is slower than malloc cause it will initialize the memory to 0.char* buffer = new char[sizeof(MyClass)]; MyClass* obj = new (buffer) MyClass();allocator is a wrapper for malloc and free.
    MyClass* buffer = std::allocator.allocate(5);
for (int i = 0; i < 5; ++i) {
  allocator.construct(&buffer[i], MyClass());
}
for (int i = 0; i < 5; ++i) {
  allocator.destroy(&buffer[i]);
}
allocator.deallocate(buffer, 5);
reduce(container.begin(), container.end).
    sum(container).auto multiply = [](int x, int y) { return x * y; }; int product = reduce(container.begin(), container.end(), 1, multiply);
    
  max_element(container.begin(), container.end)
    
  lower_bound, upper_bound or equal_range. The container must be sorted. lower_bound and upper_bound return the iterator. equal_range returns a pair of iterators. lower_bound returns the first element that not less than the query and upper_bound returns the first element that greater than the query.
    
  reverse(container.begin(), container.end()).
    
  partition(firstIt, secondIt, unaryPredicate) can divide the container into two groupds: those for which the predicate returns true and those for which it returns false. It returns the iterator to the first element of the second group. An good example of partition is Quick Sort.
    
  make_heap(container.begin(), container.end()). Directly constructs a heap. The first element is the top.
push_heap(container.begin(), container.end()). Only effective when the container is already a heap except for its last element. You want to insert a new element into the tail of the container and call push_heap.
pop_heap(container.begin(), container.end()). Only effective when the container is already a heap. The top will be swapped to the last element. [begin, end) is still a heap.void prim(vector<vector<ii>> const& adj, int s) {
  priority_queue<ii, vector<ii>, greater<ii>> pq;
  parents.resize(N);
  keys.resize(N, INT_MAX);
  keys[s] = 0;
  pq.emplace(0, s);
  while(!pq.empty()) {
    auto [d, u] = pq.top();
    pq.pop();
    if(d > keys[u])
      continue; //lazy deletion
    for(auto &[v,w]: adj[u]) {  
      if(w < keys[v]) { // note: difference here
        keys[v] = w;
        parents[v] = u;
        pq.emplace(keys[v], v);
      }
    }
  }
}
vector<int> sortArray(vector<int>& nums) {
  quickSort(nums, 0, nums.size()-1);
  return nums;
}
void quickSort(vector<int>& nums, int left, int right)
{
  if (left >= right) return;
  if (left < 0) return;
  if (right >= nums.size()) return;
  int pivot = nums[(left+right) / 2];
  swap(nums[left], nums[(left+right) / 2]);
  int i = left;
  int j = right;
  while (true)
  {
    while (nums[j] >= pivot && j > i) j--;
    if (j == i) break;
    while (nums[i] <= pivot && j > i) i++;
    if (j == i) break;
    swap(nums[i], nums[j]);
  }
  swap(nums[left], nums[i]);
  quickSort(nums, left, i-1);
  quickSort(nums, i+1, right);
}