Skip to content

Internals Local data

Raymond Chen edited this page Feb 7, 2020 · 1 revision

Process-local data and thread-local data are an internal feature of WIL, not for public consumption.

These types are in the wil::details_abi namespace, rather than the wil::details namespace because they are shared across modules within a process and are therefore held to a much higher stability requirement than WIL itself. Multiple versions of WIL must be able to coexist within a single process.

Do not use these types directly. They are internal implementation details of WIL. The types are documented here to facilitate debugging.

SemaphoreValue class

The SemaphoreValue class permits multiple components to share a 31-bit integer, 62-bit integer, or pointer. The shared value is identified by a string name. For cross-process sharing within a session, the mutex name is some agreed-upon string. For per-process sharing, the mutex name needs to incorporate the process ID to avoid collisions.

In practice, the SemaphoreValue is used for sharing within a process.

Once the shared value is set, it cannot be changed until the shared value is destroyed.

When the last valid SemaphoreValue for a particular name is destructed, the shared value is destroyed.

The SemaphoreValue object must be used in conjunction with a shared mutex (the "serialization mutex") that is held by all clients during any read or write operations. Each semaphore name typically has its own serialization mutex.

The SemaphoreValue class is default-constructible and movable. It is not copyable.

A newly-constructed SemaphoreValue is in the "empty" state.

The SemaphoreValue object can be used from multiple threads, but not from more than one thread simultaneously.

Usage pattern

The (simplified) intended usage pattern is something like this:

SemaphoreValue g_publishedValue;

GetOrCreateThing()
{
    auto mutexName = L"some-well-known-name";
    auto valueName = L"another-well-known-name";

    unique_mutex_nothrow mutex;
    mutex.reset(::CreateMutexExW(nullptr, mutexName));

    auto lock = mutex.acquire();

    // Must hold the lock when calling TryGetXxx or CreateFromXxx

    RETURN_IF_FAILED(SemaphoreValue::TryGetXxx(valueName, ...));
    if (value already exists)
    {
        // The thing already existed, so return it.
        return value;
    }

    // No thing exists yet, so make one.
    value = make_a_thing();

    // Try to publish the thing.
    SemaphoreValue semaphoreValue;
    RETURN_IF_FAILED(semaphoreValue.CreateFromXxx(valueName, value));

    // Save the SemaphoreValue to keep the published value alive.
    g_publishedValue = std::move(semaphoreValue);

    return value;
}

CleanUpThing()
{
    auto mutexName = L"some-well-known-name";
    auto valueName = L"another-well-known-name";

    unique_mutex_nothrow mutex;
    mutex.reset(::CreateMutexExW(nullptr, mutexName));

    auto lock = mutex.acquire();

    // Must hold the lock when calling TryGetXxx or CreateFromXxx

    RETURN_IF_FAILED(SemaphoreValue::TryGetXxx(valueName, ...));
    if (value already exists)
    {
        DestroyValue(value); // clean up any resources associated with the value
        g_publishedValue.Destroy(); // unpublish it, if we were the publisher
    }
}

This pattern is simplified because there is a race condition where the value is destroyed by CleanUpThing while another thread is still using it. In practice, the value is either an integer (no destruction required) or a reference-counted pointer (in which case return value; would also bump the reference count before returning it).

Design

The SemaphoreValue class uses a named semaphore to hold a 31-bit integer by recording the integer in the semaphore's token count. For 62-bit integers, two semaphores are used, one for the low-order bits and one for the high-order bits.

Since operations on the semaphore mutate the token count, the caller must hold ensure that only one caller is accessing the semaphore at a time. This is typically done with yet another mutex.

__WI_SEMAHPORE_VERSION

__WI_SEMAHPORE_VERSION is a macro which defines a suffix applied to the semaphore name. The version number changes whenever there is a breaking change to the semaphore storage mechanism. So far, there have been no breaking changes, so the macro is at its original value of "_p0".

In the case of a 62-bit integer, the upper 31 bits are stored in a semaphore with the letter h appended to its name.

Sorry about the typo in the word SEMAHPORE.

Setting the initial value

Set the initial value by calling CreateFromValue or CreateFromPointer. Only the first call will set the initial value. All attempts to set the initial value for a particular name must agree on what kind of value is stored. If one caller uses CreateFromPointer and another uses CreateFromValue, the result is undefined.

template<typename T>
HRESULT CreateFromValue(PCWSTR name, T value);

HRESULT CreateFromPointer(PCWSTR name, void* pointer);
  • The CreateFromValue method stores the specified value into the semaphore. If T is a 32-bit data type or smaller, then the value must fit inside a 31-bit unsigned integer. If T is a 64-bit data type, then the value must fit inside a 62-bit unsigned integer. If the value is out of range, the process fails fast. In practice, this means that T must be an unsigned integer type, because the sign bit will cause the value to exceed the supported range.
  • The CreateFromPointer method stores the specified pointer into the semaphore. The pointer must be 4-byte aligned. This is not a significant limitation in practice. If the pointer is not 4-byte aligned, the process fails fast.

These methods must be called with the serialization mutex held.

If a value has already been assigned to the semaphore, then the original value is left unchanged, and the operation is considered to have succeeded.

If the function succeeds, then the SemaphoreValue enters the "valid" state, and the shared value remains valid until the SemaphoreValue is destroyed.

If the function fails, then the SemaphoreValue is in an indeterminate state and should be destroyed. Until that time, the value is poisoned, and others may not be able to create the value.

Known issues: On failure to create the second semaphore (for a 62-bit value), we do not clean up the first semaphore, resulting in the poisoned state mentioned above.

Retrieving the value

The value is retrieved by calling CreateFromValue or CreateFromPointer. It is the caller's responsibility to read the value in the same way it was written. For example, if the value was created with CreateFromValue<int32_t>(), it must be read with TryGetValue<int32_t>().

template<typename T>
static HRESULT TryGetValue(PCWSTR name, _Out_ T* value, _Out_opt_ bool *retrieved = nullptr);

static HRESULT TryGetPointer(PCWSTR name, _Outptr_result_maybenull_ void** pointer);
Scenario Return value *value on exit *retrieved on exit
Value not yet created S_OK 0 false
Value has been created S_OK the created value true
Unable to retrieve value Error 0 false

The TryGetPointer does not have a retrieved parameter, so you cannot distinguish between the case where no value has been created and the case where the value of nullptr was created. In practice, this is not an issue because the created value is always non-null.

These methods must be called with the serialization mutex held.

Destroy

The Destroy method returns the SemaphoreValue class to the empty state. When there are no more "ready" SemaphoreValue objects for a particular name, the shared value is destroyed.

It is okay to call Destroy on an already-empty SemaphoreValue object.

You do not need to hold the serialization mutex to destroy the SemaphoreValue.

Process-local data storage

The ProcessLocalStorageData<T> template class is a reference-counted wrapper around process-wide data.

In practice, you don't create the object yourself. Instead, you let the Acquire method create the object on demand.

The type T must satisfy the same requirements as the T in manually_managed_shutdown_aware_object<T> (see Shutdown-aware objects). It has the following additional requirements:

  • May not override operator&.
  • Must be interoperable with earlier versions with the same sizeof(T).

The T object is constructed on demand when when Acquire is called, and it is destructed when the last Release occurs.

By convention, the first field of the T is a unsigned short size, to allow interop between different versions of the structure.

Design

The lifetime of the ProcessLocalStorageData is managed by a reference count m_refCount. The serialization mutex m_mutex must be held when acecssing the reference count.

The m_mutex is the serialization mutex required by the SemaphoreValue. Its name is generated from the name passed to Acquire.

The m_value is the SemaphoreValue that publishes the pointer to the data. Its name is the same as the name of the serialization mutex.

The m_data is the T object itself.

If Release is called during process shutdown, we do not acquire the serialization mutex. This is okay because all other contending threads have been terminated. Some processes corrupt the handle table during shutdown, and bypassing the serialization mutex avoids crashing during shutdown.

Acquire and Release

static HRESULT Acquire(
    PCSTR staticNameWithVersion,
    _Outptr_result_nullonfailure_ ProcessLocalStorageData<T>** data);

void Release();

Call Acquire with an agreed-upon name of the object, which is usually a hard-coded string literal. The Acquire method automatically adds the process ID and the sizeof(T) into the name to make this a versioned per-process object.

If the object does not exist, the Acquire method will create it using the default constructor. For POD types, the memory is zero-initialized.

On success, Acquire produces a reference-counted pointer to the ProcessLocalStorageData<T> object.

Call Release when you are finished with the data. This decrements the reference count and destructs the object if the last reference is deleted.

GetData

T* GetData();

After you have used Acquire to obtain a pointer to a ProcessLocalStorageData<T>, use the GetData() method to access the T object inside it. Note that this T object is shared with the entire process, so you will probably need to take additional precautions to ensure thread-safe access.

Process-local storage

The ProcessLocalStorage<T> template class is a smart pointer wrapper around ProcessLocalStorageData<T>. This is what you be using most of the time.

Constructor

ProcessLocalStorage<T>(PCSTR staticNameWithVersion) noexcept;

The staticNameWithVersion provides the agreed-upon name of the shared object. It must be a pointer to a string whose lifetime encloses that of the ProcessLocalStorage<T> object. In practice, it is always a string literal.

Upon construction, the object is "empty". It does not manage a ProcessLocalStorageData<T> pointer.

Destructor

~ProcessLocalStorage<T>() noexcept;

Upon destruction, the ProcessLocalStorage<T> decrements the reference count of the ProcessLocalStorageData<T> pointer, if one was obtained.

Obtaining access to the shared object

T* GetShared() noexcept;

The GetShared() method attempts to acquire a reference-counted pointer to the shared T object. If successful, it returns a pointer to the shared T object. If not successful, it returns nullptr.

If the function is successful, the returned pointer is valid until the ProcessLocalStorage<T> object is destructed. Conversely, you must destruct the ProcessLocalStorage<T> to release the reference count on the shared T object.

You can call GetShared() multiple times. Once it succeeds, the result is cached and reused for further calls to GetShared().

Thread-local storage

The ThreadLocalStorage<T> template class provides thread-local storage in a way that can be coordinated across modules within a process.

The object is default-constructible. It is not copyable or movable.

The T must have a public nonthrowing default constructor and public destructor.

By convention, the first field of the T is a unsigned short size, to allow interop between different versions of the structure.

Design

The storage takes the form of a hash table with a fixed number (10) of buckets. Thread IDs are hashed into buckets by a simple % operation, which means that half of the buckets are unused because thread IDs are in practice always a multiple of 4. We cannot fix this because it would be an ABI-breaking change.

Each bucket is a singly linked list of nodes. Adding a node is done in a lock-free manner.

Nodes are destroyed only when the ThreadLocalStorage destructs, so the ThreadLocalStorage will accumulate empty nodes for threads that have exited.

In practice, the ThreadLocalStorage<T> is used in two places.

  • Shared between modules by putting a ThreadLocalStorage<ThreadLocalData> inside a ProcessLocalStorage. This destructs when the last WIL client DLL unloads from the process.
  • Private to a DLL in the form of a ThreadLocalStorage<ThreadFailureCallbackHolder*>. This destructs when the DLL unloads.

GetLocal

T* GetLocal(bool shouldAllocate = false) noexcept;

Returns a pointer to the T object associated with the current thread. If no such object exists, and shouldAllocate is true, then one is created and returned.

If there is no object associated with the current thread, and either shouldAllocate is false or the memory for the T could not be allocated, then returns nullptr.

Note that nullptr can be returned even if shouldAllocate is true.

Thread-local failure info

The ThreadLocalFailureInfo records a failure that have been observed on a thread. Each entry is used to record a failure observed by a WIL result macro.

Type Name Description
unsigned short size For versioning.
unsigned char[2] reserved1 Alignment padding.
unsigned int sequenceId Unique increasing sequence number.
HRESULT hr The failure code.
PCSTR fileName The file where the failure occurred.
unsigned short lineNumber The line number where the failure occurred.
unsigned char failureType 0 = Exception, 1 = Return, 2 = Log, 3 = FailFast
unsigned char reserved2 Alignment padding.
PCSTR modulePath DLL where the failure occurred.
void* returnAddress Return address.
void* callerReturnAddress Caller return address.
PCWSTR message Message.
void* stringBuffer Buffer for holding fileName, modulePath, and message.
size_t stringBufferSize Size of stringBuffer buffer.

The stringBuffer never shrinks.

Methods

void Clear();

Frees the memory in the stringBuffer.

void Set(const FailureInfo& info, unsigned int newSequenceId);

Initializes the fields based on the failure info and sequence ID.

void Get(FailureInfo& info);

Copy the info from the fields back to the specified failure info object.

Thread-local data

The ThreadLocalData assigns failures to a circular list of ThreadLocalFailureInfo structures.

Currently, the 5 most recent errors on a thread are recorded. Errors are ignored if they repeat an already-recorded error for the current subscriber, on the assumption that they are propagations rather than origination.

Type Name Description
unsigned short size For versioning.
unsigned int threadId The thread this object is assigned to.
volatile long* failureSequenceId Pointer to a shared value that is used to generates unique IDs.
unsigned int latestSubscribedFailureSequenceId The sequence ID that was current when the most recent active subscriber joined.
ThreadLocalFailureInfo* errors Array of entries for recording the most recent errors.
unsigned short errorAllocCount Size of errors array.
unsigned short errorCurrentIndex The entry that contains the most recent error.

The latestSubscribedFailureSequenceId lets us detect whether an error belongs to the current subscriber. If latestSubscribedFailureSequenceId is zero, then there are no subscribers. When a subscriber joins, it sets latestSubscribedFailureSequenceId to the current sequence ID. When a subscriber leaves (which is always LIFO), it resets latestSubscribedFailureSequenceId to its previous value. In this way, the latestSubscribedFailureSequenceId is managed like a stack.

The destructor calls Clear() to clean up and free the ThreadLocalFailureInfo structures.

To obtain a ThreadLocalData for the current thread, call this function:

    __forceinline ThreadLocalData* GetThreadLocalData(bool allocate = true)
ThreadLocalData* GetThreadLocalDataCache(bool allocate = true);
ThreadLocalData* GetThreadLocalData(bool allocate = true);

The allocate parameter specifies whether the function shoudl attempt to create a per-thread ThreadLocalData if one does not already exist. (Note that creation may fail due to low memory.) Returns nullptr on failure.

The two functions are identical. (The second forwards to the first.) I think you are meant to call GetThreadLocalData.

Methods

void Clear();

Cleans up and frees the ThreadLocalFailureInfo structures.

bool EnsureAllocated(bool create = true);

Allocates the errors array if necessary. If create is false, then merely report whether the errors array has been created.

Creation may fail due to insufficient memory.

void SetLastError(FailureInfo& info);

If this error is new to the current listener, create an entry in the circular buffer to record it and assign it a unique sequence ID.

When the WIL result macros observe a failure, they call g_pfnGetContextAndNotifyFailure which is normally set to GetContextAndNotifyFailure, which calls wil::SetLastError(), which calls ThreadLocalData::SetLastError.

bool GetLastError(_Inout_ wil::FailureInfo& info, unsigned int minSequenceId, HRESULT matchRequirement)

Look through the circular buffer for the oldest event whose sequence ID is at least minSequenceId and which represents the error matchRequirement. If matchRequirement is S_OK, then any error is acceptable.

if found, copy the error information to info and return true.

Otherwise, return false.

bool GetCaughtExceptionError(
    _Inout_ wil::FailureInfo& info,
    unsigned int minSequenceId,
    _In_opt_ const DiagnosticsInfo* diagnostics,
    HRESULT matchRequirement,
    void* returnAddress)

This method must be called from inside an exception handler.

Look for a matching error (see GetLastError) that also matches the current exception. If found, copy it to info and return true.

If no such error is found, then create a new one for this exception (FailureType::Log), using the specified returnAddress and diagnostics if provided. Copy that error to info and return true if the operation succeeded.

Process-local data

The ProcessLocalData is shared across all WIL clients in the process.

Type Name Description
unsigned short size For versioning.
volatile long failureSequenceId Shared generator for unique IDs.
ThreadLocalStorage<ThreadLocalData> threads Per-thread information.
Clone this wiki locally