Skip to content

Error logging and observation

Chris Guzak edited this page Dec 1, 2023 · 15 revisions

WIL's Error handling helpers allow you to observe and log messages generated by helper macros such as RETURN_IF_FAILED, THROW_IF_FAILED, and LOG_IF_FAILED, either right when the errors are reported (for logging) or after the fact.

If this is not enough for your needs, WIL's error handling helpers also provide other error handling customization features.

How errors are logged and handled by default

By default, WIL will log failures using OutputDebugString if a user-mode debugger is attached to the application process, but do nothing with them otherwise.

So if you have a program like this:

#include <exception>
#include "wil/result.h"

void BlowUp() try
{
    throw std::exception{ "Too bad!" };
} CATCH_LOG()

int main()
{
    LOG_HR(E_UNEXPECTED);
    BlowUp();
}

then when you run it under a debugger you will see WIL log messages like these:

X:\scratch.cpp(12)\scratch.exe!00073551: (caller: 00074A6C) LogHr(1) tid(4ac0) 8000FFFF Catastrophic failure
X:\scratch.cpp(8)\scratch.exe!00073512: (caller: 00073559) LogHr(2) tid(4ac0) 8007023E {Application Error}
The exception %s (0x    Msg:[std::exception: Too bad!] 

but when you run it without a debugger, you won't see any output at all.

If necessary, WIL lets you change how it detects a debugger; see Custom debugger detection.

Customizing error logging and handling with callbacks

You can add to the default logging behavior by registering callback functions with WIL. Be careful: don't write code in these callback functions that affects your code's logic.

SetResultLoggingCallback

void wil::SetResultLoggingCallback(CustomLoggingCallback* callback);

Use wil::SetResultLoggingCallback to set a process-wide callback function that WIL will call each time it logs a failure. The callback returns void and takes a single wil::FailureInfo parameter that describes the failure being logged:

void __stdcall CustomLoggingCallback(wil::FailureInfo const& failure) noexcept;

If you're writing the top-level code of an application, you might place a call to your preferred logging library in the SetResultLoggingCallback callback function. For this purpose, you can use wil::GetFailureLogString to compose a default log message string.

Because SetResultLoggingCallback changes process-wide state, you should call it only at process startup, not in the middle of normal execution, in order to avoid race conditions between threads setting and/or calling the callback.

You can clear a previously set callback by calling SetResultLoggingCallback(nullptr), but you cannot change the callback pointer from one non-null value directly to another.

SetResultLoggingCallback example

int main()
{ 
  // Print every log message to standard error.
  wil::SetResultLoggingCallback([](wil::FailureInfo const &failure) noexcept
  {
    constexpr std::size_t sizeOfLogMessageWithNul = 2048;

    wchar_t logMessage[sizeOfLogMessageWithNul];
    if (SUCCEEDED(wil::GetFailureLogString(logMessage, sizeOfLogMessageWithNul, failure)))
    {
      std::fputws(logMessage, stderr);
    }
  });

  // ...
}

SetResultTelemetryFallback

void wil::SetResultTelemetryFallback(CustomTelemetryFallback *callback);

If your app has a component that uploads reliability metrics, analytics, or telemetry, consider using wil::SetResultTelemetryFallback to set a per module (DLL or EXE) callback function that will be called as an opportunity to send failure messages to this telemetry client. This callback is syntactically similar to the one taken by SetResultLoggingCallback, but not the same:

void __stdcall CustomTelemetryFallback(bool alreadyReportedToTelemetry, wil::FailureInfo const& failure) noexcept;

The difference is that the SetResultTelemetryFallback callback also takes a alreadyReportedToTelemetry parameter. WIL uses this parameter to tell your callback's code whether the message has already been sent up to telemetry via a previously registered ThreadFailureCallback (see below).

The same warnings apply regarding race conditions when calling SetResultTelemetryFallback from multiple threads as for SetResultLoggingCallback.

You can call both SetResultTelemetryFallback and SetResultLoggingCallback in your program, and both callbacks will run.

SetResultTelemetryFallback example

wil::SetResultTelemetryFallback([](bool alreadyReportedToTelemetry, wil::FailureInfo const &failure) noexcept
{
    // ...
});

ThreadFailureCallback

auto wil::ThreadFailureCallback(CustomFailureCallback* callback);

Use wil::ThreadFailureCallback to add a thread-local callback function that WIL will call before outputting each error handling helper log message. The callback is similar to the one accepted by SetResultLoggingCallback, but returns a bool true/false flag instead of void.

bool CustomFailureCallback(wil::FailureInfo const& failure) noexcept;

Return true from the callback when you want to tell the process's SetResultTelemetryFallback callback that the thread has already uploaded a metric or telemetry data point for the failure, so that your process-wide SetResultTelemetryFallback callback doesn't send a duplicate telemetry data point.

You can register as many callbacks as you like with ThreadFailureCallback, and they all will run; each call to ThreadFailureCallback returns an RAII object that will automatically unregister the callback when it is destructed.

ThreadFailureCallback example

auto monitor = wil::ThreadFailureCallback([](wil::FailureInfo const &failure) noexcept
{
    bool didReportToTelemetry = false;
    // ...

    // Set didReportToTelemetry to true if we've sent the failure to the telemetry client.
    // If at least one ThreadFailureCallback callback returns true,
    // then a true alreadyReportedToTelemetry flag will be passed to the
    // SetResultTelemetryFallback callback.
    return didReportToTelemetry;
});

ThreadFailureCallback caveats

The ThreadFailureCallback RAII object must be constructed and destructed on the same thread, and it must follow stack-like behavior: The last one constructed on a thread must be the first one destructed. If you violate this rule, you are likely to crash in ThreadFailureCallbackHolder::GetThreadContext as it tries to walk a corrupted list of callbacks.

Violations of the LIFO rule can occur if an exception occurs that is handled by an outer stack frame, and that exception bypasses C++ destructors, typically because the exception is not covered by the current module's exception handling model. Examples:

  • A structured exception like an access violation occurs and is caught by an outer frame. The /Eha exception handling model detects some but not all of these cases.
  • A C++ exception is thrown from C code and is caught by an outer frame, and the module is compiled with /Ehc (assume C does not throw exceptions).
  • A C++ exception escapes a function marked noexcept and is caught by an outer frame. The /Ehr exception handling model detects and handles this case.

If your module is an out-of-process COM server, make sure to use the IGlobalOptions interface to disable automatic handling of structured exceptions:

wil::CoCreateInstance<IGlobalOptions>(CLSID_GlobalOptions)->
    Set(COMGLB_EXCEPTION_HANDLING, COMGLB_EXCEPTION_DONOT_HANDLE_ANY);

Another source of non-LIFO use is a coroutine that suspends with an active ThreadFailureCallback. The ThreadFailureCallback does not destruct even though control has returned to the caller, which will commonly lead to non-LIFO usage.

GetFailureLogString - printable log message

HRESULT wil::GetFailureLogString(PWSTR destBuffer, size_t destBufferSize, wil::FailureInfo const &failure) noexcept;

wil::GetFailureLogString outputs the same error message that WIL would print to the debugger output for the given error. If you've registered callback functions in order to send log messages to your own logger, but you like the string format already used for the debugger output, you can call GetFailureLogString to obtain the debugger output string without having to parse the error data structure yourself.

You need to allocate your own wchar_t buffer to hold the string output by GetFailureLogString, and then pass the buffer's size including a space for the null terminator.

If you want to change both the debugger output and the result of GetFailureLogString, you can use SetResultMessageCallback.

GetFailureLogString example

See the example for SetResultLoggingCallback.

ThreadFailureCache

The ThreadFailureCache remembers the most recent unique failure that occurred on its thread.

A failure is considered "unique" if has a different error code from the previous error. Errors are commonly propagated up the call stack, resulting in the same error code being reported multiple times. The ThreadFailureCache records only the first such error, which is most likely to be the point of origination.

The ThreadFailureCache is movable but not copyable. See details below for move semantics.

Warning: The ThreadFailureCache relies upon its destructor to clean up per-thread state. If the destructor is bypassed (for example, due to a handled structured exception), the per-thread state points to stack structures that no longer exist, and the code will probably crash in mysterious ways.

ThreadFailureCache constructors

  • ThreadFailureCache()
    Constructs a ThreadFailureCache and begins watching for errors. (See WatchCurrentThread.)

  • ThreadFailureCache(ThreadFailureCache&& other)
    Move constructor transfers the cached failure from the other object.

    The new object begins watching for errors even if the source object was not watching for errors.

    The source object's error-watching state is unchanged. You must stop it manually.

    The source object's error state is invalid. Do not call GetFailureInfo on the old object.

ThreadFailureCache destructor

  • ~ThreadFailureCache()
    If the object was watching for errors, it must be destructed from the same thread that called WatchCurrentThread().

ThreadFailureCache operators

  • ThreadFailureCache&& operator=(ThreadFailureCache&& other)
    Move assignment operator transfers the cached failure from the other object.

    The error-watching state of the source and destination objects is unchanged.

    The source object's failure information is invalid. Do not call GetFailureInfo on the source object.

ThreadFailureCache methods

  • IgnoreCurrentThread()
    Pause watching for errors from the current thread. It is an error to call this method when errors are already being ignored. This method must be called from the same thread that called WatchCurrentThread() most recently.

  • WatchCurrentThread()
    Watch for errors from the current thread, which can different from the thread that called IgnoreCurrentThread(). It is an error to call this method when errors are already being watched.

  • FailureInfo const* GetFailure()
    If a failure has been observed, returns a FailureInfo pointer that describes the most recent unique failure. Returns nullptr if no failure has been observed.

    The returned FailureInfo pointer is valid only until the thread next observes an error. Therefore, you should call GetFailure() only from the thread that is watching, or when the object is not watching at all.

ThreadFailureCache example

The ThreadFailureCache can be used as part of an RAII type that tracks a sequence of operations leading to a goal. If the enclosing type did not reach the success state prior to its destruction (due to early return or exception unwinding), it can use the ThreadFailureCache to infer a possible reason for the failure.

class OperationLogger
{
public:
    OperationLogger() = default;

    ~OperationLogger()
    {
        if (!m_complete)
        {
            auto info = m_cache.GetFailure();
            if (info)
            {
                // log info->hr, info->pszFile, info->uLineNumber, etc.
            }
            else
            {
                // failure not logged by WIL error macro.
                // Could be C++ exception like std::bad_alloc or premature return.
            }            
    }

    void Complete() { m_complete = true; }
private:
    bool m_complete = false;
    ThreadFailureCache m_cache;
};

Remarks

Leak detection tools may report a memory leak in wil::details_abi::ProcessLocalStorageData<wil::details_abi::ProcessLocalData>::MakeAndInitialize.

The first WIL client that uses the error reporting services allocates this data, which is then shared with all other WIL clients. The last WIL client to unload frees the memory. If the process terminates with active WIL clients, then the memory is intentionally leaked. (See Shutdown-aware objects.)

As a result, leak detection tools may generate false positives.

  • The module that allocated the memory is not necessarily the one that will free it. Leak detection tools may expect modules to free all the memory they allocated, unaware that this memory is shared between modules.
  • The memory is intentionally leaked if the process shuts down while there are still active WIL clients.

See Internals Local data for implementation details.

Clone this wiki locally