"Yet another type-safe printf? Isn’t that a solved problem?"
Many C++ formatting libraries claim to be a "type-safe printf" but
they are in reality alternatives to or replacements for printf
.
rostd::printf
is not one of those; it is literally type-safe printf
.
Standard C printf
(and, by extension, C++ std::printf
) is not
portably type-safe.
-
It does not guarantee that the parameters given to the function match the specifiers in the format string. This is not only a source of bugs, it’s also dangerous from a security perspective. Modern compilers typically implement extensions that will identify
printf
-like calls and can provide warnings at compile time when the specifiers don’t match the arguments. -
It is blind to typedefs. A typedef that is a
long int
might be along long int
when compiled for a different platform. The only way to make a typedef portable withprintf
is to give it its own format specifier. Some types have special standard format modifiers, such asstd::size_t
(%zu
). But the vast majority of types are relegated to using macros, as with types likeuint64_t
(PRIu64
) and all user-defined types. -
Some types don’t even have a standard portable definition or associated format specifier. An example of this is the various
std::chrono::duration
types, each of which has an underlying representation defined merely as "signed integer type of at least [N] bits." You cannot printduration.count()
using (for example)PRId64
or%jd
and remain strictly portable.
Special emphasis on the last sentence of point #2: if you want your own
types to be portably type-safe for use with printf
, you must
associate each of those types with their own custom format specifier.
For example:
using Identifier = std::uint64_t;
#define PRI_Identifier PRIu64
Then, the custom format specifier must be used everywhere as such:
printf("The identifier is %" PRI_Identifier ".\n", id);
This is awkward and cumbersome, difficult to read, and requires strict devotion and attention to detail. I have never seen it work out in practice.
As already mentioned, there are many alternatives to printf
available, from
standard iostreams to std::format
to many different open-source libraries.
We at Roku have investigated many of those options over the years. However,
Roku code has a strong history with printf
and we did not want that to
change, for several reasons:
-
Many (almost all) type-safe formatting alternatives cause the compiler to generate more code either at the call point and/or as part of an implementation library. At Roku we are naturally averse to these code size increases.
-
Given how much code needed to change, we were not comfortable adopting a strategy that would revolutionize how we do formatting (as iostreams would, for instance). It is too much work to transition and difficult for developers to adapt to it.
-
We could not afford to sacrifice runtime performance across the board. Even a formatting library that is faster most of the time, but doesn’t match or exceed current performance in all cases, is a risk.
-
There were other internal considerations related to how we do certain things (such as logging) that would make a transition away from
printf
formatting difficult and risky.
What if we could do this:
printf("The identifier is %?.\n", id);
and just print whatever is there.
We know the compiler can do error checking (with GCC, it is accomplished via
the format
attribute extension). The compiler should be able to go the
rest of the way and fix up the string for us, replacing the %?
specifier
with %llu
or whatever the type demands.
If that were possible:
-
The caller would not need to care about what the type of
id
might be. -
If the type changes, all format strings that print this type will automatically adjust, safely.
Fortunately, we at Roku had already enabled C++20. It has lots of powerful features, but one in particular lets us do exactly this:
printf<"The identifier is %?.\n">(id);
Here we take advantage of a feature in C++20 called cNTTP (class Non Type Template Parameter). An instance of a string literal class type can be a template parameter, and with it, you can do almost anything you want at compile time.
The rostd
(pronounced "roasted") library is Roku’s internal extensions to
the C++ standard library. The rostd::printf
family of functions is
part of that library, and rostd::printf
looks like this:
namespace rostd {
template <printx::literal Fmt, typename... Args>
int printf(Args const&... args) noexcept {
auto call = [&](auto const&... args) {
static constexpr auto f = printx::build_fmt<Fmt, Args...>();
return std::printf(f.data, args...);
};
return printx::invoke(call, args...);
}
} // namespace rostd
First note that this formula can be adapted to any printf
-family function.
In this case, it’s a simple wrapper around std::printf
but it will work for
any such function, and implementations of rostd::fprintf
, rostd::sprintf
,
and rostd::snprintf
are provided as well.
There are two function calls in here, both within the rostd::printx
namespace, which do nearly all of the work:
rostd::printx::build_fmt()
-
This function builds a valid
std::printf
format string based on an analysis of the format stringFmt
(template parameter) coupled with the types provided inArgs
. Its job is:-
Look for instances of
%?
and replace them with the correct specifiers for the associated argument types inArgs
. -
Find other (standard) specifiers, verify that they are compatible with the associated type, generating a compile-time error if they are not.
-
Build a new format string containing all the correct specifier replacements, null-terminate and return it.
-
rostd::printx::invoke()
-
This function applies the provided arguments to the function, while also allowing those arguments to be translated to other types. It can therefore provide direct support for types such as:
-
std::string
,std::string_view
,std::filesystem::path
, etc. -
enum
(andenum class
) types -
limited support for user-defined types (by specializing the
printx::detail::traits
template class)
-
Take a look at the source code for more detail. Note that, besides cNTTP,
other new language features used include:
concepts, consteval
, and constexpr virtual
.
Note
|
Why does the format string need to be a template parameter? The string that ends up in the compiled binary is transformed from the one
that is provided in the code. In order to store that transformed string, we
need to know how long it will be, so that we may convert that length to
(part of) a type, like a If we only needed to parse the string to verify its correctness then we could do that. But building a type from its size does not work. It does work for template parameters, because they are constant expressions. |
Here is an example with adjacent functionally-equivalent uses of
std::printf
and rostd::printf
, with their resulting object code
disassemblies below them (clang-13, x86_64). Note any differences in the
object code, if you can find them.
Note
|
The symbol name of the |
#include <cstdio>
void report(std::string_view test_name, std::int64_t duration, unsigned threads) {
std::printf("Test '%.*s' succeeded in %" PRId64 "ms using %u threads.\n",
static_cast<int>(test_name.size()), test_name.data(), duration, threads);
} |
#include <rostd/printx.hpp>
void report(std::string_view test_name, std::int64_t duration, unsigned threads) {
rostd::printf<"Test '%?' succeeded in %?ms using %? threads.\n">
(test_name, duration, threads);
} |
report(std::basic_string_view<char, std::char_traits<char> >, long, unsigned int):
mov r8d, ecx
mov rcx, rdx
mov rdx, rsi
mov rsi, rdi
mov edi, offset .L.str
xor eax, eax
jmp printf # TAILCALL
.L.str:
.asciz "Test '%.*s' succeeded in %ldms using %u threads.\n" |
report(std::basic_string_view<char, std::char_traits<char> >, long, unsigned int):
mov r8d, ecx
mov rcx, rdx
mov rdx, rsi
mov rsi, rdi
mov edi, offset _ZZZN5rostd6printf
xor eax, eax
jmp printf # TAILCALL
_ZZZN5rostd6printf:
.asciz "Test '%.*s' succeeded in %ldms using %u threads.\n" |
The result compiles directly to a call of the underlying
std::printf
function, with a guaranteed-correct format string. It’s
impossible to tell (beyond the ephemeral symbol name) that
rostd::printf
was even used. It compiles away entirely.
This is why rostd::printf
is type-safe printf
. The resulting object
code is printf
and only printf
!
More usage examples are below. Note the following:
-
All standard
printf
format strings are supported; flags, as well as width and precision fields, are supported even in combination with%?
. -
There are no format length modifiers used in these examples. You may use them, but they will be ignored and replaced.
-
When using
%?
, floating point types are printed as-if using%g
. However, any floating point specifier may be used explicitly. -
Types like
std::string
andstd::string_view
can be printed using%?
or%s
.
auto const my_sv = std::string_view{"my string view"};
auto const my_str = std::string{"my string"};
rostd::printf<"%s">(my_sv); // prints "my string view"
rostd::printf<"%s">(my_str); // prints "my string"
auto const my_ull = 12345ull;
rostd::printf<"%u">(my_ull); // prints "12345"
rostd::printf<"%?">(my_ull); // prints "12345"
rostd::printf<"0x%x">(my_ull); // prints "0x3039"
rostd::printf<"%o">(my_ull); // prints "30071"
rostd::printf<"%c">(my_ull); // compile-time error: "format %c expects argument of type char"
auto const my_float = 3.125f;
rostd::printf<"%?">(my_float); // prints "3.125"
rostd::printf<"%A">(my_float); // prints "0X1.9P+1"
rostd::printf<"%e">(my_float); // prints "3.125000e+00"
rostd::printf<"%10?">("right"); // prints " right"
rostd::printf<"%-10?">("left"); // prints "left "
rostd::printf<"%10.2?">(my_str); // prints " my"
rostd::printf<"%-10.2?">(my_str); // prints "my "
Strict error checking is performed on the format strings that are given to
rostd::printf
. However, because C++ does not provide a way to
directly customize the compiler’s error messages, rostd::printf
will
cause the compiler to trigger an error on a line of code that begins with
PRINTX_ERROR
when there is a problem. If such an error occurs, whatever
else is printed, there should be some text in the error message that looks
something like:
rostd-public/rostd/include/rostd/printx.hpp:175:9: note: in expansion of macro ‘PRINTX_ERROR’ 175 | PRINTX_ERROR("format %c expects argument of type char"); | ^~~~~~~~~~~~
The rest of the error message should then give you enough information to trace the problem back to the line of code that caused it.
Given that we’re now running more code at compile time, it is important
to be aware of how it affects compile time performance. The graph below
demonstrates the compile performance (using gcc-10.3) for a rostd::printf
statement with N:[1-64] parameters.
-
The yellow line (printf-int) is the baseline. It is the time to compile a file with one
std::printf
statement that contains N parameters (allint
, in this case). -
The blue line (printx-int) represents the same, but using
rostd::printf
instead ofstd::printf
. -
The orange line (printx-double) represents
rostd::printf
with N parameters that are of typedouble
. -
The gray line (printx-string_view) represents
rostd::printf
with N parameters that are of typestd::string_view
. This diverges significantly because for eachstd::string_view
parameter, two values go on the stack (to be passed tostd::printf
).
Since most printf
calls contain at most a handful of parameters, compile
times are not significantly affected. But even in the rare case of a printf
with dozens of parameters, compile times can still remain reasonable.
Additionally, we believe compilers will get better at this over time.
Take the following hypothetical example:
class Logger {
public:
int log(char const* fmt, ...) __attribute__((format(printf, 2, 3)));
};
You can adapt this to printx
form by changing it to use the formula
described above:
class Logger {
private:
int log(char const* fmt, ...);
public:
template <rostd::printx::literal Fmt, typename... Args>
int log(Args const&... args) {
return rostd::printx::invoke([this](auto const&... args) {
static constexpr auto f = rostd::printx::build_fmt<Fmt, Args...>();
return log(f.data, args...);
}, args...);
}
};
Fixing up all the call points to move the format string into the template parameter can be done by a script written in your favorite text processing language. As already mentioned, the generated binary object code will be equivalent.