Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] [DON'T MERGE YET] Make crypt and crypt_gensalt use thread-local output buffers. #62

Draft
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

zackw
Copy link
Collaborator

@zackw zackw commented Nov 12, 2018

This change makes crypt and crypt_gensalt as thread-safe as they can be without changing their interfaces. Solaris already made this change, and it’s being discussed by glibc (with suggestion that it
should be pushed upstream to the C and POSIX standards committees): https://sourceware.org/ml/libc-alpha/2018-10/msg00437.html

Portable programs should still use the r-variants, though, because this is not a guaranteed feature, it doesn’t make them not clobber their output buffers on a second call, and the tradeoff is a sizeable
memory leak (CRYPT_GENSALT_OUTPUT_SIZE or sizeof(crypt_data)) for each thread that calls either function. (Thanks ever so much, C committee, for adding thread-local variables without destructors for them.) (I’m still investigating whether there’s some GNU extension we could use to avoid the memory leak at least on Linux.) (We could use pthread_getspecific, but then we’d have to link libcrypt with libpthread, and this doesn’t seem like enough reason to do that.)

I'd like comments on this, but please don't merge it yet; the memory leak is a scary cost and I would like to be sure there's really no way to avoid it before merging this as-is.

(On the up side, this change shrinks libcrypt.so's .bss segment from 33160 to 192 bytes, which is a nice savings for programs that don't use these functions.)

@zackw
Copy link
Collaborator Author

zackw commented Nov 12, 2018

Oh good, running valgrind on our test suite does detect the memory leak that this introduces. :-)

crypt-gensalt-static.c Outdated Show resolved Hide resolved
crypt-static.c Outdated Show resolved Hide resolved
@besser82
Copy link
Owner

Mhh… Is there any chance to abuse __cxa_thread_atexit_impl to free() the pointers?

@codecov-io
Copy link

codecov-io commented Nov 12, 2018

Codecov Report

Merging #62 into develop will decrease coverage by 0.05%.
The diff coverage is 85.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           develop     #62      +/-   ##
==========================================
- Coverage    96.46%   96.4%   -0.06%     
==========================================
  Files           32      32              
  Lines         3109    3119      +10     
==========================================
+ Hits          2999    3007       +8     
- Misses         110     112       +2
Impacted Files Coverage Δ
crypt-gensalt-static.c 100% <100%> (ø) ⬆️
crypt-static.c 84.61% <77.77%> (-15.39%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2e41352...b080758. Read the comment docs.

@besser82
Copy link
Owner

@zackw I've just pushed a possible abuse of __cxa_thread_atexit_impl to free the pointers at thread termination. The only drawback: Only exists in glibc > 2.18, no other systems supported.

@rfc1036
Copy link

rfc1036 commented Nov 13, 2018

I am not sure that I understand the point of this change.

Current situation:

  • single threaded programs already work fine
  • multithreaded programs probably already use the reentrant API

After this patch:

  • single threaded programs will leak memory unless they switch to the reentrant API
  • no obvious need for the changed API

So this change would force long-lived daemons to use a different API while providing no obvious real world benefits.

@zackw
Copy link
Collaborator Author

zackw commented Nov 13, 2018

@besser82 I was looking into just that this afternoon and I believe I know how to do it in a more universally supported way - the only wrinkle being that one file will need to be compiled as C++, which is enough of a wrinkle that I'm not gonna get to it this week, probably. (If I do it right, libcrypt.so will not require libstdc++.so at runtime.)

@rfc1036 The primary motivation for this change is to match what Solaris already does, and what has been proposed to be done in glibc with all the other APIs that have static internal buffers. See the thread I linked to at the beginning. A secondary motivation is to reduce the size of libcrypt.so's BSS. That said, I'm certainly open to the possibility that it's not worth bothering with.

@besser82
Copy link
Owner

besser82 commented Nov 13, 2018

… that one file will need to be compiled as C++ …

@zackw Wouldn't that imply we need to link libstdc++.so? I mean, compared to that, it would be less overkill to simply link libpthread.so internally as it basically is already present on most systems due to systemd or cryptsetup-luks

@zackw
Copy link
Collaborator Author

zackw commented Nov 13, 2018

Wouldn't that imply we need to link libstdc++.so

No. The short version is that the particular C++ feature needed for this (a static thread_local object, declared at function scope with constructor and destructor) winds up calling __cxa_thread_atexit_impl via a shim that GCC helpfully provides in a static-only library called libsupc++. As long as we don't use any other C++ features (e.g. continuing to do the large allocation with malloc instead of new), we don't need the full C++ runtime. And the advantage of doing it this way is we don't have to worry about whether __cxa_thread_atexit_impl is actually available; the shim takes care of that for us.

it would be less overkill to simply link libpthread.so internally

I feel like we may wind up going there eventually (for instance, I think I will need pthread_once and mutexes to do crypt.conf properly), and maybe that's the better route, but it does impose a cost. Not a portability cost -- threads are a mandatory feature in the current rev of POSIX, and any hypothetical configuration where libc is available but libpthread isn't is going to run into other problems pretty quick -- but a CPU time cost, because single-threaded programs that have libpthread loaded get stuck taking slower paths in several contexts within glibc proper (e.g. suddenly malloc thinks it needs to be doing actual locking).

@besser82
Copy link
Owner

Resolved merge conflicts.

@besser82 besser82 force-pushed the zack/thread-safety branch 2 times, most recently from 60aa1f5 to d24e7ce Compare December 23, 2018 13:30
zackw and others added 2 commits January 25, 2019 11:43
This change makes crypt and crypt_gensalt as thread-safe as they can
be without changing their interfaces.  Solaris already made this
change, and it’s being discussed by glibc (with suggestion that it
should be pushed upstream to the C and POSIX standards committees):
https://sourceware.org/ml/libc-alpha/2018-10/msg00437.html

Portable programs should still use the r-variants, though, because
this is not a guaranteed feature, it doesn’t make them not clobber
their output buffers on a second call, and the tradeoff is a sizeable
memory leak (CRYPT_GENSALT_OUTPUT_SIZE or sizeof(crypt_data)) for each
thread that calls either function.  (Thanks ever so much, C committee,
for adding thread-local variables without destructors for them.)
(I’m still investigating whether there’s some GNU extension we could
use to avoid the memory leak at least on Linux.)  (We could use
pthread_getspecific, but then we’d have to link libcrypt with
libpthread, and this doesn’t seem like enough reason to do that.)
@besser82
Copy link
Owner

Rebased on recent develop, resolved merge conflicts.

@solardiz
Copy link
Collaborator

I think this is a questionable feature, primarily because portable programs must not rely on it anyway, and then there's risk of someone porting system-specific code e.g. from Fedora to another system and not realizing they need to revise the uses of crypt*(). There's already such risk with ports from Solaris, but I think that's a poor excuse for making the matter worse. I wouldn't merge this unless and until glibc actually makes similar changes across the board.

@solardiz
Copy link
Collaborator

I like the NEWS file's wording "This feature is a safety net against sloppy coding. Programs are still strongly encouraged to use the reentrant functions instead". I think the man page's wording should be similar. As currently proposed, I think the man page doesn't sufficiently discourage reliance on the thread-safety. I think we should discourage such reliance even in non-portable programs.

@besser82
Copy link
Owner

Wouldn't that imply we need to link libstdc++.so

No. The short version is that the particular C++ feature needed for this (a static thread_local object, declared at function scope with constructor and destructor) winds up calling __cxa_thread_atexit_impl via a shim that GCC helpfully provides in a static-only library called libsupc++. As long as we don't use any other C++ features (e.g. continuing to do the large allocation with malloc instead of new), we don't need the full C++ runtime. And the advantage of doing it this way is we don't have to worry about whether __cxa_thread_atexit_impl is actually available; the shim takes care of that for us.

Unfortunately libsupc++ isn't portable. Systems with Clang/LLVM only are not guaranteed to ship an equivalent shim-a-like library, tho some of them may ship an externally provided libcxxrt.

Given this fact and the very likely need for pthreads when implementing crypt.conf, I'm biased towards to start using pthreads here as well.

@besser82
Copy link
Owner

Anyways, looking at the code from (Open)Solaris their implementation of thread-safety in the crypt function isn't more then a mutex around the internal buffer of DES setkey and encrypt. They do not use TLS or any other advanced things.

@besser82
Copy link
Owner

Codacy Here is an overview of what got changed by this pull request:

Issues
======
+ Solved 7
- Added 4
           

See the complete overview on Codacy

if (strlen (hash) + 1 > sizeof save_hash)
{
printf ("ERROR: crypt output is too long (%zu > %zu)\n",
strlen (hash) + 1, sizeof save_hash);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
else
printf ("ok: parent: crypt_gensalt = %s\n", setting);
if (strlen (setting) + 1 > sizeof save_setting)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (strlen (setting) + 1 > sizeof save_setting)
{
printf ("ERROR: crypt_gensalt output is too long (%zu > %zu)\n",
strlen (setting) + 1, sizeof save_setting);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
else
printf ("ok: parent: crypt = %s\n", hash);
if (strlen (hash) + 1 > sizeof save_hash)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@solardiz
Copy link
Collaborator

Anyways, looking at the code from (Open)Solaris their implementation of thread-safety in the crypt function isn't more then a mutex around the internal buffer of DES setkey and encrypt. They do not use TLS or any other advanced things.

They do use tsdalloc for crypt output buffer: https://github.com/illumos/illumos-gate/blob/cb6207858a9fcc2feaee22e626912fba281ac969/usr/src/lib/libc/port/gen/crypt.c#L154
and it has some thread magic inside: https://github.com/illumos/illumos-gate/blob/cb6207858a9fcc2feaee22e626912fba281ac969/usr/src/lib/libc/port/gen/tsdalloc.c#L66
(linking here to this same old revision you had referred to in your comment in 2019)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants