-
Notifications
You must be signed in to change notification settings - Fork 340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deadlock when address sanitizer is used under clang since v0.9.7 #365
Comments
This apparently has parallels to our long-standing jemalloc issue (e.g., #130), where jemalloc depends on time-related functions calls while libfaketime depends on memory allocation first during its own initialisation. The information about clang asan's malloc on AddressSanitizerIncompatiblity does not make me very confident that there is anything we can do on libfaketime's end, but I would love to get proven wrong by someone with deeper understanding of asan's internal workings. |
Older versions of BIND require commenting out qname-minimization configuration. BIND builds linked to jemalloc or Clang ASAN will fail to start on systems with libfaketime versions > 0.9.6: - jemalloc wolfcw/libfaketime#130 - Clang ASAN wolfcw/libfaketime#365
Older versions of BIND require commenting out qname-minimization configuration. BIND builds linked to jemalloc or Clang ASAN will fail to start on systems with libfaketime versions > 0.9.6: - jemalloc wolfcw/libfaketime#130 - Clang ASAN wolfcw/libfaketime#365
I'm leaving this open for now in case someone can bring some new ideas into this issue given that we were not able to solve the similar jemalloc incompatibility (#130) for quite some time. However, there are no solutions paths at the moment we can pursue. |
Older versions of BIND require commenting out qname-minimization configuration. BIND builds linked to jemalloc or Clang ASAN will fail to start on systems with libfaketime versions > 0.9.6: - jemalloc wolfcw/libfaketime#130 - Clang ASAN wolfcw/libfaketime#365
Hi! I came up with a
The hack is based on the already existing hack to detect recursive calls to (I also had to fix the usage of the diff --git a/src/libfaketime.c b/src/libfaketime.c
index f92ecf8..9223934 100644
--- a/src/libfaketime.c
+++ b/src/libfaketime.c
@@ -294,6 +294,7 @@ static bool check_missing_real(const char *name, bool missing)
check_missing_real(#name, (NULL == real_##name))
static int initialized = 0;
+static int initializing = 0;
/* prototypes */
static int fake_gettimeofday(struct timeval *tv);
@@ -2287,7 +2288,7 @@ int clock_gettime(clockid_t clk_id, struct timespec *tp)
fprintf(stderr, "libfaketime: Unexpected recursive calls to clock_gettime() without proper initialization. Trying alternative.\n");
DONT_FAKE_TIME(ftpl_init()) ;
}
- else if (recursion_depth == 3)
+ else if (recursion_depth == 3 || initializing)
{
fprintf(stderr, "libfaketime: Cannot recover from unexpected recursive calls to clock_gettime().\n");
fprintf(stderr, "libfaketime: Please check whether any other libraries are in use that clash with libfaketime.\n");
@@ -2297,6 +2298,7 @@ int clock_gettime(clockid_t clk_id, struct timespec *tp)
tp->tv_sec = 0;
tp->tv_nsec = 0;
}
+ recursion_depth--;
return -1;
}
else {
@@ -2557,6 +2559,7 @@ static void ftpl_init(void)
/* moved up here from below the dlsym calls #130 */
dont_fake = true; // Do not fake times during initialization
dont_fake_final = false;
+ initializing = true;
#ifdef __APPLE__
const char *progname = getprogname();
@@ -2948,6 +2951,7 @@ static void ftpl_init(void)
}
dont_fake = dont_fake_final;
+ initializing = false;
}
Doing this in the earlier branch (so that no time is faked during init) did not make the hang go away. I'm not saying that this is pretty... but it "works" for me! |
@psychon and i found out that his above "hack" does not work with c++/clang++. After a little bit of fiddling he came up with another try. The following patch seems to work with clang and clang++ on our use-cases:
For the record:
We will stick with the patch above for now. |
Sounds interesting and glad it works for you so far! However, I'm still trying to figure out how your patch works. :-) It looks like you effectively cap the recursion depth counter to 2 (based on the new recursion_depth--) and effectively replace it with a flag that is set during initialisation. It also does no longer attempt ftpl_init() wrapped in DONT_FAKE_TIME() as suggested to improve changes for jemalloc compatibility. If that fixes stuff with libasan, I guess we might simplify it a bit (leave out unreachable code) and wrap it in #ifdef for yet another compile-time flag, yes. :-) |
(Hopefully the following is related enough!) We found that ASan and libfaketime have another issue. When freeing memory a deadlock in libfaketimes Code that reads in the FAKETIME_TIMESTAMP_FILE can happen. On our case it happen quite deterministic when ASans quarantine zone is full and ASan actually starts to deallocate memory. See the following stacktrace:
Because ASan calls @psychon suggests that this could be fixed by using If needed i could try to make a reproducible example. Probably calling I'll check if using |
Imagine the simplified model of asan's void *asan_malloc(size_t size) {
pthread_mutex_lock(&some_global_mutex);
clock_gettime(I dont know the arguments and dont care for this example);
void *result = real_malloc(size);
pthread_mutex_unlock(&some_global_mutex);
return result;
} Important here: The other "ingredient" is Let's begin with the simple case: Clang / Clibfaketime has We are back in libfaketime because it hooked The callchain at this point looks something like this:
In my hack/patch above, I introduced the variable Clang++ / C++The "new ingredient" here is that the program is linked against
Here, there is not get a call to Yet another issueEdit: This should be fixed with #391 As @LtdSauce wrote:
I feel like this is a separate issue, but okay, now I'll have to write down the details here. We are using libfaketime with the environment variable set that causes it to reload the faketime file all the time (sorry, I forgot the name of this env var; I hope you know what I mean). Here, we get again into problems with malloc. Well, actually, this time it is Line 3075 in f836ea3
There are two calls to The earlier call is in Line 3198 in f836ea3
The above is inside the critical section which only ends here: Line 3282 in f836ea3
Thus, this thread already locked this non-recursive mutex and a second |
Thanks, I'm fine with the I'm also fine with replacing fopen and related calls with open if that avoids recursive calls due to a buffer we don't need anyway, though it might not be a long-term solution if implementation internals in glibc or libasan change (the force-monontonic-fix you mentioned is a good example for a workaround only needed for some middle versions, but not the older or the current glibc versions). |
Hi!
we use AddressSanitizer to test our executables. Recently we wanted to decrease time-of-test and tried libfaketime. We then noticed that programs compiled with clang and with address sanitizer freezes with the following message:
When we don't compile them with address sanitizer everything works as expected. Furthermore, when compiled with gcc and address sanitizer everything works when we load
libasan.so
beforelibfaketime.so
. Gcc links it dynamically by default while clang links it statically. We want to avoid using the non-default linkage in clang and keep it linked statically.I have not found another issue related to address sanitizers.
Environment
OS: Ubuntu Impish 21.10 x86_64
clang: Ubuntu clang version 13.0.0-2
libfaketime: version 0.9.8-9 from apt (and also when build from f26242b)
When linked against 0.9.6 everything seems to work.
Steps to reproduce
To reproduce this issue we created a little program (or just a simple main):
When compiled with the following:
Running under gdb produces the following stacktrace:
Backtrace for `LD_PRELOAD=libfaketime.so.1 FAKETIME="-15d" gdb --args ./a.out`
Backtrace when compiled with clang++
EDIT: Linking the pthread-enabled .so had no effect.
EDIT: linking the sanitizer runtime dynamically like gcc does it had no effect and produced the same backtrace
The text was updated successfully, but these errors were encountered: