Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large number of threads / limit number of threads ? #7

Open
tfrancart opened this issue Mar 6, 2019 · 4 comments
Open

Large number of threads / limit number of threads ? #7

tfrancart opened this issue Mar 6, 2019 · 4 comments

Comments

@tfrancart
Copy link

Hello

On a query involving a large amount of entities (tens of thousands) and a join between 2 sources in the federation (the tens of thousands of entities have a property linking them to an entity in the other source), I am seeing a lot of errors like the following, and the query does not terminate :

[269,472s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,472s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,472s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,473s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,473s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,473s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,473s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,473s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,474s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,474s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,474s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,474s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,474s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,474s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,475s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,475s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,475s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,475s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,476s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,476s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,476s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,476s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,476s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,477s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.

I am taking the hypothesis that FedX needs to create a lot of threads and the threads creation fails. How can I control the threads being created to avoid such errors ?

@aschwarte10
Copy link
Contributor

@tfrancart Thanks for providing all the feedback, this is really helpful in understanding how FedX behaves in actual use cases.

I just double checked the code: for executing joins (and also unions) in parallel I am using a thread pool executor with a defined number of threads. The number of available slots can be configured using the FedX config option Config.getConfig().getJoinWorkerThreads() (defaulting to 20). The thread pool is also backed by a LinkedBlockingQueue (which basically maintains the runnables waiting for their execution in order).

It is unclear to me how above error can happen.

The above error messages look like system call errors. Can you explain to me how you were able to log those? Have they been printed to stderr? Are there maybe details (e.g. a stacktrace from where the thread is being attempted to be created?

Also I did a quick google search on the error: first result indicate that an EAGAIN error occurs if the process has run out of memory. Would you try to re-run your tests by giving your JVM more max memory (e.g. using -Xmx6G) or event more? Not sure why the process doesn't run into an out-of-memory, but rather in this kind of error.

To understand your data a bit better, does it look like many of the following pattern?

endpoint 1:
source1:x :isRelated source2:otherEntitiy

endpoint 2:
source2:otherEntity rdfs:label "Other Entity "

And a query like

SELECT * WHERE {
 ?x :isRelated ?y .
 ?y rdfs:label ?label
}

Is my understanding of your scenario correct?

@tfrancart
Copy link
Author

tfrancart commented Mar 7, 2019 via email

@aschwarte10
Copy link
Contributor

Did you already have the chance to investigate the memory settings?

Thanks for the detailed explanation, I also try to reproduce the scenario. Regarding your question I will contact you via mail directly.

@aschwarte10
Copy link
Contributor

@tfrancart just as additional update: as of #13 I added a hash join operator to FedX (Note: it is currently the implementation only, but not yet active). This operator may help in cases like this, where there is a large intermediate result as input to a join.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants