Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running concurrent clients #107

Closed
mitchelldehaven opened this issue Jul 6, 2020 · 4 comments
Closed

Running concurrent clients #107

mitchelldehaven opened this issue Jul 6, 2020 · 4 comments

Comments

@mitchelldehaven
Copy link

Is there any documentation on the correct way to run concurrent clients? The README.md contains runtime performance using 6 concurrent clients, but looking through the documentation I didn't see anything on this.

@kermitt2
Copy link
Owner

kermitt2 commented Jul 6, 2020

Hello @mitchelldehaven !

I made the runtime benchmarks using shell scripts and I am using the service with various Java tools, but there are at least two clients managing concurrent calls that could help you more easily:

(disclamer: I've not tested them)

@mitchelldehaven
Copy link
Author

I'm wanting to run this HPC environment to process thousands of PDFs, but when attempting to run on different worker nodes from the same project directory, maven seems dislike this. The naive approach would be to copy the project directory several times, but the project directory is like ~100gb, so I'm unsure if the approach you were using would avoid this.

@mitchelldehaven
Copy link
Author

Sorry, I think I found the mistake I was making. It was unrelated to concurrent threads. Thanks!

@kermitt2
Copy link
Owner

kermitt2 commented Jul 7, 2020

@mitchelldehaven I am actually also trying to run the tool in an HPC environment. It's challenging because the tool is seen more as a service deployed in an environment like a AWS cloud. The issue with the 100GB resource space is that a shared disk will harm the performance a lot. It is working fine on an attached SSD because it used memory mapped files, but with shared disk access, it could be a disaster :)
So I am interested in your feedback on this!

Also note that there is a new release with updated resource dbs (now as of end of May 2020 Wikidata and Wikipedia) and some fixes, and gradle is now used instead of maven.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants