-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running concurrent clients #107
Comments
Hello @mitchelldehaven ! I made the runtime benchmarks using shell scripts and I am using the service with various Java tools, but there are at least two clients managing concurrent calls that could help you more easily:
(disclamer: I've not tested them) |
I'm wanting to run this HPC environment to process thousands of PDFs, but when attempting to run on different worker nodes from the same project directory, maven seems dislike this. The naive approach would be to copy the project directory several times, but the project directory is like ~100gb, so I'm unsure if the approach you were using would avoid this. |
Sorry, I think I found the mistake I was making. It was unrelated to concurrent threads. Thanks! |
@mitchelldehaven I am actually also trying to run the tool in an HPC environment. It's challenging because the tool is seen more as a service deployed in an environment like a AWS cloud. The issue with the 100GB resource space is that a shared disk will harm the performance a lot. It is working fine on an attached SSD because it used memory mapped files, but with shared disk access, it could be a disaster :) Also note that there is a new release with updated resource dbs (now as of end of May 2020 Wikidata and Wikipedia) and some fixes, and gradle is now used instead of maven. |
Is there any documentation on the correct way to run concurrent clients? The README.md contains runtime performance using 6 concurrent clients, but looking through the documentation I didn't see anything on this.
The text was updated successfully, but these errors were encountered: