Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] - How to improve performance #325

Open
alb-ctrl opened this issue Apr 30, 2023 · 9 comments
Open

[Question] - How to improve performance #325

alb-ctrl opened this issue Apr 30, 2023 · 9 comments

Comments

@alb-ctrl
Copy link

Hello, I first found this repo on 05/2022 and I just found some time again to use it today 04/2023
im running Python 3.9.9 on a MacBook Pro 2020 M1 Memory 8 GB
when I first used this I was using this commit
commit 2aa41c1
it ran relatively fast (I tried it again today) although the translation wasn't as good as with the new code
python3 translate_demo.py --target-lang=ENG --image <path> --mode batch --manga2eng

after getting the latest commit 83c3799
I used google translate and It was taking too long and the translation wasn't as good (almost the same as the old code comparing the translations)

the one I liked the most is using sugoi in the new code but it is still taking too long. for 20 images is took 54 minutes with the following command
python3 -m manga_translator -v --mode batch --translator=sugoi -l ENG --manga2eng --revert-upscaling -I<path>

For anyone who uses a Mac is there any way to improve performance ?

or any configuration to improve performance ?

this are some screenshots

Screen Shot 2023-04-30 at 4 44 37 PM

@JustFrederik
Copy link
Contributor

JustFrederik commented Apr 30, 2023

The gpu usage doesn’t matter and python is only using a single core so the problem is most likely the memory, but could you upload the usage of memory and cpu for both commits?

you could host a gpu server and run it not on your Mac. Since sugoi is an offline translator the performance is depending on ctranslate2. There is nothing that can be done about that.

have you tried Google colab? I think it’s free and you can use cuda with it

cloud gpu free

@alb-ctrl
Copy link
Author

alb-ctrl commented May 1, 2023

im guessing this question is not smart but is there a way to use more cores, and would that help ?

this is in the old commit (05/2022) - the 20 pages were translated in ~5 min
Screen Shot 2023-04-30 at 8 18 54 PM

this is the new code (04/2023)
Screen Shot 2023-04-30 at 8 28 09 PM

the first spike is the first page, which is just the covert art and title

I'll try the gpu server and/or Google colab in a near future.
but if anyone has any suggestions on how to improve performance it would be greatly appreciated

@JustFrederik
Copy link
Contributor

JustFrederik commented May 1, 2023

I’m confused why the usage is evenly distributed. Usually python only uses one core and it’s a pain to implement multiprocessing. It could be because of a different cpu architecture.

the problem is that you are using to much memory. You have 12gb swap. I don’t know how much of it is due to the this repo. Are you using the offline translator with the newer version. Python is using 32gb of memory.

@alb-ctrl
Copy link
Author

alb-ctrl commented May 1, 2023

im using the latest commit with offline translation using sugoi (which is pretty good, I only used google translate and sugoi in the new code but sugoi was the best translation ) maybe if someone can recommend other translators that have better performance.

just for comparison this is the memory and cpu usage only using Safari (I dont use chrome but safari uses chromium I believe so pretty much the same)
Screen Shot 2023-05-01 at 6 21 10 AM

Perhaps im over complaining trying to save 5 min in translation.
Does anyone know how long it takes to translate ~20 pages in their Mac m1. ?
if the normal is ~50 min then there's nothing I can do other than use a better computer.

@JustFrederik
Copy link
Contributor

JustFrederik commented May 1, 2023

i just ran it on my mac, but with google transate:
Before:
Screenshot 2023-05-01 at 1 04 20 PM

After:
Screenshot 2023-05-01 at 1 09 09 PM

I translated the 4 example images and it took that long: 214.56s user 148.36s system 77% cpu 7:49.55 total.
Without the inpainter it runs way faster: 98.68s user 25.17s system 272% cpu 45.429 total.

So dont use an inpainter and use an online translator.

@JustFrederik
Copy link
Contributor

@alb-ctrl why are you using --revert-upscaling

@alb-ctrl
Copy link
Author

alb-ctrl commented May 1, 2023

I just tried it without in painter it worked way better 20 images in ~5 min
python3 -m manga_translator -v --mode batch --translator=sugoi -l ENG --manga2eng --inpainter none
thanks a lot
Screen Shot 2023-05-01 at 7 58 57 AM

I was trying to see if using --rever-upscaling makes the images smaller
currently it goes from
286157 Apr 30 15:04 1.jpg to 1757664 May 1 07:55 1.png

@BigEmperor26
Copy link
Contributor

So, if you have a M1 or other apple silicon Mac you can use their GPUs by using mps instead of cuda. There is already some work doing it, apple silicon but there is a roadblock because the inpainting phase, which is the slowest part, seemengly cannot run on Apple GPUs.

@BigEmperor26
Copy link
Contributor

BigEmperor26 commented Dec 11, 2023

As I am working on it, with a M1 Pro, moving everything to the apple gpu using the mps I see a large improvement. I was able to test and successfully move to mps:

  • colorization
  • detection
  • ocr
  • upscaler

unfortunately the translation library:

  • ctranslate2 does not yet support mps devices, therefore that has to run on CPU, leading to less performances. Otherwise using a online translator skips the issue altoghever ( like google )

for the inpainting I am working on it. I tested AotInpainter and it is able to more than double the performance of a test sample image, going from 13-14s to 4s. If I am able to complete the inpainter for llama_mpe and lama_large I'll open a PR

PR opened here #533

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants