Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use deepspeed for XTTS #59

Open
thiswillbeyourgithub opened this issue Sep 14, 2024 · 6 comments
Open

How to use deepspeed for XTTS #59

thiswillbeyourgithub opened this issue Sep 14, 2024 · 6 comments
Labels
question Further information is requested

Comments

@thiswillbeyourgithub
Copy link

thiswillbeyourgithub commented Sep 14, 2024

Hi,

(As per that request)
Deepspeed seems to be a library that increases speed for AI related code that support it.

XTTS supports it.

On a non windows computer it seems to be straightforward: just pip install deepspeed then use the appropriate XTTS argument. But the issues seem to arise when we're inside a docker container. There's also an issue with deepspeed causing an increase container size, above the threshold allowed by gchr.io

If you could give pointers to help users try to get deepspeed working on their end it would be awesome! I'm a linux only person. pip install works perfectly outside of docker, but when I tried inside bash of the container I got this error:

pip install deepspeed
Collecting deepspeed
  Using cached deepspeed-0.15.1.tar.gz (1.4 MB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [9 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-oefym5jg/deepspeed_51914280e6a94d08ba3b952b5df14105/setup.py", line 108, in <module>
          cuda_major_ver, cuda_minor_ver = installed_cuda_version()
                                           ^^^^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-install-oefym5jg/deepspeed_51914280e6a94d08ba3b952b5df14105/op_builder/builder.py", line 51, in installed_cuda_version
          raise MissingCUDAException("CUDA_HOME does not exist, unable to compile CUDA op(s)")
      op_builder.builder.MissingCUDAException: CUDA_HOME does not exist, unable to compile CUDA op(s)
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

@matatonic
Copy link
Owner

there are a few complex parts for this.

  • there are no pre-built wheels that I could find
  • it needs to be compiled, and needs the CUDA dev kit available, this makes the image about 6GB larger if done simply, and this is too big for ghcr.io.
  • setting up a dev environment with the right CUDA is beyond what I will support. If you can do this on your own, great. So the deepspeed option is available in the server, if you get it installed.
  • I can't test anything in windows, so windows users are on their own to sort this out so far, a PR, or simple docs may be ok though.

@matatonic
Copy link
Owner

PS. you might try an older deepspeed version, like 0.13, IIRC this was more compatible at the time.

@matatonic
Copy link
Owner

In linux, you need to add the CUDA development toolkit, or switch to using a CUDA dev image (at least), it probably also needs additional dependencies

@thiswillbeyourgithub
Copy link
Author

Thanks. But does deepspeed only improve the checkpoint loading time or is it faster overall?

@matatonic
Copy link
Owner

It should allow running in lower vram with good performance (but probably not better than fully loaded). At it's core, I think it's essentially efficient layer swap space to ram - I'm not really sure, it may do more than that. deepspeed didn't make any difference at all for me when loading xtts in sufficient vram.

@matatonic matatonic added the question Further information is requested label Sep 14, 2024
@thiswillbeyourgithub
Copy link
Author

Alright thank you very much for all this clarification. I've decided then not to spend even more time trying. Fish quantization and piper gpu seem a safer bet for lower latency and better speed tradeoff. As far as I'm concerned you can close this. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants