Question about example_flask.py #235

ZeroYuJie · 2023-08-08T12:05:37Z

I found an example regarding using Flask for API requests. I gave it a try, but when making concurrent requests, the generated responses from the inference appear as garbled text. I suspect this might be due to concurrent inference for two questions. Is it possible to perform answer generation concurrently?

turboderp · 2023-08-08T15:43:47Z

There's no support for concurrency, no. You'd need a separate instance for each thread, with its own generator and cache, and some mechanism for sensibly splitting the work between threads, given that the implementation completely occupies the GPU.

You could possibly have a streaming API that dispatches to multiple generators when there are concurrent requests, but you'd need a lot of VRAM to accommodate that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about example_flask.py #235

Question about example_flask.py #235

ZeroYuJie commented Aug 8, 2023

turboderp commented Aug 8, 2023

Question about example_flask.py #235

Question about example_flask.py #235

Comments

ZeroYuJie commented Aug 8, 2023

turboderp commented Aug 8, 2023