Investigate problems with ctranslate2 #75

SebastianBodza · 2023-08-17T08:55:22Z

Is it possible to investigate the problems of ctranslate2 in more detail? The library is one of the fastest and supports token streaming. Unfortunately with beam search no token streaming is possible and there the performance is quite bad :/

Is there any way to run the interview locally?

P.s. in the readme cformers2 should be ctranslate2

SebastianBodza · 2023-08-17T09:04:13Z

Shouldn't the parameter for the beamsize be beam_size instead of num_hypotheses?
def generate(self, prompt, params):
According to the following
https://opennmt.net/CTranslate2/python/ctranslate2.Generator.html#ctranslate2.Generator.generate_batch

the-crypt-keeper · 2023-08-20T21:09:39Z

@SebastianBodza I really need to do a better job in the README, yes you can run everything locally.

Create the prompts with prepare.py --template prompts/Wizard-Coder.txt which should say Expanded 28 Wizard-Coder prompts to results/prepare_junior-v2_python-javascript_Wizard-Coder.ndjson

Now you can run the ctranslate2 (why does my brain refuse to remember this correctly ugh) interview with:

./interview_cuda.py --runtime ctranslate2 --model_name michaelfeil/ct2fast-WizardCoder-15B-V1.0 --params params/wizardcoder.json --input results/prepare_junior-v2_python-javascript_Wizard-Coder.ndjson

This will download the model from HF if it's not already cached. My initial observations when implementing this runtime in #62 were that if you try params/precise.json instead of params/wizardcoder.json the results were very different then what every other runtime produced with those settings (and not very good).

As to your second point: that's an interesting thought. There should be 2 paramters to beam searching, one for the number of beams to consider and another for the size or length of those beams. When I first went through the docs I left with the impression that the beam_size parameter is the beam length, while num_hypotheses is the number of beams but now I'm not so sure and it's possible I got them backwards. Like you mentioned, beam searching is slow (because each beam is effectively an inference stream) so I tend to stick to simpler sampling for evaluations just because of resource constraints.

SebastianBodza · 2023-08-21T11:23:06Z

Thanks for the clarification! I ran some tests locally.
I guess it is rather related to the repetition penalty. Without repetition_penalty and repeat_last_n.

Python Passed 85 of 91
JavaScript Passed 75 of 91

However it seems to also be a bit unstable. Another run with the same settings:

Python Passed 88 of 91
JavaScript Passed 82 of 91

For the beam_size i think you are right. num_hypotheses should be correct.

the-crypt-keeper · 2023-08-21T13:11:12Z

@SebastianBodza Yes something seems to be wrong with the implementation of repeat penalty in this runtime, but I haven't yet dived into the code to see whats up. This isnt normally a complex operation.

If you want to try it on something with repeat penalty that should be otherwise stable, that's the goal of params/greedy.json.

the-crypt-keeper · 2023-08-27T14:17:07Z

I've implemented batching and basic stop-seq support for this runtime, but batching seems to only make the instability problems here worse :/

I wonder if upstream issue #1425 is related and we have some unstable sort related issues happening here..

guillaumekln · 2023-08-31T16:29:20Z

Hi,

The issue related to the callback in batch mode should be fixed in ctranslate2>=3.19.0. The returned batch_ids were mixed up.

However, I'm not sure what is the issue with repetition penalty. For now I suggest forcing this value to 1 for CTranslate2 if this value works for you. In general repetition penalty should not be needed when using a random sampler.

the-crypt-keeper · 2023-12-09T15:37:59Z

@guillaumekln I am having trouble with this runtime following upgrade of my container to CUDA 12.1, it complains of RuntimeError: Library libcublas.so.11 is not found or cannot be loaded

Does ct2 only support CUDA 11 at this time?

the-crypt-keeper pushed a commit that referenced this issue Aug 20, 2023

#75 ctranslate2, not cformers2. fix interviewers table.

b606d7e

SebastianBodza mentioned this issue Aug 29, 2023

nucleus sampler problem? OpenNMT/CTranslate2#1424

Open

the-crypt-keeper added the stuck Issue is stuck on something label Dec 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate problems with ctranslate2 #75

Investigate problems with ctranslate2 #75

SebastianBodza commented Aug 17, 2023 •

edited

Loading

SebastianBodza commented Aug 17, 2023

the-crypt-keeper commented Aug 20, 2023 •

edited

Loading

SebastianBodza commented Aug 21, 2023 •

edited

Loading

the-crypt-keeper commented Aug 21, 2023

the-crypt-keeper commented Aug 27, 2023

guillaumekln commented Aug 31, 2023

the-crypt-keeper commented Dec 9, 2023

Investigate problems with ctranslate2 #75

Investigate problems with ctranslate2 #75

Comments

SebastianBodza commented Aug 17, 2023 • edited Loading

SebastianBodza commented Aug 17, 2023

the-crypt-keeper commented Aug 20, 2023 • edited Loading

SebastianBodza commented Aug 21, 2023 • edited Loading

the-crypt-keeper commented Aug 21, 2023

the-crypt-keeper commented Aug 27, 2023

guillaumekln commented Aug 31, 2023

the-crypt-keeper commented Dec 9, 2023

SebastianBodza commented Aug 17, 2023 •

edited

Loading

the-crypt-keeper commented Aug 20, 2023 •

edited

Loading

SebastianBodza commented Aug 21, 2023 •

edited

Loading