Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation on pretrained model #1

Open
nikifori opened this issue Sep 4, 2024 · 1 comment
Open

Evaluation on pretrained model #1

nikifori opened this issue Sep 4, 2024 · 1 comment

Comments

@nikifori
Copy link

nikifori commented Sep 4, 2024

Hi,

While trying to run the evaluation on the pretrained model: https://github.com/fgnt/tssep_data/blob/master/egs/libri_css/README.md#steps-to-evaluate-a-pretrained-model

I got this error on the make tssep_pretrained_eval command:

FileNotFoundError: [Errno 2] No such file or directory: '~/tssep_data/egs/libri_css/data/ivector/simLibriCSS_oracle_ivectors.json'

I have not changed anything into the config files neither on tssep_pretrained_77_62000.yaml which is the base. However, I think simLibriCSS_oracle_ivectors.json is unecessary for evaluation, since for the evaluation only the produced i-vectors are needed (libriCSS_espnet_ivectors.json), and for domain adaptation feature_statistics.pkl is downloaded successfully.

Update (4/9/2024)

just a quick update.

I managed to overcome the previous error by commenting out the
following lines in the config.yaml:

  • line 113: - ~/testing_evaluation_tssep_data/tssep_data/egs/libri_css/data/ivector/simLibriCSS_oracle_ivectors.json'
  • line 124: SimLibriCSS-dev: true

and by changing:

  • aux_feature_statistics_domain_adaptation: null (from mean_std to null)

The problem now is that I receive CUDA_OUT_OF_MEMORY error:

Run eval: ~/testing_evaluation_tssep_data/tssep_data/egs/libri_css/tssep_pretrained/eval/62000/1
device: 0
Load feature statistics from cache: ~/testing_evaluation_tssep_data/tssep_data/egs/libri_css/tssep_pretrained/eval/62000/1/cache/feature_statistics.pkl
Use prefetch with threads for dataloading
  0%|                                                                                                                                                       | 0/60 [00:01<?, ?it/s]
ERROR - extract_eval - Failed after 0:00:04!
Traceback (most recent calls WITHOUT Sacred internals):
  File "~/testing_evaluation_tssep_data/tssep_data/tssep_data/eval/run.py", line 246, in main
    eeg.eval(eg=eg)
  File "~/testing_evaluation_tssep_data/tssep_data/tssep_data/eval/experiment.py", line 825, in eval
    self.work(
  File "~/testing_evaluation_tssep_data/tssep_data/tssep_data/eval/experiment.py", line 382, in work
    ex['Observation'] = self.wpe(ex['Observation'])
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/testing_evaluation_tssep_data/tssep/tssep/train/enhancer.py", line 336, in __call__
    nara_wpe.torch_wpe.wpe_v6(
  File "~/miniconda3/envs/ivec_train_check/lib/python3.11/site-packages/nara_wpe/torch_wpe.py", line 222, in wpe_v6
    Y_tilde_inverse_power = Y_tilde * inverse_power[..., None, :]
                            ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.60 GiB. GPU 0 has a total capacty of 22.02 GiB of which 14.05 GiB is free. Including non-PyTorch memory, this process has 7.97 GiB memory in use. Of the allocated memory 6.90 GiB is allocated by PyTorch, and 14.47 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

make: *** [Makefile:13: run] Error 1

But there is not any argument about eval batch size or anything relevant to tweak.
Have you got any thoughts on that?

Thanks

@boeddeker
Copy link
Member

boeddeker commented Sep 12, 2024

Thanks Konstantinos for reporting this and sorry for the trouble.
We talked via email, so here a summary:

  • nara_wpe got an update with the memory efficient torch WPE implementation. So a 20 GB GPU should now be enough for inference.
  • I forgot to upload and adjust the code for the domain adaptation of the ivectors. The current master branch of tssep_data has now the changes to do the domain adaptation without access to the training data.
  • Disabeling the aux_feature_statistics_domain_adaptation has a large impact on the WER. (6% -> >70%). So it is not recommented to do that.
  • If you have only access to a small GPU, you can change the nn_segmenter parameters in the config to use less frames. Since the eval minibatch size is 1, it cannot be reduced.
    • An alternative is to use the CPU for inference, since they have usually more memory than the GPU.

Update (Sep. 23):

  • The code in the tssep repo is now updated and uses the memory efficient WPE implementation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants