You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To simulate cache-aware streaming, you may use the script at ``<NeMo_git_root>/examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py``. It can simulate streaming in single stream or multi-stream mode (in batches) for an ASR model.
Does it "simulate" cache-aware streaming or does it perform it? Models trained natively with cache-aware streaming are available, e.g. here. Does running functions such as conformer_stream_step() repeatedly, like it's done in the notebook here, actually perform the streaming step with the appropriate optimizations? Is it that it somehow logically produces the same output as cache-aware streaming but unoptimized, like you're still feeding in large batches of context into the model or something and they're just thrown out to produce the same output as optimized cache-aware streaming?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
See these sources:
NeMo/docs/source/asr/models.rst
Line 236 in cda2a63
NeMo/nemo/collections/asr/parts/mixins/mixins.py
Line 590 in cda2a63
NeMo/nemo/collections/asr/parts/mixins/mixins.py
Line 714 in cda2a63
Does it "simulate" cache-aware streaming or does it perform it? Models trained natively with cache-aware streaming are available, e.g. here. Does running functions such as
conformer_stream_step()
repeatedly, like it's done in the notebook here, actually perform the streaming step with the appropriate optimizations? Is it that it somehow logically produces the same output as cache-aware streaming but unoptimized, like you're still feeding in large batches of context into the model or something and they're just thrown out to produce the same output as optimized cache-aware streaming?Beta Was this translation helpful? Give feedback.
All reactions