Speech Note 4.6.0 Beta 2 #145

mkiol · 2024-06-22T12:47:28Z

If you want to play and test the upcoming release, Speech Note 4.6.0 Beta 2 is available in "flathub-beta" repository. To enable "flathub-beta" follow these instructions.

Make sure to update the add-on to version 1.2.0 if you are using it.

Release Highlights:

Much quicker STT without GPU acceleration (WhisperCpp models)
OpenVINO acceleration (WhisperCpp models)
Automatic language detection in STT (WhisperCpp and FasterWhisper)
Control tags for advance TTS processing (dynamically change speed or add silence)
Improved Translator UI

All changes (compared to version 4.5.0):

User Interface
- Speech Note has been translated into Norwegian language
- Grouped models.
  Models that provide multiple sub-models (for example, TTS models
  that provide different voices) are shown in groups. This makes it
  easier to find models in the model browser.
Speech to Text
- The name of the all Whisper models has been changed to
  'WhisperCpp' to better reflect the engine behind them.
- Automatic language detection in STT.
  To automatically detect the language during STT, select one of
  the models that is in the 'Auto detected' category in
  the language list.
- Separate settings for engines.
  The configuration of each engine has been separated in the settings.
  You can separately set the parameters for 'WhisperCpp' and
  'FasterWhisper'. The new configuration parameters that have been
  added to the settings are: 'Number of simultaneous threads',
  'Beam search width', 'Audio context size', 'Use Flash Attention'.
- Quicker decoding with 'WhisperCpp'.
  Optimization for short sentences has been added to 'WhisperCpp'.
  With it, the speed of STT has doubled!
- Support for OpenVINO hardware acceleration in 'WhisperCpp' engine.
  With OpenVINO decoding on CPU is much quicker. If you are not using
  GPU acceleration, it is recommended to enable OpenVINO in
  'WhisperCpp' engine settings.
  Currently, OpenVINO is enabled only for CPU acceleration.
- Option for inserting processing statistics.
  New settings option allows inserting processing related
  information to the text after decoding, such as processing time and
  audio length. This can be useful for comparing the
  performance of different models, engines and their parameters.
Text to Speech
- Control tags for advance TTS processing.
  Control tags allow you to dynamically change the speed of
  synthesized text or add silence between sentences.
  To use control tags, insert '{speed: 0.5}' or '{silence: 1s}'
  into the text. For convenience, you can also insert
  predefined control tags using text context menu 'Insert control tag'.
- Welsh language. New language is enabled with 'Piper' voice.
- New 'Piper' voices for Spanish, Italian and English
- New 'RHVoice' voice for Slovak
Translator
- Improved Translator UI.
  The 'Translate', 'Switch languages' and 'Add' buttons have been
  placed between text areas which is more convenient.
- New models: English to Lithuanian, Croatian to English,
  Latvian to English, Danish to English
- Updated models: Lithuanian to English, Slovenian to English
Flatpak
- New library: OpenVINO version 2024.1.0.15008
- whisper.cpp update to version 1.6.2
- CTranslate2 update to version 4.3.1

gbodley · 2024-06-22T15:22:41Z

.mp3 not supported which seems ridiculous for a program that's main feature is to export .mp3. How do we submit examples of what we are talking about?

_We don’t support that file type.

Try again with GIF, JPEG, JPG, MOV, MP4, PNG, SVG, WEBM, CPUPROFILE, CSV, DMP, DOCX, FODG, FODP, FODS, FODT, GZ, JSON, JSONC, LOG, MD, ODF, ODG, ODP, ODS, ODT, PATCH, PDF, PPTX, TGZ, TXT, XLS, XLSX or ZIP._

mkiol · 2024-08-03T13:14:00Z

@gbodley

.mp3 not supported

Sorry for the very late reply. MP3 format is supported for both import and export. If that doesn't work, could you create a separate "issue" for this problem? Thanks.

mkiol · 2024-08-03T13:14:38Z

Release 4.6.0 is out, so closing.

gbodley · 2024-08-03T15:35:34Z

I was talking about the github site, not the program

…

On Sat, Aug 3, 2024 at 7:14 AM mkiol ***@***.***> wrote: @gbodley <https://github.com/gbodley> .mp3 not supported Sorry for the very late reply. MP3 format is supported for both import and export. If that doesn't work, could you create a separate "issue" for this problem? Thanks. — Reply to this email directly, view it on GitHub <#145 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALBEH6N435JFSG4JEMQYDJDZPTJS7AVCNFSM6AAAAABJXNITHSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRWG4YDSMRYGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

mkiol · 2024-08-03T15:40:19Z

I was talking about the github site, not the program

Ha ha, now that makes sense :) I was really worried that a key feature was not working.

gbodley · 2024-08-03T16:59:48Z

No. I wanted to post an example of the speech output of the program, but mp3 is not allowed to be submitted by github for some unknown reason.

…

On Sat, Aug 3, 2024 at 9:40 AM mkiol ***@***.***> wrote: I was talking about the github site, not the program Ha ha, now that makes sense :) I was really worried that a key feature was not working. — Reply to this email directly, view it on GitHub <#145 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALBEH6PTH2GKNPG7XZWOUFTZPT2XTAVCNFSM6AAAAABJXNITHSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRWHAZTMNRWG4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

mkiol pinned this issue Jun 22, 2024

mkiol changed the title ~~Speech Note 4.6.0 Beta 1~~ Speech Note 4.6.0 Beta 2 Jul 22, 2024

mkiol mentioned this issue Jul 22, 2024

UX request: provide a button to swap languages (for translator) #143

Closed

mkiol closed this as completed Aug 3, 2024

mkiol unpinned this issue Aug 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speech Note 4.6.0 Beta 2 #145

Speech Note 4.6.0 Beta 2 #145

mkiol commented Jun 22, 2024 •

edited

Loading

gbodley commented Jun 22, 2024

mkiol commented Aug 3, 2024

mkiol commented Aug 3, 2024

gbodley commented Aug 3, 2024 via email

mkiol commented Aug 3, 2024

gbodley commented Aug 3, 2024 via email

Speech Note 4.6.0 Beta 2 #145

Speech Note 4.6.0 Beta 2 #145

Comments

mkiol commented Jun 22, 2024 • edited Loading

gbodley commented Jun 22, 2024

mkiol commented Aug 3, 2024

mkiol commented Aug 3, 2024

gbodley commented Aug 3, 2024 via email

mkiol commented Aug 3, 2024

gbodley commented Aug 3, 2024 via email

mkiol commented Jun 22, 2024 •

edited

Loading