Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speech Note 4.6.0 Beta 2 #145

Closed
mkiol opened this issue Jun 22, 2024 · 6 comments
Closed

Speech Note 4.6.0 Beta 2 #145

mkiol opened this issue Jun 22, 2024 · 6 comments

Comments

@mkiol
Copy link
Owner

mkiol commented Jun 22, 2024

If you want to play and test the upcoming release, Speech Note 4.6.0 Beta 2 is available in "flathub-beta" repository. To enable "flathub-beta" follow these instructions.

Make sure to update the add-on to version 1.2.0 if you are using it.

Release Highlights:

  • Much quicker STT without GPU acceleration (WhisperCpp models)
  • OpenVINO acceleration (WhisperCpp models)
  • Automatic language detection in STT (WhisperCpp and FasterWhisper)
  • Control tags for advance TTS processing (dynamically change speed or add silence)
  • Improved Translator UI

All changes (compared to version 4.5.0):

  • User Interface
    • Speech Note has been translated into Norwegian language
    • Grouped models.
      Models that provide multiple sub-models (for example, TTS models
      that provide different voices) are shown in groups. This makes it
      easier to find models in the model browser.
  • Speech to Text
    • The name of the all Whisper models has been changed to
      'WhisperCpp' to better reflect the engine behind them.
    • Automatic language detection in STT.
      To automatically detect the language during STT, select one of
      the models that is in the 'Auto detected' category in
      the language list.
    • Separate settings for engines.
      The configuration of each engine has been separated in the settings.
      You can separately set the parameters for 'WhisperCpp' and
      'FasterWhisper'. The new configuration parameters that have been
      added to the settings are: 'Number of simultaneous threads',
      'Beam search width', 'Audio context size', 'Use Flash Attention'.
    • Quicker decoding with 'WhisperCpp'.
      Optimization for short sentences has been added to 'WhisperCpp'.
      With it, the speed of STT has doubled!
    • Support for OpenVINO hardware acceleration in 'WhisperCpp' engine.
      With OpenVINO decoding on CPU is much quicker. If you are not using
      GPU acceleration, it is recommended to enable OpenVINO in
      'WhisperCpp' engine settings.
      Currently, OpenVINO is enabled only for CPU acceleration.
    • Option for inserting processing statistics.
      New settings option allows inserting processing related
      information to the text after decoding, such as processing time and
      audio length. This can be useful for comparing the
      performance of different models, engines and their parameters.
  • Text to Speech
    • Control tags for advance TTS processing.
      Control tags allow you to dynamically change the speed of
      synthesized text or add silence between sentences.
      To use control tags, insert '{speed: 0.5}' or '{silence: 1s}'
      into the text. For convenience, you can also insert
      predefined control tags using text context menu 'Insert control tag'.
    • Welsh language. New language is enabled with 'Piper' voice.
    • New 'Piper' voices for Spanish, Italian and English
    • New 'RHVoice' voice for Slovak
  • Translator
    • Improved Translator UI.
      The 'Translate', 'Switch languages' and 'Add' buttons have been
      placed between text areas which is more convenient.
    • New models: English to Lithuanian, Croatian to English,
      Latvian to English, Danish to English
    • Updated models: Lithuanian to English, Slovenian to English
  • Flatpak
    • New library: OpenVINO version 2024.1.0.15008
    • whisper.cpp update to version 1.6.2
    • CTranslate2 update to version 4.3.1
@mkiol mkiol pinned this issue Jun 22, 2024
@gbodley
Copy link

gbodley commented Jun 22, 2024

.mp3 not supported which seems ridiculous for a program that's main feature is to export .mp3. How do we submit examples of what we are talking about?

_We don’t support that file type.

Try again with GIF, JPEG, JPG, MOV, MP4, PNG, SVG, WEBM, CPUPROFILE, CSV, DMP, DOCX, FODG, FODP, FODS, FODT, GZ, JSON, JSONC, LOG, MD, ODF, ODG, ODP, ODS, ODT, PATCH, PDF, PPTX, TGZ, TXT, XLS, XLSX or ZIP._

@mkiol mkiol changed the title Speech Note 4.6.0 Beta 1 Speech Note 4.6.0 Beta 2 Jul 22, 2024
@mkiol
Copy link
Owner Author

mkiol commented Aug 3, 2024

@gbodley

.mp3 not supported

Sorry for the very late reply. MP3 format is supported for both import and export. If that doesn't work, could you create a separate "issue" for this problem? Thanks.

@mkiol
Copy link
Owner Author

mkiol commented Aug 3, 2024

Release 4.6.0 is out, so closing.

@mkiol mkiol closed this as completed Aug 3, 2024
@mkiol mkiol unpinned this issue Aug 3, 2024
@gbodley
Copy link

gbodley commented Aug 3, 2024 via email

@mkiol
Copy link
Owner Author

mkiol commented Aug 3, 2024

I was talking about the github site, not the program

Ha ha, now that makes sense :) I was really worried that a key feature was not working.

@gbodley
Copy link

gbodley commented Aug 3, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants