Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test stt #5634

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

test stt #5634

wants to merge 14 commits into from

Conversation

DDMeaqua
Copy link
Contributor

@DDMeaqua DDMeaqua commented Oct 11, 2024

💻 变更类型 | Change Type

  • feat
  • fix
  • refactor
  • perf
  • style
  • test
  • docs
  • ci
  • chore
  • build

🔀 变更说明 | Description of Change

📝 补充信息 | Additional Information

Summary by CodeRabbit

  • New Features

    • Introduced voice recognition capabilities for the chat application, allowing users to input text via voice.
    • Added a button for toggling voice input, reflecting the current listening state.
    • Enabled text-to-speech (TTS) and speech-to-text (STT) functionalities by default upon application startup.
    • Enhanced settings management with a new speech-to-text configuration list in both English and Chinese locales.
    • Added support for selecting the speech-to-text engine and language in the settings.
    • Improved visual feedback for the chat input actions during the listening state.
  • Bug Fixes

    • Refined event handling for keyboard shortcuts to ensure correct actions are triggered.
  • Chores

    • Simplified comments related to keyboard shortcuts for better clarity.

Copy link

vercel bot commented Oct 11, 2024

@DDMeaqua is attempting to deploy a commit to the NextChat Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Contributor

coderabbitai bot commented Oct 11, 2024

Walkthrough

The changes in this pull request enhance the chat application's functionality by integrating voice recognition features. Modifications include the addition of state variables for managing speech recognition, the implementation of methods for starting and stopping voice input, and the incorporation of a new button for toggling voice input in the chat interface. Additionally, the configuration for text-to-speech (TTS) and speech-to-text (STT) functionalities has been updated, and a new API interface for transcription has been introduced.

Changes

File Change Summary
app/components/chat.tsx - Added voice recognition state variables (isListening, isTranscription, speechApi).
- Implemented startListening, stopListening, and onRecognitionEnd methods.
- Updated keyboard event handling for voice input functionality.
- Added button for toggling voice input.
app/components/chat.module.scss - Added .listening class to modify input action styles during voice recognition.
app/components/stt-config.tsx - Introduced STTConfigList component for managing STT settings.
- Added checkbox for enabling STT and dropdown for selecting STT engine and language.
app/store/config.ts - Updated ttsConfig to enable TTS by default.
- Added sttConfig with enable set to true and initialized with DEFAULT_STT_ENGINE.
- Introduced new types and a validator for STT configuration.
app/utils/speech.ts - Added SpeechApi, OpenAITranscriptionApi, and WebTranscriptionApi classes for speech recognition.
- Implemented methods for starting, stopping, and managing transcription.
app/locales/cn.ts - Added new STT section for speech-to-text functionality in the cn locale object.
- Updated StartSpeak and StopSpeak properties in the Chat section.
app/locales/en.ts - Added new STT section for speech-to-text functionality in the en locale object.
- Updated StopSpeak property in the Chat section.
app/client/platforms/openai.ts - Introduced transcription method in ChatGPTApi for audio transcription functionality.
- Updated path method to accept an optional model parameter.
app/components/settings.tsx - Added import for STTConfigList and included it in the Settings component for managing STT configurations.

Possibly related PRs

Suggested labels

enhancement, planned

Suggested reviewers

  • Dogtiti
  • lloydzhou

Poem

In the chat where voices sing,
Bunnies hop and joy they bring.
With every word, a tale unfolds,
In whispers sweet, our chat beholds.
So let us speak and laugh with glee,
In our cozy chat, just you and me! 🐇🎤


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
app/components/chat.tsx (2)

1565-1565: Use consistent language in code comments

The comment on line 1565 is in Chinese (// 快捷键), whereas the rest of the code comments are in English. For consistency and better collaboration, it's recommended to use a single language for all code comments.


Line range hint 1566-1597: Prevent multiple event listener registrations

The useEffect hook adds an event listener to window and removes it on cleanup. However, since the dependencies array includes messages, chatStore, and navigate, the effect will re-run and re-register the event listener whenever any of these dependencies change. This can lead to multiple event listeners being attached, causing unexpected behavior or memory leaks.

To fix this, consider adjusting the dependencies array to an empty array [] if the handleKeyDown function does not rely on changing variables. Alternatively, you can memoize the handleKeyDown function using useCallback with the necessary dependencies.

Apply this diff to adjust the dependencies:

-  }, [messages, chatStore, navigate]);
+  }, []);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between c5074f0 and ec28338.

📒 Files selected for processing (1)
  • app/components/chat.tsx (1 hunks)
🧰 Additional context used

Copy link
Contributor

Your build has completed!

Preview deployment

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (6)
app/client/api.ts (1)

66-74: LGTM! Consider adding a range for the temperature property.

The TranscriptionOptions interface is well-defined and covers essential options for transcription. The use of specific types for model and response_format enhances type safety.

Consider adding a more specific range for the temperature property, as it typically has a limited range (e.g., 0 to 1). You could use a type like this:

temperature?: number & { __brand: 'Temperature' };

And then define a type guard function:

function isValidTemperature(value: number): value is number & { __brand: 'Temperature' } {
  return value >= 0 && value <= 1;
}

This approach would provide better type safety and documentation for the expected range of the temperature property.

app/constant.ts (2)

Line range hint 268-273: LGTM: Comprehensive TTS defaults added

The new constants for Text-to-Speech (TTS) provide a robust set of default settings, covering engines, models, and voices. This addition enhances the flexibility and configurability of the TTS feature.

Consider adding a brief comment explaining the purpose of these constants and their relationship to the TTS feature for improved code documentation.


274-276: LGTM: STT defaults added with good flexibility

The new constants for Speech-to-Text (STT) provide a good set of default settings, including multiple engine options. This addition enhances the flexibility of the STT feature.

Consider adding a brief comment explaining the purpose of these constants and their relationship to the STT feature for improved code documentation.

app/components/settings.tsx (1)

86-86: LGTM! Consider grouping related configurations.

The addition of the STTConfigList component and its integration into the Settings component looks good. It follows the existing pattern used for TTSConfigList, which maintains consistency in the codebase.

To improve code organization, consider grouping related configurations together. You could move the STTConfigList next to the TTSConfigList, as they are both related to voice interactions. This would make it easier for developers to find and maintain related features.

 <List>
   <TTSConfigList
     ttsConfig={config.ttsConfig}
     updateConfig={(updater) => {
       const ttsConfig = { ...config.ttsConfig };
       updater(ttsConfig);
       config.update((config) => (config.ttsConfig = ttsConfig));
     }}
   />
+  <STTConfigList
+    sttConfig={config.sttConfig}
+    updateConfig={(updater) => {
+      const sttConfig = { ...config.sttConfig };
+      updater(sttConfig);
+      config.update((config) => (config.sttConfig = sttConfig));
+    }}
+  />
 </List>

- <List>
-  <STTConfigList
-    sttConfig={config.sttConfig}
-    updateConfig={(updater) => {
-      const sttConfig = { ...config.sttConfig };
-      updater(sttConfig);
-      config.update((config) => (config.sttConfig = sttConfig));
-    }}
-  />
- </List>

Also applies to: 1707-1716

app/components/chat.tsx (1)

553-589: Speech recognition functionality implemented.

The addition of speech recognition functionality is well-implemented, with proper state management and error handling. The code considers different STT engines and handles them appropriately.

However, there's a minor improvement to be made:

Consider removing the console.log statement on line 585 before deploying to production:

-    console.log(finalTranscript);
app/client/platforms/openai.ts (1)

185-193: Enhance readability of conditional formData.append statements

The conditional formData.append statements could be formatted for better readability and consistency.

Consider using braces and proper indentation:

if (options.language) {
  formData.append("language", options.language);
}
if (options.prompt) {
  formData.append("prompt", options.prompt);
}
if (options.response_format) {
  formData.append("response_format", options.response_format);
}
if (options.temperature) {
  formData.append("temperature", options.temperature.toString());
}

Alternatively, streamline the appending of optional parameters:

+const optionalParams = ['language', 'prompt', 'response_format', 'temperature'];
+optionalParams.forEach((param) => {
+  if (options[param]) {
+    formData.append(param, options[param].toString());
+  }
+});
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 76faff0 and 8ec4df4.

📒 Files selected for processing (6)
  • app/client/api.ts (2 hunks)
  • app/client/platforms/openai.ts (1 hunks)
  • app/components/chat.tsx (7 hunks)
  • app/components/settings.tsx (2 hunks)
  • app/constant.ts (2 hunks)
  • app/store/config.ts (4 hunks)
🧰 Additional context used
🔇 Additional comments (16)
app/client/api.ts (2)

111-111: LGTM! The transcription method is a good addition.

The new transcription method in the LLMApi abstract class is well-defined and consistent with the TranscriptionOptions interface. The use of Promise<string> as the return type is appropriate for an asynchronous transcription operation.

This addition enhances the API's capabilities and provides a clear contract for implementing transcription functionality across different LLM providers.


Line range hint 66-111: Summary: Transcription feature addition is well-implemented

The changes to app/client/api.ts successfully introduce transcription capabilities to the API. The new TranscriptionOptions interface and the transcription method in the LLMApi abstract class are well-structured and consistent with the existing codebase. These additions follow good TypeScript practices and enhance the API's functionality.

The implementation provides a solid foundation for integrating transcription features across different LLM providers. Great job on this feature addition!

app/constant.ts (3)

153-153: LGTM: Addition of TranscriptionPath

The new TranscriptionPath constant in the OpenaiPath object is correctly defined and aligns with OpenAI's API structure for audio transcription endpoints. This addition supports the integration of speech-to-text functionality.


Line range hint 153-276: Summary: Good additions for speech-related features, minor improvements suggested

The changes to app/constant.ts successfully introduce new constants for Text-to-Speech (TTS) and Speech-to-Text (STT) functionality. These additions provide good flexibility and configuration options for speech-related features. Consider adding brief comments to explain the purpose of these new constants for improved code documentation. Additionally, please provide clarification on the need for a Firefox-specific STT engine default.


276-276: Please clarify the need for a Firefox-specific STT engine

The addition of FIREFOX_DEFAULT_STT_ENGINE suggests a different default Speech-to-Text engine for Firefox browsers. Could you provide more context on why Firefox requires a different default engine? This information would be valuable for understanding the rationale behind this browser-specific configuration.

To verify the necessity of this Firefox-specific setting, we can check for any Firefox-related conditionals in the codebase:

✅ Verification successful

FIREFOX_DEFAULT_STT_ENGINE confirmed

The addition of FIREFOX_DEFAULT_STT_ENGINE is justified by the Firefox-specific conditional found in the codebase:

  • app/utils/speech.ts: if (isFirefox()) return;
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for Firefox-related conditionals
rg --type typescript --type javascript 'if.*firefox|firefox.*if' -i

Length of output: 104


Script:

#!/bin/bash
# Search for Firefox-related conditionals in .ts and .js files
rg --glob '*.ts' --glob '*.js' 'if.*firefox|firefox.*if' -i

Length of output: 110

app/components/chat.tsx (5)

13-13: New icon import added for voice functionality.

The addition of the VoiceWhiteIcon import is consistent with the existing import style and suggests the introduction of voice-related features to the chat component.


76-76: Firefox-specific utility function imported.

The addition of the isFirefox import suggests the implementation of browser-specific functionality, which is a good practice for ensuring cross-browser compatibility.


102-104: Speech-to-text related constants imported.

The addition of STT-related constant imports (DEFAULT_STT_ENGINE, DEFAULT_TTS_ENGINE, FIREFOX_DEFAULT_STT_ENGINE) indicates the integration of speech-to-text functionality, enhancing the chat component's capabilities.


124-124: Speech transcription APIs imported.

The addition of OpenAITranscriptionApi and WebTranscriptionApi imports provides multiple options for speech-to-text functionality, allowing for flexibility in the implementation.


826-834: Voice input action added to chat interface.

A new ChatAction component for voice input has been successfully integrated into the chat interface. The implementation is conditional based on the STT configuration, ensuring it's only available when enabled. The component correctly toggles between start and stop functionality, maintaining consistency with other ChatAction components.

app/store/config.ts (6)

8-9: Importing DEFAULT_STT_ENGINE and DEFAULT_STT_ENGINES is appropriate.

The addition of DEFAULT_STT_ENGINE and DEFAULT_STT_ENGINES to the imports ensures that the necessary constants for STT functionality are available.


25-25: Definition of STTEngineType is correct.

The type STTEngineType correctly derives from the DEFAULT_STT_ENGINES array, representing valid STT engine options.


87-87: Enabling TTS by default: Confirm if this is intentional.

Changing ttsConfig.enable to true activates text-to-speech by default for all users. Please verify that this aligns with user expectations and does not inadvertently impact those who may not wish to use TTS.


94-97: Adding sttConfig with enable: true: Review default settings.

The new sttConfig enables speech-to-text by default. Consider whether users should opt-in to this feature, as it may require microphone access and could have privacy implications. Ensuring explicit user consent might enhance the user experience.


104-104: Definition of STTConfig type is appropriate.

The STTConfig type accurately references ChatConfig["sttConfig"], ensuring type consistency.


134-138: STTConfigValidator is correctly implemented.

The validator for sttConfig properly enforces that the engine property conforms to STTEngineType, maintaining data integrity.

options.onController?.(controller);

try {
const path = this.path(OpenaiPath.TranscriptionPath, options.model);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Incorrect number of arguments in this.path method call

In line 200, the this.path method is called with two arguments (OpenaiPath.TranscriptionPath, options.model), but the path method is defined to accept only one argument. This will result in a TypeError.

To fix this, update the path method signature to accept the optional model parameter:

-export class ChatGPTApi implements LLMApi {
-  // Existing code...
-  path(path: string): string {
+export class ChatGPTApi implements LLMApi {
+  // Existing code...
+  path(path: string, model?: string): string {
     // Method implementation...
   }

Ensure that you handle the model parameter within the path method as needed.

Committable suggestion was skipped due to low confidence.

Comment on lines +214 to +217
const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
const json = await res.json();
return json.text;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add response status check before parsing JSON

In lines 214-217, the code attempts to parse the JSON response without verifying if the request was successful. If the request fails, res.json() might throw an error or the response might not contain the expected text property.

Consider checking res.ok before parsing the response and handle errors appropriately:

const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
+ if (!res.ok) {
+   const errorText = await res.text();
+   console.error(`[Response] Transcription request failed: ${errorText}`);
+   throw new Error(`Transcription request failed with status ${res.status}`);
+ }
const json = await res.json();
return json.text;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
const json = await res.json();
return json.text;
const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
if (!res.ok) {
const errorText = await res.text();
console.error(`[Response] Transcription request failed: ${errorText}`);
throw new Error(`Transcription request failed with status ${res.status}`);
}
const json = await res.json();
return json.text;

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Outside diff range and nitpick comments (15)
app/client/platforms/moonshot.ts (2)

67-69: Implement transcription method

The addition of the transcription method is consistent with the LLMApi interface. However, to improve the placeholder implementation:

  1. Add a TODO comment to remind about the implementation.
  2. Consider adding a console warning to alert developers during development.

Here's a suggested improvement:

 transcription(options: TranscriptionOptions): Promise<string> {
-    throw new Error("Method not implemented.");
+    // TODO: Implement transcription method
+    console.warn("transcription method not implemented yet");
+    return Promise.reject(new Error("Method not implemented."));
 }

Line range hint 1-70: Summary: Transcription feature addition

The changes in this file lay the groundwork for adding transcription capabilities to the MoonshotApi class. This aligns with the PR objectives of introducing a new feature and the AI summary mentioning voice recognition features.

However, the current implementation is a placeholder. To complete this feature:

  1. Implement the transcription method with the actual logic for transcribing audio.
  2. Update the speech method if it's related to this feature.
  3. Add unit tests to ensure the new functionality works as expected.
  4. Update any relevant documentation to reflect these new capabilities.

Consider how this transcription feature will integrate with the rest of the application. Will it require updates to the user interface or other parts of the codebase?

app/client/platforms/bytedance.ts (2)

87-89: Method signature looks good, but needs implementation and documentation.

The transcription method has been correctly added to the DoubaoApi class with the appropriate signature. However, it's currently not implemented.

Consider the following suggestions:

  1. Add a TODO comment explaining the intended functionality and timeline for implementation.
  2. Provide documentation for the method, including parameter description and expected return value.
  3. If possible, implement a basic version of the method or provide a more informative error message.

Would you like assistance in drafting the documentation or a basic implementation?


Issues Found: Unimplemented transcription method is being used across the codebase.

  • app/utils/speech.ts: const transcription = await llm.transcription({ file: audioBlob });
  • app/client/api.ts: abstract transcription(options: TranscriptionOptions): Promise<string>;
  • app/client/platforms/tencent.ts: transcription(options: TranscriptionOptions): Promise<string> {
  • app/client/platforms/iflytek.ts: transcription(options: TranscriptionOptions): Promise<string> {
  • app/client/platforms/google.ts: transcription(options: TranscriptionOptions): Promise<string> {
  • app/client/platforms/baidu.ts: transcription(options: TranscriptionOptions): Promise<string> {
  • app/client/platforms/anthropic.ts: transcription(options: TranscriptionOptions): Promise<string> {
  • app/client/platforms/alibaba.ts: transcription(options: TranscriptionOptions): Promise<string> {
  • app/client/platforms/bytedance.ts: transcription(options: TranscriptionOptions): Promise<string> {
  • app/client/platforms/moonshot.ts: transcription(options: TranscriptionOptions): Promise<string> {
  • app/client/platforms/openai.ts: async transcription(options: TranscriptionOptions): Promise<string> {

It is crucial to implement the transcription method to ensure that all dependent functionalities operate correctly. Alternatively, if the implementation is pending, consider adding proper error handling or temporarily disabling the usages to prevent runtime issues.

🔗 Analysis chain

Line range hint 1-89: Overall impact: Transcription capability added but not yet implemented.

The changes introduce a new transcription method to the DoubaoApi class, expanding its capabilities. However, the method is not yet implemented.

Suggestions for next steps:

  1. Implement the transcription method.
  2. Update any relevant documentation or API references to include this new capability.
  3. Consider adding unit tests for the new method once implemented.
  4. Review other parts of the codebase that might need to be updated to utilize this new functionality.

To ensure no existing code is expecting this method to be functional, we can run the following check:

This will help identify any premature usage of the unimplemented method.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any usage of the transcription method
rg 'transcription\(' --type typescript

Length of output: 75


Script:

#!/bin/bash
# Search for any usage of the transcription method
rg 'transcription\(' --type ts

Length of output: 1109

app/client/platforms/iflytek.ts (1)

Line range hint 27-70: Consider a plan for completing the SparkApi implementation.

The SparkApi class is partially implemented, with some methods like chat being fully functional while others (speech, transcription) are not yet implemented. To ensure a complete and consistent implementation:

  1. Create a roadmap for implementing all remaining methods.
  2. Prioritize the implementation of these methods based on project requirements.
  3. Consider adding TODO comments with issue/ticket numbers for tracking purposes.
  4. Ensure that error messages for unimplemented methods are more informative, possibly indicating that the feature is not supported by this API if that's the case.
  5. Review the LLMApi interface to confirm that all required methods are present in this implementation.

Would you like assistance in creating this implementation roadmap or updating the error messages for unimplemented methods?

app/client/platforms/tencent.ts (1)

97-99: LGTM: Addition of transcription method

The transcription method is correctly added with the appropriate signature. However, it's currently not implemented.

Would you like assistance in implementing this method? We could discuss the requirements and draft an initial implementation.

app/client/platforms/baidu.ts (1)

Line range hint 1-287: Summary of changes and next steps

The changes in this file are focused on adding transcription capability to the ErnieApi class. The additions are consistent with the PR objectives and don't introduce any issues in the existing code. However, the transcription method needs to be implemented to complete this feature.

Next steps:

  1. Implement the transcription method in the ErnieApi class.
  2. Add appropriate error handling and logging to the new method.
  3. Consider adding unit tests for the new functionality.
  4. Update any relevant documentation or comments to reflect these changes.
app/client/platforms/google.ts (2)

72-74: Consider adding a TODO comment and improving the error message.

While it's common to add placeholder methods when implementing new interface requirements, it's beneficial to:

  1. Add a TODO comment to track the need for implementation.
  2. Provide a more informative error message to aid in debugging if the method is accidentally called before implementation.

Consider updating the method as follows:

 transcription(options: TranscriptionOptions): Promise<string> {
-    throw new Error("Method not implemented.");
+    // TODO: Implement transcription method
+    throw new Error("Transcription method not yet implemented in GeminiProApi");
 }

This change will help track the pending implementation and provide clearer error information if the method is called prematurely.


Line range hint 1-324: Summary and Next Steps

The changes to app/client/platforms/google.ts lay the groundwork for adding transcription functionality to the GeminiProApi class. While the implementation is not complete, the structure is in place with the new import and method stub.

To move this feature forward:

  1. Implement the transcription method with the actual logic for transcription.
  2. Add unit tests to cover the new functionality.
  3. Update the PR description with details about the new transcription feature and its intended use.
  4. Consider adding documentation for the new method, especially if it introduces new behavior or requirements for users of the GeminiProApi class.
🧰 Tools
🪛 Biome

[error] 77-77: This aliasing of this is unnecessary.

Arrow functions inherits this from their enclosing scope.
Safe fix: Use this instead of an alias.

(lint/complexity/noUselessThisAlias)

app/client/platforms/anthropic.ts (1)

Line range hint 1-424: Summary: Transcription feature initiated but incomplete.

The changes in this PR introduce the groundwork for a transcription feature in the ClaudeApi class. While the necessary import and method signature have been added, the implementation is still pending. To complete this feature:

  1. Implement the transcription method logic.
  2. Add error handling and appropriate logging.
  3. Update the API documentation to reflect the new capability.
  4. Add unit tests for the new functionality.
  5. Consider adding integration tests if this feature interacts with external services.

These steps will ensure the new feature is fully functional and maintainable.

Consider whether this transcription feature should be a separate service or module, depending on its complexity and potential for reuse across different parts of the application.

app/locales/cn.ts (1)

541-550: LGTM! Consider adding a description for the STT feature.

The new STT (Speech-to-Text) section is well-structured and consistent with the existing localization patterns. The translations are appropriate for the Chinese language.

For improved consistency with other sections, consider adding a brief description of the STT feature, similar to how the TTS section has a description at line 530.

You could add a description like this:

 STT: {
+  Description: {
+    Title: "语音转文本",
+    SubTitle: "将语音输入转换为文本",
+  },
   Enable: {
     Title: "启用语音转文本",
     SubTitle: "启用语音转文本",
   },
   Engine: {
     Title: "转换引擎",
     SubTitle: "音频转换引擎",
   },
 },
app/components/chat.tsx (4)

13-13: Speech recognition feature looks good, consider error handling.

The addition of speech recognition functionality is a great enhancement to the chat interface. The implementation handles different speech recognition engines and provides appropriate fallbacks.

A few suggestions for improvement:

  1. Consider adding error handling for cases where speech recognition fails to initialize or encounters runtime errors.
  2. It might be helpful to add user feedback (e.g., a visual indicator) when speech recognition is active.

Consider adding error handling:

const startListening = async () => {
  if (speechApi) {
    try {
      await speechApi.start();
      setIsListening(true);
    } catch (error) {
      console.error("Failed to start speech recognition:", error);
      showToast("Failed to start speech recognition. Please try again.");
    }
  }
};

Also applies to: 554-590


827-835: Speech input toggle looks good, consider adding aria-label.

The integration of the speech recognition feature into the ChatActions component is well-implemented. The conditional rendering based on the config is a good practice.

Suggestion for improvement:
Add an aria-label to the ChatAction button for better accessibility.

Consider adding an aria-label to the ChatAction:

<ChatAction
  onClick={async () =>
    isListening ? await stopListening() : await startListening()
  }
  text={isListening ? Locale.Chat.StopSpeak : Locale.Chat.StartSpeak}
  icon={<VoiceWhiteIcon />}
  aria-label={isListening ? "Stop speech recognition" : "Start speech recognition"}
/>

Line range hint 1561-1637: Keyboard shortcuts are a great addition, consider extracting to a separate function.

The implementation of keyboard shortcuts significantly improves the application's usability. The shortcuts are well-thought-out and consider different operating systems.

Suggestion for improvement:
Consider extracting the keyboard shortcut logic into a separate function for better code organization and reusability.

Extract the keyboard shortcut logic:

const handleKeyboardShortcuts = useCallback((event: KeyboardEvent) => {
  const isMac = navigator.platform.toUpperCase().indexOf("MAC") >= 0;
  const modifierKey = isMac ? event.metaKey : event.ctrlKey;

  if (modifierKey && event.shiftKey && event.key.toLowerCase() === "o") {
    event.preventDefault();
    chatStore.newSession();
    navigate(Path.Chat);
  } else if (event.shiftKey && event.key.toLowerCase() === "escape") {
    event.preventDefault();
    inputRef.current?.focus();
  }
  // ... (other shortcuts)
}, [messages, chatStore, navigate]);

useEffect(() => {
  window.addEventListener("keydown", handleKeyboardShortcuts);
  return () => {
    window.removeEventListener("keydown", handleKeyboardShortcuts);
  };
}, [handleKeyboardShortcuts]);

Line range hint 1638-1930: Image paste handling is a great addition, consider adding visual feedback.

The implementation of image paste handling for vision models is a valuable feature that enhances the user experience. The code correctly checks if the current model supports vision before processing the pasted image.

Suggestion for improvement:
Consider adding visual feedback when an image is successfully pasted, such as a brief toast notification or a subtle animation.

Add visual feedback for successful image paste:

const handlePaste = useCallback(
  async (event: React.ClipboardEvent<HTMLTextAreaElement>) => {
    // ... existing code ...
    if (file) {
      // ... existing code ...
      setAttachImages(images);
      showToast("Image successfully attached"); // Add this line
    }
    // ... existing code ...
  },
  [attachImages, chatStore],
);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 8ec4df4 and 6e38569.

📒 Files selected for processing (12)
  • app/client/platforms/alibaba.ts (2 hunks)
  • app/client/platforms/anthropic.ts (2 hunks)
  • app/client/platforms/baidu.ts (2 hunks)
  • app/client/platforms/bytedance.ts (2 hunks)
  • app/client/platforms/google.ts (2 hunks)
  • app/client/platforms/iflytek.ts (2 hunks)
  • app/client/platforms/moonshot.ts (2 hunks)
  • app/client/platforms/openai.ts (2 hunks)
  • app/client/platforms/tencent.ts (2 hunks)
  • app/components/chat.tsx (7 hunks)
  • app/locales/cn.ts (1 hunks)
  • app/locales/en.ts (1 hunks)
🧰 Additional context used
🔇 Additional comments (13)
app/client/platforms/moonshot.ts (1)

23-23: LGTM: Import of TranscriptionOptions

The addition of TranscriptionOptions to the import statement is consistent with the implementation of the new transcription method.

app/client/platforms/bytedance.ts (1)

17-17: LGTM: Import statement updated correctly.

The addition of TranscriptionOptions to the import statement is consistent with the new transcription method implementation.

app/client/platforms/iflytek.ts (1)

16-16: LGTM: Import statement addition is appropriate.

The addition of TranscriptionOptions to the import statement is consistent with the new transcription method being introduced. This ensures type safety and maintains consistency with the LLMApi interface.

app/client/platforms/alibaba.ts (2)

16-16: LGTM: Import statement added correctly.

The TranscriptionOptions import is correctly added and is consistent with the new transcription method being implemented.


Line range hint 1-295: Summary: Incomplete implementation of transcription feature.

This PR introduces the groundwork for adding a transcription feature to the QwenApi class, which aligns with the PR objective of adding a new feature (test stt). However, the implementation is currently incomplete:

  1. The TranscriptionOptions import has been added correctly.
  2. The transcription method has been added with the correct signature.
  3. The actual implementation of the transcription functionality is missing.

To fully meet the PR objectives:

  1. Complete the implementation of the transcription method.
  2. Add any necessary helper methods or utilities for transcription.
  3. Update the PR description with details about the new transcription feature and its usage.
  4. Consider adding tests for the new functionality.

To ensure no other parts of the codebase are expecting this method to be fully implemented, run the following script:

This will help identify if there are any other parts of the codebase that might be affected by this incomplete implementation.

✅ Verification successful

Verified: No existing usages of QwenApi.transcription or imports of TranscriptionOptions found.

The implementation of the transcription method in the QwenApi class is currently incomplete. However, verification confirms that there are no existing usages of this method or imports of TranscriptionOptions in the codebase. This means that adding the unimplemented method does not impact other parts of the project at this time.

To fully meet the PR objectives:

  1. Complete the implementation of the transcription method.
  2. Add any necessary helper methods or utilities for transcription.
  3. Update the PR description with details about the new transcription feature and its usage.
  4. Consider adding tests for the new functionality.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any usage of the transcription method in the codebase

# Search for any calls to transcription method of QwenApi
echo "Searching for usage of QwenApi.transcription:"
rg --type typescript --type javascript 'QwenApi.*transcription'

# Search for any imports of TranscriptionOptions that might be using it
echo "Searching for imports of TranscriptionOptions:"
rg --type typescript --type javascript 'import.*TranscriptionOptions'

Length of output: 411


Script:

#!/bin/bash
# Description: Check for any usage of the transcription method in the codebase

# Search for any calls to transcription method of QwenApi
echo "Searching for usage of QwenApi.transcription in *.ts and *.js files:"
rg 'QwenApi\.transcription' -g '*.ts' -g '*.js'

# Search for any imports of TranscriptionOptions in *.ts and *.js files
echo "Searching for imports of TranscriptionOptions in *.ts and *.js files:"
rg 'import.*TranscriptionOptions' -g '*.ts' -g '*.js'

Length of output: 401

app/client/platforms/tencent.ts (2)

12-12: LGTM: Import of TranscriptionOptions

The addition of TranscriptionOptions to the import list is correct and necessary for the new transcription method.


Line range hint 1-99: Incomplete implementation of speech-related features

The changes in this file lay the groundwork for speech-to-text and text-to-speech functionality, which aligns with the PR objective of adding a new feature. However, the actual implementation of these features is missing.

To ensure the PR meets its objectives, please clarify:

  1. Is this PR intended to be a partial implementation or a complete feature?
  2. Are there other files in this PR that contain the actual implementations?

If this is meant to be a complete feature, consider updating the PR description to reflect the current state of the implementation.

app/client/platforms/baidu.ts (1)

18-18: LGTM: Import statement addition is appropriate.

The addition of TranscriptionOptions to the import statement is consistent with the new transcription method being added to the ErnieApi class.

app/client/platforms/google.ts (1)

9-9: LGTM: Import statement updated correctly.

The addition of TranscriptionOptions to the import statement is consistent with the new transcription method being added to the GeminiProApi class.

app/client/platforms/anthropic.ts (1)

2-8: LGTM: Import changes are consistent with new feature addition.

The addition of TranscriptionOptions to the imports is in line with the introduction of transcription functionality to the ClaudeApi class.

app/client/platforms/openai.ts (2)

Line range hint 1-223: LGTM with minor suggestions

The implementation of the transcription method and the updates to the ChatGPTApi class are well-done and consistent with the existing code structure. The new functionality for audio transcription is a valuable addition to the API client.

Please consider the suggested improvements in the previous comments to enhance error handling, response parsing, and code consistency. Once these minor adjustments are made, the changes will be ready for merging.


201-201: Verify the path method implementation for the new model parameter

The path method signature has been updated to accept an optional model parameter, which is used in the transcription method. However, the implementation of the path method is not visible in this code snippet.

Please ensure that the path method correctly handles the new model parameter. You can use the following script to verify the implementation:

If the path method implementation is not found or doesn't handle the model parameter, please update it accordingly.

app/components/chat.tsx (1)

Line range hint 1-2099: Overall, excellent enhancements to the chat component.

The changes made to this file significantly improve the functionality and user experience of the chat component. Key improvements include:

  1. Addition of speech recognition capabilities
  2. Implementation of useful keyboard shortcuts
  3. Enhanced image handling for vision models
  4. Improved accessibility and user feedback

These features are well-integrated into the existing component structure, maintaining code consistency and readability. The new capabilities make the chat interface more versatile and user-friendly.

Suggestions for future improvements:

  1. Consider adding unit tests for the new functionality
  2. Explore opportunities for further code modularization, especially for complex logic blocks

Comment on lines +68 to +70
transcription(options: TranscriptionOptions): Promise<string> {
throw new Error("Method not implemented.");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Implement the transcription method.

The transcription method has been added with the correct signature, which is good. However, it currently throws a "Method not implemented" error.

To complete this feature:

  1. Implement the transcription logic using the options parameter.
  2. Ensure proper error handling.
  3. Add unit tests for this new method.
  4. Update any relevant documentation.

Would you like assistance in implementing this method or creating a task to track its implementation?

Comment on lines +93 to +95
transcription(options: TranscriptionOptions): Promise<string> {
throw new Error("Method not implemented.");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Implement the transcription method.

The transcription method is currently a placeholder. To complete this feature:

  1. Implement the actual transcription logic.
  2. Handle potential errors and edge cases.
  3. Ensure the implementation adheres to the TranscriptionOptions interface.
  4. Add appropriate error handling and logging.
  5. Consider adding unit tests for this new functionality.

Would you like assistance in implementing the transcription method or creating a GitHub issue to track this task?

Comment on lines 96 to +99

transcription(options: TranscriptionOptions): Promise<string> {
throw new Error("Method not implemented.");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider implementing both speech and transcription methods

Both the speech and transcription methods are currently unimplemented. To fully realize the speech-to-text and text-to-speech functionality mentioned in the PR summary, both methods should be implemented.

Would you like assistance in drafting implementations for both methods? This would ensure the full functionality of the new feature.

Comment on lines +85 to +87
transcription(options: TranscriptionOptions): Promise<string> {
throw new Error("Method not implemented.");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Implement the transcription method and add documentation.

The transcription method has been added with the correct signature, but it currently throws a "Method not implemented" error. Please implement the method to handle transcription requests. Additionally, consider adding documentation to explain the purpose and expected behavior of this method.

Would you like assistance in implementing the transcription method or creating documentation for it?

Comment on lines +86 to +88
transcription(options: TranscriptionOptions): Promise<string> {
throw new Error("Method not implemented.");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Implement the transcription method.

The transcription method has been added to the ClaudeApi class, which is a good start for introducing transcription functionality. However, the method is currently not implemented.

To complete this feature:

  1. Implement the transcription logic using the Anthropic API or any other appropriate service.
  2. Handle potential errors and edge cases.
  3. Add unit tests to verify the functionality.

Would you like assistance in implementing this method or creating a GitHub issue to track this task?

Comment on lines 184 to 223
async transcription(options: TranscriptionOptions): Promise<string> {
const formData = new FormData();
formData.append("file", options.file, "audio.wav");
formData.append("model", options.model ?? "whisper-1");
if (options.language) formData.append("language", options.language);
if (options.prompt) formData.append("prompt", options.prompt);
if (options.response_format)
formData.append("response_format", options.response_format);
if (options.temperature)
formData.append("temperature", options.temperature.toString());

console.log("[Request] openai audio transcriptions payload: ", options);

const controller = new AbortController();
options.onController?.(controller);

try {
const path = this.path(OpenaiPath.TranscriptionPath, options.model);
const headers = getHeaders(true);
const payload = {
method: "POST",
body: formData,
signal: controller.signal,
headers: headers,
};

// make a fetch request
const requestTimeoutId = setTimeout(
() => controller.abort(),
REQUEST_TIMEOUT_MS,
);
const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
const json = await res.json();
return json.text;
} catch (e) {
console.log("[Request] failed to make a audio transcriptions request", e);
throw e;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance error handling and response parsing in the transcription method

The transcription method implementation looks good overall, but there are a few areas for improvement:

  1. Error handling could be more specific to provide better debugging information.
  2. The response parsing assumes the presence of a text property without checking.
  3. The headers variable is declared but not used in the fetch call.

Consider applying the following improvements:

  1. Enhance error handling:
 } catch (e) {
-  console.log("[Request] failed to make a audio transcriptions request", e);
-  throw e;
+  console.error("[Request] failed to make an audio transcriptions request", e);
+  throw new Error(`Transcription request failed: ${e.message}`);
 }
  1. Add response status check and error handling:
 const res = await fetch(path, payload);
 clearTimeout(requestTimeoutId);
+if (!res.ok) {
+  throw new Error(`Transcription request failed with status ${res.status}`);
+}
 const json = await res.json();
-return json.text;
+return json.text ?? '';
  1. Use the headers variable in the fetch call:
 const payload = {
   method: "POST",
   body: formData,
   signal: controller.signal,
-  headers: headers,
+  headers,
 };

These changes will improve the robustness and reliability of the transcription method.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async transcription(options: TranscriptionOptions): Promise<string> {
const formData = new FormData();
formData.append("file", options.file, "audio.wav");
formData.append("model", options.model ?? "whisper-1");
if (options.language) formData.append("language", options.language);
if (options.prompt) formData.append("prompt", options.prompt);
if (options.response_format)
formData.append("response_format", options.response_format);
if (options.temperature)
formData.append("temperature", options.temperature.toString());
console.log("[Request] openai audio transcriptions payload: ", options);
const controller = new AbortController();
options.onController?.(controller);
try {
const path = this.path(OpenaiPath.TranscriptionPath, options.model);
const headers = getHeaders(true);
const payload = {
method: "POST",
body: formData,
signal: controller.signal,
headers: headers,
};
// make a fetch request
const requestTimeoutId = setTimeout(
() => controller.abort(),
REQUEST_TIMEOUT_MS,
);
const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
const json = await res.json();
return json.text;
} catch (e) {
console.log("[Request] failed to make a audio transcriptions request", e);
throw e;
}
}
async transcription(options: TranscriptionOptions): Promise<string> {
const formData = new FormData();
formData.append("file", options.file, "audio.wav");
formData.append("model", options.model ?? "whisper-1");
if (options.language) formData.append("language", options.language);
if (options.prompt) formData.append("prompt", options.prompt);
if (options.response_format)
formData.append("response_format", options.response_format);
if (options.temperature)
formData.append("temperature", options.temperature.toString());
console.log("[Request] openai audio transcriptions payload: ", options);
const controller = new AbortController();
options.onController?.(controller);
try {
const path = this.path(OpenaiPath.TranscriptionPath, options.model);
const headers = getHeaders(true);
const payload = {
method: "POST",
body: formData,
signal: controller.signal,
headers,
};
// make a fetch request
const requestTimeoutId = setTimeout(
() => controller.abort(),
REQUEST_TIMEOUT_MS,
);
const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
if (!res.ok) {
throw new Error(`Transcription request failed with status ${res.status}`);
}
const json = await res.json();
return json.text ?? '';
} catch (e) {
console.error("[Request] failed to make an audio transcriptions request", e);
throw new Error(`Transcription request failed: ${e.message}`);
}
}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (3)
app/store/config.ts (1)

136-141: Add language validation to STTConfigValidator

The addition of STTConfigValidator is good, but it could be more comprehensive:

  1. The engine validation looks correct.
  2. Consider adding validation for the language property to ensure it matches the expected format or is within a set of valid languages.

Here's a suggested improvement:

export const STTConfigValidator = {
  engine(x: string) {
    return x as STTEngineType;
  },
  language(x: string) {
    // Add logic to validate the language
    // For example:
    // return VALID_LANGUAGES.includes(x) ? x : DEFAULT_STT_LANGUAGE;
    return x;
  },
};

Replace the comment with appropriate validation logic for the language property.

app/constant.ts (1)

274-289: LGTM with suggestion: STT constants addition

The new constants for speech-to-text (STT) functionality are well-defined and provide a comprehensive set of options for engines and languages. However, consider the following suggestion:

  • The default language (DEFAULT_STT_LANGUAGE) is set to "zh-CN" (Chinese Simplified). Consider using a more neutral default, such as "en-US", or making it configurable based on the user's locale or preferences.

Would you like assistance in implementing a more flexible default language selection?

app/components/chat.tsx (1)

589-594: Function to handle speech recognition results.

The onRecognitionEnd function processes the final transcript and updates the user input. It also resets the isTranscription state if not using the default engine.

However, there's a minor issue:

Consider removing or replacing the console.log statement with a more appropriate logging mechanism for production code:

- console.log(finalTranscript);
+ // Consider using a proper logging mechanism here
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between d7eee52 and 80a7a1d.

⛔ Files ignored due to path filters (2)
  • app/icons/vioce-close.svg is excluded by !**/*.svg
  • app/icons/vioce-open.svg is excluded by !**/*.svg
📒 Files selected for processing (6)
  • app/components/chat.tsx (9 hunks)
  • app/components/stt-config.tsx (1 hunks)
  • app/constant.ts (2 hunks)
  • app/locales/cn.ts (2 hunks)
  • app/locales/en.ts (2 hunks)
  • app/store/config.ts (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • app/locales/cn.ts
  • app/locales/en.ts
🧰 Additional context used
🪛 Biome
app/components/stt-config.tsx

[error] 23-23: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)


[error] 34-36: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)


[error] 61-62: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)

🔇 Additional comments (18)
app/components/stt-config.tsx (2)

1-11: LGTM: Imports and component declaration are well-structured.

The imports are appropriate for the component's functionality, and the STTConfigList component is correctly exported with properly typed props.


1-75: Overall assessment: Well-implemented component with minor improvements needed.

The STTConfigList component is well-structured and correctly implements the speech-to-text configuration functionality. It handles browser compatibility and provides a good user interface for managing STT settings.

Key points:

  1. The component correctly uses React hooks and props for state management.
  2. Browser compatibility is handled appropriately, especially for Firefox.
  3. The UI is clear and easy to understand.

Areas for improvement:

  1. Refactor state updates to avoid assignments in expressions, as noted in previous comments.
  2. Ensure consistent use of localization throughout the component.
  3. Consider adding prop-types or TypeScript for better type checking of props.

Once these minor issues are addressed, the component will be in excellent shape and ready for production use.

🧰 Tools
🪛 Biome

[error] 23-23: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)


[error] 34-36: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)


[error] 61-62: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)

app/store/config.ts (4)

8-10: LGTM: New STT-related constants imported

The addition of DEFAULT_STT_ENGINE, DEFAULT_STT_ENGINES, and DEFAULT_STT_LANGUAGE constants is consistent with the existing structure and supports the new Speech-to-Text functionality.


26-26: LGTM: New STTEngineType added

The addition of STTEngineType is consistent with the existing type definitions and provides type safety for STT engine selection.


106-106: LGTM: New STTConfig type added

The addition of STTConfig type is consistent with the existing structure and provides type safety for the STT configuration.


88-99: Review TTS default setting and approve STT config addition

The changes look good overall:

  1. The new sttConfig is well-structured and uses appropriate default values.
  2. The addition of STT configuration aligns with the PR objective.

However, please review the change to ttsConfig.enable:

Changing ttsConfig.enable to true by default might affect user experience. Consider if this change is intentional and if it might surprise users who are not expecting TTS to be enabled automatically.

✅ Verification successful

Approve TTS and STT Configuration Changes

The updates look good overall:

  1. The new sttConfig is well-structured and uses appropriate default values.
  2. The addition of STT configuration aligns with the PR objective.
  3. Verified that ttsConfig.enable is set to true by default with no overrides disabling it.

No issues found with the current configurations.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify the impact of enabling TTS by default
rg "ttsConfig.*enable.*false" --type typescript

Length of output: 84


Script:

#!/bin/bash
# Verify the impact of enabling TTS by default by searching for 'ttsConfig.enable = false' in .ts files
rg "ttsConfig\.enable\s*=\s*false" --type-add 'ts:*.ts' --type ts

Length of output: 67

app/constant.ts (3)

153-153: LGTM: Addition of TranscriptionPath constant

The new TranscriptionPath constant in the OpenaiPath object is correctly defined and aligns with OpenAI's API structure for audio transcription endpoints. This addition supports the integration of speech-to-text functionality.


Line range hint 1-291: Summary of changes and recommendations

The additions to app/constant.ts primarily focus on implementing speech-to-text (STT) functionality. The changes are generally well-structured and provide necessary configurations for STT features. However, consider the following recommendations:

  1. Review the default STT language selection to ensure it's appropriate for your user base.
  2. Provide clarification on the Firefox-specific STT engine constant.
  3. Consider adding comments to explain the purpose and usage of new constants, especially browser-specific ones.

These changes will enhance the maintainability and clarity of the code for future development.


290-290: Please clarify: Firefox-specific STT engine constant

A new constant FIREFOX_DEFAULT_STT_ENGINE has been added, set to "OpenAI Whisper". Could you please provide context on why Firefox requires a separate default STT engine? Consider adding a comment to explain the rationale behind this browser-specific constant.

To better understand the usage of this constant, let's search for its occurrences in the codebase:

This will help us verify if and how this constant is being used in the application.

✅ Verification successful

FIREFOX_DEFAULT_STT_ENGINE Usage Verified

The constant FIREFOX_DEFAULT_STT_ENGINE is defined in app/constant.ts and appropriately used in app/components/chat.tsx to set the STT engine for Firefox browsers. This browser-specific configuration ensures compatibility and optimal performance for Firefox users. No issues found.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for usage of FIREFOX_DEFAULT_STT_ENGINE
rg "FIREFOX_DEFAULT_STT_ENGINE" --type typescript

Length of output: 84


Script:

#!/bin/bash
# Corrected search for usage of FIREFOX_DEFAULT_STT_ENGINE in TypeScript files
rg "FIREFOX_DEFAULT_STT_ENGINE" --glob "*.ts" --glob "*.tsx"

Length of output: 289

app/components/chat.tsx (9)

13-14: New icons imported for voice functionality.

Two new icons have been imported: VoiceOpenIcon and VoiceCloseIcon. These are likely used to represent the state of voice input in the UI.


77-77: Added import for Firefox detection.

The isFirefox function has been imported. This is likely used to provide browser-specific behavior for speech recognition.


104-106: New constants imported for speech-to-text configuration.

Constants for default speech-to-text (STT) engines have been added, including a specific one for Firefox. This supports the new voice input feature.


126-126: New import for speech transcription APIs.

The OpenAITranscriptionApi and WebTranscriptionApi have been imported. These are likely used to handle the speech-to-text functionality.


556-558: New state variables for speech recognition.

Three new state variables have been added:

  • isListening: Likely used to track if speech recognition is active.
  • isTranscription: Possibly used to indicate if transcription is in progress.
  • speechApi: Stores the speech recognition API instance.

These variables are crucial for managing the state of the new voice input feature.


573-579: Function to start speech recognition.

The startListening function initializes speech recognition and updates the isListening state. It also shows a toast notification to the user.


580-588: Function to stop speech recognition.

The stopListening function halts speech recognition, updates the isListening state, and shows a toast notification. It also sets isTranscription to true if not using the default engine.


831-840: Added voice input button to ChatActions component.

A new button for toggling voice input has been added to the ChatActions component. It uses the new icons and state variables to represent the current listening state.


Line range hint 1-2000: Summary of speech-to-text integration

The changes in this file successfully integrate speech-to-text functionality into the chat interface. Key additions include:

  1. New state variables for managing speech recognition status.
  2. Functions for starting and stopping speech recognition.
  3. A useEffect hook for initializing the speech API.
  4. UI elements for toggling voice input.

The implementation is generally well-done, with appropriate error handling and user feedback. However, there are a few minor suggestions for improvement:

  1. Consider adding dependencies to the useEffect hook for speech API initialization.
  2. Remove or replace the console.log statement in the onRecognitionEnd function.

Overall, these changes enhance the chat interface by providing users with a voice input option, improving accessibility and user experience.

Comment on lines +14 to +27
<ListItem
title={Locale.Settings.STT.Enable.Title}
subTitle={Locale.Settings.STT.Enable.SubTitle}
>
<input
type="checkbox"
checked={props.sttConfig.enable}
onChange={(e) =>
props.updateConfig(
(config) => (config.enable = e.currentTarget.checked),
)
}
></input>
</ListItem>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Refactor the config update to avoid assignment in expression.

The checkbox implementation for enabling/disabling STT is correct. However, the update logic can be improved to address the static analysis warning about assignment in expression.

Consider refactoring the onChange handler as follows:

 onChange={(e) =>
   props.updateConfig(
-    (config) => (config.enable = e.currentTarget.checked),
+    (config) => ({ ...config, enable: e.currentTarget.checked })
   )
 }

This change creates a new object with the updated enable property, which is a more idiomatic way to update state in React and avoids the assignment in expression issue.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<ListItem
title={Locale.Settings.STT.Enable.Title}
subTitle={Locale.Settings.STT.Enable.SubTitle}
>
<input
type="checkbox"
checked={props.sttConfig.enable}
onChange={(e) =>
props.updateConfig(
(config) => (config.enable = e.currentTarget.checked),
)
}
></input>
</ListItem>
<ListItem
title={Locale.Settings.STT.Enable.Title}
subTitle={Locale.Settings.STT.Enable.SubTitle}
>
<input
type="checkbox"
checked={props.sttConfig.enable}
onChange={(e) =>
props.updateConfig(
(config) => ({ ...config, enable: e.currentTarget.checked })
)
}
></input>
</ListItem>
🧰 Tools
🪛 Biome

[error] 23-23: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)

Comment on lines 55 to 72
{props.sttConfig.engine === "Web Speech API" && !isFirefox() && (
<ListItem title="语言选择">
<Select
value={props.sttConfig.language}
onChange={(e) => {
props.updateConfig(
(config) => (config.language = e.currentTarget.value),
);
}}
>
{DEFAULT_STT_LANGUAGES.map((v, i) => (
<option value={v} key={i}>
{v}
</option>
))}
</Select>
</ListItem>
)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Refactor the language update and localize the title.

The conditional rendering of the language selection for Web Speech API is implemented correctly. However, there are two issues to address:

  1. The update logic can be improved to address the static analysis warning about assignment in expression.

  2. The title for language selection is in Chinese, which might not be consistent with the rest of the application.

  3. Refactor the onChange handler as follows:

 onChange={(e) => {
   props.updateConfig(
-    (config) => (config.language = e.currentTarget.value),
+    (config) => ({ ...config, language: e.currentTarget.value }),
   );
 }}

This change creates a new object with the updated language property, which is a more idiomatic way to update state in React and avoids the assignment in expression issue.

  1. Replace the hardcoded Chinese title with a localized string:
- <ListItem title="语言选择">
+ <ListItem title={Locale.Settings.STT.Language.Title}>

Make sure to add the corresponding translation key in the locale files.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{props.sttConfig.engine === "Web Speech API" && !isFirefox() && (
<ListItem title="语言选择">
<Select
value={props.sttConfig.language}
onChange={(e) => {
props.updateConfig(
(config) => (config.language = e.currentTarget.value),
);
}}
>
{DEFAULT_STT_LANGUAGES.map((v, i) => (
<option value={v} key={i}>
{v}
</option>
))}
</Select>
</ListItem>
)}
{props.sttConfig.engine === "Web Speech API" && !isFirefox() && (
<ListItem title={Locale.Settings.STT.Language.Title}>
<Select
value={props.sttConfig.language}
onChange={(e) => {
props.updateConfig(
(config) => ({ ...config, language: e.currentTarget.value }),
);
}}
>
{DEFAULT_STT_LANGUAGES.map((v, i) => (
<option value={v} key={i}>
{v}
</option>
))}
</Select>
</ListItem>
)}
🧰 Tools
🪛 Biome

[error] 61-62: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)

Comment on lines +28 to +54
<ListItem title={Locale.Settings.STT.Engine.Title}>
<Select
value={props.sttConfig.engine}
onChange={(e) => {
props.updateConfig(
(config) =>
(config.engine = STTConfigValidator.engine(
e.currentTarget.value,
)),
);
}}
>
{isFirefox()
? DEFAULT_STT_ENGINES.filter((v) => v !== "Web Speech API").map(
(v, i) => (
<option value={v} key={i}>
{v}
</option>
),
)
: DEFAULT_STT_ENGINES.map((v, i) => (
<option value={v} key={i}>
{v}
</option>
))}
</Select>
</ListItem>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Refactor the engine update to avoid assignment in expression.

The STT engine selection implementation is correct and handles browser compatibility well. However, the update logic can be improved to address the static analysis warning about assignment in expression.

Consider refactoring the onChange handler as follows:

 onChange={(e) => {
   props.updateConfig(
-    (config) =>
-      (config.engine = STTConfigValidator.engine(
-        e.currentTarget.value,
-      )),
+    (config) => ({
+      ...config,
+      engine: STTConfigValidator.engine(e.currentTarget.value),
+    }),
   );
 }}

This change creates a new object with the updated engine property, which is a more idiomatic way to update state in React and avoids the assignment in expression issue.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<ListItem title={Locale.Settings.STT.Engine.Title}>
<Select
value={props.sttConfig.engine}
onChange={(e) => {
props.updateConfig(
(config) =>
(config.engine = STTConfigValidator.engine(
e.currentTarget.value,
)),
);
}}
>
{isFirefox()
? DEFAULT_STT_ENGINES.filter((v) => v !== "Web Speech API").map(
(v, i) => (
<option value={v} key={i}>
{v}
</option>
),
)
: DEFAULT_STT_ENGINES.map((v, i) => (
<option value={v} key={i}>
{v}
</option>
))}
</Select>
</ListItem>
<ListItem title={Locale.Settings.STT.Engine.Title}>
<Select
value={props.sttConfig.engine}
onChange={(e) => {
props.updateConfig(
(config) => ({
...config,
engine: STTConfigValidator.engine(e.currentTarget.value),
}),
);
}}
>
{isFirefox()
? DEFAULT_STT_ENGINES.filter((v) => v !== "Web Speech API").map(
(v, i) => (
<option value={v} key={i}>
{v}
</option>
),
)
: DEFAULT_STT_ENGINES.map((v, i) => (
<option value={v} key={i}>
{v}
</option>
))}
</Select>
</ListItem>
🧰 Tools
🪛 Biome

[error] 34-36: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)

Comment on lines 560 to 571
useEffect(() => {
if (isFirefox()) config.sttConfig.engine = FIREFOX_DEFAULT_STT_ENGINE;
setSpeechApi(
config.sttConfig.engine !== DEFAULT_STT_ENGINE
? new WebTranscriptionApi((transcription) =>
onRecognitionEnd(transcription),
)
: new OpenAITranscriptionApi((transcription) =>
onRecognitionEnd(transcription),
),
);
}, []);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

useEffect hook for initializing speech recognition.

This hook initializes the speech recognition API based on the configured engine. It uses a Firefox-specific engine if the browser is Firefox.

However, there's a potential issue:

The useEffect hook has an empty dependency array, which means it will only run once when the component mounts. If config.sttConfig.engine changes, this effect won't re-run. Consider adding the necessary dependencies to the effect:

- useEffect(() => {
+ useEffect(() => {
    if (isFirefox()) config.sttConfig.engine = FIREFOX_DEFAULT_STT_ENGINE;
    setSpeechApi(
      config.sttConfig.engine !== DEFAULT_STT_ENGINE
        ? new WebTranscriptionApi((transcription) =>
            onRecognitionEnd(transcription),
          )
        : new OpenAITranscriptionApi((transcription) =>
            onRecognitionEnd(transcription),
          ),
    );
- }, []);
+ }, [config.sttConfig.engine]);

Also, make sure onRecognitionEnd is memoized with useCallback to prevent unnecessary re-creations of the speech API.

Committable suggestion was skipped due to low confidence.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 80a7a1d and 5e514c5.

📒 Files selected for processing (1)
  • app/components/chat.tsx (7 hunks)
🧰 Additional context used
🔇 Additional comments (5)
app/components/chat.tsx (5)

13-14: New icon imports added for voice functionality.

The addition of VoiceOpenIcon and VoiceCloseIcon imports suggests new voice-related features are being implemented. These imports follow the existing naming conventions and file structure.


77-77: Firefox detection utility imported.

The isFirefox utility function is imported, indicating potential Firefox-specific handling in the component. This addition is consistent with the existing import structure.


104-106: Speech-related constants imported.

New constants for speech-to-text (STT) and text-to-speech (TTS) engines are added, including a Firefox-specific STT engine. These additions are consistent with the component's evolving voice-related functionality.


126-126: Transcription API utilities imported.

The addition of OpenAITranscriptionApi and WebTranscriptionApi imports indicates the implementation of multiple transcription methods for speech-to-text functionality. This aligns with the earlier constant imports and enhances the component's voice-related capabilities.


832-840: Voice input action button added with appropriate conditional rendering.

A new voice input action button is implemented within the ChatActions component. The button's appearance and behavior change based on the listening state, and it's only rendered when the STT feature is enabled in the configuration. This implementation provides a user-friendly way to access the new voice input functionality.

Comment on lines 555 to 595
const [isListening, setIsListening] = useState(false);
const [isTranscription, setIsTranscription] = useState(false);
const [speechApi, setSpeechApi] = useState<any>(null);

useEffect(() => {
if (isFirefox()) config.sttConfig.engine = FIREFOX_DEFAULT_STT_ENGINE;
setSpeechApi(
config.sttConfig.engine !== DEFAULT_STT_ENGINE
? new WebTranscriptionApi((transcription) =>
onRecognitionEnd(transcription),
)
: new OpenAITranscriptionApi((transcription) =>
onRecognitionEnd(transcription),
),
);
}, []);

const startListening = async () => {
showToast(Locale.Chat.StartSpeak);
if (speechApi) {
await speechApi.start();
setIsListening(true);
document.getElementById("chat-input")?.focus();
}
};
const stopListening = async () => {
showToast(Locale.Chat.CloseSpeak);
if (speechApi) {
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(true);
await speechApi.stop();
setIsListening(false);
}
document.getElementById("chat-input")?.focus();
};
const onRecognitionEnd = (finalTranscript: string) => {
console.log(finalTranscript);
if (finalTranscript) props.setUserInput(finalTranscript);
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(false);
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Speech recognition functionality implemented with potential improvements.

The addition of speech recognition state variables and functions enhances the component's capabilities. However, there are two points to consider:

  1. The useEffect hook initializing the speech API has an empty dependency array. This means it will only run once when the component mounts. If config.sttConfig.engine changes, the effect won't re-run. Consider adding the necessary dependencies:
- useEffect(() => {
+ useEffect(() => {
    if (isFirefox()) config.sttConfig.engine = FIREFOX_DEFAULT_STT_ENGINE;
    setSpeechApi(
      config.sttConfig.engine !== DEFAULT_STT_ENGINE
        ? new WebTranscriptionApi((transcription) =>
            onRecognitionEnd(transcription),
          )
        : new OpenAITranscriptionApi((transcription) =>
            onRecognitionEnd(transcription),
          ),
    );
- }, []);
+ }, [config.sttConfig.engine]);
  1. There's a console.log statement in the onRecognitionEnd function. Consider removing it or replacing it with a more appropriate logging mechanism for production code.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const [isListening, setIsListening] = useState(false);
const [isTranscription, setIsTranscription] = useState(false);
const [speechApi, setSpeechApi] = useState<any>(null);
useEffect(() => {
if (isFirefox()) config.sttConfig.engine = FIREFOX_DEFAULT_STT_ENGINE;
setSpeechApi(
config.sttConfig.engine !== DEFAULT_STT_ENGINE
? new WebTranscriptionApi((transcription) =>
onRecognitionEnd(transcription),
)
: new OpenAITranscriptionApi((transcription) =>
onRecognitionEnd(transcription),
),
);
}, []);
const startListening = async () => {
showToast(Locale.Chat.StartSpeak);
if (speechApi) {
await speechApi.start();
setIsListening(true);
document.getElementById("chat-input")?.focus();
}
};
const stopListening = async () => {
showToast(Locale.Chat.CloseSpeak);
if (speechApi) {
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(true);
await speechApi.stop();
setIsListening(false);
}
document.getElementById("chat-input")?.focus();
};
const onRecognitionEnd = (finalTranscript: string) => {
console.log(finalTranscript);
if (finalTranscript) props.setUserInput(finalTranscript);
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(false);
};
const [isListening, setIsListening] = useState(false);
const [isTranscription, setIsTranscription] = useState(false);
const [speechApi, setSpeechApi] = useState<any>(null);
useEffect(() => {
if (isFirefox()) config.sttConfig.engine = FIREFOX_DEFAULT_STT_ENGINE;
setSpeechApi(
config.sttConfig.engine !== DEFAULT_STT_ENGINE
? new WebTranscriptionApi((transcription) =>
onRecognitionEnd(transcription),
)
: new OpenAITranscriptionApi((transcription) =>
onRecognitionEnd(transcription),
),
);
}, [config.sttConfig.engine]);
const startListening = async () => {
showToast(Locale.Chat.StartSpeak);
if (speechApi) {
await speechApi.start();
setIsListening(true);
document.getElementById("chat-input")?.focus();
}
};
const stopListening = async () => {
showToast(Locale.Chat.CloseSpeak);
if (speechApi) {
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(true);
await speechApi.stop();
setIsListening(false);
}
document.getElementById("chat-input")?.focus();
};
const onRecognitionEnd = (finalTranscript: string) => {
console.log(finalTranscript);
if (finalTranscript) props.setUserInput(finalTranscript);
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(false);
};

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (1)
app/utils/speech.ts (1)

56-56: Fix typo in console warning message

There's a typo in the console warning on line 56. "Media Decives" should be "Media Devices".

Apply this diff to correct the typo:

     console.warn("Media Decives will work only with SSL");
+    console.warn("Media Devices will work only with SSL");
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 5e514c5 and b91c224.

📒 Files selected for processing (4)
  • app/components/chat.tsx (7 hunks)
  • app/components/stt-config.tsx (1 hunks)
  • app/store/config.ts (4 hunks)
  • app/utils/speech.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • app/store/config.ts
🧰 Additional context used
🪛 Biome
app/components/stt-config.tsx

[error] 23-23: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)


[error] 34-36: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)


[error] 61-62: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)

🔇 Additional comments (9)
app/components/stt-config.tsx (5)

1-11: LGTM: Imports and component declaration are well-structured.

The imports cover all necessary dependencies, and the component declaration with its props is correctly defined.


1-75: Overall assessment: Well-implemented STT configuration component with minor improvements needed.

The STTConfigList component effectively manages and displays configuration options for speech-to-text (STT) settings. It handles browser compatibility and conditional rendering appropriately. The suggested refactors will improve code quality by addressing assignment in expression issues and ensuring consistent localization.

Key points:

  1. The component structure and logic are sound.
  2. Update logic for config changes should be refactored to avoid assignments in expressions.
  3. Localization should be applied consistently, especially for the language selection title.

Once these minor improvements are implemented, the component will be more robust and maintainable.

🧰 Tools
🪛 Biome

[error] 23-23: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)


[error] 34-36: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)


[error] 61-62: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)


14-27: ⚠️ Potential issue

Refactor the config update to avoid assignment in expression.

The checkbox implementation for enabling/disabling STT is correct. However, the update logic can be improved to address the static analysis warning about assignment in expression.

Consider refactoring the onChange handler as follows:

 onChange={(e) =>
   props.updateConfig(
-    (config) => (config.enable = e.currentTarget.checked),
+    (config) => ({ ...config, enable: e.currentTarget.checked })
   )
 }

This change creates a new object with the updated enable property, which is a more idiomatic way to update state in React and avoids the assignment in expression issue.

🧰 Tools
🪛 Biome

[error] 23-23: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)


28-54: ⚠️ Potential issue

Refactor the engine update to avoid assignment in expression.

The STT engine selection implementation is correct and handles browser compatibility well. However, the update logic can be improved to address the static analysis warning about assignment in expression.

Consider refactoring the onChange handler as follows:

 onChange={(e) => {
   props.updateConfig(
-    (config) =>
-      (config.engine = STTConfigValidator.engine(
-        e.currentTarget.value,
-      )),
+    (config) => ({
+      ...config,
+      engine: STTConfigValidator.engine(e.currentTarget.value),
+    }),
   );
 }}

This change creates a new object with the updated engine property, which is a more idiomatic way to update state in React and avoids the assignment in expression issue.

🧰 Tools
🪛 Biome

[error] 34-36: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)


55-72: ⚠️ Potential issue

Refactor the language update and localize the title.

The conditional rendering of the language selection for Web Speech API is implemented correctly. However, there are two issues to address:

  1. The update logic can be improved to address the static analysis warning about assignment in expression.

  2. The title for language selection is in Chinese, which might not be consistent with the rest of the application.

  3. Refactor the onChange handler as follows:

 onChange={(e) => {
   props.updateConfig(
-    (config) => (config.language = e.currentTarget.value),
+    (config) => ({ ...config, language: e.currentTarget.value }),
   );
 }}

This change creates a new object with the updated language property, which is a more idiomatic way to update state in React and avoids the assignment in expression issue.

  1. Replace the hardcoded Chinese title with a localized string:
- <ListItem title="语言选择">
+ <ListItem title={Locale.Settings.STT.Language.Title}>

Make sure to add the corresponding translation key in the locale files.

🧰 Tools
🪛 Biome

[error] 61-62: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)

app/components/chat.tsx (4)

13-14: New voice-related icons added.

The addition of VoiceOpenIcon and VoiceCloseIcon suggests new voice functionality is being introduced to the chat component. The import statements are correctly formatted and follow the existing naming conventions.


77-77: Firefox detection utility added.

The import of isFirefox utility function suggests the introduction of Firefox-specific functionality or checks. This addition is correctly implemented and follows the existing import style.


104-106: Speech-related constants added.

New constants for speech-to-text (STT) and text-to-speech (TTS) engines have been imported, including a Firefox-specific STT engine. These additions align with the previously imported isFirefox utility, suggesting the implementation of browser-specific speech functionality.


126-126: Transcription APIs imported.

The addition of OpenAITranscriptionApi and WebTranscriptionApi indicates the implementation of transcription functionality, likely for speech-to-text features. This change supports multiple transcription services and is correctly implemented.

Comment on lines 555 to 597
const [isListening, setIsListening] = useState(false);
const [isTranscription, setIsTranscription] = useState(false);
const [speechApi, setSpeechApi] = useState<any>(null);

useEffect(() => {
if (isFirefox()) config.sttConfig.engine = FIREFOX_DEFAULT_STT_ENGINE;
const lang = config.sttConfig.lang;
setSpeechApi(
config.sttConfig.engine !== DEFAULT_STT_ENGINE
? new WebTranscriptionApi(
(transcription) => onRecognitionEnd(transcription),
lang,
)
: new OpenAITranscriptionApi((transcription) =>
onRecognitionEnd(transcription),
),
);
}, []);

const startListening = async () => {
showToast(Locale.Chat.StartSpeak);
if (speechApi) {
await speechApi.start();
setIsListening(true);
document.getElementById("chat-input")?.focus();
}
};
const stopListening = async () => {
showToast(Locale.Chat.CloseSpeak);
if (speechApi) {
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(true);
await speechApi.stop();
setIsListening(false);
}
document.getElementById("chat-input")?.focus();
};
const onRecognitionEnd = (finalTranscript: string) => {
console.log(finalTranscript);
if (finalTranscript) props.setUserInput(finalTranscript);
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(false);
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Voice input functionality implemented with potential improvements.

The addition of speech recognition state variables and functions enhances the component's capabilities. However, there are two points to consider:

  1. The useEffect hook initializing the speech API has an empty dependency array. This means it will only run once when the component mounts. If config.sttConfig.engine changes, the effect won't re-run. Consider adding the necessary dependencies:
- useEffect(() => {
+ useEffect(() => {
    if (isFirefox()) config.sttConfig.engine = FIREFOX_DEFAULT_STT_ENGINE;
    setSpeechApi(
      config.sttConfig.engine !== DEFAULT_STT_ENGINE
        ? new WebTranscriptionApi((transcription) =>
            onRecognitionEnd(transcription),
          )
        : new OpenAITranscriptionApi((transcription) =>
            onRecognitionEnd(transcription),
          ),
    );
- }, []);
+ }, [config.sttConfig.engine]);
  1. There's a console.log statement in the onRecognitionEnd function. Consider removing it or replacing it with a more appropriate logging mechanism for production code.

Committable suggestion was skipped due to low confidence.

Comment on lines +35 to +44
// @ts-ignore
navigator.getUserMedia =
// @ts-ignore
navigator.getUserMedia ||
// @ts-ignore
navigator.webkitGetUserMedia ||
// @ts-ignore
navigator.mozGetUserMedia ||
// @ts-ignore
navigator.msGetUserMedia;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove unnecessary and unused code setting navigator.getUserMedia

The code from lines 35-44 assigns navigator.getUserMedia with cross-browser support shims. However, this variable is not used later in the code since you're using navigator.mediaDevices.getUserMedia directly on line 46, which is the standard and widely supported method. Additionally, navigator.getUserMedia is deprecated. Removing these lines will clean up the code and prevent potential confusion.

Apply this diff to remove the unused code:

     // @ts-ignore
-    navigator.getUserMedia =
-      // @ts-ignore
-      navigator.getUserMedia ||
-      // @ts-ignore
-      navigator.webkitGetUserMedia ||
-      // @ts-ignore
-      navigator.mozGetUserMedia ||
-      // @ts-ignore
-      navigator.msGetUserMedia;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// @ts-ignore
navigator.getUserMedia =
// @ts-ignore
navigator.getUserMedia ||
// @ts-ignore
navigator.webkitGetUserMedia ||
// @ts-ignore
navigator.mozGetUserMedia ||
// @ts-ignore
navigator.msGetUserMedia;

Comment on lines +117 to +125
async start(): Promise<void> {
this.listeningStatus = true;
await this.recognitionInstance.start();
}

async stop(): Promise<void> {
this.listeningStatus = false;
await this.recognitionInstance.stop();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove unnecessary async and await in start() and stop() methods

The start() and stop() methods of the Web Speech API's SpeechRecognition are synchronous and do not return promises. Using async and await is unnecessary and may cause confusion. Update the methods to be synchronous by removing async and await, and adjust the method signatures accordingly.

Apply this diff to simplify the methods:

-  async start(): Promise<void> {
+  start(): void {
     this.listeningStatus = true;
     if (!this.recognitionInstance) {
       console.warn("Speech recognition is not supported or has been disabled.");
       return;
     }
-    await this.recognitionInstance.start();
+    this.recognitionInstance.start();
   }

-  async stop(): Promise<void> {
+  stop(): void {
     this.listeningStatus = false;
     if (!this.recognitionInstance) {
       console.warn("Speech recognition is not supported or has been disabled.");
       return;
     }
-    await this.recognitionInstance.stop();
+    this.recognitionInstance.stop();
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async start(): Promise<void> {
this.listeningStatus = true;
await this.recognitionInstance.start();
}
async stop(): Promise<void> {
this.listeningStatus = false;
await this.recognitionInstance.stop();
}
start(): void {
this.listeningStatus = true;
if (!this.recognitionInstance) {
console.warn("Speech recognition is not supported or has been disabled.");
return;
}
this.recognitionInstance.start();
}
stop(): void {
this.listeningStatus = false;
if (!this.recognitionInstance) {
console.warn("Speech recognition is not supported or has been disabled.");
return;
}
this.recognitionInstance.stop();
}

Comment on lines +98 to +125
if (isFirefox()) return;
const SpeechRecognition =
(window as any).SpeechRecognition ||
(window as any).webkitSpeechRecognition;
this.recognitionInstance = new SpeechRecognition();
this.recognitionInstance.continuous = true;
this.recognitionInstance.interimResults = true;
this.recognitionInstance.lang = lang ?? getSTTLang();
if (transcriptionCallback) {
this.onTranscriptionReceived(transcriptionCallback);
}
this.recognitionInstance.onresult = (event: any) => {
const result = event.results[event.results.length - 1];
if (result.isFinal) {
this.onTranscription(result[0].transcript);
}
};
}

async start(): Promise<void> {
this.listeningStatus = true;
await this.recognitionInstance.start();
}

async stop(): Promise<void> {
this.listeningStatus = false;
await this.recognitionInstance.stop();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle potential null recognitionInstance to prevent runtime errors

If isFirefox() returns true, the constructor exits early, leaving recognitionInstance as null. This can lead to null reference errors when start(), stop(), or isListening() methods are called. To prevent runtime errors, add checks to ensure recognitionInstance is not null before using it.

Apply this diff to add null checks in the methods:

   start(): void {
+    if (!this.recognitionInstance) {
+      console.warn("Speech recognition is not supported or has been disabled.");
+      return;
+    }
     this.listeningStatus = true;
     this.recognitionInstance.start();
   }

   stop(): void {
+    if (!this.recognitionInstance) {
+      console.warn("Speech recognition is not supported or has been disabled.");
+      return;
+    }
     this.listeningStatus = false;
     this.recognitionInstance.stop();
   }

   isListening = () => this.listeningStatus && this.recognitionInstance !== null;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (isFirefox()) return;
const SpeechRecognition =
(window as any).SpeechRecognition ||
(window as any).webkitSpeechRecognition;
this.recognitionInstance = new SpeechRecognition();
this.recognitionInstance.continuous = true;
this.recognitionInstance.interimResults = true;
this.recognitionInstance.lang = lang ?? getSTTLang();
if (transcriptionCallback) {
this.onTranscriptionReceived(transcriptionCallback);
}
this.recognitionInstance.onresult = (event: any) => {
const result = event.results[event.results.length - 1];
if (result.isFinal) {
this.onTranscription(result[0].transcript);
}
};
}
async start(): Promise<void> {
this.listeningStatus = true;
await this.recognitionInstance.start();
}
async stop(): Promise<void> {
this.listeningStatus = false;
await this.recognitionInstance.stop();
}
if (isFirefox()) return;
const SpeechRecognition =
(window as any).SpeechRecognition ||
(window as any).webkitSpeechRecognition;
this.recognitionInstance = new SpeechRecognition();
this.recognitionInstance.continuous = true;
this.recognitionInstance.interimResults = true;
this.recognitionInstance.lang = lang ?? getSTTLang();
if (transcriptionCallback) {
this.onTranscriptionReceived(transcriptionCallback);
}
this.recognitionInstance.onresult = (event: any) => {
const result = event.results[event.results.length - 1];
if (result.isFinal) {
this.onTranscription(result[0].transcript);
}
};
}
async start(): Promise<void> {
if (!this.recognitionInstance) {
console.warn("Speech recognition is not supported or has been disabled.");
return;
}
this.listeningStatus = true;
await this.recognitionInstance.start();
}
async stop(): Promise<void> {
if (!this.recognitionInstance) {
console.warn("Speech recognition is not supported or has been disabled.");
return;
}
this.listeningStatus = false;
await this.recognitionInstance.stop();
}
isListening = () => this.listeningStatus && this.recognitionInstance !== null;

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (1)
app/components/chat.module.scss (1)

78-84: LGTM with a minor suggestion for improvement

The addition of the .listening state for .chat-input-action enhances the visual feedback, likely for voice input functionality. The implementation is consistent with the existing hover styles, providing a cohesive user experience.

To ensure smooth transitions between states, consider adding transition properties to the .listening class:

 &.listening {
   width: var(--full-width);
+  transition: width 0.3s ease;
   .text {
     opacity: 1;
     transform: translate(0);
+    transition: opacity 0.3s ease, transform 0.3s ease;
   }
 }

This will create a smoother visual effect when entering and exiting the listening state, matching the existing hover transition behavior.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between b91c224 and 7475519.

📒 Files selected for processing (2)
  • app/components/chat.module.scss (1 hunks)
  • app/components/chat.tsx (9 hunks)
🧰 Additional context used

Comment on lines +13 to +14
import VoiceOpenIcon from "../icons/vioce-open.svg";
import VoiceCloseIcon from "../icons/vioce-close.svg";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix typo in icon file paths

There's a typo in the file paths for the newly imported voice icons. "vioce" should be "voice".

Please apply the following changes:

-import VoiceOpenIcon from "../icons/vioce-open.svg";
-import VoiceCloseIcon from "../icons/vioce-close.svg";
+import VoiceOpenIcon from "../icons/voice-open.svg";
+import VoiceCloseIcon from "../icons/voice-close.svg";
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import VoiceOpenIcon from "../icons/vioce-open.svg";
import VoiceCloseIcon from "../icons/vioce-close.svg";
import VoiceOpenIcon from "../icons/voice-open.svg";
import VoiceCloseIcon from "../icons/voice-close.svg";

Comment on lines 595 to 600
const onRecognitionEnd = (finalTranscript: string) => {
console.log(finalTranscript);
if (finalTranscript) props.setUserInput(finalTranscript);
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(false);
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove console.log in production code

There's a console.log statement in the onRecognitionEnd function. Consider removing it or replacing it with a more appropriate logging mechanism for production code.

  const onRecognitionEnd = (finalTranscript: string) => {
-   console.log(finalTranscript);
    if (finalTranscript) props.setUserInput(finalTranscript);
    if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
      setIsTranscription(false);
  };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const onRecognitionEnd = (finalTranscript: string) => {
console.log(finalTranscript);
if (finalTranscript) props.setUserInput(finalTranscript);
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(false);
};
const onRecognitionEnd = (finalTranscript: string) => {
if (finalTranscript) props.setUserInput(finalTranscript);
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(false);
};

@@ -546,6 +555,50 @@
}
}, [chatStore, currentModel, models]);

const [isListening, setIsListening] = useState(false);
const [isTranscription, setIsTranscription] = useState(false);
const [speechApi, setSpeechApi] = useState<any>(null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider using a more specific type for speechApi state

The speechApi state is initialized with any type. Consider using a more specific type to improve type safety.

- const [speechApi, setSpeechApi] = useState<any>(null);
+ const [speechApi, setSpeechApi] = useState<WebTranscriptionApi | OpenAITranscriptionApi | null>(null);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const [speechApi, setSpeechApi] = useState<any>(null);
const [speechApi, setSpeechApi] = useState<WebTranscriptionApi | OpenAITranscriptionApi | null>(null);

Comment on lines +562 to +575
useEffect(() => {
if (isFirefox()) config.sttConfig.engine = FIREFOX_DEFAULT_STT_ENGINE;
const lang = config.sttConfig.lang;
setSpeechApi(
config.sttConfig.engine !== DEFAULT_STT_ENGINE
? new WebTranscriptionApi(
(transcription) => onRecognitionEnd(transcription),
lang,
)
: new OpenAITranscriptionApi((transcription) =>
onRecognitionEnd(transcription),
),
);
}, []);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Consider adding dependencies to useEffect

The useEffect hook initializing the speech API has an empty dependency array. This means it will only run once when the component mounts. If config.sttConfig.engine or config.sttConfig.lang changes, the effect won't re-run.

Consider adding the necessary dependencies:

- useEffect(() => {
+ useEffect(() => {
    if (isFirefox()) config.sttConfig.engine = FIREFOX_DEFAULT_STT_ENGINE;
    const lang = config.sttConfig.lang;
    setSpeechApi(
      config.sttConfig.engine !== DEFAULT_STT_ENGINE
        ? new WebTranscriptionApi(
            (transcription) => onRecognitionEnd(transcription),
            lang,
          )
        : new OpenAITranscriptionApi((transcription) =>
            onRecognitionEnd(transcription),
          ),
    );
- }, []);
+ }, [config.sttConfig.engine, config.sttConfig.lang]);

Also, consider memoizing the onRecognitionEnd function with useCallback to prevent unnecessary re-creations of the speech API.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
useEffect(() => {
if (isFirefox()) config.sttConfig.engine = FIREFOX_DEFAULT_STT_ENGINE;
const lang = config.sttConfig.lang;
setSpeechApi(
config.sttConfig.engine !== DEFAULT_STT_ENGINE
? new WebTranscriptionApi(
(transcription) => onRecognitionEnd(transcription),
lang,
)
: new OpenAITranscriptionApi((transcription) =>
onRecognitionEnd(transcription),
),
);
}, []);
useEffect(() => {
if (isFirefox()) config.sttConfig.engine = FIREFOX_DEFAULT_STT_ENGINE;
const lang = config.sttConfig.lang;
setSpeechApi(
config.sttConfig.engine !== DEFAULT_STT_ENGINE
? new WebTranscriptionApi(
(transcription) => onRecognitionEnd(transcription),
lang,
)
: new OpenAITranscriptionApi((transcription) =>
onRecognitionEnd(transcription),
),
);
}, [config.sttConfig.engine, config.sttConfig.lang]);

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between f0b9068 and b5a5558.

📒 Files selected for processing (1)
  • app/components/chat.tsx (9 hunks)
🧰 Additional context used
🔇 Additional comments (3)
app/components/chat.tsx (3)

13-14: ⚠️ Potential issue

Fix typo in icon file paths

There's a typo in the file paths for the newly imported voice icons. "vioce" should be "voice".

Please apply the following changes:

-import VoiceOpenIcon from "../icons/vioce-open.svg";
-import VoiceCloseIcon from "../icons/vioce-close.svg";
+import VoiceOpenIcon from "../icons/voice-open.svg";
+import VoiceCloseIcon from "../icons/voice-close.svg";

Likely invalid or redundant comment.


560-560: 🛠️ Refactor suggestion

Consider using a more specific type for speechApi state

The speechApi state is initialized with any type. Consider using a more specific type to improve type safety.

- const [speechApi, setSpeechApi] = useState<any>(null);
+ const [speechApi, setSpeechApi] = useState<WebTranscriptionApi | OpenAITranscriptionApi | null>(null);

Likely invalid or redundant comment.


562-575: ⚠️ Potential issue

Consider adding dependencies to useEffect

The useEffect hook initializing the speech API has an empty dependency array. This means it will only run once when the component mounts. If config.sttConfig.engine or config.sttConfig.lang changes, the effect won't re-run.

Consider adding the necessary dependencies:

- useEffect(() => {
+ useEffect(() => {
    if (isFirefox()) config.sttConfig.engine = FIREFOX_DEFAULT_STT_ENGINE;
    const lang = config.sttConfig.lang;
    setSpeechApi(
      config.sttConfig.engine !== DEFAULT_STT_ENGINE
        ? new WebTranscriptionApi(
            (transcription) => onRecognitionEnd(transcription),
            lang,
          )
        : new OpenAITranscriptionApi((transcription) =>
            onRecognitionEnd(transcription),
          ),
    );
- }, []);
+ }, [config.sttConfig.engine, config.sttConfig.lang]);

Also, consider memoizing the onRecognitionEnd function with useCallback to prevent unnecessary re-creations of the speech API.

Likely invalid or redundant comment.

Comment on lines +604 to +611
const onRecognitionEnd = (finalTranscript: string) => {
console.log(finalTranscript);
if (finalTranscript) {
props.setUserInput((prevInput) => prevInput + finalTranscript);
}
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(false);
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove console.log in production code

There's a console.log statement in the onRecognitionEnd function. Consider removing it or replacing it with a more appropriate logging mechanism for production code.

 const onRecognitionEnd = (finalTranscript: string) => {
-  console.log(finalTranscript);
   if (finalTranscript) {
     props.setUserInput((prevInput) => prevInput + finalTranscript);
   }
   if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
     setIsTranscription(false);
 };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const onRecognitionEnd = (finalTranscript: string) => {
console.log(finalTranscript);
if (finalTranscript) {
props.setUserInput((prevInput) => prevInput + finalTranscript);
}
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(false);
};
const onRecognitionEnd = (finalTranscript: string) => {
if (finalTranscript) {
props.setUserInput((prevInput) => prevInput + finalTranscript);
}
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(false);
};

Comment on lines +584 to +603
const startListening = async () => {
playSound("/Recordingstart.mp3");
showToast(Locale.Chat.StartSpeak);
if (speechApi) {
await speechApi.start();
setIsListening(true);
document.getElementById("chat-input")?.focus();
}
};
const stopListening = async () => {
showToast(Locale.Chat.CloseSpeak);
if (speechApi) {
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(true);
await speechApi.stop();
setIsListening(false);
}
playSound("/Recordingdone.mp3");
document.getElementById("chat-input")?.focus();
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve focus management in startListening and stopListening

Both startListening and stopListening functions set focus to the chat input. This might not be ideal for all users, especially those using screen readers. Consider making this behavior configurable:

+ const [autoFocusAfterListening, setAutoFocusAfterListening] = useState(true);

 const startListening = async () => {
   // ... existing code ...
-  document.getElementById("chat-input")?.focus();
+  if (autoFocusAfterListening) {
+    document.getElementById("chat-input")?.focus();
+  }
 };

 const stopListening = async () => {
   // ... existing code ...
-  document.getElementById("chat-input")?.focus();
+  if (autoFocusAfterListening) {
+    document.getElementById("chat-input")?.focus();
+  }
 };

Add a setting in the user preferences to control this behavior.

Committable suggestion was skipped due to low confidence.

@lloydzhou
Copy link
Contributor

合一下主分支

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Merge the main branch

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between b5a5558 and 0ea650d.

📒 Files selected for processing (7)
  • app/client/platforms/openai.ts (2 hunks)
  • app/components/chat.tsx (9 hunks)
  • app/components/markdown.tsx (1 hunks)
  • app/components/settings.tsx (2 hunks)
  • app/locales/cn.ts (2 hunks)
  • app/locales/en.ts (2 hunks)
  • app/store/config.ts (4 hunks)
✅ Files skipped from review due to trivial changes (1)
  • app/components/markdown.tsx
🚧 Files skipped from review as they are similar to previous changes (2)
  • app/components/settings.tsx
  • app/store/config.ts
🧰 Additional context used
🔇 Additional comments (11)
app/client/platforms/openai.ts (3)

37-37: LGTM: Import statement updated correctly

The addition of TranscriptionOptions to the import statement is consistent with the new transcription method implementation. This change is necessary and doesn't introduce any issues.


Line range hint 71-71: LGTM: path method signature updated correctly

The path method signature has been updated to include an optional model parameter, as mentioned in the AI-generated summary. This change:

  1. Allows for more flexibility in constructing API paths.
  2. Is consistent with the needs of the new transcription method.
  3. Addresses the existing comment about updating the path method signature.

This update improves the method's versatility without introducing any issues.


Line range hint 1-1: Summary of changes: Audio transcription functionality added

The changes in this file enhance the ChatGPTApi class by:

  1. Adding a new transcription method for audio transcription functionality.
  2. Updating the path method signature to support dynamic API path construction.

These modifications improve the class's capabilities and flexibility. While the implementation is generally solid, there are a few suggested improvements for error handling and response parsing in the transcription method.

Overall, these changes are a valuable addition to the codebase, extending the API's functionality to include audio transcription services.

app/locales/cn.ts (3)

542-551: LGTM: New STT (Speech-to-Text) section added

The new STT section has been successfully added to the Chinese localization file. It includes appropriate translations for enabling the STT feature and selecting the conversion engine. This addition aligns well with the integration of voice recognition features mentioned in the PR summary.


95-97: LGTM: Chat section updated for voice input functionality

The Chat section has been successfully updated to include translations related to voice input functionality. The changes to StartSpeak, CloseSpeak, and StopSpeak properties are clear and appropriate. The StopSpeak property now includes an informative message about the recording state, which enhances user experience.


Line range hint 1-1000: Summary: Successful integration of voice recognition features in Chinese localization

The changes made to app/locales/cn.ts effectively integrate voice recognition features into the Chinese localization. The additions include:

  1. A new STT (Speech-to-Text) section in the Settings object, providing translations for enabling the feature and selecting the conversion engine.
  2. Updated translations in the Chat section for starting, closing, and stopping voice input.

These modifications align well with the PR objectives and maintain consistency with the existing code structure and style. The translations are clear and appropriate for Chinese users.

app/locales/en.ts (1)

Line range hint 1-1064: LGTM with minor suggestions

The changes made to the localization file are appropriate for supporting the new STT (Speech-to-Text) feature. The structure and naming conventions are consistent with the rest of the file. Please consider the two suggestions made earlier:

  1. Fix the inconsistency in the STT Engine subtitle.
  2. Consider renaming the StopSpeak property to better reflect its dual purpose for speaking and recording states.

Once these minor issues are addressed, the localization changes will be fully aligned with the new STT functionality.

app/components/chat.tsx (4)

607-614: ⚠️ Potential issue

Remove console.log in production code

There's a console.log statement in the onRecognitionEnd function. Consider removing it or replacing it with a more appropriate logging mechanism for production code.

 const onRecognitionEnd = (finalTranscript: string) => {
-  console.log(finalTranscript);
   if (finalTranscript) {
     props.setUserInput((prevInput) => prevInput + finalTranscript);
   }
   if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
     setIsTranscription(false);
 };

Likely invalid or redundant comment.


587-606: 🛠️ Refactor suggestion

Improve focus management in startListening and stopListening

Both startListening and stopListening functions set focus to the chat input. This might not be ideal for all users, especially those using screen readers. Consider making this behavior configurable:

+ const [autoFocusAfterListening, setAutoFocusAfterListening] = useState(true);

 const startListening = async () => {
   // ... existing code ...
-  document.getElementById("chat-input")?.focus();
+  if (autoFocusAfterListening) {
+    document.getElementById("chat-input")?.focus();
+  }
 };

 const stopListening = async () => {
   // ... existing code ...
-  document.getElementById("chat-input")?.focus();
+  if (autoFocusAfterListening) {
+    document.getElementById("chat-input")?.focus();
+  }
 };

Add a setting in the user preferences to control this behavior.

Likely invalid or redundant comment.


13-14: ⚠️ Potential issue

Fix typo in icon import paths

There's a typo in the file paths for the newly imported voice icons. "vioce" should be "voice".

Please apply the following changes:

-import VoiceOpenIcon from "../icons/vioce-open.svg";
-import VoiceCloseIcon from "../icons/vioce-close.svg";
+import VoiceOpenIcon from "../icons/voice-open.svg";
+import VoiceCloseIcon from "../icons/voice-close.svg";

Likely invalid or redundant comment.


565-578: ⚠️ Potential issue

Consider adding dependencies to useEffect

The useEffect hook initializing the speech API has an empty dependency array. This means it will only run once when the component mounts. If config.sttConfig.engine or config.sttConfig.lang changes, the effect won't re-run.

Consider adding the necessary dependencies:

- useEffect(() => {
+ useEffect(() => {
    if (isFirefox()) config.sttConfig.engine = FIREFOX_DEFAULT_STT_ENGINE;
    const lang = config.sttConfig.lang;
    setSpeechApi(
      config.sttConfig.engine !== DEFAULT_STT_ENGINE
        ? new WebTranscriptionApi(
            (transcription) => onRecognitionEnd(transcription),
            lang,
          )
        : new OpenAITranscriptionApi((transcription) =>
            onRecognitionEnd(transcription),
          ),
    );
- }, []);
+ }, [config.sttConfig.engine, config.sttConfig.lang]);

Also, consider memoizing the onRecognitionEnd function with useCallback to prevent unnecessary re-creations of the speech API.

Likely invalid or redundant comment.

Comment on lines +184 to +223
async transcription(options: TranscriptionOptions): Promise<string> {
const formData = new FormData();
formData.append("file", options.file, "audio.wav");
formData.append("model", options.model ?? "whisper-1");
if (options.language) formData.append("language", options.language);
if (options.prompt) formData.append("prompt", options.prompt);
if (options.response_format)
formData.append("response_format", options.response_format);
if (options.temperature)
formData.append("temperature", options.temperature.toString());

console.log("[Request] openai audio transcriptions payload: ", options);

const controller = new AbortController();
options.onController?.(controller);

try {
const path = this.path(OpenaiPath.TranscriptionPath);
const headers = getHeaders(true);
const payload = {
method: "POST",
body: formData,
signal: controller.signal,
headers: headers,
};

// make a fetch request
const requestTimeoutId = setTimeout(
() => controller.abort(),
REQUEST_TIMEOUT_MS,
);
const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
const json = await res.json();
return json.text;
} catch (e) {
console.log("[Request] failed to make a audio transcriptions request", e);
throw e;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance error handling and response parsing in the transcription method

The implementation of the transcription method looks good overall, but there are a few areas for improvement:

  1. Error handling could be more specific to provide better debugging information.
  2. The response parsing assumes the presence of a text property without checking.
  3. The headers variable is declared but not used in the fetch call.

I acknowledge the existing comment about these issues. To address them, consider applying the following improvements:

  1. Enhance error handling:
 } catch (e) {
-  console.log("[Request] failed to make a audio transcriptions request", e);
-  throw e;
+  console.error("[Request] failed to make an audio transcriptions request", e);
+  throw new Error(`Transcription request failed: ${e.message}`);
 }
  1. Add response status check and error handling:
 const res = await fetch(path, payload);
 clearTimeout(requestTimeoutId);
+if (!res.ok) {
+  throw new Error(`Transcription request failed with status ${res.status}`);
+}
 const json = await res.json();
-return json.text;
+return json.text ?? '';
  1. Use the headers variable in the fetch call:
 const payload = {
   method: "POST",
   body: formData,
   signal: controller.signal,
-  headers: headers,
+  headers,
 };

These changes will improve the robustness and reliability of the transcription method.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async transcription(options: TranscriptionOptions): Promise<string> {
const formData = new FormData();
formData.append("file", options.file, "audio.wav");
formData.append("model", options.model ?? "whisper-1");
if (options.language) formData.append("language", options.language);
if (options.prompt) formData.append("prompt", options.prompt);
if (options.response_format)
formData.append("response_format", options.response_format);
if (options.temperature)
formData.append("temperature", options.temperature.toString());
console.log("[Request] openai audio transcriptions payload: ", options);
const controller = new AbortController();
options.onController?.(controller);
try {
const path = this.path(OpenaiPath.TranscriptionPath);
const headers = getHeaders(true);
const payload = {
method: "POST",
body: formData,
signal: controller.signal,
headers: headers,
};
// make a fetch request
const requestTimeoutId = setTimeout(
() => controller.abort(),
REQUEST_TIMEOUT_MS,
);
const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
const json = await res.json();
return json.text;
} catch (e) {
console.log("[Request] failed to make a audio transcriptions request", e);
throw e;
}
}
async transcription(options: TranscriptionOptions): Promise<string> {
const formData = new FormData();
formData.append("file", options.file, "audio.wav");
formData.append("model", options.model ?? "whisper-1");
if (options.language) formData.append("language", options.language);
if (options.prompt) formData.append("prompt", options.prompt);
if (options.response_format)
formData.append("response_format", options.response_format);
if (options.temperature)
formData.append("temperature", options.temperature.toString());
console.log("[Request] openai audio transcriptions payload: ", options);
const controller = new AbortController();
options.onController?.(controller);
try {
const path = this.path(OpenaiPath.TranscriptionPath);
const headers = getHeaders(true);
const payload = {
method: "POST",
body: formData,
signal: controller.signal,
headers,
};
// make a fetch request
const requestTimeoutId = setTimeout(
() => controller.abort(),
REQUEST_TIMEOUT_MS,
);
const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
if (!res.ok) {
throw new Error(`Transcription request failed with status ${res.status}`);
}
const json = await res.json();
return json.text ?? '';
} catch (e) {
console.error("[Request] failed to make an audio transcriptions request", e);
throw new Error(`Transcription request failed: ${e.message}`);
}
}

Comment on lines +97 to +98
CloseSpeak: "Stop Speak",
StopSpeak: "Recording...",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider renaming the StopSpeak property

The StopSpeak property has been updated to include a "Recording..." state, which is likely related to the new STT functionality. However, the property name "StopSpeak" might be confusing when used for both speaking and recording states.

Consider renaming the StopSpeak property to something more generic that covers both speaking and recording actions, such as CloseAudio or StopAudioInteraction. For example:

-CloseSpeak: "Stop Speak",
-StopSpeak: "Recording...",
+CloseAudio: "Stop Speak",
+StopAudioInteraction: "Recording...",

This change would make the property names more descriptive and less ambiguous when used in different contexts.

Committable suggestion was skipped due to low confidence.

Comment on lines +550 to +559
STT: {
Enable: {
Title: "Enable STT",
SubTitle: "Enable Speech-to-Text",
},
Engine: {
Title: "STT Engine",
SubTitle: "Text-to-Speech Engine",
},
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix inconsistency in STT Engine subtitle

The STT section has been added successfully, but there's an inconsistency in the Engine.SubTitle property. It currently refers to "Text-to-Speech" instead of "Speech-to-Text".

Please update the Engine.SubTitle to correctly reflect Speech-to-Text functionality:

 STT: {
   Enable: {
     Title: "Enable STT",
     SubTitle: "Enable Speech-to-Text",
   },
   Engine: {
     Title: "STT Engine",
-    SubTitle: "Text-to-Speech Engine",
+    SubTitle: "Speech-to-Text Engine",
   },
 },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
STT: {
Enable: {
Title: "Enable STT",
SubTitle: "Enable Speech-to-Text",
},
Engine: {
Title: "STT Engine",
SubTitle: "Text-to-Speech Engine",
},
},
STT: {
Enable: {
Title: "Enable STT",
SubTitle: "Enable Speech-to-Text",
},
Engine: {
Title: "STT Engine",
SubTitle: "Speech-to-Text Engine",
},
},

Comment on lines +580 to +585
function playSound(fileName: string) {
const audio = new Audio(fileName);
audio.play().catch((error) => {
console.error("error:", error);
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve error handling in playSound function

The playSound function catches errors but only logs them to the console. Consider adding user-facing error handling:

 function playSound(fileName: string) {
   const audio = new Audio(fileName);
   audio.play().catch((error) => {
     console.error("error:", error);
+    showToast(Locale.Chat.AudioPlayError);
   });
 }

Also, consider adding a check to ensure the audio file exists before attempting to play it.

Committable suggestion was skipped due to low confidence.

@lloydzhou
Copy link
Contributor

tts以及stt相关功能可能会暂停一下
优先处理openai realtime api

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


tts and stt related functions may be suspended for a while
Prioritize openai realtime api

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants