Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test stt #5634

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open
11 changes: 11 additions & 0 deletions app/client/api.ts
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,16 @@ export interface SpeechOptions {
onController?: (controller: AbortController) => void;
}

export interface TranscriptionOptions {
model?: "whisper-1";
file: Blob;
language?: string;
prompt?: string;
response_format?: "json" | "text" | "srt" | "verbose_json" | "vtt";
temperature?: number;
onController?: (controller: AbortController) => void;
}

export interface ChatOptions {
messages: RequestMessage[];
config: LLMConfig;
Expand Down Expand Up @@ -98,6 +108,7 @@ export interface LLMModelProvider {
export abstract class LLMApi {
abstract chat(options: ChatOptions): Promise<void>;
abstract speech(options: SpeechOptions): Promise<ArrayBuffer>;
abstract transcription(options: TranscriptionOptions): Promise<string>;
abstract usage(): Promise<LLMUsage>;
abstract models(): Promise<LLMModel[]>;
}
Expand Down
5 changes: 5 additions & 0 deletions app/client/platforms/alibaba.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import {
LLMApi,
LLMModel,
SpeechOptions,
TranscriptionOptions,
MultimodalContent,
} from "../api";
import Locale from "../../locales";
Expand Down Expand Up @@ -89,6 +90,10 @@ export class QwenApi implements LLMApi {
throw new Error("Method not implemented.");
}

transcription(options: TranscriptionOptions): Promise<string> {
throw new Error("Method not implemented.");
}
Comment on lines +93 to +95
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Implement the transcription method.

The transcription method is currently a placeholder. To complete this feature:

  1. Implement the actual transcription logic.
  2. Handle potential errors and edge cases.
  3. Ensure the implementation adheres to the TranscriptionOptions interface.
  4. Add appropriate error handling and logging.
  5. Consider adding unit tests for this new functionality.

Would you like assistance in implementing the transcription method or creating a GitHub issue to track this task?


async chat(options: ChatOptions) {
const messages = options.messages.map((v) => ({
role: v.role,
Expand Down
12 changes: 11 additions & 1 deletion app/client/platforms/anthropic.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
import { Anthropic, ApiPath } from "@/app/constant";
import { ChatOptions, getHeaders, LLMApi, SpeechOptions } from "../api";
import {
ChatOptions,
getHeaders,
LLMApi,
SpeechOptions,
TranscriptionOptions,
} from "../api";
import {
useAccessStore,
useAppConfig,
Expand Down Expand Up @@ -77,6 +83,10 @@ export class ClaudeApi implements LLMApi {
throw new Error("Method not implemented.");
}

transcription(options: TranscriptionOptions): Promise<string> {
throw new Error("Method not implemented.");
}
Comment on lines +86 to +88
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Implement the transcription method.

The transcription method has been added to the ClaudeApi class, which is a good start for introducing transcription functionality. However, the method is currently not implemented.

To complete this feature:

  1. Implement the transcription logic using the Anthropic API or any other appropriate service.
  2. Handle potential errors and edge cases.
  3. Add unit tests to verify the functionality.

Would you like assistance in implementing this method or creating a GitHub issue to track this task?


extractMessage(res: any) {
console.log("[Response] claude response: ", res);

Expand Down
5 changes: 5 additions & 0 deletions app/client/platforms/baidu.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ import {
LLMModel,
MultimodalContent,
SpeechOptions,
TranscriptionOptions,
} from "../api";
import Locale from "../../locales";
import {
Expand Down Expand Up @@ -81,6 +82,10 @@ export class ErnieApi implements LLMApi {
throw new Error("Method not implemented.");
}

transcription(options: TranscriptionOptions): Promise<string> {
throw new Error("Method not implemented.");
}
Comment on lines +85 to +87
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Implement the transcription method and add documentation.

The transcription method has been added with the correct signature, but it currently throws a "Method not implemented" error. Please implement the method to handle transcription requests. Additionally, consider adding documentation to explain the purpose and expected behavior of this method.

Would you like assistance in implementing the transcription method or creating documentation for it?


async chat(options: ChatOptions) {
const messages = options.messages.map((v) => ({
// "error_code": 336006, "error_msg": "the role of message with even index in the messages must be user or function",
Expand Down
5 changes: 5 additions & 0 deletions app/client/platforms/bytedance.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ import {
LLMModel,
MultimodalContent,
SpeechOptions,
TranscriptionOptions,
} from "../api";
import Locale from "../../locales";
import {
Expand Down Expand Up @@ -83,6 +84,10 @@ export class DoubaoApi implements LLMApi {
throw new Error("Method not implemented.");
}

transcription(options: TranscriptionOptions): Promise<string> {
throw new Error("Method not implemented.");
}

async chat(options: ChatOptions) {
const messages = options.messages.map((v) => ({
role: v.role,
Expand Down
5 changes: 5 additions & 0 deletions app/client/platforms/google.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ import {
LLMModel,
LLMUsage,
SpeechOptions,
TranscriptionOptions,
} from "../api";
import {
useAccessStore,
Expand Down Expand Up @@ -68,6 +69,10 @@ export class GeminiProApi implements LLMApi {
throw new Error("Method not implemented.");
}

transcription(options: TranscriptionOptions): Promise<string> {
throw new Error("Method not implemented.");
}

async chat(options: ChatOptions): Promise<void> {
const apiClient = this;
let multimodal = false;
Expand Down
5 changes: 5 additions & 0 deletions app/client/platforms/iflytek.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import {
LLMApi,
LLMModel,
SpeechOptions,
TranscriptionOptions,
} from "../api";
import Locale from "../../locales";
import {
Expand Down Expand Up @@ -64,6 +65,10 @@ export class SparkApi implements LLMApi {
throw new Error("Method not implemented.");
}

transcription(options: TranscriptionOptions): Promise<string> {
throw new Error("Method not implemented.");
}
Comment on lines +68 to +70
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Implement the transcription method.

The transcription method has been added with the correct signature, which is good. However, it currently throws a "Method not implemented" error.

To complete this feature:

  1. Implement the transcription logic using the options parameter.
  2. Ensure proper error handling.
  3. Add unit tests for this new method.
  4. Update any relevant documentation.

Would you like assistance in implementing this method or creating a task to track its implementation?


async chat(options: ChatOptions) {
const messages: ChatOptions["messages"] = [];
for (const v of options.messages) {
Expand Down
5 changes: 5 additions & 0 deletions app/client/platforms/moonshot.ts
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import {
LLMApi,
LLMModel,
SpeechOptions,
TranscriptionOptions,
} from "../api";
import { getClientConfig } from "@/app/config/client";
import { getMessageTextContent } from "@/app/utils";
Expand Down Expand Up @@ -63,6 +64,10 @@ export class MoonshotApi implements LLMApi {
throw new Error("Method not implemented.");
}

transcription(options: TranscriptionOptions): Promise<string> {
throw new Error("Method not implemented.");
}

async chat(options: ChatOptions) {
const messages: ChatOptions["messages"] = [];
for (const v of options.messages) {
Expand Down
42 changes: 42 additions & 0 deletions app/client/platforms/openai.ts
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ import {
LLMUsage,
MultimodalContent,
SpeechOptions,
TranscriptionOptions,
} from "../api";
import Locale from "../../locales";
import { getClientConfig } from "@/app/config/client";
Expand Down Expand Up @@ -180,6 +181,47 @@ export class ChatGPTApi implements LLMApi {
}
}

async transcription(options: TranscriptionOptions): Promise<string> {
const formData = new FormData();
formData.append("file", options.file, "audio.wav");
formData.append("model", options.model ?? "whisper-1");
if (options.language) formData.append("language", options.language);
if (options.prompt) formData.append("prompt", options.prompt);
if (options.response_format)
formData.append("response_format", options.response_format);
if (options.temperature)
formData.append("temperature", options.temperature.toString());

console.log("[Request] openai audio transcriptions payload: ", options);

const controller = new AbortController();
options.onController?.(controller);

try {
const path = this.path(OpenaiPath.TranscriptionPath);
const headers = getHeaders(true);
const payload = {
method: "POST",
body: formData,
signal: controller.signal,
headers: headers,
};

// make a fetch request
const requestTimeoutId = setTimeout(
() => controller.abort(),
REQUEST_TIMEOUT_MS,
);
const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
const json = await res.json();
return json.text;
Comment on lines +215 to +218
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add response status check before parsing JSON

In lines 214-217, the code attempts to parse the JSON response without verifying if the request was successful. If the request fails, res.json() might throw an error or the response might not contain the expected text property.

Consider checking res.ok before parsing the response and handle errors appropriately:

const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
+ if (!res.ok) {
+   const errorText = await res.text();
+   console.error(`[Response] Transcription request failed: ${errorText}`);
+   throw new Error(`Transcription request failed with status ${res.status}`);
+ }
const json = await res.json();
return json.text;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
const json = await res.json();
return json.text;
const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
if (!res.ok) {
const errorText = await res.text();
console.error(`[Response] Transcription request failed: ${errorText}`);
throw new Error(`Transcription request failed with status ${res.status}`);
}
const json = await res.json();
return json.text;

} catch (e) {
console.log("[Request] failed to make a audio transcriptions request", e);
throw e;
}
}
Comment on lines +184 to +223
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance error handling and response parsing in the transcription method

The implementation of the transcription method looks good overall, but there are a few areas for improvement:

  1. Error handling could be more specific to provide better debugging information.
  2. The response parsing assumes the presence of a text property without checking.
  3. The headers variable is declared but not used in the fetch call.

I acknowledge the existing comment about these issues. To address them, consider applying the following improvements:

  1. Enhance error handling:
 } catch (e) {
-  console.log("[Request] failed to make a audio transcriptions request", e);
-  throw e;
+  console.error("[Request] failed to make an audio transcriptions request", e);
+  throw new Error(`Transcription request failed: ${e.message}`);
 }
  1. Add response status check and error handling:
 const res = await fetch(path, payload);
 clearTimeout(requestTimeoutId);
+if (!res.ok) {
+  throw new Error(`Transcription request failed with status ${res.status}`);
+}
 const json = await res.json();
-return json.text;
+return json.text ?? '';
  1. Use the headers variable in the fetch call:
 const payload = {
   method: "POST",
   body: formData,
   signal: controller.signal,
-  headers: headers,
+  headers,
 };

These changes will improve the robustness and reliability of the transcription method.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async transcription(options: TranscriptionOptions): Promise<string> {
const formData = new FormData();
formData.append("file", options.file, "audio.wav");
formData.append("model", options.model ?? "whisper-1");
if (options.language) formData.append("language", options.language);
if (options.prompt) formData.append("prompt", options.prompt);
if (options.response_format)
formData.append("response_format", options.response_format);
if (options.temperature)
formData.append("temperature", options.temperature.toString());
console.log("[Request] openai audio transcriptions payload: ", options);
const controller = new AbortController();
options.onController?.(controller);
try {
const path = this.path(OpenaiPath.TranscriptionPath);
const headers = getHeaders(true);
const payload = {
method: "POST",
body: formData,
signal: controller.signal,
headers: headers,
};
// make a fetch request
const requestTimeoutId = setTimeout(
() => controller.abort(),
REQUEST_TIMEOUT_MS,
);
const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
const json = await res.json();
return json.text;
} catch (e) {
console.log("[Request] failed to make a audio transcriptions request", e);
throw e;
}
}
async transcription(options: TranscriptionOptions): Promise<string> {
const formData = new FormData();
formData.append("file", options.file, "audio.wav");
formData.append("model", options.model ?? "whisper-1");
if (options.language) formData.append("language", options.language);
if (options.prompt) formData.append("prompt", options.prompt);
if (options.response_format)
formData.append("response_format", options.response_format);
if (options.temperature)
formData.append("temperature", options.temperature.toString());
console.log("[Request] openai audio transcriptions payload: ", options);
const controller = new AbortController();
options.onController?.(controller);
try {
const path = this.path(OpenaiPath.TranscriptionPath);
const headers = getHeaders(true);
const payload = {
method: "POST",
body: formData,
signal: controller.signal,
headers,
};
// make a fetch request
const requestTimeoutId = setTimeout(
() => controller.abort(),
REQUEST_TIMEOUT_MS,
);
const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
if (!res.ok) {
throw new Error(`Transcription request failed with status ${res.status}`);
}
const json = await res.json();
return json.text ?? '';
} catch (e) {
console.error("[Request] failed to make an audio transcriptions request", e);
throw new Error(`Transcription request failed: ${e.message}`);
}
}


async chat(options: ChatOptions) {
const modelConfig = {
...useAppConfig.getState().modelConfig,
Expand Down
5 changes: 5 additions & 0 deletions app/client/platforms/tencent.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import {
LLMModel,
MultimodalContent,
SpeechOptions,
TranscriptionOptions,
} from "../api";
import Locale from "../../locales";
import {
Expand Down Expand Up @@ -93,6 +94,10 @@ export class HunyuanApi implements LLMApi {
throw new Error("Method not implemented.");
}

transcription(options: TranscriptionOptions): Promise<string> {
throw new Error("Method not implemented.");
}
Comment on lines 96 to +99
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider implementing both speech and transcription methods

Both the speech and transcription methods are currently unimplemented. To fully realize the speech-to-text and text-to-speech functionality mentioned in the PR summary, both methods should be implemented.

Would you like assistance in drafting implementations for both methods? This would ensure the full functionality of the new feature.


async chat(options: ChatOptions) {
const visionModel = isVisionModel(options.config.model);
const messages = options.messages.map((v, index) => ({
Expand Down
8 changes: 8 additions & 0 deletions app/components/chat.module.scss
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,14 @@
pointer-events: none;
}

&.listening {
width: var(--full-width);
.text {
opacity: 1;
transform: translate(0);
}
}

&:hover {
--delay: 0.5s;
width: var(--full-width);
Expand Down
79 changes: 77 additions & 2 deletions app/components/chat.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ import React, {
} from "react";

import SendWhiteIcon from "../icons/send-white.svg";
import VoiceOpenIcon from "../icons/vioce-open.svg";
import VoiceCloseIcon from "../icons/vioce-close.svg";
Comment on lines +13 to +14
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix typo in icon file paths

There's a typo in the file paths for the newly imported voice icons. "vioce" should be "voice".

Please apply the following changes:

-import VoiceOpenIcon from "../icons/vioce-open.svg";
-import VoiceCloseIcon from "../icons/vioce-close.svg";
+import VoiceOpenIcon from "../icons/voice-open.svg";
+import VoiceCloseIcon from "../icons/voice-close.svg";
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import VoiceOpenIcon from "../icons/vioce-open.svg";
import VoiceCloseIcon from "../icons/vioce-close.svg";
import VoiceOpenIcon from "../icons/voice-open.svg";
import VoiceCloseIcon from "../icons/voice-close.svg";

import BrainIcon from "../icons/brain.svg";
import RenameIcon from "../icons/rename.svg";
import ExportIcon from "../icons/share.svg";
Expand Down Expand Up @@ -72,6 +74,7 @@ import {
isDalle3,
showPlugins,
safeLocalStorage,
isFirefox,
} from "../utils";

import { uploadImage as uploadImageRemote } from "@/app/utils/chat";
Expand All @@ -98,7 +101,9 @@ import {
import { useNavigate } from "react-router-dom";
import {
CHAT_PAGE_SIZE,
DEFAULT_STT_ENGINE,
DEFAULT_TTS_ENGINE,
FIREFOX_DEFAULT_STT_ENGINE,
ModelProvider,
Path,
REQUEST_TIMEOUT_MS,
Expand All @@ -117,6 +122,7 @@ import { MultimodalContent } from "../client/api";

import { ClientApi } from "../client/api";
import { createTTSPlayer } from "../utils/audio";
import { OpenAITranscriptionApi, WebTranscriptionApi } from "../utils/speech";
import { MsEdgeTTS, OUTPUT_FORMAT } from "../utils/ms_edge_tts";

import { isEmpty } from "lodash-es";
Expand Down Expand Up @@ -367,6 +373,7 @@ export function ChatAction(props: {
text: string;
icon: JSX.Element;
onClick: () => void;
isListening?: boolean;
}) {
const iconRef = useRef<HTMLDivElement>(null);
const textRef = useRef<HTMLDivElement>(null);
Expand All @@ -388,7 +395,9 @@ export function ChatAction(props: {

return (
<div
className={`${styles["chat-input-action"]} clickable`}
className={`${styles["chat-input-action"]} clickable ${
props.isListening ? styles["listening"] : ""
}`}
onClick={() => {
props.onClick();
setTimeout(updateWidth, 1);
Expand Down Expand Up @@ -549,6 +558,61 @@ export function ChatActions(props: {
}
}, [chatStore, currentModel, models]);

const [isListening, setIsListening] = useState(false);
const [isTranscription, setIsTranscription] = useState(false);
const [speechApi, setSpeechApi] = useState<any>(null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider using a more specific type for speechApi state

The speechApi state is initialized with any type. Consider using a more specific type to improve type safety.

- const [speechApi, setSpeechApi] = useState<any>(null);
+ const [speechApi, setSpeechApi] = useState<WebTranscriptionApi | OpenAITranscriptionApi | null>(null);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const [speechApi, setSpeechApi] = useState<any>(null);
const [speechApi, setSpeechApi] = useState<WebTranscriptionApi | OpenAITranscriptionApi | null>(null);


useEffect(() => {
if (isFirefox()) config.sttConfig.engine = FIREFOX_DEFAULT_STT_ENGINE;
const lang = config.sttConfig.lang;
setSpeechApi(
config.sttConfig.engine !== DEFAULT_STT_ENGINE
? new WebTranscriptionApi(
(transcription) => onRecognitionEnd(transcription),
lang,
)
: new OpenAITranscriptionApi((transcription) =>
onRecognitionEnd(transcription),
),
);
}, []);
Comment on lines +565 to +578
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Consider adding dependencies to useEffect

The useEffect hook initializing the speech API has an empty dependency array. This means it will only run once when the component mounts. If config.sttConfig.engine or config.sttConfig.lang changes, the effect won't re-run.

Consider adding the necessary dependencies:

- useEffect(() => {
+ useEffect(() => {
    if (isFirefox()) config.sttConfig.engine = FIREFOX_DEFAULT_STT_ENGINE;
    const lang = config.sttConfig.lang;
    setSpeechApi(
      config.sttConfig.engine !== DEFAULT_STT_ENGINE
        ? new WebTranscriptionApi(
            (transcription) => onRecognitionEnd(transcription),
            lang,
          )
        : new OpenAITranscriptionApi((transcription) =>
            onRecognitionEnd(transcription),
          ),
    );
- }, []);
+ }, [config.sttConfig.engine, config.sttConfig.lang]);

Also, consider memoizing the onRecognitionEnd function with useCallback to prevent unnecessary re-creations of the speech API.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
useEffect(() => {
if (isFirefox()) config.sttConfig.engine = FIREFOX_DEFAULT_STT_ENGINE;
const lang = config.sttConfig.lang;
setSpeechApi(
config.sttConfig.engine !== DEFAULT_STT_ENGINE
? new WebTranscriptionApi(
(transcription) => onRecognitionEnd(transcription),
lang,
)
: new OpenAITranscriptionApi((transcription) =>
onRecognitionEnd(transcription),
),
);
}, []);
useEffect(() => {
if (isFirefox()) config.sttConfig.engine = FIREFOX_DEFAULT_STT_ENGINE;
const lang = config.sttConfig.lang;
setSpeechApi(
config.sttConfig.engine !== DEFAULT_STT_ENGINE
? new WebTranscriptionApi(
(transcription) => onRecognitionEnd(transcription),
lang,
)
: new OpenAITranscriptionApi((transcription) =>
onRecognitionEnd(transcription),
),
);
}, [config.sttConfig.engine, config.sttConfig.lang]);


function playSound(fileName: string) {
const audio = new Audio(fileName);
audio.play().catch((error) => {
console.error("error:", error);
});
}
Comment on lines +580 to +585
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve error handling in playSound function

The playSound function catches errors but only logs them to the console. Consider adding user-facing error handling:

 function playSound(fileName: string) {
   const audio = new Audio(fileName);
   audio.play().catch((error) => {
     console.error("error:", error);
+    showToast(Locale.Chat.AudioPlayError);
   });
 }

Also, consider adding a check to ensure the audio file exists before attempting to play it.

Committable suggestion was skipped due to low confidence.


const startListening = async () => {
playSound("/Recordingstart.mp3");
showToast(Locale.Chat.StartSpeak);
if (speechApi) {
await speechApi.start();
setIsListening(true);
document.getElementById("chat-input")?.focus();
}
};
const stopListening = async () => {
showToast(Locale.Chat.CloseSpeak);
if (speechApi) {
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(true);
await speechApi.stop();
setIsListening(false);
}
playSound("/Recordingdone.mp3");
document.getElementById("chat-input")?.focus();
};
Comment on lines +587 to +606
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve focus management in startListening and stopListening

Both startListening and stopListening functions set focus to the chat input. This might not be ideal for all users, especially those using screen readers. Consider making this behavior configurable:

+ const [autoFocusAfterListening, setAutoFocusAfterListening] = useState(true);

 const startListening = async () => {
   // ... existing code ...
-  document.getElementById("chat-input")?.focus();
+  if (autoFocusAfterListening) {
+    document.getElementById("chat-input")?.focus();
+  }
 };

 const stopListening = async () => {
   // ... existing code ...
-  document.getElementById("chat-input")?.focus();
+  if (autoFocusAfterListening) {
+    document.getElementById("chat-input")?.focus();
+  }
 };

Add a setting in the user preferences to control this behavior.

Committable suggestion was skipped due to low confidence.

const onRecognitionEnd = (finalTranscript: string) => {
console.log(finalTranscript);
if (finalTranscript) {
props.setUserInput((prevInput) => prevInput + finalTranscript);
}
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(false);
};
Comment on lines +607 to +614
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove console.log in production code

There's a console.log statement in the onRecognitionEnd function. Consider removing it or replacing it with a more appropriate logging mechanism for production code.

 const onRecognitionEnd = (finalTranscript: string) => {
-  console.log(finalTranscript);
   if (finalTranscript) {
     props.setUserInput((prevInput) => prevInput + finalTranscript);
   }
   if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
     setIsTranscription(false);
 };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const onRecognitionEnd = (finalTranscript: string) => {
console.log(finalTranscript);
if (finalTranscript) {
props.setUserInput((prevInput) => prevInput + finalTranscript);
}
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(false);
};
const onRecognitionEnd = (finalTranscript: string) => {
if (finalTranscript) {
props.setUserInput((prevInput) => prevInput + finalTranscript);
}
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(false);
};


return (
<div className={styles["chat-input-actions"]}>
{couldStop && (
Expand Down Expand Up @@ -783,6 +847,17 @@ export function ChatActions(props: {
icon={<ShortcutkeyIcon />}
/>
)}

{config.sttConfig.enable && (
<ChatAction
onClick={async () =>
isListening ? await stopListening() : await startListening()
}
text={isListening ? Locale.Chat.StopSpeak : Locale.Chat.StartSpeak}
icon={isListening ? <VoiceOpenIcon /> : <VoiceCloseIcon />}
isListening={isListening}
/>
)}
</div>
);
}
Expand Down Expand Up @@ -1508,7 +1583,7 @@ function _Chat() {
setAttachImages(images);
}

// 快捷键 shortcut keys
// 快捷键
const [showShortcutKeyModal, setShowShortcutKeyModal] = useState(false);

useEffect(() => {
Expand Down
Loading