修复配置图片转述供应商后自动调用图片转述的Bug #3295

railgun19457 · 2025-11-03T16:31:06Z

fixes #3198

Motivation / 动机

在配置了图片转述供应商的情况下，即使当前使用的对话供应商支持图像能力，也会调用图片转述来处理图片
但是正常来说，图片转述是为了不支持图像模态的模型准备的
本次修改，在收到图片消息时会先判断当前对话供应商是否支持图像，只有不支持图像时才会调用图片转述

Modifications / 改动点

packages/astrbot/process_llm_request.py

调用图片转述之前添加了当前供应商能力的判断

Verification Steps / 验证步骤

将对话供应商设为 支持/不支持 图像
会根据供应商情况选择是否使用图像转述

Screenshots or Test Results / 运行截图或测试结果

当前供应商支持图像能力的情况

[00:14:16] [Core] [INFO] [core.event_bus:54]: [default] [webchat(webchat)] misaka/misaka: 她是谁 [图片]  
 [00:14:16] [Core] [DBUG] [waking_check.stage:128]: enabled_plugins_name: ['*']
 [00:14:16] [Core] [DBUG] [method.star_request:45]: plugin -> session_controller - handle_session_control_agent 
 [00:14:16] [Core] [DBUG] [method.star_request:45]: plugin -> astrbot_plugin_shutup - handle_message
 [00:14:16] [Core] [DBUG] [method.star_request:45]: plugin -> session_controller - handle_empty_mention
 [00:14:16] [Core] [DBUG] [method.star_request:45]: plugin -> astrbot_plugin_persona_plus - on_quick_switch_command
 [00:14:16] [Core] [DBUG] [method.star_request:45]: plugin -> astrbot_plugin_persona_plus - on_message
 [00:14:16] [Core] [DBUG] [process_stage.utils:59]: [知识库] 使用全局配置，知识库数量: 0
 [00:14:16] [Core] [DBUG] [pipeline.context_utils:96]: hook(OnLLMRequestEvent) -> astrbot - decorate_llm_req
 [00:14:16] [Core] [DBUG] [astrbot.process_llm_request:63]: Tool set for persona default: ['reminder', 'web_search_tavily', 'tavily_extract_web_page']
 [00:14:16] [Core] [DBUG] [astrbot.process_llm_request:165]: [IMG Caption] 当前 provider: <astrbot.core.provider.sources.openai_source.ProviderOpenAIOfficial object at 0x0000022E3E7A20B0>
 [00:14:16] [Core] [DBUG] [astrbot.process_llm_request:170]: [IMG Caption] Provider ID: NewAPI
 [00:14:16] [Core] [DBUG] [astrbot.process_llm_request:175]: [IMG Caption] 配置中的 provider 数量: 2
 [00:14:16] [Core] [DBUG] [astrbot.process_llm_request:178]: [IMG Caption] 所有 provider IDs: ['NewAPI', 'NewApi-VL'] 
 [00:14:16] [Core] [DBUG] [astrbot.process_llm_request:185]: [IMG Caption] 找到 provider 配置，modalities: ['text', 'tool_use', 'image']
 [00:14:16] [Core] [DBUG] [astrbot.process_llm_request:190]: [IMG Caption] Provider 支持图像能力，跳过图像转述
 [00:14:16] [Core] [DBUG] [astrbot.process_llm_request:207]: [IMG Caption] 当前 provider 支持图像，直接传递图片 URL
 [00:14:16] [Core] [DBUG] [pipeline.context_utils:96]: hook(OnLLMRequestEvent) -> astrbot-web-searcher - edit_web_search_tools
 [00:14:16] [Core] [DBUG] [method.llm_request:544]: handle provider[id: NewAPI] request: ProviderRequest(prompt=她是谁, session_id=webchat:FriendMessage:webchat!misaka!146cdcee-6e0b-4cda-88ea-36c675747670, image_count=1, func_tool=ToolSet(tools=[FuncTool(name=reminder, parameters={'type': 'object', 'properties': {'text': {'type': 'string', 'description': 'Must Required. The content of the reminder.'}, 'datetime_str': {'type': 'string', 'description': "Required when user's reminder is a single reminder. The datetime string of the reminder, Must format with %Y-%m-%d %H:%M"}, 'cron_expression': {'type': 'string', 'description': "Required when user's reminder is a repeated reminder. The cron expression of the reminder. Monday is 0 and Sunday is 6."}, 'human_readable_cron': {'type': 'string', 'description': 'Optional. The human readable cron expression of the reminder.'}}}, description=Call this function when user is asking for setting a reminder.)]), contexts=['user: 你好', 'assistant: 您好！有什么我可以帮助您的吗？'], system_prompt=
Current datetime: 2025-11-04 00:14 (中国标准时间)
, conversation_id=c87fc880-ba27-4266-b518-b9ab47eb1af9,
 [00:14:16] [Core] [DBUG] [runners.tool_loop_agent_runner:61]: Agent state transition: AgentState.IDLE -> AgentState.RUNNING
 [00:14:25] [Core] [DBUG] [sources.openai_source:148]: completion: ChatCompletion(id='chatcmpl-202511040014173765128pfKcyHH4', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='根据您提供的图片，这位动漫角色是**御坂美琴（Misaka Mikoto）**。\n\n她 是《魔法禁书目录》系列及其外传《某科学的超电磁炮》中的主要角色之一，以其“超电磁炮”的能力而闻名。她通常被描绘成棕色短发，头戴两个小花发饰，并且能够操 纵电磁力。', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None, reasoning_content='**The Deduction**\n\nOkay, so I\'ve got this image of an anime girl, brown hair, white shirt, and those distinctive white flowers in her hair. And she\'s clearly manipulating electricity, likely projecting it from a coin. Hmm, this is practically a textbook case. I\'m seeing Misaka Mikoto, no doubt about it. The electromagnetism, the railgun signature move...it\'s all there. Definitely from "A Certain Magical Index" and "A Certain Scientific Railgun". Those flowers in her hair are the giveaway. Easy.\n'))], created=1762186465, model='gemini-2.5-flash', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=228, prompt_tokens=483, total_tokens=711, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=None, audio_tokens=0, reasoning_tokens=140, rejected_prediction_tokens=None, text_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0, text_tokens=225, image_tokens=0), input_tokens=0, output_tokens=0, input_tokens_details=None))
 [00:14:25] [Core] [DBUG] [runners.tool_loop_agent_runner:61]: Agent state transition: AgentState.RUNNING -> AgentState.DONE 
 [00:14:25] [Core] [DBUG] [pipeline.context_utils:96]: hook(OnLLMResponseEvent) -> thinking_filter - resp
 [00:14:25] [Core] [INFO] [respond.stage:163]: Prepare to send - misaka/misaka: 根据您提供的图片，这位动漫角色是**御坂美琴（Misaka Mikoto）**。      

她是《魔法禁书目录》系列及其外传《某科学的超电磁炮》中的主要角色之一，以其“超电磁炮”的能力而闻名。她通常被描绘成棕色短发，头戴两个小花发饰，并且能够 操纵电磁力。  
 [00:14:25] [Core] [DBUG] [pipeline.context_utils:96]: hook(OnAfterMessageSentEvent) -> astrbot - after_llm_req
 [00:14:25] [Core] [DBUG] [pipeline.scheduler:84]: pipeline 执行完毕。

当前供应商不支持图像能力的情况

 [00:15:34] [Core] [INFO] [core.event_bus:54]: [default] [webchat(webchat)] misaka/misaka: 她是谁 [图片]  
 [00:15:34] [Core] [DBUG] [waking_check.stage:128]: enabled_plugins_name: ['*']
 [00:15:34] [Core] [DBUG] [method.star_request:45]: plugin -> session_controller - handle_session_control_agent 
 [00:15:34] [Core] [DBUG] [method.star_request:45]: plugin -> astrbot_plugin_shutup - handle_message
 [00:15:34] [Core] [DBUG] [method.star_request:45]: plugin -> session_controller - handle_empty_mention
 [00:15:34] [Core] [DBUG] [method.star_request:45]: plugin -> astrbot_plugin_persona_plus - on_quick_switch_command
 [00:15:34] [Core] [DBUG] [method.star_request:45]: plugin -> astrbot_plugin_persona_plus - on_message
 [00:15:34] [Core] [DBUG] [process_stage.utils:59]: [知识库] 使用全局配置，知识库数量: 0
 [00:15:34] [Core] [DBUG] [pipeline.context_utils:96]: hook(OnLLMRequestEvent) -> astrbot - decorate_llm_req
 [00:15:34] [Core] [DBUG] [astrbot.process_llm_request:63]: Tool set for persona default: ['reminder', 'web_search_tavily', 'tavily_extract_web_page']
 [00:15:34] [Core] [DBUG] [astrbot.process_llm_request:165]: [IMG Caption] 当前 provider: <astrbot.core.provider.sources.openai_source.ProviderOpenAIOfficial object at 0x0000022E3E7A20B0>
 [00:15:34] [Core] [DBUG] [astrbot.process_llm_request:170]: [IMG Caption] Provider ID: NewAPI
 [00:15:34] [Core] [DBUG] [astrbot.process_llm_request:175]: [IMG Caption] 配置中的 provider 数量: 2
 [00:15:34] [Core] [DBUG] [astrbot.process_llm_request:178]: [IMG Caption] 所有 provider IDs: ['NewAPI', 'NewApi-VL']
 [00:15:34] [Core] [DBUG] [astrbot.process_llm_request:185]: [IMG Caption] 找到 provider 配置，modalities: ['text', 'tool_use']
 [00:15:34] [Core] [DBUG] [astrbot.process_llm_request:204]: [IMG Caption] 当前 provider 不支持图像，启用图像转述
 [00:15:34] [Core] [DBUG] [astrbot.process_llm_request:95]: Processing image caption with provider: NewApi-VL
 [00:16:00] [Core] [DBUG] [sources.openai_source:148]: completion: ChatCompletion(id='chatcmpl-20251104001535631465453rpCSXS4p', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='这是一张动漫风格的插画，描绘了一位年轻的女性角色，特写了她的上半身。\n\n画面中的女孩留着棕色的齐肩短发，左侧头发上别着两朵白色的小花。她的眼睛是棕色的，眼神锐利而坚定，眉毛紧蹙，嘴巴微微张开，似乎在喊叫或屏息凝神，表 情显得非常专注和充满力量，甚至带有一丝愤怒。\n\n她穿着一件白色的短袖衬衫，领口处隐约可见一件深色的背心或毛衣。她的右臂向前伸出，手掌朝向画面前方，食 指和拇指之间夹着一枚银色的硬币。硬币周围和她的手臂上环绕着明亮的蓝色电弧或闪电，暗示着她正在积蓄或释放强大的电能。\n\n背景是深色的，有蓝色的闪电图案 和光效，进一步烘托出紧张和充满能量的氛围。整个画面充满了动感和冲击力，角色似乎正准备将硬币以极快的速度发射出去。', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None, reasoning_content='**Internal Monologue: Constructing the Chinese Description**\n\nOkay, so the task is to translate this image into Chinese, capturing all the nuances. First, I need to break it down. *Initial Scan:* Anime style, female, determined expression, action pose, electricity, dark background. Got it.\n\n*Detailed Breakdown - Character:* Young female, brown hair, flowers, intense eyes. She\'s shouting or gritting her teeth; focused, maybe a little angry. White shirt, possibly a school uniform, classic anime look. Right arm extended, holding something... that\'s the key.\n\n*Detailed Breakdown - Action/Object:* The object is a coin, held between thumb and index finger. She\'s about to flick it or launch it. And there\'s electricity! Blue arcs, lightning bolts. This is definitely hinting at a character who controls electricity - "Railgun", if I\'m familiar with the source.\n\n*Detailed Breakdown - Background/Atmosphere:* Dark background with blue lightning. Intense, dynamic, powerful. This sets the tone.\n\n*Synthesis:* Let\'s put this into a rough structure: Anime girl, brown hair, flowers. Angry/determined, open mouth. Reaching out, holding coin, electricity. Dark background, lightning. Like she\'s launching the coin.\n\n*Refine and Elaborate:* Now, let\'s make it flow in Chinese. Start with the style and subject, then build the description. *"This is an anime-style illustration of a young female character."* Describe her appearance: *"...brown, shoulder-length hair, with white flowers on the left."* *"...sharp, determined eyes."* Focus on the expression, the key element: *"...furrowed eyebrows, very focused and powerful, even with a hint of anger."* Clothing: "...white short-sleeved shirt..."\n\nNow, the action and the key element! *"...silver coin held between her index finger and thumb."* That\'s the visual trigger. And the electricity! *"...bright blue electric arcs... suggesting she\'s releasing powerful electrical energy."* The background builds the atmosphere: *"...dark...blue lightning patterns...enhancing the tense and energetic atmosphere."*\n\nFinally, the overall impression: *"...dynamism and impact...ready to launch the coin..."*\n\n*Final Review:* Read it again. Does it sound good? Accurate? No redundancies? Yep, it should work. This translation should capture the essence and the dynamic nature of the image.\n'))], created=1762186560, model='gemini-2.5-flash', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=1519, prompt_tokens=264, total_tokens=1783, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=None, audio_tokens=0, reasoning_tokens=1287, rejected_prediction_tokens=None, text_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0, text_tokens=6, image_tokens=0), input_tokens=0, output_tokens=0, input_tokens_details=None))
 [00:16:00] [Core] [DBUG] [pipeline.context_utils:96]: hook(OnLLMRequestEvent) -> astrbot-web-searcher - edit_web_search_tools 
 [00:16:00] [Core] [DBUG] [method.llm_request:544]: handle provider[id: NewAPI] request: ProviderRequest(prompt=(Image Caption: 这是一张动漫风格的插 画，描绘了一位年轻的女性角色，特写了她的上半身。

画面中的女孩留着棕色的齐肩短发，左侧头发上别着两朵白色的小花。她的眼睛是棕色的，眼神锐利而坚定，眉毛紧蹙，嘴巴微微张开，似乎在喊叫或屏息凝神，表情显 得非常专注和充满力量，甚至带有一丝愤怒。

她穿着一件白色的短袖衬衫，领口处隐约可见一件深色的背心或毛衣。她的右臂向前伸出，手掌朝向画面前方，食指和拇指之间夹着一枚银色的硬币。硬币周围和她的手 臂上环绕着明亮的蓝色电弧或闪电，暗示着她正在积蓄或释放强大的电能。

背景是深色的，有蓝色的闪电图案和光效，进一步烘托出紧张和充满能量的氛围。整个画面充满了动感和冲击力，角色似乎正准备将硬币以极快的速度发射出去。)      

她是谁, session_id=webchat:FriendMessage:webchat!misaka!79ae8263-ce8c-44d0-afeb-53d1d968ad0a, image_count=0, func_tool=ToolSet(tools=[FuncTool(name=reminder, parameters={'type': 'object', 'properties': {'text': {'type': 'string', 'description': 'Must Required. The content of the reminder.'}, 'datetime_str': {'type': 'string', 'description': "Required when user's reminder is a single reminder. The datetime string of the reminder, Must format with %Y-%m-%d %H:%M"}, 'cron_expression': {'type': 'string', 'description': "Required when user's reminder is a repeated reminder. The cron expression of the reminder. Monday is 0 and Sunday is 6."}, 'human_readable_cron': {'type': 'string', 'description': 'Optional. The human readable cron expression of the reminder.'}}}, description=Call this function when user is asking for setting a reminder.)]), contexts=['user: 你好', 'assistant: 您好！有 什么我可以帮助您的吗？'], system_prompt=
Current datetime: 2025-11-04 00:15 (中国标准时间)
, conversation_id=30c897c9-660c-4425-a0a3-7e629be8507b,
 [00:16:00] [Core] [DBUG] [runners.tool_loop_agent_runner:61]: Agent state transition: AgentState.IDLE -> AgentState.RUNNING
 [00:16:03] [Core] [DBUG] [sources.openai_source:148]: completion: ChatCompletion(id='chatcmpl-202511040016012517943743IjN3qVw', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='抱歉，我无法通过图像识别来判断她是谁。我是一个语言模型，无法处理图片信息。', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None))], created=1762186563, model='gemini-2.5-flash', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=23, prompt_tokens=462, total_tokens=485, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=None, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0, text_tokens=462, image_tokens=0), input_tokens=0, output_tokens=0, input_tokens_details=None))
 [00:16:03] [Core] [DBUG] [runners.tool_loop_agent_runner:61]: Agent state transition: AgentState.RUNNING -> AgentState.DONE 
 [00:16:03] [Core] [DBUG] [pipeline.context_utils:96]: hook(OnLLMResponseEvent) -> thinking_filter - resp
 [00:16:03] [Core] [INFO] [respond.stage:163]: Prepare to send - misaka/misaka: 抱歉，我无法通过图像识别来判断她是谁。我是一个语言模型，无法处理图片 信息。  
 [00:16:03] [Core] [DBUG] [pipeline.context_utils:96]: hook(OnAfterMessageSentEvent) -> astrbot - after_llm_req
 [00:16:03] [Core] [DBUG] [pipeline.scheduler:84]: pipeline 执行完毕。
 [00:16:03] [Core] [DBUG] [method.llm_request:648]: WebChat 对话标题生成请求，清理后的文本: User: (Image Caption: 这是一张动漫风格的插画，描绘了一位 年轻的女性角色，特写了她的上半身。

画面中的女孩留着棕色的齐肩短发，左侧头发上别着两朵白色的小花。她的眼睛是棕色的，眼神锐利而坚定，眉毛紧蹙，嘴巴微微张开，似乎在喊叫或屏息凝神，表情显 得非常专注和充满力量，甚至带有一丝愤怒。

她穿着一件白色的短袖衬衫，领口处隐约可见一件深色的背心或毛衣。她的右臂向前伸出，手掌朝向画面前方，食指和拇指之间夹着一枚银色的硬币。硬币周围和她的手 臂上环绕着明亮的蓝色电弧或闪电，暗示着她正在积蓄或释放强大的电能。

背景是深色的，有蓝色的闪电图案和光效，进一步烘托出紧张和充满能量的氛围。整个画面充满了动感和冲击力，角色似乎正准备将硬币以极快的速度发射出去。)      

她是谁
[2025-11-04 00:16:03 +0800] [80268] [INFO] 127.0.0.1:63448 POST /api/chat/send 1.1 200 - 29049271
[2025-11-04 00:16:03 +0800] [80268] [INFO] 127.0.0.1:63448 GET /api/chat/conversations 1.1 200 564 5325
 [00:16:12] [Core] [DBUG] [sources.openai_source:148]: completion: ChatCompletion(id='chatcmpl-20251104001603984526579tThopuo1', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='询问描述中动漫角色的身份', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None, reasoning_content='**Identifying Anime Character from Description**\n\nMy thought process is focused on summarizing the user\'s intent. The user presented an image description and then asked, "她是谁?" (Who is she?). My task is to capture the essence of this question within a ten-word Chinese summary. Therefore, the core query is asking for character *identification* based on a visual description. After considering different phrasings, I\'ve arrived at "询问描述中动漫角色的身份" (Asking for the identity of the anime character in the description). This directly addresses the user\'s need, fits the word count, and accurately reflects the context.\n'))], created=1762186572, model='gemini-2.5-flash', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=645, prompt_tokens=310, total_tokens=955, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=None, audio_tokens=0, reasoning_tokens=637, rejected_prediction_tokens=None, text_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0, text_tokens=310, image_tokens=0), input_tokens=0, output_tokens=0, input_tokens_details=None))
 [00:16:12] [Core] [DBUG] [method.llm_request:660]: WebChat 对话标题生成响应: 询问描述中动漫角色的身份 
[2025-11-04 00:16:46 +0800] [80268] [INFO] 127.0.0.1:49668 POST /api/chat/post_image 1.1 200 94 9915
[2025-11-04 00:16:55 +0800] [80268] [INFO] 127.0.0.1:52283 GET /api/chat/get_file 1.1 200 481877 10022
 [00:16:55] [Core] [DBUG] [webchat.webchat_adapter:136]: WebChatAdapter: [Plain(type=<ComponentType.Plain: 'Plain'>, text='描述一下这张图片', convert=True), Image(type=<ComponentType.Image: 'Image'>, file='file:///E:\\Code\\Python\\AstrBot\\data\\webchat\\imgs\\eb621021-a698-4848-8e04-6ba50255a61f.jpg', subType=0, url='', cache=True, id=40000, c=2, path='E:\\Code\\Python\\AstrBot\\data\\webchat\\imgs\\eb621021-a698-4848-8e04-6ba50255a61f.jpg', file_unique='')]
 [00:16:55] [Core] [INFO] [core.event_bus:54]: [default] [webchat(webchat)] misaka/misaka: 描述一下这张图片 [图片]  
 [00:16:55] [Core] [DBUG] [waking_check.stage:128]: enabled_plugins_name: ['*']
 [00:16:55] [Core] [DBUG] [method.star_request:45]: plugin -> session_controller - handle_session_control_agent 
 [00:16:55] [Core] [DBUG] [method.star_request:45]: plugin -> astrbot_plugin_shutup - handle_message
 [00:16:55] [Core] [DBUG] [method.star_request:45]: plugin -> session_controller - handle_empty_mention
 [00:16:55] [Core] [DBUG] [method.star_request:45]: plugin -> astrbot_plugin_persona_plus - on_quick_switch_command
 [00:16:55] [Core] [DBUG] [method.star_request:45]: plugin -> astrbot_plugin_persona_plus - on_message
 [00:16:55] [Core] [DBUG] [process_stage.utils:59]: [知识库] 使用全局配置，知识库数量: 0
 [00:16:55] [Core] [DBUG] [pipeline.context_utils:96]: hook(OnLLMRequestEvent) -> astrbot - decorate_llm_req
 [00:16:55] [Core] [DBUG] [astrbot.process_llm_request:63]: Tool set for persona default: ['reminder', 'web_search_tavily', 'tavily_extract_web_page']
 [00:16:55] [Core] [DBUG] [astrbot.process_llm_request:165]: [IMG Caption] 当前 provider: <astrbot.core.provider.sources.openai_source.ProviderOpenAIOfficial object at 0x0000022E3E7A20B0>
 [00:16:55] [Core] [DBUG] [astrbot.process_llm_request:170]: [IMG Caption] Provider ID: NewAPI
 [00:16:55] [Core] [DBUG] [astrbot.process_llm_request:175]: [IMG Caption] 配置中的 provider 数量: 2
 [00:16:55] [Core] [DBUG] [astrbot.process_llm_request:178]: [IMG Caption] 所有 provider IDs: ['NewAPI', 'NewApi-VL']
 [00:16:55] [Core] [DBUG] [astrbot.process_llm_request:185]: [IMG Caption] 找到 provider 配置，modalities: ['text', 'tool_use']
 [00:16:55] [Core] [DBUG] [astrbot.process_llm_request:204]: [IMG Caption] 当前 provider 不支持图像，启用图像转述
 [00:16:55] [Core] [DBUG] [astrbot.process_llm_request:95]: Processing image caption with provider: NewApi-VL
 [00:17:17] [Core] [DBUG] [sources.openai_source:148]: completion: ChatCompletion(id='chatcmpl-20251104001655823595319QZMgjMVW', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='这张图片是一幅动漫风格的插画，描绘了一位年轻的女孩的特写。\n\n**主要人物：**\n*   **发型与发色：** 女孩留着棕色短发，发型略显蓬松，刘海向一侧梳理。左侧的头发上别着两朵白色的小花。\n*   **面部特征：** 她的眼睛大而有神 ，呈棕色，眼神坚定而锐利，眉毛微蹙，透露出专注、决心或一丝愤怒。嘴巴微张，仿佛在呐喊、用力或准备说些什么。\n*   **衣着：** 她穿着一件白色的短袖衬衫，领口敞开，里面似乎还有一件深色的马甲或制服的一部分。\n*   **动作与姿态：** 女孩的身体略微前倾，右臂向前伸出，指向画面前方（观众方向）。她的右手握着一枚硬币，硬币和她的手臂周围环绕着蓝色的电流或电弧，这些电光向前方延伸，充满动感。\n\n**背景与氛围：**\n*   背景是深色调，呈现出深蓝色或黑色的渐变。\n*   背景中散布着一些蓝色的闪电状光芒或裂纹，与女孩手中的电流相呼应，增强了画面的能量感和紧张气氛。\n\n**整体印象：**\n整幅画面充满力量感和动态，女孩的表情和动作都展现出她强大的能力和坚定的决心，仿佛正在施展某种电击或投掷硬币的特殊技能。', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None, reasoning_content='**Breaking Down the Anime Image for a Chinese Description**\n\nAlright, let\'s get down to it. The user wants a Chinese description of this image, so I need to translate the visual data into words, ensuring it\'s comprehensive and accurate. First, I need to get a clear picture of the scene.\n\nMy initial impression? Anime style, young female character, looks determined or angry, and something electrical is happening. Dark background. That\'s the baseline. Now, I need to zoom in and dissect the details.\n\nThe character is key. Young female, probably a teenager, based on anime tropes. Short, brown, slightly messy hair. Two white flowers - possibly sakura - are pinned in her hair. Her eyes are large, brown, intense, and focused. Her expression is fierce, mouth open, like she\'s yelling or exerting effort. She’s wearing what looks like a white short-sleeved shirt, possibly part of a school uniform. Her pose is dynamic: leaning forward with her right arm extended towards the viewer. The object in her hand is a coin - maybe a 100-yen coin? Crucially, there are blue electrical arcs or lightning emanating from the coin and her hand, reaching towards us.\n\nThe background\'s important too. Dark blue or black with jagged blue lines, reinforcing the electric theme.\n\nNow, I\'m going to structure the description. Start with the overall style. Next, introduce the main character. Describe her appearance – hair, eyes, expression, and clothing. Then, focus on her action, emphasizing the coin and electrical powers. Finally, mention the background details and the overall mood.\n\nNow, it\'s time to translate. "Anime illustration" becomes *dòngmàn fēnggé de chāhuà*. "Young girl" is *yī wèi niánqīng de nǚhái*. I\'m thinking *tèxiě* for a close-up. I\'ll describe her hair as *zōngsè duǎnfà*, and add in the flowers: *Zuǒcè tóufà shàng biézhe liǎng duǒ báisè xiǎohuā*. For her expression, I have a few options: *Yǎnjīng dà ér yǒu shén, chéng zōngsè, yǎnshén jiāndìng ér ruìlì* or even *yǎnshén línglì*. The open mouth is *Zuǐbā wēi zhāng, fǎngfú zài nàhǎn huò yònglì*. The white shirt is *Chuānzhe báisè duǎnxiù chènshān*. I\'ll describe her pose as *Shēntǐ qiánqīng, yòubì xiàng qián shēnchū*. The coin in her hand is *Shǒuzhōng wòzhe yī méi yìngbì*. The crucial detail, the electric arcs, is *Yìngbì hé tā de shǒubì zhōuwéi huánràozhe lánsè de diànliú huò diànhú*.  The dark background with lightning becomes *Bèijǐng shì shēnsè diào, yǒu lánsè de shǎndiàn zhuàng guāngmáng huò lièwén*. The overall feeling needs to capture the image\'s dynamic energy: *Zhěngtǐ huàmiàn chōngmǎn dònggǎn hé lìliàng gǎn, zhǎnxiàn chū juésè qiángdà de nénglì hé jiāndìng de juéxīn*.\n\nOkay, time to review. Does the Chinese flow? Is everything accurate? I\'ve been careful to be specific; for instance, "electrical arcs" instead of just "lightning" is more precise. Describing the "school uniform style" adds extra context.  The flowers are a vital detail. And, of course, the coin needs to be mentioned. This systematic approach hopefully ensures a complete and descriptive translation.\n'))], created=1762186637, model='gemini-2.5-flash', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=1636, prompt_tokens=264, total_tokens=1900, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=None, audio_tokens=0, reasoning_tokens=1287, rejected_prediction_tokens=None, text_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0, text_tokens=6, image_tokens=0), input_tokens=0, output_tokens=0, input_tokens_details=None))
 [00:17:17] [Core] [DBUG] [pipeline.context_utils:96]: hook(OnLLMRequestEvent) -> astrbot-web-searcher - edit_web_search_tools 
 [00:17:17] [Core] [DBUG] [method.llm_request:544]: handle provider[id: NewAPI] request: ProviderRequest(prompt=(Image Caption: 这张图片是一幅动漫风 格的插画，描绘了一位年轻的女孩的特写。

**主要人物：**
*   **发型与发色：** 女孩留着棕色短发，发型略显蓬松，刘海向一侧梳理。左侧的头发上别着两朵白色的小花。
*   **面部特征：** 她的眼睛大而有神，呈棕色，眼神坚定而锐利，眉毛微蹙，透露出专注、决心或一丝愤怒。嘴巴微张，仿佛在呐喊、用力或准备说些什么。        
*   **衣着：** 她穿着一件白色的短袖衬衫，领口敞开，里面似乎还有一件深色的马甲或制服的一部分。
*   **动作与姿态：** 女孩的身体略微前倾，右臂向前伸出，指向画面前方（观众方向）。她的右手握着一枚硬币，硬币和她的手臂周围环绕着蓝色的电流或电弧，这些电光向前方延伸，充满动感。

**背景与氛围：**
*   背景是深色调，呈现出深蓝色或黑色的渐变。
*   背景中散布着一些蓝色的闪电状光芒或裂纹，与女孩手中的电流相呼应，增强了画面的能量感和紧张气氛。

**整体印象：**
整幅画面充满力量感和动态，女孩的表情和动作都展现出她强大的能力和坚定的决心，仿佛正在施展某种电击或投掷硬币的特殊技能。)

描述一下这张图片, session_id=webchat:FriendMessage:webchat!misaka!79ae8263-ce8c-44d0-afeb-53d1d968ad0a, image_count=0, func_tool=ToolSet(tools=[FuncTool(name=reminder, parameters={'type': 'object', 'properties': {'text': {'type': 'string', 'description': 'Must Required. The content of the reminder.'}, 'datetime_str': {'type': 'string', 'description': "Required when user's reminder is a single reminder. The datetime string of the reminder, Must format with %Y-%m-%d %H:%M"}, 'cron_expression': {'type': 'string', 'description': "Required when user's reminder is a repeated reminder. The cron expression of the reminder. Monday is 0 and Sunday is 6."}, 'human_readable_cron': {'type': 'string', 'description': 'Optional. The human readable cron expression of the reminder.'}}}, description=Call this function when user is asking for setting a reminder.)]), contexts=['user: 你好', 'assistant: 您好！有什么我可以帮助您的吗？', 'user: (Image Caption: 这是一张动漫风格的插画，描绘了一位年轻的女性角色，特写了她的上半身。\n\n画面中的女孩留着棕色的齐肩短发，左侧头发上别着两朵白色的小花。她的眼睛是棕色的，眼神锐利而坚定，眉毛紧蹙，嘴巴微微张开，似乎在喊叫或屏息凝神，表情显得非常专注和充满力量 ，甚至带有一丝愤怒。\n\n她穿着一件白色的短袖衬衫，领口处隐约可见一件深色的背心或毛衣。她的右臂向前伸出，手掌朝向画面前方，食指和拇指之间夹着一枚银色 的硬币。硬币周围和她的手臂上环绕着明亮的蓝色电弧或闪电，暗示着她正在积蓄或释放强大的电能。\n\n背景是深色的，有蓝色的闪电图案和光效，进一步烘托出紧张 和充满能量的氛围。整个画面充满了动感和冲击力，角色似乎正准备将硬币以极快的速度发射出去。)\n\n她是谁', 'assistant: 抱歉，我无法通过图像识别来判断她是 谁。我是一个语言模型，无法处理图片信息。'], system_prompt=
Current datetime: 2025-11-04 00:16 (中国标准时间)
, conversation_id=30c897c9-660c-4425-a0a3-7e629be8507b,
 [00:17:17] [Core] [DBUG] [runners.tool_loop_agent_runner:61]: Agent state transition: AgentState.IDLE -> AgentState.RUNNING
 [00:17:23] [Core] [DBUG] [sources.openai_source:148]: completion: ChatCompletion(id='chatcmpl-20251104001717786019589sDGwNLil', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='这张图片是一幅动漫风格的插画，特写了一位年轻的女性角色。她留着棕色的齐肩短发，左侧头发上别着两朵白色的小花。女孩的眼睛是棕色的，眼神锐利而坚定，眉毛紧蹙，嘴巴微微张开，似乎在喊叫或屏息凝神，表情显得非常专注、充满力量 ，甚至带有一丝愤怒。\n\n她穿着一件白色的短袖衬衫，领口处隐约可见一件深色的背心或马甲。她的右臂向前伸出，手掌朝向画面前方，食指和拇指之间夹着一枚银色 的硬币。硬币周围和她的手臂上环绕着明亮的蓝色电弧或闪电，暗示着她正在积蓄或释放强大的电能，仿佛正在施展某种电击或投掷硬币的特殊技能。\n\n背景是深色的 ，有蓝色的闪电图案和光效，进一步烘托出紧张和充满能量的氛围。整个画面充满了动感和冲击力，角色似乎正准备将硬币以极快的速度发射出去，展现出她强大的能力 和坚定的决心。', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None, reasoning_content='**Here\'s how I\'m approaching this:**\n\nOkay, so I\'ve got a challenge. I\'m given these two detailed descriptions and I need to essentially "re-create" the image in my mind, even though I can\'t *see* it. My brain is going to have to do some serious work here, combining the information. I\'ll need to look for common threads, contradictions, and try to build a cohesive picture from these pieces. It\'s like having two expert witnesses, but needing to sift through their testimonies to get to the truth. Let\'s see...I need to be accurate and comprehensive, but also concise.  It\'s like distilling a complex chemical reaction into a precise equation. I can do this.\n'))], created=1762186643, model='gemini-2.5-flash', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=307, prompt_tokens=845, total_tokens=1152, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=None, audio_tokens=0, reasoning_tokens=56, rejected_prediction_tokens=None, text_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0, text_tokens=845, image_tokens=0), input_tokens=0, output_tokens=0, input_tokens_details=None))
 [00:17:23] [Core] [DBUG] [runners.tool_loop_agent_runner:61]: Agent state transition: AgentState.RUNNING -> AgentState.DONE 
 [00:17:23] [Core] [DBUG] [pipeline.context_utils:96]: hook(OnLLMResponseEvent) -> thinking_filter - resp
 [00:17:23] [Core] [INFO] [respond.stage:163]: Prepare to send - misaka/misaka: 这张图片是一幅动漫风格的插画，特写了一位年轻的女性角色。她留着棕色的 齐肩短发，左侧头发上别着两朵白色的小花。女孩的眼睛是棕色的，眼神锐利而坚定，眉毛紧蹙，嘴巴微微张开，似乎在喊叫或屏息凝神，表情显得非常专注、充满力量 ，甚至带有一丝愤怒。

她穿着一件白色的短袖衬衫，领口处隐约可见一件深色的背心或马甲。她的右臂向前伸出，手掌朝向画面前方，食指和拇指之间夹着一枚银色的硬币。硬币周围和她的手 臂上环绕着明亮的蓝色电弧或闪电，暗示着她正在积蓄或释放强大的电能，仿佛正在施展某种电击或投掷硬币的特殊技能。

背景是深色的，有蓝色的闪电图案和光效，进一步烘托出紧张和充满能量的氛围。整个画面充满了动感和冲击力，角色似乎正准备将硬币以极快的速度发射出去，展现出 她强大的能力和坚定的决心。  
 [00:17:23] [Core] [DBUG] [pipeline.context_utils:96]: hook(OnAfterMessageSentEvent) -> astrbot - after_llm_req
 [00:17:23] [Core] [DBUG] [pipeline.scheduler:84]: pipeline 执行完毕。

同时也可以看出，相同情况下，转述效果是远不如模型的原生识图的

Compatibility & Breaking Changes / 兼容性与破坏性变更

这是一个破坏性变更 (Breaking Change)。/ This is a breaking change.
这不是一个破坏性变更。/ This is NOT a breaking change.

Checklist / 检查清单

😊 如果 PR 中有新加入的功能，已经通过 Issue / 邮件等方式和作者讨论过。/ If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
👀 我的更改经过了良好的测试，并已在上方提供了“验证步骤”和“运行截图”。/ My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
🤓 我确保没有引入新依赖库，或者引入了新依赖库的同时将其添加到了 requirements.txt 和 pyproject.toml 文件相应位置。/ I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
😮 我的更改没有引入恶意代码。/ My changes do not introduce malicious code.

Sourcery 摘要

Bug 修复:

仅当当前提供商不支持图像模态时才调用图像描述生成功能

Original summary in English

Sourcery 总结

在调用图像描述功能之前，检查当前 LLM 提供商的图像支持，并且仅在提供商不具备图像能力时才调用它。

Bug 修复：

仅在当前提供商不支持图像模态时才调用图像描述功能

功能增强：

为当前提供商 ID、已配置模态和图像描述决策添加调试日志

Original summary in English

Summary by Sourcery

Check the current LLM provider's image support before invoking image captioning and only call it when the provider lacks image capability.

Bug Fixes:

Invoke image captioning only when the current provider does not support image modality

Enhancements:

Add debug logging for current provider ID, configured modalities, and image caption decision

sourcery-ai

你好 - 我已审阅了你的更改，它们看起来很棒！

AI 代理的提示

请处理此代码审查中的评论：

## 单独评论

### 评论 1
<location> `packages/astrbot/process_llm_request.py:159` </location>
<code_context>

             # image caption
+            # 只有当前 provider 不支持图像时才使用图像转述
             if img_cap_prov_id and req.image_urls:
-                await self._ensure_img_caption(req, cfg, img_cap_prov_id)
+                current_provider = self.ctx.get_using_provider(
</code_context>

<issue_to_address>
**问题 (复杂性):** 考虑将提供者图像支持检查重构为一个辅助方法，以简化主逻辑流程。

```markdown
考虑将所有这些嵌套、配置查找和日志记录提取到一个小型辅助方法中，例如 `_provider_supports_image()`。这样你的主流程就只是一个 `if`/`await`，并且可以移除大量的样板代码。

示例：

```python
def _provider_supports_image(self, event) -> bool:
    provider = self.ctx.get_using_provider(umo=event.unified_msg_origin)
    if not provider:
        logger.debug("[IMG Caption] no provider → will caption")
        return False

    try:
        pid = provider.meta().id
        cfg = self.ctx.get_config(umo=event.unified_msg_origin)
        for p in cfg.get("provider", []):
            if p.get("id") == pid and "image" in p.get("modalities", []):
                logger.debug(f"[IMG Caption] {pid} supports image → skip caption")
                return True
        logger.debug(f"[IMG Caption] {pid} does not support image")
    except Exception as e:
        logger.warning(f"[IMG Caption] failed to check provider support: {e}")

    return False
```

然后将大块代码替换为：

```python
if img_cap_prov_id and req.image_urls:
    if not self._provider_supports_image(event):
        await self._ensure_img_caption(req, cfg, img_cap_prov_id)
```

这保持了完全相同的逻辑和日志记录，但将所有分支合并到一个集中的辅助方法中。
</issue_to_address>

Sourcery 对开源项目免费 - 如果你喜欢我们的评论，请考虑分享它们 ✨

_{帮助我更有用！请在每条评论上点击 👍 或 👎，我将利用反馈来改进你的评论。}

Original comment in English

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents

Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `packages/astrbot/process_llm_request.py:159` </location>
<code_context>

             # image caption
+            # 只有当前 provider 不支持图像时才使用图像转述
             if img_cap_prov_id and req.image_urls:
-                await self._ensure_img_caption(req, cfg, img_cap_prov_id)
+                current_provider = self.ctx.get_using_provider(
</code_context>

<issue_to_address>
**issue (complexity):** Consider refactoring the provider image support check into a helper method to simplify the main logic flow.

```markdown
Consider pulling all of that nesting, config‐lookup and logging into one small helper, e.g. `_provider_supports_image()`. Then your main flow is just one `if`/`await` and you remove a ton of boilerplate.

Example:

```python
def _provider_supports_image(self, event) -> bool:
    provider = self.ctx.get_using_provider(umo=event.unified_msg_origin)
    if not provider:
        logger.debug("[IMG Caption] no provider → will caption")
        return False

    try:
        pid = provider.meta().id
        cfg = self.ctx.get_config(umo=event.unified_msg_origin)
        for p in cfg.get("provider", []):
            if p.get("id") == pid and "image" in p.get("modalities", []):
                logger.debug(f"[IMG Caption] {pid} supports image → skip caption")
                return True
        logger.debug(f"[IMG Caption] {pid} does not support image")
    except Exception as e:
        logger.warning(f"[IMG Caption] failed to check provider support: {e}")

    return False
```

Then replace the big block with:

```python
if img_cap_prov_id and req.image_urls:
    if not self._provider_supports_image(event):
        await self._ensure_img_caption(req, cfg, img_cap_prov_id)
```

This keeps exactly the same logic and logging but collapses all branching into one focused helper.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-11-03T17:03:28Z

packages/astrbot/process_llm_request.py


            # image caption
+            # 只有当前 provider 不支持图像时才使用图像转述
            if img_cap_prov_id and req.image_urls:


问题 (复杂性): 考虑将提供者图像支持检查重构为一个辅助方法，以简化主逻辑流程。

考虑将所有这些嵌套、配置查找和日志记录提取到一个小型辅助方法中，例如 `_provider_supports_image()`。这样你的主流程就只是一个 `if`/`await`，并且可以移除大量的样板代码。示例： ```python def _provider_supports_image(self, event) -> bool: provider = self.ctx.get_using_provider(umo=event.unified_msg_origin) if not provider: logger.debug("[IMG Caption] no provider → will caption") return False try: pid = provider.meta().id cfg = self.ctx.get_config(umo=event.unified_msg_origin) for p in cfg.get("provider", []): if p.get("id") == pid and "image" in p.get("modalities", []): logger.debug(f"[IMG Caption] {pid} supports image → skip caption") return True logger.debug(f"[IMG Caption] {pid} does not support image") except Exception as e: logger.warning(f"[IMG Caption] failed to check provider support: {e}") return False

然后将大块代码替换为：

if img_cap_prov_id and req.image_urls: if not self._provider_supports_image(event): await self._ensure_img_caption(req, cfg, img_cap_prov_id)

这保持了完全相同的逻辑和日志记录，但将所有分支合并到一个集中的辅助方法中。

Original comment in English

issue (complexity): Consider refactoring the provider image support check into a helper method to simplify the main logic flow.

Consider pulling all of that nesting, config‐lookup and logging into one small helper, e.g. `_provider_supports_image()`. Then your main flow is just one `if`/`await` and you remove a ton of boilerplate. Example: ```python def _provider_supports_image(self, event) -> bool: provider = self.ctx.get_using_provider(umo=event.unified_msg_origin) if not provider: logger.debug("[IMG Caption] no provider → will caption") return False try: pid = provider.meta().id cfg = self.ctx.get_config(umo=event.unified_msg_origin) for p in cfg.get("provider", []): if p.get("id") == pid and "image" in p.get("modalities", []): logger.debug(f"[IMG Caption] {pid} supports image → skip caption") return True logger.debug(f"[IMG Caption] {pid} does not support image") except Exception as e: logger.warning(f"[IMG Caption] failed to check provider support: {e}") return False

Then replace the big block with:

if img_cap_prov_id and req.image_urls: if not self._provider_supports_image(event): await self._ensure_img_caption(req, cfg, img_cap_prov_id)

This keeps exactly the same logic and logging but collapses all branching into one focused helper.

Copilot

Pull Request Overview

This PR improves the image caption functionality by making it conditional on the provider's image support capabilities. Instead of unconditionally using image captioning when configured, the code now checks whether the current provider supports images natively before falling back to image captioning.

Key changes:

Added logic to check if the current provider supports image modality by inspecting its configuration
Image captioning is now only used when the provider doesn't support images natively
Added extensive debug logging to trace the provider capability detection process

Copilot · 2025-11-03T17:47:51Z

packages/astrbot/process_llm_request.py

+                        # 从配置中查找当前 provider 的配置
+                        full_cfg = self.ctx.get_config(umo=event.unified_msg_origin)
+                        providers_list = full_cfg.get("provider", [])
+                        logger.debug(
+                            f"[IMG Caption] 配置中的 provider 数量: {len(providers_list)}"
+                        )
+                        logger.debug(
+                            f"[IMG Caption] 所有 provider IDs: {[p.get('id') for p in providers_list]}"
+                        )
+
+                        for provider_cfg in providers_list:
+                            if provider_cfg.get("id") == provider_id:
+                                modalities = provider_cfg.get("modalities", [])
+                                logger.debug(
+                                    f"[IMG Caption] 找到 provider 配置，modalities: {modalities}"
+                                )
+                                if "image" in modalities:
+                                    provider_supports_image = True
+                                    logger.debug(
+                                        "[IMG Caption] Provider 支持图像能力，跳过图像转述"
+                                    )
+                                break
+                        else:
+                            logger.debug(
+                                f"[IMG Caption] 未找到 provider_id={provider_id} 的配置"


This duplicates existing provider modality checking logic. The same functionality exists in astrbot/core/pipeline/process_stage/method/llm_request.py lines 510-516 where provider.provider_config.get('modalities', ['image']) is used. Consider using current_provider.provider_config.get('modalities', []) directly instead of searching through the config list, which would simplify this code significantly.

Suggested change

# 从配置中查找当前 provider 的配置

full_cfg = self.ctx.get_config(umo=event.unified_msg_origin)

providers_list = full_cfg.get("provider", [])

logger.debug(

f"[IMG Caption] 配置中的 provider 数量: {len(providers_list)}"

)

logger.debug(

f"[IMG Caption] 所有 provider IDs: {[p.get('id') for p in providers_list]}"

)

for provider_cfg in providers_list:

if provider_cfg.get("id") == provider_id:

modalities = provider_cfg.get("modalities", [])

logger.debug(

f"[IMG Caption] 找到 provider 配置，modalities: {modalities}"

)

if "image" in modalities:

provider_supports_image = True

logger.debug(

"[IMG Caption] Provider 支持图像能力，跳过图像转述"

)

break

else:

logger.debug(

f"[IMG Caption] 未找到 provider_id={provider_id} 的配置"

# 直接从当前 provider 的配置获取 modalities

modalities = current_provider.provider_config.get("modalities", [])

logger.debug(

f"[IMG Caption] 当前 provider 配置 modalities: {modalities}"

)

if "image" in modalities:

provider_supports_image = True

logger.debug(

"[IMG Caption] Provider 支持图像能力，跳过图像转述"

Copilot · 2025-11-03T17:47:52Z

packages/astrbot/process_llm_request.py

+                logger.debug(f"[IMG Caption] 当前 provider: {current_provider}")
+
+                if current_provider:
+                    try:
+                        provider_id = current_provider.meta().id
+                        logger.debug(f"[IMG Caption] Provider ID: {provider_id}")
+
+                        # 从配置中查找当前 provider 的配置
+                        full_cfg = self.ctx.get_config(umo=event.unified_msg_origin)
+                        providers_list = full_cfg.get("provider", [])
+                        logger.debug(
+                            f"[IMG Caption] 配置中的 provider 数量: {len(providers_list)}"
+                        )
+                        logger.debug(
+                            f"[IMG Caption] 所有 provider IDs: {[p.get('id') for p in providers_list]}"
+                        )
+
+                        for provider_cfg in providers_list:
+                            if provider_cfg.get("id") == provider_id:
+                                modalities = provider_cfg.get("modalities", [])
+                                logger.debug(
+                                    f"[IMG Caption] 找到 provider 配置，modalities: {modalities}"
+                                )
+                                if "image" in modalities:
+                                    provider_supports_image = True
+                                    logger.debug(
+                                        "[IMG Caption] Provider 支持图像能力，跳过图像转述"
+                                    )
+                                break
+                        else:
+                            logger.debug(
+                                f"[IMG Caption] 未找到 provider_id={provider_id} 的配置"
+                            )
+                    except Exception as e:
+                        logger.warning(f"[IMG Caption] 获取 provider 信息失败: {e}")
+                else:
+                    logger.debug("[IMG Caption] 当前没有 provider")


[nitpick] Excessive debug logging clutters the code and may impact performance. Consider reducing the number of debug statements to only the most critical decision points (e.g., lines 204 and 207-209), or consolidate multiple debug logs into fewer, more informative messages. Debug logs at lines 165, 175-180, 185-187, 195-197, and 201 could be removed or reduced.

Suggested change

logger.debug(f"[IMG Caption] 当前 provider: {current_provider}")

if current_provider:

try:

provider_id = current_provider.meta().id

logger.debug(f"[IMG Caption] Provider ID: {provider_id}")

# 从配置中查找当前 provider 的配置

full_cfg = self.ctx.get_config(umo=event.unified_msg_origin)

providers_list = full_cfg.get("provider", [])

logger.debug(

f"[IMG Caption] 配置中的 provider 数量: {len(providers_list)}"

)

logger.debug(

f"[IMG Caption] 所有 provider IDs: {[p.get('id') for p in providers_list]}"

)

for provider_cfg in providers_list:

if provider_cfg.get("id") == provider_id:

modalities = provider_cfg.get("modalities", [])

logger.debug(

f"[IMG Caption] 找到 provider 配置，modalities: {modalities}"

)

if "image" in modalities:

provider_supports_image = True

logger.debug(

"[IMG Caption] Provider 支持图像能力，跳过图像转述"

)

break

else:

logger.debug(

f"[IMG Caption] 未找到 provider_id={provider_id} 的配置"

)

except Exception as e:

logger.warning(f"[IMG Caption] 获取 provider 信息失败: {e}")

else:

logger.debug("[IMG Caption] 当前没有 provider")

# Removed excessive debug logging; retain only critical decision point logs below.

if current_provider:

try:

provider_id = current_provider.meta().id

# Removed: logger.debug(f"[IMG Caption] Provider ID: {provider_id}")

# 从配置中查找当前 provider 的配置

full_cfg = self.ctx.get_config(umo=event.unified_msg_origin)

providers_list = full_cfg.get("provider", [])

# Removed: logger.debug(f"[IMG Caption] 配置中的 provider 数量: {len(providers_list)}")

# Removed: logger.debug(f"[IMG Caption] 所有 provider IDs: {[p.get('id') for p in providers_list]}")

for provider_cfg in providers_list:

if provider_cfg.get("id") == provider_id:

modalities = provider_cfg.get("modalities", [])

# Removed: logger.debug(f"[IMG Caption] 找到 provider 配置，modalities: {modalities}")

if "image" in modalities:

provider_supports_image = True

# Removed: logger.debug("[IMG Caption] Provider 支持图像能力，跳过图像转述")

break

else:

# Removed: logger.debug(f"[IMG Caption] 未找到 provider_id={provider_id} 的配置")

except Exception as e:

logger.warning(f"[IMG Caption] 获取 provider 信息失败: {e}")

else:

# Removed: logger.debug("[IMG Caption] 当前没有 provider")

Copilot · 2025-11-03T17:47:52Z

packages/astrbot/process_llm_request.py

+                logger.debug(f"[IMG Caption] 当前 provider: {current_provider}")
+
+                if current_provider:
+                    try:
+                        provider_id = current_provider.meta().id
+                        logger.debug(f"[IMG Caption] Provider ID: {provider_id}")
+
+                        # 从配置中查找当前 provider 的配置
+                        full_cfg = self.ctx.get_config(umo=event.unified_msg_origin)
+                        providers_list = full_cfg.get("provider", [])
+                        logger.debug(
+                            f"[IMG Caption] 配置中的 provider 数量: {len(providers_list)}"
+                        )
+                        logger.debug(
+                            f"[IMG Caption] 所有 provider IDs: {[p.get('id') for p in providers_list]}"
+                        )
+
+                        for provider_cfg in providers_list:
+                            if provider_cfg.get("id") == provider_id:
+                                modalities = provider_cfg.get("modalities", [])
+                                logger.debug(
+                                    f"[IMG Caption] 找到 provider 配置，modalities: {modalities}"
+                                )
+                                if "image" in modalities:
+                                    provider_supports_image = True
+                                    logger.debug(
+                                        "[IMG Caption] Provider 支持图像能力，跳过图像转述"
+                                    )
+                                break
+                        else:
+                            logger.debug(
+                                f"[IMG Caption] 未找到 provider_id={provider_id} 的配置"
+                            )
+                    except Exception as e:
+                        logger.warning(f"[IMG Caption] 获取 provider 信息失败: {e}")
+                else:
+                    logger.debug("[IMG Caption] 当前没有 provider")


[nitpick] Excessive debug logging clutters the code and may impact performance. Consider reducing the number of debug statements to only the most critical decision points (e.g., lines 204 and 207-209), or consolidate multiple debug logs into fewer, more informative messages. Debug logs at lines 165, 175-180, 185-187, 195-197, and 201 could be removed or reduced.

Suggested change

logger.debug(f"[IMG Caption] 当前 provider: {current_provider}")

if current_provider:

try:

provider_id = current_provider.meta().id

logger.debug(f"[IMG Caption] Provider ID: {provider_id}")

# 从配置中查找当前 provider 的配置

full_cfg = self.ctx.get_config(umo=event.unified_msg_origin)

providers_list = full_cfg.get("provider", [])

logger.debug(

f"[IMG Caption] 配置中的 provider 数量: {len(providers_list)}"

)

logger.debug(

f"[IMG Caption] 所有 provider IDs: {[p.get('id') for p in providers_list]}"

)

for provider_cfg in providers_list:

if provider_cfg.get("id") == provider_id:

modalities = provider_cfg.get("modalities", [])

logger.debug(

f"[IMG Caption] 找到 provider 配置，modalities: {modalities}"

)

if "image" in modalities:

provider_supports_image = True

logger.debug(

"[IMG Caption] Provider 支持图像能力，跳过图像转述"

)

break

else:

logger.debug(

f"[IMG Caption] 未找到 provider_id={provider_id} 的配置"

)

except Exception as e:

logger.warning(f"[IMG Caption] 获取 provider 信息失败: {e}")

else:

logger.debug("[IMG Caption] 当前没有 provider")

# Removed excessive debug logs; only log at decision points below

if current_provider:

try:

provider_id = current_provider.meta().id

# (Optional) Could log provider_id here if needed for troubleshooting

# 从配置中查找当前 provider 的配置

full_cfg = self.ctx.get_config(umo=event.unified_msg_origin)

providers_list = full_cfg.get("provider", [])

# Removed debug logs for provider list details

for provider_cfg in providers_list:

if provider_cfg.get("id") == provider_id:

modalities = provider_cfg.get("modalities", [])

# Removed debug log for modalities

if "image" in modalities:

provider_supports_image = True

# Removed debug log for image capability

break

else:

# Removed debug log for missing provider config

except Exception as e:

logger.warning(f"[IMG Caption] 获取 provider 信息失败: {e}")

else:

# Removed debug log for missing provider

Copilot · 2025-11-03T17:47:53Z

packages/astrbot/process_llm_request.py

+                logger.debug(f"[IMG Caption] 当前 provider: {current_provider}")
+
+                if current_provider:
+                    try:
+                        provider_id = current_provider.meta().id
+                        logger.debug(f"[IMG Caption] Provider ID: {provider_id}")
+
+                        # 从配置中查找当前 provider 的配置
+                        full_cfg = self.ctx.get_config(umo=event.unified_msg_origin)
+                        providers_list = full_cfg.get("provider", [])
+                        logger.debug(
+                            f"[IMG Caption] 配置中的 provider 数量: {len(providers_list)}"
+                        )
+                        logger.debug(
+                            f"[IMG Caption] 所有 provider IDs: {[p.get('id') for p in providers_list]}"
+                        )
+
+                        for provider_cfg in providers_list:
+                            if provider_cfg.get("id") == provider_id:
+                                modalities = provider_cfg.get("modalities", [])
+                                logger.debug(
+                                    f"[IMG Caption] 找到 provider 配置，modalities: {modalities}"
+                                )
+                                if "image" in modalities:
+                                    provider_supports_image = True
+                                    logger.debug(
+                                        "[IMG Caption] Provider 支持图像能力，跳过图像转述"
+                                    )
+                                break
+                        else:
+                            logger.debug(
+                                f"[IMG Caption] 未找到 provider_id={provider_id} 的配置"
+                            )
+                    except Exception as e:
+                        logger.warning(f"[IMG Caption] 获取 provider 信息失败: {e}")
+                else:
+                    logger.debug("[IMG Caption] 当前没有 provider")


[nitpick] Excessive debug logging clutters the code and may impact performance. Consider reducing the number of debug statements to only the most critical decision points (e.g., lines 204 and 207-209), or consolidate multiple debug logs into fewer, more informative messages. Debug logs at lines 165, 175-180, 185-187, 195-197, and 201 could be removed or reduced.

Suggested change

logger.debug(f"[IMG Caption] 当前 provider: {current_provider}")

if current_provider:

try:

provider_id = current_provider.meta().id

logger.debug(f"[IMG Caption] Provider ID: {provider_id}")

# 从配置中查找当前 provider 的配置

full_cfg = self.ctx.get_config(umo=event.unified_msg_origin)

providers_list = full_cfg.get("provider", [])

logger.debug(

f"[IMG Caption] 配置中的 provider 数量: {len(providers_list)}"

)

logger.debug(

f"[IMG Caption] 所有 provider IDs: {[p.get('id') for p in providers_list]}"

)

for provider_cfg in providers_list:

if provider_cfg.get("id") == provider_id:

modalities = provider_cfg.get("modalities", [])

logger.debug(

f"[IMG Caption] 找到 provider 配置，modalities: {modalities}"

)

if "image" in modalities:

provider_supports_image = True

logger.debug(

"[IMG Caption] Provider 支持图像能力，跳过图像转述"

)

break

else:

logger.debug(

f"[IMG Caption] 未找到 provider_id={provider_id} 的配置"

)

except Exception as e:

logger.warning(f"[IMG Caption] 获取 provider 信息失败: {e}")

else:

logger.debug("[IMG Caption] 当前没有 provider")

# Removed excessive debug logs; only log at decision points below

if current_provider:

try:

provider_id = current_provider.meta().id

# 从配置中查找当前 provider 的配置

full_cfg = self.ctx.get_config(umo=event.unified_msg_origin)

providers_list = full_cfg.get("provider", [])

for provider_cfg in providers_list:

if provider_cfg.get("id") == provider_id:

modalities = provider_cfg.get("modalities", [])

if "image" in modalities:

provider_supports_image = True

break

except Exception as e:

logger.warning(f"[IMG Caption] 获取 provider 信息失败: {e}")

# No need to log if no provider; decision point below will log outcome

Copilot · 2025-11-03T17:47:53Z

packages/astrbot/process_llm_request.py

+                    logger.debug("[IMG Caption] 当前没有 provider")
+
+                if not provider_supports_image:
+                    logger.debug("[IMG Caption] 当前 provider 不支持图像，启用图像转述")
+                    await self._ensure_img_caption(req, cfg, img_cap_prov_id)
+                else:
+                    logger.debug(
+                        "[IMG Caption] 当前 provider 支持图像，直接传递图片 URL"
+                    )


[nitpick] Excessive debug logging clutters the code and may impact performance. Consider reducing the number of debug statements to only the most critical decision points (e.g., lines 204 and 207-209), or consolidate multiple debug logs into fewer, more informative messages. Debug logs at lines 165, 175-180, 185-187, 195-197, and 201 could be removed or reduced.

修复正常对话自动调用图片转述的Bug

219a385

railgun19457 marked this pull request as ready for review November 3, 2025 17:02

auto-assign bot requested review from LIghtJUNction and Soulter November 3, 2025 17:02

sourcery-ai bot reviewed Nov 3, 2025

View reviewed changes

LIghtJUNction requested a review from Copilot November 3, 2025 17:44

Copilot AI reviewed Nov 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

修复配置图片转述供应商后自动调用图片转述的Bug #3295

修复配置图片转述供应商后自动调用图片转述的Bug #3295

railgun19457 commented Nov 3, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Nov 3, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 3, 2025

Uh oh!

Copilot AI Nov 3, 2025

Uh oh!

Copilot AI Nov 3, 2025

Uh oh!

Copilot AI Nov 3, 2025

Uh oh!

Copilot AI Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

修复配置图片转述供应商后自动调用图片转述的Bug #3295

Are you sure you want to change the base?

修复配置图片转述供应商后自动调用图片转述的Bug #3295

Conversation

railgun19457 commented Nov 3, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation / 动机

Modifications / 改动点

Verification Steps / 验证步骤

Screenshots or Test Results / 运行截图或测试结果

Compatibility & Breaking Changes / 兼容性与破坏性变更

Checklist / 检查清单

Sourcery 摘要

Sourcery 总结

Summary by Sourcery

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

railgun19457 commented Nov 3, 2025 •

edited by sourcery-ai bot

Loading