When the knowledge base performs segmentation analysis, pictures in the document are lost #11134

zlm0001 · 2024-11-26T09:46:47Z

Self Checks

This is only for bug report, if you would like to ask a question, please head to Discussions.
I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
Please do not modify this template :) and fill in all the required fields.

Dify version

0.12.1

Cloud or Self Hosted

Cloud

Steps to reproduce

When the knowledge base performs segmentation analysis, pictures in the document are lost

✔️ Expected Behavior

When the knowledge base performs segmentation parsing, the pictures in the document can be attached to the context, the same position as the original text displayed. In this way, when performing knowledge retrieval, the pictures can also be retrieved as knowledge.

❌ Actual Behavior

When the knowledge base was performing segmentation analysis, the pictures in the document were lost and were not stored in the knowledge base

dosubot · 2024-11-26T09:49:06Z

I found a similar unsolved discussion titled "When processing documents, you can save the images to segments" ^[1]. This might be related to the issue you're experiencing with images being lost during segmentation analysis.

In the Dify codebase, images in DOCX documents are extracted and processed using the WordExtractor class. This class extracts images, saves them separately, and embeds references in the processed document content ^[2]^[3]. However, it seems there might be an issue with how these images are being handled or stored in the knowledge base during segmentation.

Additionally, there is a test named test_invoke_chat_model_with_vision that verifies the handling of images during segmentation analysis, which might be useful for debugging this issue ^[4].

To continue talking to Dosu, mention @dosu.

crazywoola · 2024-11-26T11:00:33Z

Link #11063, currently, we do not parse images in our system.

zlm0001 · 2024-11-27T02:20:37Z

[11063](url) is to analyze the content in the picture. I want to not parse the content in the picture, but to follow the picture intact in the context of the original text.

We now do knowledge questions and answers. When users do knowledge questions and answers, they hope that the large model can not only return the text, but also hope to return the pictures related to the knowledge questions and answers.

Example: I created a knowledge base related to the user manual, which contains many description of the operation steps and screenshots to assist instructions. When users ask questions and answers to the user manual knowledge base, the large model not only returns the text of the operation steps, but also returns relevant screenshots

dosubot bot added cloud When the version is cloud and it is a bug report 🐞 bug Something isn't working labels Nov 26, 2024

crazywoola removed 🐞 bug Something isn't working cloud When the version is cloud and it is a bug report labels Nov 26, 2024

crazywoola closed this as completed Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When the knowledge base performs segmentation analysis, pictures in the document are lost #11134

When the knowledge base performs segmentation analysis, pictures in the document are lost #11134

zlm0001 commented Nov 26, 2024

dosubot bot commented Nov 26, 2024

crazywoola commented Nov 26, 2024

zlm0001 commented Nov 27, 2024

When the knowledge base performs segmentation analysis, pictures in the document are lost #11134

When the knowledge base performs segmentation analysis, pictures in the document are lost #11134

Comments

zlm0001 commented Nov 26, 2024

Self Checks

Dify version

Cloud or Self Hosted

Steps to reproduce

✔️ Expected Behavior

❌ Actual Behavior

dosubot bot commented Nov 26, 2024

crazywoola commented Nov 26, 2024

zlm0001 commented Nov 27, 2024