HelpDeveloping a Business Chatbot with Dify and Challenges in Optimizing Knowledge Base Chunking #13130

numeyume · 2025-02-01T05:25:38Z

numeyume
Feb 1, 2025

Self Checks

I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
Please do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing? Tell me about your story.

I am developing a chatbot using Dify that identifies and presents relevant sections of manuals (created in HTML format) to answer business-related inquiries.

When creating the knowledge base, I converted 700 manuals written in HTML into PDF files via web rendering (each page's footer includes a URL) and loaded these files into the system.
In this approach, each chunk also included a corresponding URL.
As a result, the chatbot successfully provided both a summary of the answer and a relevant URL, yielding highly satisfactory outcomes.

Looking ahead, since the cloud version of Dify has a 1,000-page limit and the manuals I need to process clearly exceed this limit, I attempted to merge the 700 HTML files (including URLs) into a single Markdown file using Python and created the knowledge base in parent-child split mode.

However, this approach did not generate the expected chunk structure.
Ideally, I expected parent chunks to contain only titles and URLs, while child chunks would contain the article's main content.
Instead, some chunks contained mixed URLs from different pages within a single title.

Therefore, I would like to ask:
Is there a way to process a Markdown file (formatted with titles, URLs, and content separated by # headers) in a way that produces well-structured chunk data, similar to the successful PDF-based approach?

Any guidance would be greatly appreciated.

2. Additional context or comments

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HelpDeveloping a Business Chatbot with Dify and Challenges in Optimizing Knowledge Base Chunking #13130

{{title}}

Replies: 0 comments

Select a reply

HelpDeveloping a Business Chatbot with Dify and Challenges in Optimizing Knowledge Base Chunking #13130

numeyume Feb 1, 2025

Self Checks

1. Is this request related to a challenge you're experiencing? Tell me about your story.

2. Additional context or comments

Replies: 0 comments

numeyume
Feb 1, 2025