Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown file parsing error, chunk is not good. #5569

Closed
4 tasks done
blankbro opened this issue Jun 25, 2024 · 8 comments · May be fixed by #5855
Closed
4 tasks done

Markdown file parsing error, chunk is not good. #5569

blankbro opened this issue Jun 25, 2024 · 8 comments · May be fixed by #5855
Labels
🐞 bug Something isn't working

Comments

@blankbro
Copy link

Self Checks

  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • Please do not modify this template :) and fill in all the required fields.

Dify version

0.6.11

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

import https://github.com/blankbro/mqtt-storm/blob/master/README.md

image

✔️ Expected Behavior

md截图

Keep the picture link in markdown.
Keep markdown format.

❌ Actual Behavior

image
image

@dosubot dosubot bot added the 🐞 bug Something isn't working label Jun 25, 2024
Copy link

dosubot bot commented Jun 25, 2024

Hello, @blankbro! I'm here to help you with any bugs, questions, or contributions!

To resolve the "chunk is not good" Markdown file parsing error in Dify version 0.6.11 when importing the README.md file from the mqtt-storm GitHub repository, follow these steps:

  1. Check for Unclosed Tags: Ensure that all HTML tags within the Markdown file are properly closed.
  2. Validate Markdown Syntax: Use a Markdown validator to check for any syntax errors.
  3. Simplify Content: Temporarily remove complex elements like tables, images, and HTML to identify if any specific part is causing the issue.

Here is an example of a simplified and properly structured Markdown file:

# Project Title

![cover-image](https://example.com/image.png)

<p align="center">
  <a href="https://example.com">Link 1</a> ·
  <a href="https://example.com">Link 2</a> ·
  <a href="https://example.com">Link 3</a>
</p>

## Features

- Feature 1
- Feature 2
- Feature 3

## Usage

Instructions on how to use the project.

## Contributing

Guidelines for contributing to the project.

## License

Information about the project's license.

If the error persists, it may be necessary to debug further by isolating specific sections of the Markdown file to identify the problematic chunk [1].

Additionally, Dify uses the react-markdown library for Markdown parsing. Known issues with react-markdown version 8.0.6 include:

  1. HTML Parsing: By default, react-markdown does not support parsing HTML within Markdown.
  2. Plugins Compatibility: Some plugins or extensions might not be fully compatible.
  3. Syntax Highlighting: Additional configuration or plugins like react-syntax-highlighter might be needed [2].

Dify also provides built-in tools for validating or sanitizing Markdown files before parsing them, such as replacing continuous spaces, newlines, and tabs, and deleting all URLs and email addresses [3].

For more detailed error messages, check the Logs section in the Dify application, accessible via the left-side navigation [4].

To continue talking to Dosu, mention @dosu.

@crazywoola
Copy link
Member

Please use the custom separate identifier to do it. :)

@blankbro
Copy link
Author

blankbro commented Jun 25, 2024

Please use the custom separate identifier to do it. :)

@crazywoola

image
image

Is that it? But my problem still exists.

@crazywoola
Copy link
Member

Yes, you need to choose which identifier you would like to use. For example, you could fill in ###

@blankbro
Copy link
Author

Yes, you need to choose which identifier you would like to use. For example, you could fill in ###

image

problem still exists.

@crazywoola
Copy link
Member

crazywoola commented Jun 30, 2024

image Your doc is segmenting by `##`, I mean for example `###`, you should change it according to your needs.

@blankbro
Copy link
Author

image Your doc is segmenting by ##, I mean for example ###, you should change it according to your needs.

image problem still exists.

@tangyoha
Copy link

tangyoha commented Jul 2, 2024

@crazywoola hi, After my investigation, I found that it was caused by the commit #2528. The md file was processed and the pictures and links were automatically removed.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants