Add new tool: Doc2X #5753

Menghuan1918 · 2024-06-29T14:36:09Z

Description

As mentioned here (#5675), a tool for OCR of images was added whose output preserves the text in the format of the source image. Its accuracy is higher than using LLM (Vison) to extract text in band format, with significantly fewer illusions.

Example1:

Extracting the value of loss from a screenshot of a machine learning training and plotting a line graph

2024-06-29.22-09-40.mp4

Example2:

Extract maths questions, give them to multiple LLMs (even non-version ones) to answer, and subsequently aggregate them

2024-06-29.21-27-47.mp4

Type of Change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update, included: Dify Document
Improvement, including but not limited to code refactoring, performance optimization, and UI/UX improvement
Dependency upgrade

How Has This Been Tested?

Created several new workflow tests using this new tool and ran several tests in the workflow using its supported image formats (png/jpg)

Suggested Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings
I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods
optional I have made corresponding changes to the documentation
optional I have added tests that prove my fix is effective or that my feature works
optional New and existing unit tests pass locally with my changes

laipz8200 · 2024-07-02T05:46:06Z

We need to wait for the service provider to support internationalization.

Menghuan1918 and others added 9 commits June 29, 2024 11:13

Init the tool info

c276958

Finished the OCR processing part

0b85089

formatting code

88eb792

Some bug fix

5debe90

Try to change the way to get img file

bd8afb3

Merge branch 'langgenius:main' into main

b463b58

Get the file to upload to doc2x

9360403

Merge branch 'langgenius:main' into main

451c6c4

Merge branch 'langgenius:main' into main

5bb8221

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. 🔨 feat:tools Tools for agent, function call related stuff. labels Jun 29, 2024

takatost requested a review from laipz8200 June 30, 2024 02:41

Fix formatting problems in yaml file

42f790a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new tool: Doc2X #5753

Add new tool: Doc2X #5753

Menghuan1918 commented Jun 29, 2024

laipz8200 commented Jul 2, 2024

Add new tool: Doc2X #5753

Are you sure you want to change the base?

Add new tool: Doc2X #5753

Conversation

Menghuan1918 commented Jun 29, 2024

Description

Type of Change

How Has This Been Tested?

Suggested Checklist:

laipz8200 commented Jul 2, 2024