Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequently occurring bug: Missing the letter i and l in words #1702

Open
LaguePesikin opened this issue Feb 18, 2025 · 3 comments
Open

Frequently occurring bug: Missing the letter i and l in words #1702

LaguePesikin opened this issue Feb 18, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@LaguePesikin
Copy link

Description of the bug | 错误描述

pdf paragraph screenshot:
Image
MineU output:

{
        "type": "text",
        "text": "For the string elicitation task, we are given a target response $y$ , and our goal is to generate a prefx $x$ such that m’s output on $x$ is an exact match for $y$ ; in other words, we want to solve arg $\\mathrm{max}_{x}$ $p_{m}(y\\mid x)$ . ",
        "page_idx": 1
}

(prefix -> prefx)
Similar problems occur in words like 'fluency' (fuency), 'benefit' (beneft)..., and almost always happen.

How to reproduce the bug | 如何复现

see above.

Operating system | 操作系统

MacOS

Python version | Python 版本

3.11

Software version | 软件版本 (magic-pdf --version)

1.0.x

Device mode | 设备模式

cpu

@LaguePesikin LaguePesikin added the bug Something isn't working label Feb 18, 2025
@myhloli
Copy link
Collaborator

myhloli commented Feb 18, 2025

Can you try this pdf on huggingface demo https://huggingface.co/spaces/opendatalab/MinerU and feedback result?

@LaguePesikin
Copy link
Author

Can you try this pdf on huggingface demo https://huggingface.co/spaces/opendatalab/MinerU and feedback result?

Seems it work well, all pairs fi and fl are accurately recognized.
So why the web demo behave differently from my local client?

@myhloli
Copy link
Collaborator

myhloli commented Feb 18, 2025

The online demo uses the dev branch, which includes some unreleased fix codes, so the specific performance may differ from the release version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants