update countDownloads for PaddleOCR #1812

TingquanGao · 2025-10-18T18:36:46Z

No description provided.

pcuenca · 2025-10-20T06:49:04Z

packages/tasks/src/model-libraries.ts

+		docsUrl: "https://www.paddleocr.ai/",
 		snippets: snippets.paddleocr,
 		filter: true,
+		countDownloads: `path_extension:"safetensors" OR path_extension:"pdiparams"`,


Just for info, if the user runs the https://huggingface.co/PaddlePaddle/PaddleOCR-VL/tree/main/PaddleOCR-VL-0.9B model this will work fine. However, if they run the two-stage pipeline, this rule will count one download for https://huggingface.co/PaddlePaddle/PaddleOCR-VL/blob/main/PaddleOCR-VL-0.9B/model.safetensors, and another one for https://huggingface.co/PaddlePaddle/PaddleOCR-VL/blob/main/PP-DocLayoutV2/inference.pdiparams.

Thank you for the reminder. We will reconsider how to implement.

Hi, I have updated the codes. It now only counts file whose full path is exactly inference.pdiparams, instead of ending with the pdiparams suffix.
Could you please review it again? If there are any issues or suggestions, please feel free to let me know and I will make further improvements.

Thank you!

Thank you for iterating! But I think this change will still result in the same situation described in my previous comment. Sorry my message was not clear, let me try to summarize:

If the user only downloads PaddleOCR-VL-0.9B and runs it, then the download count will be incremented once.

If they use the full pipeline to run both PP-DocLayoutV2 and PaddleOCR-VL-0.9B one after the other, then the download count will be incremented twice: one for each model. This is because files from both directories will match the rule.

If PP-DocLayoutV2 is never intended to be used without PaddleOCR-VL-0.9B, then perhaps it would make sense to add a rule for just PaddleOCR-VL-0.9B. But I don't know if that's the case, or if PP-DocLayoutV2 may be used independently. I'm pointing this out just to give you information about how to interpret the download counts; depending on the use cases you support, you may want to restrict counts to one of the model, or accept duplicate counts when both are used as part of the same pipeline.

As discussed offline, we think that using one repo per model is usually clearer and provides a lot more benefits in addition to cleaner download counts. For example, this repo is a copy of your PaddleOCR-VL-0.9B directory, and it can easily be used with the from_pretrained API as shown in the snippet I added to the model card. It also conveys useful metadata: it's an image-text-to-text model, it works with transformers, but it requires custom code to do so (as indicated by the custom_code tag, which was automatically inferred). It provides some starter code snippets when users click on the "Use this model" button, which are also inferred automatically based on the model characteristics.

If you need more time to work on a single model-per-repo structure, that's totally ok - we can merge this PR meanwhile, but keep in mind what the download rule means, and if possible try to make it match the use cases your models support.

Thank you for your comments. Please allow me to clarify our plan:

Both the PaddleOCR-VL-0.9B and PP-DocLayoutV2 models can be used independently. Therefore, we agree with your suggestion that the two models should be stored in separate repositories.

The PaddleOCR-VL-0.9B model will continue to be hosted in the PaddlePaddle/PaddleOCR-VL repository. However, we will move the relevant files to the root directory of the repository and remove the PaddleOCR-VL-0.9B subdirectory. This change will be completed as soon as possible, most likely by tomorrow.

As for the PP-DocLayoutV2 model currently present in the PaddlePaddle/PaddleOCR-VL repository, we will remove it at an appropriate time in the future. After that, the PP-DocLayoutV2 model will only be available in the PaddlePaddle/PP-DocLayoutV2 repository.

Given the plan described above, would it be possible to proceed with merging the current code?

Thank you very much for your patience, guidance, and suggestions. If there is anything inappropriate in our approach, please feel free to point it out.

pcuenca

Fine for me, cc @Wauplin in case there's more feedback

Wauplin

The logic seems good to me! Thanks a lot @TingquanGao @pcuenca for the discussions. Also agree splitting the models in 2 repos is the right move.

TingquanGao requested review from SBrandeis, Wauplin, gary149, julien-c, ngxson and pcuenca as code owners October 18, 2025 18:36

TingquanGao closed this Oct 18, 2025

TingquanGao reopened this Oct 18, 2025

pcuenca reviewed Oct 20, 2025

View reviewed changes

TingquanGao changed the title ~~update countDownloads for PaddleOCR~~ [WIP] update countDownloads for PaddleOCR Oct 20, 2025

update countDownloads for PaddleOCR

4725e1e

TingquanGao force-pushed the fix_ppocr branch from b16e36d to 4725e1e Compare October 20, 2025 11:54

TingquanGao changed the title ~~[WIP] update countDownloads for PaddleOCR~~ update countDownloads for PaddleOCR Oct 20, 2025

pcuenca approved these changes Oct 20, 2025

View reviewed changes

Wauplin approved these changes Oct 21, 2025

View reviewed changes

Wauplin merged commit 1237054 into huggingface:main Oct 21, 2025
2 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

update countDownloads for PaddleOCR #1812

update countDownloads for PaddleOCR #1812

Uh oh!

TingquanGao commented Oct 18, 2025

Uh oh!

pcuenca Oct 20, 2025

Uh oh!

TingquanGao Oct 20, 2025

Uh oh!

TingquanGao Oct 20, 2025

Uh oh!

pcuenca Oct 20, 2025

Uh oh!

TingquanGao Oct 20, 2025

Uh oh!

pcuenca left a comment

Uh oh!

Wauplin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

update countDownloads for PaddleOCR #1812

update countDownloads for PaddleOCR #1812

Uh oh!

Conversation

TingquanGao commented Oct 18, 2025

Uh oh!

pcuenca Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

TingquanGao Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

TingquanGao Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

pcuenca Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

TingquanGao Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

pcuenca left a comment

Choose a reason for hiding this comment

Uh oh!

Wauplin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants