Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc_analyze中按页串行处理的逻辑改成并行提升速度 #1566

Open
LCorleone opened this issue Jan 17, 2025 · 1 comment
Open

doc_analyze中按页串行处理的逻辑改成并行提升速度 #1566

LCorleone opened this issue Jan 17, 2025 · 1 comment
Labels
enhancement New feature or request

Comments

@LCorleone
Copy link

for index in range(len(dataset)):
    page_data = dataset.get_page(index)
    img_dict = page_data.get_image()
    img = img_dict['img']
    page_width = img_dict['width']
    page_height = img_dict['height']
    if start_page_id <= index <= end_page_id:
        page_start = time.time()
        result = custom_model(img)
        logger.info(f'-----page_id : {index}, page total time: {round(time.time() - page_start, 2)}-----')
    else:
        result = []

    page_info = {'page_no': index, 'height': page_height, 'width': page_width}
    page_dict = {'layout_dets': result, 'page_info': page_info}
    model_json.append(page_dict)

看了下代码,每一页处理是独立的,对于一个页数较大的文件,会比较耗时,在资源允许的情况下,改成并行处理再合并对速度应该有较大提升。不知是否可行。

@LCorleone LCorleone added the enhancement New feature or request label Jan 17, 2025
@myhloli
Copy link
Collaborator

myhloli commented Jan 17, 2025

目前有做性能优化的计划,根据调研结果,并行处理在io资源上消耗较大,目前的优化方向是尽量吃满单卡性能用更大的batch去做加速。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants