[WIP] add local evaluation for MMBench dev split #45

Luodian · 2024-04-04T17:15:41Z

LLaVA-v1.5-7B eval results

* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit 4011e6c Author: kcz358 <[email protected]> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (#45) commit 16a6c1f Author: XinrunDu <[email protected]> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (#46) commit 515a7c4 Author: XinrunDu <[email protected]> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (#44) Co-authored-by: ygjin11 <[email protected]> commit b3a013c Author: kcz358 <[email protected]> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit 1b4a477 Author: JvThunder <[email protected]> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (#42) * refactor vizwizvqa task * Merge commit '41d044cd287adcbcf095afb1a0ef5a96c88c3d9d' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <[email protected]>

* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit 5a44010 Author: kcz358 <[email protected]> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (#45) commit cf10a45 Author: XinrunDu <[email protected]> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (#46) commit caaad1d Author: XinrunDu <[email protected]> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (#44) Co-authored-by: ygjin11 <[email protected]> commit cfa11b6 Author: kcz358 <[email protected]> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit 4d42aa8 Author: JvThunder <[email protected]> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (#42) * refactor vizwizvqa task * Merge commit '0cf06439d3c85aee8783034b226f1badd3a08608' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <[email protected]>

* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit c35da5e Author: kcz358 <[email protected]> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (EvolvingLMMs-Lab#45) commit 0175674 Author: XinrunDu <[email protected]> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (EvolvingLMMs-Lab#46) commit 25f7a96 Author: XinrunDu <[email protected]> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (EvolvingLMMs-Lab#44) Co-authored-by: ygjin11 <[email protected]> commit 631891b Author: kcz358 <[email protected]> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit 210d779 Author: JvThunder <[email protected]> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42) * refactor vizwizvqa task * Merge commit '5b0d7aaac69663d1fedc531b75644ebe1bdb867e' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <[email protected]>

* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit 21dea7b Author: kcz358 <[email protected]> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (EvolvingLMMs-Lab#45) commit 12144a6 Author: XinrunDu <[email protected]> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (EvolvingLMMs-Lab#46) commit aca1e6d Author: XinrunDu <[email protected]> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (EvolvingLMMs-Lab#44) Co-authored-by: ygjin11 <[email protected]> commit 0925443 Author: kcz358 <[email protected]> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit 16f1cf2 Author: JvThunder <[email protected]> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42) * refactor vizwizvqa task * Merge commit '9bbbad51a77051fcf676438f81e81f723c1b438b' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <[email protected]>

* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit 9cb2f41 Author: kcz358 <[email protected]> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (EvolvingLMMs-Lab#45) commit 8154867 Author: XinrunDu <[email protected]> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (EvolvingLMMs-Lab#46) commit 2078e19 Author: XinrunDu <[email protected]> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (EvolvingLMMs-Lab#44) Co-authored-by: ygjin11 <[email protected]> commit 81b2181 Author: kcz358 <[email protected]> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit b22bced Author: JvThunder <[email protected]> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42) * refactor vizwizvqa task * Merge commit '59c7d67077c315657a02bdee2eace0e64c1ee0d4' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <[email protected]>

* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit e2686e8 Author: kcz358 <[email protected]> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (EvolvingLMMs-Lab#45) commit bf93c62 Author: XinrunDu <[email protected]> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (EvolvingLMMs-Lab#46) commit 3a6b334 Author: XinrunDu <[email protected]> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (EvolvingLMMs-Lab#44) Co-authored-by: ygjin11 <[email protected]> commit 568a358 Author: kcz358 <[email protected]> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit 966c56f Author: JvThunder <[email protected]> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42) * refactor vizwizvqa task * Merge commit '41ceea1413ea03f0089bcc346d9187060dc228df' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <[email protected]>

* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit 5598ac0 Author: kcz358 <[email protected]> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (EvolvingLMMs-Lab#45) commit 015a8d2 Author: XinrunDu <[email protected]> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (EvolvingLMMs-Lab#46) commit ee5b446 Author: XinrunDu <[email protected]> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (EvolvingLMMs-Lab#44) Co-authored-by: ygjin11 <[email protected]> commit 7c11ba4 Author: kcz358 <[email protected]> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit d18d66d Author: JvThunder <[email protected]> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42) * refactor vizwizvqa task * Merge commit '780af491d66291bd0780d5426295a4c7dfe385e2' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <[email protected]>

* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit 11c9464 Author: kcz358 <[email protected]> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (EvolvingLMMs-Lab#45) commit 1cbc746 Author: XinrunDu <[email protected]> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (EvolvingLMMs-Lab#46) commit 7c4d14b Author: XinrunDu <[email protected]> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (EvolvingLMMs-Lab#44) Co-authored-by: ygjin11 <[email protected]> commit 801829a Author: kcz358 <[email protected]> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit 2bb8fd6 Author: JvThunder <[email protected]> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42) * refactor vizwizvqa task * Merge commit '9bbbad51a77051fcf676438f81e81f723c1b438b' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <[email protected]>

* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit ca0c734 Author: kcz358 <[email protected]> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (EvolvingLMMs-Lab#45) commit c6d4d44 Author: XinrunDu <[email protected]> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (EvolvingLMMs-Lab#46) commit b5204d4 Author: XinrunDu <[email protected]> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (EvolvingLMMs-Lab#44) Co-authored-by: ygjin11 <[email protected]> commit 3dd77b9 Author: kcz358 <[email protected]> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit 058a7d4 Author: JvThunder <[email protected]> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42) * refactor vizwizvqa task * Merge commit '59c7d67077c315657a02bdee2eace0e64c1ee0d4' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <[email protected]>

* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit f77ff8a Author: kcz358 <[email protected]> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (EvolvingLMMs-Lab#45) commit 23294e3 Author: XinrunDu <[email protected]> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (EvolvingLMMs-Lab#46) commit e60daa7 Author: XinrunDu <[email protected]> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (EvolvingLMMs-Lab#44) Co-authored-by: ygjin11 <[email protected]> commit d95e7ff Author: kcz358 <[email protected]> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit 7a005aa Author: JvThunder <[email protected]> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42) * refactor vizwizvqa task * Merge commit 'cfdce77dad7c0ae328f60712c6dd5ba1bc75cc1d' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <[email protected]>

Luodian force-pushed the dev/public/mmbench branch 2 times, most recently from 729587b to 8157676 Compare April 4, 2024 17:28

Luodian closed this Apr 4, 2024

Luodian force-pushed the dev/public/mmbench branch from 8157676 to 70cc773 Compare April 4, 2024 17:30

Luodian pushed a commit that referenced this pull request Apr 4, 2024

Fix seedbench choices bugs (#45)

4011e6c

Luodian pushed a commit that referenced this pull request Apr 4, 2024

Fix seedbench choices bugs (#45)

5a44010

kangreen0210 pushed a commit to kangreen0210/LIME that referenced this pull request Oct 6, 2024

Fix seedbench choices bugs (EvolvingLMMs-Lab#45)

c35da5e

kangreen0210 pushed a commit to kangreen0210/LIME that referenced this pull request Oct 6, 2024

Fix seedbench choices bugs (EvolvingLMMs-Lab#45)

21dea7b

kangreen0210 pushed a commit to kangreen0210/LIME that referenced this pull request Oct 6, 2024

Fix seedbench choices bugs (EvolvingLMMs-Lab#45)

9cb2f41

kangreen0210 pushed a commit to kangreen0210/LIME that referenced this pull request Oct 6, 2024

Fix seedbench choices bugs (EvolvingLMMs-Lab#45)

e2686e8

kangreen0210 pushed a commit to kangreen0210/LIME that referenced this pull request Oct 6, 2024

Fix seedbench choices bugs (EvolvingLMMs-Lab#45)

5598ac0

kangreen0210 pushed a commit to kangreen0210/LIME that referenced this pull request Oct 6, 2024

Fix seedbench choices bugs (EvolvingLMMs-Lab#45)

11c9464

kangreen0210 pushed a commit to kangreen0210/LIME that referenced this pull request Oct 6, 2024

Fix seedbench choices bugs (EvolvingLMMs-Lab#45)

ca0c734

kangreen0210 pushed a commit to kangreen0210/LIME that referenced this pull request Oct 6, 2024

Fix seedbench choices bugs (EvolvingLMMs-Lab#45)

f77ff8a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] add local evaluation for MMBench dev split #45

[WIP] add local evaluation for MMBench dev split #45

Luodian commented Apr 4, 2024

[WIP] add local evaluation for MMBench dev split #45

[WIP] add local evaluation for MMBench dev split #45

Conversation

Luodian commented Apr 4, 2024