You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently fine-tuned a model Donut for DocVQA. The fine-tuning process completed successfully, but I encountered an issue during inference. When I ask a question that should correspond to a specific answer in the ground truth, the model often returns a different answer.
For example, my dataset contains the following entry:
image upload.wikimedia.org/wikipedia/commons/f/f5/Florida_Driver_License.png (its just example)
json
"gt_parses": [ {"question": "What is the Driver's License Number?", "answer": "DL1234567890"}, {"question": "What is the Full Name?", "answer": "John Doe"}, {"question": "What is the Date of Birth?", "answer": "March 10, 1985"}, {"question": "What is the Address?", "answer": "123 Elm Street, Springfield, IL 62704, United States"}, {"question": "What is the Expiration Date?", "answer": "March 10, 2025"} ]
However, when I query "What is the Full Name?", the model incorrectly responds with the “Driver's License Number” instead of the name.
i have been try:
move the full name on the first array, but the answer is still Driver License Number
change the metadata.csv to jsonl (its return out of memory)
create only 1 question answer inside array json, its work. but when i try fine-tuning other question its break. for example:
1st fine-tuning i use all driver license number, i ask number inside card and its answered
2nd fine-tuning i add full name, but when i ask its return wrong answer. the answer always driver license number
here the sequences i created
def _prepare_gt_sequence(self, gt_parses):
sequences = []
for parse in gt_parses:
question = parse.get("question", "")
answer = parse.get("answer", "")
sequence = f"<s>{question}</s> {answer}<eos>"
sequences.append(sequence)
return sequences[0] if sequences else "<s><eos>"
Could you please provide guidance on why this might be happening and how I can resolve it? Any suggestions on improving the model's accuracy for this kind of task would be greatly appreciated.
Thank you for your assistance!
The text was updated successfully, but these errors were encountered:
Hi, Could I see the specific format of the metadata.jsonl file for your training dataset? Or Could you consider sending it to my email address.?When I was fine-tuning, I found that the input of the model contained the answer part, not sure what is the problem https://github.com/clovaai/donut/issues/312#issue-2501078667
I recently fine-tuned a model Donut for DocVQA. The fine-tuning process completed successfully, but I encountered an issue during inference. When I ask a question that should correspond to a specific answer in the ground truth, the model often returns a different answer.
For example, my dataset contains the following entry:
image
upload.wikimedia.org/wikipedia/commons/f/f5/Florida_Driver_License.png (its just example)
json
"gt_parses": [ {"question": "What is the Driver's License Number?", "answer": "DL1234567890"}, {"question": "What is the Full Name?", "answer": "John Doe"}, {"question": "What is the Date of Birth?", "answer": "March 10, 1985"}, {"question": "What is the Address?", "answer": "123 Elm Street, Springfield, IL 62704, United States"}, {"question": "What is the Expiration Date?", "answer": "March 10, 2025"} ]
However, when I query "What is the Full Name?", the model incorrectly responds with the “Driver's License Number” instead of the name.
i have been try:
here the sequences i created
Could you please provide guidance on why this might be happening and how I can resolve it? Any suggestions on improving the model's accuracy for this kind of task would be greatly appreciated.
Thank you for your assistance!
The text was updated successfully, but these errors were encountered: