Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use LLM to revise back translations #616

Open
johnml1135 opened this issue Jan 3, 2025 · 4 comments
Open

Use LLM to revise back translations #616

johnml1135 opened this issue Jan 3, 2025 · 4 comments
Labels
research Research topics

Comments

@johnml1135
Copy link
Collaborator

johnml1135 commented Jan 3, 2025

Use an LLM to revise a back translation to mimic the style of the source translation. The revised back translation might provide better training data for mixed source configurations.

@ddaspit
Copy link
Collaborator

ddaspit commented Jan 3, 2025

What do you mean by "rearrange back translations"?

@Enkidu93
Copy link
Collaborator

Enkidu93 commented Jan 3, 2025

I think John was referring to this slack conversation: https://sil-language-software.slack.com/archives/C8022MJ7M/p1735864021033209.

@ddaspit
Copy link
Collaborator

ddaspit commented Jan 3, 2025

It probably makes sense to move this to silnlp, since it is a research topic.

@johnml1135 johnml1135 transferred this issue from sillsdev/serval Jan 3, 2025
@woodwardmw
Copy link

In case it's helpful here's the notebook I used to clean up a back translation using a very simple prompt with GPT-4o-mini: https://github.com/sil-ai/madlad-finetuning/blob/main/llm_BT_cleanup_async.ipynb

Took about 20 minutes, and I think ~$1 cost.

Example before and after texts:
https://github.com/sil-ai/madlad-finetuning/blob/main/data/eng-krrBT.txt
https://github.com/sil-ai/madlad-finetuning/blob/main/data/eng-krrBT-clean.txt

@ddaspit ddaspit changed the title Use ChatGPT to rearrange back translations Use LLM to revise back translations Jan 3, 2025
@ddaspit ddaspit added the research Research topics label Jan 3, 2025
@ddaspit ddaspit removed this from Serval Jan 3, 2025
@ddaspit ddaspit moved this from 🆕 New to 📋 Backlog in SIL-NLP Research Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
research Research topics
Projects
Status: 📋 Backlog
Development

No branches or pull requests

4 participants