Skip to content

LLM transliteration #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
theRealProHacker opened this issue Jan 17, 2025 · 3 comments · May be fixed by #7
Open

LLM transliteration #6

theRealProHacker opened this issue Jan 17, 2025 · 3 comments · May be fixed by #7
Assignees
Labels
enhancement New feature or request

Comments

@theRealProHacker
Copy link
Owner

theRealProHacker commented Jan 17, 2025

Use https://github.com/mlc-ai/web-llm for web-based LLM-inference (see #6 (comment))

Use https://github.com/linuxscout/yaziji to generate random phrases, transliterate and correct them to get a ton of data.

@theRealProHacker theRealProHacker self-assigned this Jan 17, 2025
@theRealProHacker theRealProHacker added the enhancement New feature or request label Jan 17, 2025
@theRealProHacker
Copy link
Owner Author

Web-based LLM inference doesn't work on many browsers and takes too long to load.

What I tried:

import * as webllm from "https://esm.run/@mlc-ai/web-llm";

const initProgressCallback = (initProgress) => {
  console.log(initProgress);
 }
const selectedModel = "Llama-3.1-8B-Instruct-q4f32_1-MLC";

const engine = await webllm.CreateMLCEngine(
  selectedModel,
   { initProgressCallback: initProgressCallback }, 
);

const messages = [
  {role: 'system', content: 'Transliterating according to IJMES ...'},
  {role: 'user', content: text}
]
            
const reply = await engine.chat.completions.create({messages});

Maybe fewer parameters would allow for an acceptable loading time, but it makes sense to wait with web-based inference until it (i.e. web GPU) is fully adopted in all major browsers.

@theRealProHacker theRealProHacker linked a pull request Jan 22, 2025 that will close this issue
@theRealProHacker theRealProHacker linked a pull request Jan 22, 2025 that will close this issue
@theRealProHacker
Copy link
Owner Author

#7 instead uses the HuggingFace inference API from the backend.

The following aspects are very important:

  1. Instruct the model to answer succinctly/short. In the tested prompt, this is even repeated twice.
  2. For transliteration, there is no experimentation or creativity needed. There is really only one correct answer. Therefore, the temperature should be set to 0
  3. The Qwen 2.5 model performs very good for Arabic, which becomes apparent from the Open Arabic LLM Leaderboard. 72B parameters almost guarantee a solid performance in most cases.
from huggingface_hub import InferenceClient

client = InferenceClient(api_key="...")

messages = [
    {
        "role": "system",
        "content": "You are a transliterator that transliterates according to the IJMES standard. Your task is to transliterate as quickly and succinctly as possible. Don't explain anything, keep your answers as short as possible. "
    },
    {
        "role": "user",
        "content": text
    }
]

completion = client.chat.completions.create(
    model="Qwen/Qwen2.5-72B-Instruct", 
    messages=messages, 
    max_tokens=500,
    temperature=0
)

return completion.choices[0].message.content

@theRealProHacker
Copy link
Owner Author

For now, fine-tuning and gathering more data seems unnecessarily complicated. Or, to put it differently, the costs outweigh the benefits in light of the quite good performance of the current approach, at least for IJMES.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant