Improving Gemma 2 for a Specific Language on a Budget: Post-Training Recipe
Additional resources for Gemma Neogenesis, a ๐ Kaggle notebook for improving Gemma 2 for a specific language on a budget. The notebook participates to the Kaggle competition: Google - Unlock Global Communication with Gemma.
The notebook demonstrates a case study on improving Gemma 2 2B's performance in Italian through Post-Training, combining Supervised Fine Tuning and Preference Tuning. The process uses both existing datasets and synthetic data generated specifically for this competition. While focused on Italian, the cost-effective methods demonstrated can inspire similar fine-tuning approaches for other languages.
- ๐ Evaluation Prompts: prompts for evaluating the quality of translated instructions and responses, using an LLM as a Judge, in the context of LLM-aided translation.
- ๐๏ธ Qualitative Evaluation/Vibe Checking: qualitative evaluation of the model, compared to gemma-2-2b-it on about10 varied questions/tasks.
- ๐โ๏ธ Scale Translation: code for scaling LLM-aided-translation.
- ๐ฏ Spectrum results: results of the Signal to Noise Ratio analysis done with Spectrum.
- ๐ References: curated collection of resources and references used in the notebook.
- ๐ผ๏ธ Images.