Skip to content

Commit

Permalink
update fine-tuning in/thru img link for rag app comm sys article
Browse files Browse the repository at this point in the history
  • Loading branch information
robertdhayanturner committed Aug 23, 2024
1 parent 834fa59 commit 850e479
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/articles/rag-application-communication-system.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ A good fine-tuning dataset, though it requires a significant amount of careful m

Preparation of the instruction dataset and base model improvement should be your main focus; these have the most impact on performance. I don't spend much time optimizing the training design beyond a few hyperparameters (learning rate, batch size, etc.). I've also generally stopped looking into preference fine-tuning (like DPO); the time spent was not worth the very few improvement points.

![Fine-tuning for/through RAG](../assets/use_cases/rag_application_communication/fine-tuning-for-rag.png)
![Fine-tuning for/through RAG](../assets/use_cases/rag_application_communication/fine-tuning-thru-rag.png)

While it's far less common, you can also apply this approach (i.e., fine-tuning your instruction dataset using RAG-generated synthetic data) [to embedding models](https://huggingface.co/blog/davanstrien/synthetic-similarity-datasets). Synthetic data makes it considerably easier to create an instruction dataset that maps the expected format of the similarity dataset (including queries and “hard negatives”). Fine-tuning your embedding models with synthetic data will confer the same benefits as LLM fine-tuning: cost savings (a much smaller model that demonstrates the same level of performance as a big one) and appropriateness, by bringing the “similarity” score closer to the expectations of your retrieval system.

Expand Down

0 comments on commit 850e479

Please sign in to comment.