The strength of RAG lies in giving models external knowledge. But its weakness is that the retrieved content may end up unreliable, and current LLMs treat all context as equally valid.
With Finetune-RAG, we train models to reason selectively and identify trustworthy context to generate responses that avoid factual errors, even in the presence of misleading input.
We release:
A dataset of 1,600+ dual-context examples Fine-tuned checkpoints for LLaMA 3.1-8B-Instruct Bench-RAG: a GPT-4o evaluation framework scoring accuracy, helpfulness, relevance, and depth
Our resources:
Codebase: https://github.com/Pints-AI/Finetune-Bench-RAG Dataset: https://huggingface.co/datasets/pints-ai/Finetune-RAG Paper: https://arxiv.org/abs/2505.10792v2
submitted by /u/zpdeaccount to r/learnmachinelearning
[link] [comments]
Laisser un commentaire