Skip to content

Commit

Permalink
Confab + evals
Browse files Browse the repository at this point in the history
  • Loading branch information
rmusser01 committed Nov 18, 2024
1 parent 2b733b3 commit 2af9491
Show file tree
Hide file tree
Showing 2 changed files with 36 additions and 10 deletions.
41 changes: 33 additions & 8 deletions Docs/Citations_and_Confabulations.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,6 @@
2. [Confabulations](#confabulations)
3. [References](#references)





RAG
https://www.lycee.ai/blog/rag-ragallucinations-and-how-to-fight-them

Expand All @@ -18,26 +14,42 @@ Attributions
Benchmarks
https://github.com/lechmazur/confabulations/
https://huggingface.co/spaces/vectara/Hallucination-evaluation-leaderboard
https://huggingface.co/spaces/hallucinations-leaderboard/leaderboard
https://osu-nlp-group.github.io/AttributionBench/


Research
https://github.com/EdinburghNLP/awesome-hallucination-detection
https://github.com/EdinburghNLP/awesome-hallucination-detection
https://arxiv.org/abs/2407.13481
https://arxiv.org/abs/2408.06195
https://arxiv.org/abs/2407.19813
https://arxiv.org/abs/2407.16557
https://arxiv.org/abs/2407.16604
https://thetechoasis.beehiiv.com/p/eliminating-hallucinations-robots-imitate-us
https://arxiv.org/abs/2407.19825
https://arxiv.org/pdf/2406.02543
https://arxiv.org/abs/2406.10279
https://arxiv.org/pdf/2409.18475
https://llm-editing.github.io/
https://arxiv.org/pdf/2407.03651
https://cleanlab.ai/blog/trustworthy-language-model/
https://arxiv.org/abs/2408.07852
Detecting Hallucinations
https://arxiv.org/abs/2410.22071
https://arxiv.org/abs/2410.02707

Reflective thinking
https://arxiv.org/html/2404.09129v1
https://github.com/yanhong-lbh/LLM-SelfReflection-Eval
Semantic Entropy
https://www.nature.com/articles/s41586-024-07421-0
https://arxiv.org/abs/2406.15927
HALVA
https://research.google/blog/halva-hallucination-attenuated-language-and-vision-assistant/


Finetuning:
- https://eugeneyan.com/writing/finetuning/
-

Evals:
- https://github.com/yanhong-lbh/LLM-SelfReflection-Eval
Expand All @@ -49,7 +61,21 @@ LLM As Judge:
https://arxiv.org/pdf/2404.12272
https://arize.com/blog/breaking-down-evalgen-who-validates-the-validators/
https://huggingface.co/vectara/hallucination_evaluation_model
https://arxiv.org/pdf/2404.12272
https://arize.com/blog/breaking-down-evalgen-who-validates-the-validators/


Long context generation
https://arxiv.org/pdf/2408.15518
https://arxiv.org/pdf/2408.14906
https://arxiv.org/pdf/2408.15496
https://arxiv.org/pdf/2408.11745
https://arxiv.org/pdf/2407.14482
https://arxiv.org/pdf/2407.09450
https://arxiv.org/pdf/2407.14057
https://www.turingpost.com/p/longrag
https://www.turingpost.com/p/deepseek
https://arxiv.org/pdf/2408.07055

- Detecting Hallucinations using Semantic Entropy:
- https://www.nature.com/articles/s41586-024-07421-0
Expand All @@ -66,8 +92,7 @@ Lynx/patronus
- https://github.com/NVIDIA/NeMo-Guardrails/blob/develop/examples/configs/patronusai/prompts.yml
- https://github.com/NVIDIA/NeMo-Guardrails/blob/develop/docs/user_guides/community/patronus-lynx.md
- https://huggingface.co/PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct-Q4_K_M-GGUF
- https://www.patronus.ai/blog/lynx-state-of-the-art-open-source-hallucination-detection-model

- https://arxiv.org/abs/2407.08488

----------------------------------------------------------------------------------------------------------------
### <a name="citations"></a> Citations
Expand Down
5 changes: 3 additions & 2 deletions Docs/Evaluation_Plans.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
- [VLM Evaluations](#vlm-evals)
----------------------------------------------------------------------------------------------------------------


https://eugeneyan.com/writing/evals/
Benchmarking with distilabel
https://distilabel.argilla.io/latest/sections/pipeline_samples/examples/benchmarking_with_distilabel/

Expand Down Expand Up @@ -250,6 +250,7 @@ Finetuning
- https://stackoverflow.com/questions/9879276/how-do-i-evaluate-a-text-summarization-tool
- https://github.com/confident-ai/deepeval/tree/99aae8ebc09093b8691c7bd6791f6927385cafa8/deepeval/metrics/summarization
- https://www.confident-ai.com/blog/a-step-by-step-guide-to-evaluating-an-llm-text-summarization-task
- https://arxiv.org/abs/2009.01325
- https://arxiv.org/abs/2407.01370v1
- https://arxiv.org/html/2403.19889v1
- https://github.com/salesforce/summary-of-a-haystack
Expand Down Expand Up @@ -334,7 +335,7 @@ Retrieval Granularity

----------------------------------------------------------------------------------------------------------------
### <a name="rag-eval"></a> RAG Evaluation

https://blog.streamlit.io/ai21_grounded_multi_doc_q-a/
https://archive.is/OtPVh
https://towardsdatascience.com/how-to-create-a-rag-evaluation-dataset-from-documents-140daa3cbe71
- **101**
Expand Down

0 comments on commit 2af9491

Please sign in to comment.