Skip to content

Commit

Permalink
Merge pull request #433 from rmusser01/dev
Browse files Browse the repository at this point in the history
Eval Plans writeup
  • Loading branch information
rmusser01 authored Nov 18, 2024
2 parents c9ede3f + 2af9491 commit a8ff2b5
Show file tree
Hide file tree
Showing 3 changed files with 711 additions and 52 deletions.
119 changes: 119 additions & 0 deletions Docs/Citations_and_Confabulations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Citations & Confabulations

## Table of Contents
1. [Citations](#citations)
2. [Confabulations](#confabulations)
3. [References](#references)

RAG
https://www.lycee.ai/blog/rag-ragallucinations-and-how-to-fight-them

Attributions
https://github.com/aws-samples/llm-based-advanced-summarization/blob/main/detect_attribution.ipynb

Benchmarks
https://github.com/lechmazur/confabulations/
https://huggingface.co/spaces/vectara/Hallucination-evaluation-leaderboard
https://huggingface.co/spaces/hallucinations-leaderboard/leaderboard
https://osu-nlp-group.github.io/AttributionBench/


Research
https://github.com/EdinburghNLP/awesome-hallucination-detection
https://arxiv.org/abs/2407.13481
https://arxiv.org/abs/2408.06195
https://arxiv.org/abs/2407.19813
https://arxiv.org/abs/2407.16557
https://arxiv.org/abs/2407.16604
https://thetechoasis.beehiiv.com/p/eliminating-hallucinations-robots-imitate-us
https://arxiv.org/abs/2407.19825
https://arxiv.org/pdf/2406.02543
https://arxiv.org/abs/2406.10279
https://arxiv.org/pdf/2409.18475
https://llm-editing.github.io/
https://arxiv.org/pdf/2407.03651
https://cleanlab.ai/blog/trustworthy-language-model/
https://arxiv.org/abs/2408.07852
Detecting Hallucinations
https://arxiv.org/abs/2410.22071
https://arxiv.org/abs/2410.02707
Reflective thinking
https://arxiv.org/html/2404.09129v1
https://github.com/yanhong-lbh/LLM-SelfReflection-Eval
Semantic Entropy
https://www.nature.com/articles/s41586-024-07421-0
https://arxiv.org/abs/2406.15927
HALVA
https://research.google/blog/halva-hallucination-attenuated-language-and-vision-assistant/


Finetuning:
- https://eugeneyan.com/writing/finetuning/
-

Evals:
- https://github.com/yanhong-lbh/LLM-SelfReflection-Eval
- https://eugeneyan.com/writing/evals/
https://github.com/confident-ai/deepeval/tree/99aae8ebc09093b8691c7bd6791f6927385cafa8/deepeval/metrics/hallucination


LLM As Judge:
https://arxiv.org/pdf/2404.12272
https://arize.com/blog/breaking-down-evalgen-who-validates-the-validators/
https://huggingface.co/vectara/hallucination_evaluation_model
https://arxiv.org/pdf/2404.12272
https://arize.com/blog/breaking-down-evalgen-who-validates-the-validators/


Long context generation
https://arxiv.org/pdf/2408.15518
https://arxiv.org/pdf/2408.14906
https://arxiv.org/pdf/2408.15496
https://arxiv.org/pdf/2408.11745
https://arxiv.org/pdf/2407.14482
https://arxiv.org/pdf/2407.09450
https://arxiv.org/pdf/2407.14057
https://www.turingpost.com/p/longrag
https://www.turingpost.com/p/deepseek
https://arxiv.org/pdf/2408.07055

- Detecting Hallucinations using Semantic Entropy:
- https://www.nature.com/articles/s41586-024-07421-0
- https://github.com/jlko/semantic_uncertainty
- https://github.com/jlko/long_hallucinations
- https://arxiv.org/abs/2406.15927

Lynx/patronus
- https://www.patronus.ai/blog/lynx-state-of-the-art-open-source-hallucination-detection-model
- https://huggingface.co/PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct-Q4_K_M-GGUF
- https://github.com/NVIDIA/NeMo-Guardrails/blob/develop/docs/user_guides/community/patronus-lynx.md
- https://github.com/NVIDIA/NeMo-Guardrails/blob/develop/examples/configs/patronusai/prompts.yml
- https://arxiv.org/html/2407.08488v1
- https://github.com/NVIDIA/NeMo-Guardrails/blob/develop/examples/configs/patronusai/prompts.yml
- https://github.com/NVIDIA/NeMo-Guardrails/blob/develop/docs/user_guides/community/patronus-lynx.md
- https://huggingface.co/PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct-Q4_K_M-GGUF
- https://arxiv.org/abs/2407.08488

----------------------------------------------------------------------------------------------------------------
### <a name="citations"></a> Citations
- **101**
- Unsorted
- https://mattyyeung.github.io/deterministic-quoting#7-conclusion-is-this-really-ready-for-healthcare

----------------------------------------------------------------------------------------------------------------



----------------------------------------------------------------------------------------------------------------
### <a name="confabulations"></a> Confabulations


----------------------------------------------------------------------------------------------------------------



----------------------------------------------------------------------------------------------------------------
### <a name="references"></a> References


----------------------------------------------------------------------------------------------------------------
Loading

0 comments on commit a8ff2b5

Please sign in to comment.