Merge pull request #433 from rmusser01/dev

Eval Plans writeup
rmusser01 · Nov 18, 2024 · a8ff2b5 · a8ff2b5
2 parents c9ede3f + 2af9491
commit a8ff2b5
Show file tree

Hide file tree

Showing 3 changed files with 711 additions and 52 deletions.
diff --git a/Docs/Citations_and_Confabulations.md b/Docs/Citations_and_Confabulations.md
@@ -0,0 +1,119 @@
+# Citations & Confabulations
+
+## Table of Contents
+1. [Citations](#citations)
+2. [Confabulations](#confabulations)
+3. [References](#references)
+
+RAG
+  https://www.lycee.ai/blog/rag-ragallucinations-and-how-to-fight-them
+
+Attributions
+  https://github.com/aws-samples/llm-based-advanced-summarization/blob/main/detect_attribution.ipynb
+
+Benchmarks
+  https://github.com/lechmazur/confabulations/
+  https://huggingface.co/spaces/vectara/Hallucination-evaluation-leaderboard
+  https://huggingface.co/spaces/hallucinations-leaderboard/leaderboard
+  https://osu-nlp-group.github.io/AttributionBench/
+
+
+Research
+  https://github.com/EdinburghNLP/awesome-hallucination-detection
+  https://arxiv.org/abs/2407.13481
+  https://arxiv.org/abs/2408.06195
+  https://arxiv.org/abs/2407.19813
+  https://arxiv.org/abs/2407.16557
+  https://arxiv.org/abs/2407.16604
+  https://thetechoasis.beehiiv.com/p/eliminating-hallucinations-robots-imitate-us
+  https://arxiv.org/abs/2407.19825
+  https://arxiv.org/pdf/2406.02543
+  https://arxiv.org/abs/2406.10279
+  https://arxiv.org/pdf/2409.18475
+  https://llm-editing.github.io/
+  https://arxiv.org/pdf/2407.03651
+  https://cleanlab.ai/blog/trustworthy-language-model/
+  https://arxiv.org/abs/2408.07852
+  Detecting Hallucinations
+    https://arxiv.org/abs/2410.22071
+    https://arxiv.org/abs/2410.02707
+  Reflective thinking
+    https://arxiv.org/html/2404.09129v1
+    https://github.com/yanhong-lbh/LLM-SelfReflection-Eval
+  Semantic Entropy
+    https://www.nature.com/articles/s41586-024-07421-0
+    https://arxiv.org/abs/2406.15927
+  HALVA
+    https://research.google/blog/halva-hallucination-attenuated-language-and-vision-assistant/
+
+
+Finetuning: 
+- https://eugeneyan.com/writing/finetuning/
+- 
+
+Evals:
+- https://github.com/yanhong-lbh/LLM-SelfReflection-Eval
+  - https://eugeneyan.com/writing/evals/
+  https://github.com/confident-ai/deepeval/tree/99aae8ebc09093b8691c7bd6791f6927385cafa8/deepeval/metrics/hallucination
+
+
+LLM As Judge:
+  https://arxiv.org/pdf/2404.12272
+  https://arize.com/blog/breaking-down-evalgen-who-validates-the-validators/
+  https://huggingface.co/vectara/hallucination_evaluation_model
+  https://arxiv.org/pdf/2404.12272
+  https://arize.com/blog/breaking-down-evalgen-who-validates-the-validators/
+
+
+Long context generation
+  https://arxiv.org/pdf/2408.15518
+  https://arxiv.org/pdf/2408.14906
+  https://arxiv.org/pdf/2408.15496
+  https://arxiv.org/pdf/2408.11745
+  https://arxiv.org/pdf/2407.14482
+  https://arxiv.org/pdf/2407.09450
+  https://arxiv.org/pdf/2407.14057
+  https://www.turingpost.com/p/longrag
+  https://www.turingpost.com/p/deepseek
+  https://arxiv.org/pdf/2408.07055
+
+- Detecting Hallucinations using Semantic Entropy:
+- https://www.nature.com/articles/s41586-024-07421-0
+- https://github.com/jlko/semantic_uncertainty
+- https://github.com/jlko/long_hallucinations
+- https://arxiv.org/abs/2406.15927
+
+Lynx/patronus
+- https://www.patronus.ai/blog/lynx-state-of-the-art-open-source-hallucination-detection-model
+- https://huggingface.co/PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct-Q4_K_M-GGUF
+- https://github.com/NVIDIA/NeMo-Guardrails/blob/develop/docs/user_guides/community/patronus-lynx.md
+- https://github.com/NVIDIA/NeMo-Guardrails/blob/develop/examples/configs/patronusai/prompts.yml
+- https://arxiv.org/html/2407.08488v1
+- https://github.com/NVIDIA/NeMo-Guardrails/blob/develop/examples/configs/patronusai/prompts.yml
+- https://github.com/NVIDIA/NeMo-Guardrails/blob/develop/docs/user_guides/community/patronus-lynx.md
+- https://huggingface.co/PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct-Q4_K_M-GGUF
+- https://arxiv.org/abs/2407.08488
+
+----------------------------------------------------------------------------------------------------------------
+### <a name="citations"></a> Citations
+- **101**
+- Unsorted
+    - https://mattyyeung.github.io/deterministic-quoting#7-conclusion-is-this-really-ready-for-healthcare
+
+----------------------------------------------------------------------------------------------------------------
+
+
+
+----------------------------------------------------------------------------------------------------------------
+### <a name="confabulations"></a> Confabulations
+
+
+----------------------------------------------------------------------------------------------------------------
+
+
+
+----------------------------------------------------------------------------------------------------------------
+### <a name="references"></a> References
+
+
+----------------------------------------------------------------------------------------------------------------