I am a Ph.D. student at Penn State University, advised by Dr. Rui Zhang. I’m interested in building reliable and trustworthy NLP systems.
[Personal Website] [Google Scholar] [Semantic Scholar]
-
VisOnlyQA [huggingface dataset] [code]
- Paper: VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information
- Dataset for evaluating visual perception capabilities of LVLMs on geometric and numerical information about scientific figures
-
ReaLMistake [huggingface dataset] [code]
- Paper: Evaluating LLMs at Detecting Errors in LLM Responses (COLM 2024)
- Benchmark for evaluating error detection methods that detect mistakes in LLM responses
- Expert error annotations on responses from GPT-4 and Llama 2 70B on three tasks
-
WiCE [dataset and code]
- Paper: WiCE: Real-World Entailment for Claims in Wikipedia (EMNLP2023)
- Dataset for document-level NLI
- Fine-grained textual entailment dataset built on pairs of natural claims and evidence extracted from Wikipedia
- When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs (TACL 2024)
- Paper list on self-correction of LLMs: https://github.com/ryokamoi/llm-self-correction-papers
- Shortcomings of Question Answering Based Factuality Frameworks for Error Localization [human annotation]