ryokamoi

Follow

🍧

Ryo Kamoi ryokamoi

🍧

Follow

PhD Student at Penn State University. Building trustworthy and reliable NLP systems.

54 followers · 38 following

Penn State University
State College
https://ryokamoi.github.io
@ryokamoi
https://scholar.google.com/citations?user=4OWTLKAAAAAJ&hl=en

Achievements

Achievements

Highlights

Pro

ryokamoi/README.md

I am a Ph.D. student at Penn State University, advised by Dr. Rui Zhang. I’m interested in building reliable and trustworthy NLP systems.

[Personal Website] [Google Scholar] [Semantic Scholar]

Datasets

VisOnlyQA [huggingface dataset] [code]
- Paper: VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information
- Dataset for evaluating visual perception capabilities of LVLMs on geometric and numerical information about scientific figures
ReaLMistake [huggingface dataset] [code]
- Paper: Evaluating LLMs at Detecting Errors in LLM Responses (COLM 2024)
- Benchmark for evaluating error detection methods that detect mistakes in LLM responses
- Expert error annotations on responses from GPT-4 and Llama 2 70B on three tasks
WiCE [dataset and code]
- Paper: WiCE: Real-World Entailment for Claims in Wikipedia (EMNLP2023)
- Dataset for document-level NLI
- Fine-grained textual entailment dataset built on pairs of natural claims and evidence extracted from Wikipedia

Survey

When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs (TACL 2024)
- Paper list on self-correction of LLMs: https://github.com/ryokamoi/llm-self-correction-papers

Other Resources

Shortcomings of Question Answering Based Factuality Frameworks for Error Localization [human annotation]
- Paper: Shortcomings of Question Answering Based Factuality Frameworks for Error Localization (EACL2023)

Pinned Loading

psunlpgroup/VisOnlyQA psunlpgroup/VisOnlyQA Public

This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information"

Python 13
psunlpgroup/ReaLMistake psunlpgroup/ReaLMistake Public

This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".

Python 27 3
wice wice Public

This repository contains the dataset and code for "WiCE: Real-World Entailment for Claims in Wikipedia" in EMNLP 2023.

Python 40 1
llm-self-correction-papers llm-self-correction-papers Public

List of papers on Self-Correction of LLMs.

68 2
QA-metrics-human-annotation QA-metrics-human-annotation Public

Human Generated Questions in ''Shortcomings of Question Answering Based Factuality Frameworks for Error Localization'' (EACL2023)

1
llm_models_bib llm_models_bib Public

BibTex files for LLMs and LVLMs

TeX