Skip to content

UL-FRI-NLP-2023-2024/ul-fri-nlp-course-project-processingbit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Natural language processing course 2023/24: ProcessingBIT

Qualitative discourse analysis is crucial for social scientists studying human interaction. This project leverages large language models (LLMs) to enhance qualitative discourse analysis, a task traditionally requiring high inter-rater reliability among human coders. This is an exceedingly labor-intensive task, requiring human coders to fully understand the discussion context, consider each participant’s perspective, and comprehend the sentence’s associations with the previous discussion, as well as shared general knowledge.

The goal is to develop a model capable of categorizing postings in online discussions, such as those in a corpus discussing "The Lady, or the Tiger?" story, but capable of generalization.

Our approach incorporates multiple features to identify topic shifts driven by individual users. We fine-tuned multiple LLMs models, like LLama and Mistral, using the LoRA technique to optimize training efficiency and defined a generic prompt, adaptable for both models, that includes chat history, context from relevant articles or stories, and a codebook of labels and examples.

Finally, an ensemble approach combined predictions from multiple models, with the final model using few-shot learning to select the best prediction. To ensure explainability, we generated textual explanations with LLaMA, making the model's decisions accessible to non-expert users while avoiding hallucinations.

About

ul-fri-nlp-course-project-processingbit created by GitHub Classroom

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published