Welcome to the Natural Language Processing course repository offered at the University of Tehran. This repository contains code for assignments and projects completed during the course. The course by:
This NLP course offers a comprehensive curriculum covering various essential topics. Students will learn the basics of text processing, including Regular Expressions and Text Normalization for pattern matching and text preparation. They will explore Morphology to understand word structure, Tokenization for breaking text into meaningful units, and Edit Distance and Spell Correction techniques for identifying and correcting spelling errors.
The course includes practical applications such as Language Modeling with N-Grams, Naive Bayes Classification, and Sentiment Analysis. Students will also delve into Logistic Regression, gaining valuable insights into its applications in NLP.
Further, the course delves into Lexical and Vector Semantics, helping students understand word meaning and relationships. Advanced topics like Neural Nets and Neural Language Models, Sequence Labeling for Parts of Speech and Named Entities, and Deep Learning Architectures for Sequence Processing will equip students with modern NLP techniques.
Word Senses and WordNet will enable students to work with word sense disambiguation, and Encoder-Decoder models, attention, and LSTM will be taught for sequence-to-sequence tasks. Transformers and Contextual Word Embeddings will be covered, along with Transforms and Transfer Learning using models like MBERT, XLMR, GPT, T5, and MT5.
The course will also touch on Statistical Machine Translation and Neural Machine Translation, as well as Constituency Grammars, Parsing, and Dialogue Systems including chatbots. Additionally, Information Extraction (NER, RE), Question Answering, and Logical Representations of Sentence Meaning will be explored, offering a comprehensive understanding of NLP applications and techniques.
Please find below a brief overview of the contents of this repository:
HW1/
: This directory contains code for Assignment 1, which focuses on n-grams and different methods of tokenization.HW2/
: This directory contains code for Assignment 2, which focuses on Sentiment analysis using Naive Bayes and Logistic Regression, and training word2vec.HW3/
: This directory contains code for Assignment 3, which focuses on Sentiment analysis using LSTM, RNN, and GRU.HW4/
: This directory contains code for Assignment 4, which focuses on Zero-shot learning and fine-tuning ParsBERT for the task of natural language inference on FarsTail.HW5/
: This directory contains code for Assignment 5, which focuses on machine translation using Fairseq. We train this model.HW6/
: This directory contains code for Assignment 6, which focuses on training a chatbot. In this task, we train Rasa bot to answer FAQ questions for a ticket-selling company.
This repository is for archival and reference purposes only. The code here might not be updated or maintained. Use it at your own discretion.