This repository contains a collection of small assignments designed to introduce the core concepts and techniques in Natural Language Processing (NLP). Each notebook showcases a specific NLP task, and the projects are structured to help build foundational understanding in the field.
1. N_grams.ipynb
Implements trigram models using the corpus_for_language_model.txt file.
Trains n-grams and calculates sentence probabilities using various techniques:
-- Maximum Likelihood Estimation (MLE)
-- Laplace Smoothing
-- Katz Backoff
Computes Positive Pointwise Mutual Information (PPMI) for word pairs based on the corpus.
Leverages GloVe embeddings to find the most similar words. GloVe embeddings are downloaded and extracted using the following commands:
!wget http://nlp.stanford.edu/data/glove.42B.300d.zip
!unzip glove.42B.300d.zip
2. Emotions_Sentiment_Analysis.ipynb
This notebook focuses on a variety of NLP tasks using both classical and neural approaches:
(i) Classical Part-of-Speech Tagging
Uses the Spacy library to perform statistical Part-of-Speech (POS) tagging. install: python -m spacy download en_core_web_sm
(ii) Neural Part-of-Speech Tagging
Utilizes the Transformers library for neural POS tagging. Model :QCRI/bert-base-multilingual-cased-pos- english (a multilingual BERT model trained on the Penn Treebank dataset). Tags sentences using a neural network-based BERT model.
(iii) Neural Sentiment Analysis
Uses the Transformers library for sentiment analysis on tweets. Model: "cardiffnlp/twitter-xlm-roberta- base-sentiment" (an XLM-R model fine-tuned for multilingual sentiment analysis). Performs sentiment analysis on Twitter data.
(iv) Neural Emotion Detection
Uses the Transformers library for emotion detection on tweets. Model: "mrm8488/t5-base-finetuned- emotion" (a T5 model fine-tuned for emotion recognition). Predicts the emotion in each tweet based on Google's T5 architecture.
3. Neural_QA,_Summarization_and_Dependency_Parsing.ipynb
(i) Neural Question Answering: Using the Hugging Face Transformers library to load and implement various question-answering models, we answer specific questions based on provided contexts. Each model is assessed for its ability to generate accurate answers, and the code is structured to allow easy experimentation with different models.
(ii) Neural Summarization: Utilizing Transformers for abstractive summarization, we apply multiple models to condense long passages into concise summaries. Each model is evaluated for effectiveness in capturing the main points of the text while maintaining coherence and readability.
(iii) Dependency Parsing with SpaCy: Using the SpaCy library, we explore syntactic relationships within sentences by performing dependency parsing with different SpaCy models. This analysis helps understand the grammatical structure of sentences and the relationships between words.
Implements trigram models using the corpus_for_language_model.txt file.
Trains n-grams and calculates sentence probabilities using various techniques:
-- Maximum Likelihood Estimation (MLE)
-- Laplace Smoothing
-- Katz Backoff
Computes Positive Pointwise Mutual Information (PPMI) for word pairs based on the corpus.
Leverages GloVe embeddings to find the most similar words. GloVe embeddings are downloaded and extracted using the following commands:
!wget http://nlp.stanford.edu/data/glove.42B.300d.zip
!unzip glove.42B.300d.zip
2. Emotions_Sentiment_Analysis.ipynb
This notebook focuses on a variety of NLP tasks using both classical and neural approaches:
(i) Classical Part-of-Speech Tagging
Uses the Spacy library to perform statistical Part-of-Speech (POS) tagging. install: python -m spacy download en_core_web_sm
(ii) Neural Part-of-Speech TaggingUtilizes the Transformers library for neural POS tagging. Model :QCRI/bert-base-multilingual-cased-pos- english (a multilingual BERT model trained on the Penn Treebank dataset). Tags sentences using a neural network-based BERT model.
(iii) Neural Sentiment AnalysisUses the Transformers library for sentiment analysis on tweets. Model: "cardiffnlp/twitter-xlm-roberta- base-sentiment" (an XLM-R model fine-tuned for multilingual sentiment analysis). Performs sentiment analysis on Twitter data.
(iv) Neural Emotion DetectionUses the Transformers library for emotion detection on tweets. Model: "mrm8488/t5-base-finetuned- emotion" (a T5 model fine-tuned for emotion recognition). Predicts the emotion in each tweet based on Google's T5 architecture.
3. Neural_QA,_Summarization_and_Dependency_Parsing.ipynb
(i) Neural Question Answering: Using the Hugging Face Transformers library to load and implement various question-answering models, we answer specific questions based on provided contexts. Each model is assessed for its ability to generate accurate answers, and the code is structured to allow easy experimentation with different models.
(ii) Neural Summarization: Utilizing Transformers for abstractive summarization, we apply multiple models to condense long passages into concise summaries. Each model is evaluated for effectiveness in capturing the main points of the text while maintaining coherence and readability.
(iii) Dependency Parsing with SpaCy: Using the SpaCy library, we explore syntactic relationships within sentences by performing dependency parsing with different SpaCy models. This analysis helps understand the grammatical structure of sentences and the relationships between words.