Skip to content

Experiments with web crawling, scraping, and indexing a collection of web documents. Clustering the indexed data with k-means algorithm. Each resulting cluster is assigned a sentiment score using AFINN - a sentiment analysis script.

Notifications You must be signed in to change notification settings

chihiroanihr/COMP479-Fall2022-P4

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COMP479-Fall2022-P4

Information Retrieval and Web Search course project at Concordia University - assigned by Dr. Sabine Bergler.

Overview

COMP479 Project (P4) which experiments with web crawling, web scraping, and indexing a collection of web documents. Subsequently, the indexed data is clustered using the k-means algorithm. Each resulting cluster is then assigned a sentiment score using AFINN - a script used for sentiment analysis.

For the original project outline (assigned by Dr. Sabine Bergler), click here.

Resources

Programming Language

Built with Python

Python>=3.8 is used as a programming language for this project due to its compatibility with natural language processing tasks, facilitated by the NLTK package.

Dependencies

  • beautifulsoup4
  • scipy
  • afinn
  • scikit-learn
  • TfidfVectorizer
  • KMeans
  • reppy
  • urllib3

Crawling Tool

About

Experiments with web crawling, scraping, and indexing a collection of web documents. Clustering the indexed data with k-means algorithm. Each resulting cluster is assigned a sentiment score using AFINN - a sentiment analysis script.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages