Web-Scraping-and-Text-Analysis-Project

Objective

The goal of this project is to extract textual data from articles using provided URLs and perform text analysis to compute various metrics.

Approach

Web Scraping:
- Used the BeautifulSoup library for web scraping.
- Extracted data from the first URL and converted it into a string, then a list of words.
Data Manipulation:
- Converted the extracted data into a pandas DataFrame and further into a NumPy array for manipulation.
Text Analysis:
- Created a TextAnalysis class inside the TextAnalysis.py file.
- Defined class attributes for StopWords, PositiveWords, and NegativeWords to use across all URL data.
- Methods were created to:
  - Load stop words, positive words, and negative words from files.
  - Extract, clean, and analyze the text data.
- Employed exception handling during data extraction.
- Managed the sequence of methods to ensure dependent variables like word count are assigned first.
Automation:
- Created a Main.py file that imports the TextAnalysis class and performs analysis on each URL iteratively.
- Results are stored in a dictionary and exported to an Output.xlsx file.

Setup

Requirements

Ensure all required libraries are installed by running:
```
pip3 install -r requirements.txt ```
```

Libraries Used

Requests
Bs4 (BeautifulSoup)
pandas
NLTK
Openpyxl

Operating System

MacOS

How to Run the Project

Clone the repository:

git clone https://github.com/samarth-jain28/Web-Scraping-and-Text-Analysis-Project/ ```

Navigate to the project directory:

cd Web-Scraping-and-Text-Analysis-Project

pip3 install -r requirements.txt:
```
 pip3 install -r requirements.txt
```
Run the Main.py file to start the analysis:
```
 python3 Main.py
```

Output

The results will be saved in Output.xlsx within the project directory.

Future Enhancements

Add support for additional languages in text analysis.
Implement sentiment analysis using machine learning models.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
MasterDictionary		MasterDictionary
StopWords		StopWords
Input.xlsx		Input.xlsx
Output.xlsx		Output.xlsx
Readme.md		Readme.md
TextAnalysis.py		TextAnalysis.py
main.py		main.py
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web-Scraping-and-Text-Analysis-Project

Objective

Approach

Setup

Requirements

Libraries Used

Operating System

How to Run the Project

Output

Future Enhancements

About

Releases

Packages

Languages

samarth-jain28/Web-Scraping-and-Text-Analysis-Project

Folders and files

Latest commit

History

Repository files navigation

Web-Scraping-and-Text-Analysis-Project

Objective

Approach

Setup

Requirements

Libraries Used

Operating System

How to Run the Project

Output

Future Enhancements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages