A Python script for web scraping that extracts various types of data (links, email addresses, social media links, author names, and phone numbers) from a given URL. It provides both terminal and file-based output options.
Before running the script, you need to have Python 3.x installed on your system. If you don't have it, you can download it from the official Python website.
-
Clone this repository to your local machine using Git or download it as a ZIP archive and extract it.
-
Navigate to the project directory:
cd web-scraper-project
-
Install the required Python dependencies using pip:
pip install -r requirements.txt
-
Run the
urlScrapper.py
script:python urlScrapper.py
-
Enter the URL you want to scrape when prompted.
-
Choose the output format:
- Enter
1
for terminal output. - Enter
2
to save the data to a file.
- Enter
If you choose terminal output (1
), the script will display the following information in the terminal:
- Extracted Links
- Extracted Email Addresses
- Extracted Facebook Links
- Extracted Author Names
- Extracted Phone Numbers
If you choose file output (2
), you will be prompted to enter a filename. The script will save the extracted data to a file with the following structure:
- Scraped data from [URL]
- Links
- Email Addresses
- Facebook Links
- Author Names
- Phone Numbers
This project relies on the following Python libraries, which are listed in the requirements.txt
file:
beautifulsoup4
: Used for parsing HTML content.requests
: Used for making HTTP requests to fetch HTML content.colorama
: Used for terminal text color formatting.
You can install these dependencies using the pip install -r requirements.txt
command, as mentioned in the installation instructions.
Contributions are welcome! If you have any improvements, bug fixes, or new features to add, please open an issue or create a pull request. See CONTRIBUTING.md for more details.
This project is licensed under the MIT License - see the LICENSE file for details.
- This project was created by Togeee12.
- Special thanks to the developers of the Python libraries used in this project.