Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Code Addition Request]: Automate Workflows through Web Scraping (#738)
Fixes #736 ## Pull Request for PyVerse 💡 ### Requesting to submit a pull request to the PyVerse repository. --- #### Issue Title *Add Web Scraping Workflow Automation* - [YES] I have provided the issue title. --- #### Name *Sanchit Chauhan* - [YES] I have provided my name. --- #### GitHub ID *sanchitc05* - [YES] I have provided my GitHub ID. --- #### Email ID *[email protected]* - [YES] I have provided my email ID. --- #### Identify Yourself **Mention in which program you are contributing (e.g., WoB, GSSOC, SSOC, SWOC).** *GSSOC, HACKTOBERFEST* - [YES] I have mentioned my participant role. --- #### Closes *Closes: #736 * - [YES] I have provided the issue number. --- #### Describe the Add-ons or Changes You've Made *### **Description** This PR introduces an automated web scraping workflow to extract data from static and dynamic web pages. The solution uses `requests` and `BeautifulSoup` for static pages, and `Selenium` for dynamic content. The scraped data is logged for easy tracking and error management. This feature streamlines repetitive data collection tasks and enables automated scheduling for regular scraping. ### **Technical Implementation** - **Libraries Used**: - `requests`: Fetch web pages for static content. - `BeautifulSoup`: Parse and extract relevant data from HTML. - `Selenium`: Automate browser interaction for dynamic content. - **Logging Module**: Tracks activities and errors in `scraper.log`. - **Project Structure**: - `scraper.py`: Main script containing scraping logic. - `requirements.txt`: Dependency list for easy setup. ### **Usage** 1. Clone the repository and install dependencies: ```bash git clone https://github.com/yourusername/web_scraper.git cd web_scraper pip install -r requirements.txt ``` 2. Update `static_url` and `dynamic_url` variables in `scraper.py`. 3. Run the scraper: ```bash python scraper.py ``` 4. Check logs in `scraper.log` for activity status. ### **Benefits** - **Automates data collection**, saving time and effort. - **Handles dynamic content**, making it adaptable to complex websites. - **Error tracking** ensures smooth, continuous scraping. ### **Testing** - Successfully tested scraping both static and dynamic pages. - Verified proper logging of activities and error handling.* - [YES] I have described my changes. --- #### Type of Change **Select the type of change:** - [YES] Bug fix (non-breaking change which fixes an issue) - [YES] New feature (non-breaking change which adds functionality) - [YES] Code style update (formatting, local variables) - [YES] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [YES] This change requires a documentation update --- #### How Has This Been Tested? **Describe how your changes have been tested.** *Describe your testing process here.* - [YES] I have described my testing process. --- #### Checklist **Please confirm the following:** - [YES] My code follows the guidelines of this project. - [YES] I have performed a self-review of my code. - [YES] I have commented on my code, particularly wherever it was hard to understand. - [YES] I have made corresponding changes to the documentation. - [YES] My changes generate no new warnings. - [YES] I have added things that prove my fix is effective or that my feature works. - [NO] Any dependent changes have been merged and published in downstream modules.
- Loading branch information