Mechanism Institute Library Scraper

This Python script automates the process of scraping data from the Mechanism Institute's library. It uses Selenium to control a Brave browser instance and extract data from web pages. The data is saved in both CSV and JSON formats.

Features

Automatically scrolls to the bottom of the page to ensure all elements are loaded.
Scrapes data and saves it to both CSV and JSON files.
Configurable via environment variables, allowing easy setup on different machines.

Prerequisites

Before running the script, ensure you have the following installed:

Python 3.8 or higher
Brave Browser
Google ChromeDriver (compatible with your Brave version)
pip

Installation

Clone the repository:

git clone https://github.com/your-username/mechanism-scraper.git
cd mechanism-scraper

Create and activate a virtual environment:

python3 -m venv myenv
source myenv/bin/activate

Install the required Python packages:
```
pip install -r requirements.txt
```
If the requirements.txt does not exist, you can install the dependencies manually:
```
pip install selenium webdriver_manager python-dotenv pandas
```

Setup

Environment Variables

Create a .env file in the root of the project directory to store your environment-specific variables:

touch .env

Add the following lines to the .env file, replacing the placeholder paths with your actual paths:

CHROMEDRIVER_PATH=/path/to/your/chromedriver
BRAVE_BROWSER_PATH=/Applications/Brave Browser.app/Contents/MacOS/Brave Browser

Example `.env` File

CHROMEDRIVER_PATH=/Users/your-username/Downloads/chromedriver
BRAVE_BROWSER_PATH=/Applications/Brave Browser.app/Contents/MacOS/Brave Browser

Note

Ensure that the ChromeDriver version matches the Brave browser version installed on your machine.
Make the chromedriver binary executable if needed:
```
chmod +x /path/to/your/chromedriver
```

Usage

To run the script, use the following command:

python mechanism_scrape.py

The script will launch a headless Brave browser, navigate to the specified page, and scrape the data. The scraped data will be saved as both CSV and JSON files in the current directory.

Output

The script generates two output files:

mechanism_institute_library.csv: A CSV file containing the scraped data.
mechanism_institute_library.json: A JSON file containing the scraped data.

Troubleshooting

SessionNotCreatedException: Ensure the ChromeDriver version matches the installed Brave browser version.
No Chrome Binary Error: Check the BRAVE_BROWSER_PATH in the .env file to ensure it points to the correct location of the Brave binary.
Permission Denied Error: Make sure that chromedriver is executable. Use chmod +x /path/to/your/chromedriver.

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository.
Create a new branch (git checkout -b feature-branch).
Make your changes.
Commit your changes (git commit -m 'Add some feature').
Push to the branch (git push origin feature-branch).
Open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Mechanism Institute Library Scraper

Table of Contents

Features

Prerequisites

Installation

Setup

Environment Variables

Example `.env` File

Note

Usage

Output

Troubleshooting

Contributing

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Mechanism Institute Library Scraper

Table of Contents

Features

Prerequisites

Installation

Setup

Environment Variables

Example .env File

Note

Usage

Output

Troubleshooting

Contributing

License

Example `.env` File