Capstone Project - Custom API Web Scraper

This project is a part of the Software Engineering Career Track course as it is my Capstone I. It involves the development of a custom web scraper that extracts data from a website and presents it as an API. This README file provides an overview of the project, its functionalities, and how to use it.

For this project I wanted to take hand at creating my own API from an external website. ProDrivers

API

https://www.prodrivers.com/jobs/?_city={city_param}&_state={state_param}&_title={key_param}

You can view the deployed application here: Prodrivers - Webcrawl

Project Overview

Description

The Capstone Part I project focuses on building a custom API using web scraping techniques. The API is designed to retrieve data from a specific website and format it in a way that's easily accessible for other applications.

Features

Web scraping using BeautifulSoup4 and Requests libraries.
Conversion of scraped data into a custom API.
Clean and structured API responses.
Easy installation via pip or using a requirements.txt file.
Create a User (With various roles and permissions)
Login/Logout
Create Additional Jobs that are not apart of API

Userflow

You may search for jobs without logging in
If you 'Sign Up' then you can choose a role 'Driver', 'Dispatcher' and 'Client'
As far as the relations go:

Technology Stack

Python
BeautifulSoup4
Requests
Flask
Jinja2

Installation

You can install the required libraries for this project using pip. Simply run the following command inside the local directory of the project:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
flask run

If for any reason 'pip install -r requirements.txt' fails you may copy the entire contents of the post_requirements_manual.md file into your terminal.

Running the Web Scrapes

To run the web scraper and obtain API responses, execute the following command:

python webcrawl.py

This command will initiate the scraping process, and you should see a list of retrieved data items presented as a custom API JSON object.

Usage

Standard User Flow

Install the required dependencies using the installation instructions provided above.
Run the web scraper using the command python webcrawl.py.
Access the custom API generated by the scraper to retrieve data.
Use the obtained data in your applications as needed.

Developer Notes

After installing any additional packages, make sure to update the requirements.txt file with the command:

pip freeze > requirements.txt

The database is hosted at ElephantSQL
For refrence: to upload the local database to the cloud database, run this comman locally.

pg_dump -O driver_jobs_db | psql postgres://ruvsrxcf:[email protected]/ruvsrxcf

To view the database

psql postgres://ruvsrxcf:[email protected]/ruvsrxcf

Additional Information

The project's GitHub repository contains all the code and documentation.
For details on the specific API endpoints and data structure, refer to the API documentation in the code.

This project is a demonstration of web scraping techniques and creating custom APIs. It can be extended and modified to suit various data extraction needs.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
static		static
templates		templates
.gitattributes		.gitattributes
.gitignore		.gitignore
Capstone One: Part Two.gdoc		Capstone One: Part Two.gdoc
Capstone-ProDrivers.png		Capstone-ProDrivers.png
alembic.ini		alembic.ini
app.py		app.py
config.py		config.py
forms.py		forms.py
models.py		models.py
post_requirements_manual.md		post_requirements_manual.md
pre_requirements_manual.md		pre_requirements_manual.md
readme.md		readme.md
requirements.txt		requirements.txt
schema.svg		schema.svg
seed.sql		seed.sql
test.py		test.py
webcrawl.py		webcrawl.py
webcrawl_cpc.py		webcrawl_cpc.py
webcrawl_cpc_solo.py		webcrawl_cpc_solo.py
webcrawl_cpc_solo2.py		webcrawl_cpc_solo2.py
webcrawl_old.py		webcrawl_old.py
webcrawl_prodrivers.py		webcrawl_prodrivers.py
webcrawl_prodrivers_solo.py		webcrawl_prodrivers_solo.py
webcrawl_prodrivers_solo2.py		webcrawl_prodrivers_solo2.py
webcrawl_trillium.py		webcrawl_trillium.py
webcrawl_trillium_solo.py		webcrawl_trillium_solo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Capstone Project - Custom API Web Scraper

Project Overview

Description

Features

Userflow

Technology Stack

Installation

Running the Web Scrapes

Usage

Standard User Flow

Developer Notes

Additional Information

About

Releases

Packages

Languages

3ichael7ambert/Prodrivers-WebCrawler

Folders and files

Latest commit

History

Repository files navigation

Capstone Project - Custom API Web Scraper

Project Overview

Description

Features

Userflow

Technology Stack

Installation

Running the Web Scrapes

Usage

Standard User Flow

Developer Notes

Additional Information

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages