PSZ Project

This project aims to implement a simple web scraper for second-hand cars, do data processing and simple analysis and, finally, implement KNN and L2 Linear Regression (Ridge regression).

The description for this project can be found in the pdf `PSZ_Projekat_Jun_Jul_2022_v1.0.pdf`

Prerequisites

You will need to have python 3.10 installed, as well as NodeJS (npm) and AngularJS. Regarding the database requirement, you will need to have Postgres installed with a database named polovniautomobili

Project specific python dependencies can be found in requirements.txt

Usage

This project is implemented as a collection of multiple smaller "projects" and the execution should be followed in accordance with this README.

The initial raw data can be found within the polovni_automobili_database_.csv. This can be overridden by running the main.py in ./scrapy/.

NOTE:

Please note that scrapy does not have rotating proxies implemented! This is due to the project requirement regarding
execution time. The proxies which I had the opportunity to use were unacceptably slow! Be mindful if you run the scraper!

I highly suggest you run the project from PyCharm!

[Optional] Run the Scrapy project from within ./scraper by running main.py
Run trim_database.py from ./scraper/data_formatting/, this will generate the polovni_automobili_database_removed_newlines.csv file
Run formatter.py from ./scraper/data_formatting/, this will generate the polovni_automobili_database_formatted_data.csv file
Load the dump file in ./scraper/db_migrating/pg_dump to your database
[Optional] If you have data in your DB, run db_truncate.py from ./scraper/db_migrating/
Run the db_migrating.py from ./scraper/db_migrating/
Run generate_document.py from ./scraper/db_task_scripts/
1. Optionally open the rezultati.docx after the execution finishes
Run generate_graphs.py from ./scraper/data_visualization/
1. Optionally check the ./scraper/data_visualization/graphs
2. Optionally check the ./scraper/data_visualization/scripts
3. Optionally visit the ./scraper/data_visualization/web_view
  1. Start the server flask_main.py
  2. Open main.html
[Cauton! Runs too long] Run exploratory_testing.py in ./scraper/ml/backend
1. The results of this can be found in ./scraper/ml/backend/exploratory_testing_results/
Run prepare_lin_regression_for_app.py in ./scraper/ml/backend
Start the server server.py in ./scraper/ml/backend
Navigate to ./scraper/ml/frontend and start the angular application with ng serve --open
1. Note that you will most likely need to run the npm install command!
In the browser, select data and execute

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.idea		.idea
scraper		scraper
.deepsource.toml		.deepsource.toml
.gitignore		.gitignore
LICENSE		LICENSE
ML_Explanation.pdf		ML_Explanation.pdf
PSZ_Projekat_Jun_Jul_2022_v1.0.pdf		PSZ_Projekat_Jun_Jul_2022_v1.0.pdf
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PSZ Project

The description for this project can be found in the pdf `PSZ_Projekat_Jun_Jul_2022_v1.0.pdf`

Prerequisites

Usage

NOTE:

About

Releases

Packages

Contributors 2

Languages

License

DarkBlackJPG/SecondaryCarScraper

Folders and files

Latest commit

History

Repository files navigation

PSZ Project

The description for this project can be found in the pdf PSZ_Projekat_Jun_Jul_2022_v1.0.pdf

Prerequisites

Usage

NOTE:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

The description for this project can be found in the pdf `PSZ_Projekat_Jun_Jul_2022_v1.0.pdf`

Packages