This project aims to implement a simple web scraper for second-hand cars, do data processing and simple analysis and, finally, implement KNN and L2 Linear Regression (Ridge regression).
You will need to have python 3.10 installed, as well as NodeJS (npm) and AngularJS. Regarding the database requirement,
you will need to have Postgres installed with a database named polovniautomobili
Project specific python dependencies can be found in requirements.txt
This project is implemented as a collection of multiple smaller "projects" and the execution should be followed in accordance with this README.
The initial raw data can be found within the polovni_automobili_database_.csv
. This can be overridden by running the
main.py
in ./scrapy/
.
Please note that scrapy does not have rotating proxies implemented! This is due to the project requirement regarding
execution time. The proxies which I had the opportunity to use were unacceptably slow! Be mindful if you run the scraper!
I highly suggest you run the project from PyCharm!
- [Optional] Run the Scrapy project from within
./scraper
by runningmain.py
- Run
trim_database.py
from./scraper/data_formatting/
, this will generate thepolovni_automobili_database_removed_newlines.csv
file - Run
formatter.py
from./scraper/data_formatting/
, this will generate thepolovni_automobili_database_formatted_data.csv
file - Load the dump file in
./scraper/db_migrating/pg_dump
to your database - [Optional] If you have data in your DB, run
db_truncate.py
from./scraper/db_migrating/
- Run the
db_migrating.py
from./scraper/db_migrating/
- Run
generate_document.py
from./scraper/db_task_scripts/
- Optionally open the
rezultati.docx
after the execution finishes
- Optionally open the
- Run
generate_graphs.py
from./scraper/data_visualization/
- Optionally check the
./scraper/data_visualization/graphs
- Optionally check the
./scraper/data_visualization/scripts
- Optionally visit the
./scraper/data_visualization/web_view
- Start the server
flask_main.py
- Open
main.html
- Start the server
- Optionally check the
- [Cauton! Runs too long] Run
exploratory_testing.py
in./scraper/ml/backend
- The results of this can be found in
./scraper/ml/backend/exploratory_testing_results/
- The results of this can be found in
- Run
prepare_lin_regression_for_app.py
in./scraper/ml/backend
- Start the server
server.py
in./scraper/ml/backend
- Navigate to
./scraper/ml/frontend
and start the angular application withng serve --open
- Note that you will most likely need to run the
npm install
command!
- Note that you will most likely need to run the
- In the browser, select data and execute