We wanted to mimic a database of a huge e-commerce site like Amazon. Then, we identified how a huge database like Amazon's could be broken down into operations and insights. And we decided to focus on the insights portion because there is more room to interpret results. So, this database serves both customers and producers with insights into products. Our final product was our query outputs. This is an academic project for ST207.
The entire project was written with SQLite on DBBrowser. We established a connection from our ST207Database.ipynb file to do this. The dataset here has already been cleaned before inserting into the database, you can find the original dataset here: https://app.datastock.shop/?site_name=Amazon.com_Product_Reviews (You will need an account to download this). We have also added additional data from here: https://www.kaggle.com/promptcloud/amazon-product-reviews-dataset
- Run the code in ST207Database.ipynb on Google Colab
- Before running, import amazon.csv into the environment
- Deletion rules: Realistically, the database should've allowed for deletions in the case we accidentally entered a wrong record. Also for ethical reasons, customers may not want their data stored there forever to be used by others
- Extra restrictions: For instance, in the Products table, a restriction to ensure Categories and Subcategories would have been useful
- Dropping information: For the database to work, we had to drop a lot of data with null values. But in reality, data often have these kinds of gaps and an improvement we could've made is to find another way to work with the empty values
- Reviews interpretations: We could have scanned through review contents to make more analyses
With our initial goal in mind, our database did a good job at serving customers and producers with insights. Evidence is shown in all the query outputs. However, we did see that there were a couple of things that would become a problem if brought into a real-life setting.
- Chris Twomey: https://github.com/christwm201914523
- Rachel Soh: https://github.com/RS201918703
- Rafay Butt: https://github.com/raf201920011