Carrefour_sample.mp4
Notes: This project is mainly used to scrape the SKUs (Products) from the public E-commerce website Carrefour. Its main goal is to schedule the pipeline, can be implemented on cloud-base system, to periodically scrape the data from the website.
Ecommerce_brandtable: This is the table containingUniqueinformation of theBrandfrom which SKUs belong to. This table will only add innew brandsin upcoming scrape.
- Dimentions breakdown:
id: The unique id for each brandBrand: The actual name of that brand displayed on the E-commerce platform
Ecommerce_skutable: This is the main table containingUniqueinformation of the SKUs being scrapped. This table will only add innew SKUsin upcoming scrape.
- Dimentions breakdown:
id: The unique id for each SKUfk_brand_id: Theforeign brand id keyused to get theBrand namefrom theEcommerce_brandtableSKU_ID: The generated marketplace code for each productKeyWord: The keyword the users used to launch the scrape in the executed jobProduct: The actual name of SKU's products displayed on the E-commerce platformsUrl: The products' urls from which the users can access directlyimg_url: The products' images urls from which the users can access directlyBase_size: The quantified base size of each product, if obtainable from the marketplacesource: The data source that these products get created (From scrapping in this project)Created: The date time that the products were inserted to this table
Ecommerce_sku_pricestable: This is the table containing information of the SKU's prices being scrapped. This table will beupdatedfor existed SKUs andinsertedfor non-existed SKUs in upcoming scrape.
- Dimentions breakdown:
id: The unique id for each rowSKU_ID: The generated marketplace code for each productPrice: The actual price of the products displayed on the E-commerce platformscreated: The date time that the products were inserted to this tableupdated: The date time that the products were updated to this tablefk_sku_id: Theforeign keyused to link with theidprimary key in theecommerce_skutable
Ecommerce_sku_ratingstable: This is the table containing information of the SKU's prices being scrapped. This table will beupdatedfor existed SKUs andinsertedfor non-existed SKUs in upcoming scrape.
- Dimentions breakdown:
id: The unique id for each rowSKU_ID: The generated marketplace code for each productRatings: The actual ratings of the products displayed on the E-commerce platformsStars: The actual stars of the products displayed on the E-commerce platformscreated: The date time that the products were inserted to this tableupdated: The date time that the products were updated to this tablefk_sku_id: Theforeign keyused to link with theidprimary key in theecommerce_skutable
Ecommerce_sku_stamptable: This is the table containing information of all the SKU's prices being scrapped. It will beappendedall SKUs, regardless of existed or non-existed in each scrape, its main purpose is used for references and trackings.
- Dimentions breakdown:
id: The unique id for each rowSKU_ID: The generated marketplace code for each productfk_sku_id: Theforeign keyused to link with theidprimary key in theecommerce_skutableRatings: The actual ratings of the products displayed on the E-commerce platformsStars: The actual stars of the products displayed on the E-commerce platformscreated: The date time that the products were inserted to this tableupdated: The date time that the products were updated to this table

