Skip to content

Files

Latest commit

 

History

History

PySpark_Project2_Lego

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

PySpark_Project2_Lego

INTRODUCTION


Inventory that stores goods that are related to Lego was analyzed by using Apache Spark.
Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python and R.
It also supports pandas API on Spark for pandas workloads.
This analysis was performed to find out about quantity of items, parts, sets and themes and their properties for example either colours or transparent and the stock of these parts in the inventory.

ANALYSIS


Analysis was perfomed by PySpark. PySpark is the Python API for Apache Spark.
Advanced functions for ETL/ELT transformations were used to create this project.
Before analysis, data was cleaned using advanced PySpark functions & methods
All steps with detailed information is also described in PySpark_Project2_Lego.ipynb

MAIN CONCLUSIONS

  1. Minifigure Series 1 [Random Bag] has the greatest quantity of sets (60)
  2. A Town Theme that belongs to a Dacta Buildings Set has the greatest quantity in the Inventory (22)
  3. A Fence 1 x 4 x 1 part has the greatest quantity in the Inventory (100)
  4. Part Category Name Minifigs has the greatest quantity in the Inventory (24)
  5. A NHL Action Set with Stickers set has the greatest quantity in the Inventory (12)
  6. The majority of parts has the black colour (63)
  7. The Inventory has the stock for only 15 parts
  8. The Inventory has only 20 parts that are transparent.
  9. The oldest sets is Bungalow (54 years)
  10. The greatest quantity of parts belongs to Technic Pin with Friction Ridges Lengthwise and Center Slots (black) (18056). It belongs to Technic Pins Part Category Name
  11. There are 2180 Themes that do not possess parts
  12. A Dacta Buildings, Lego Road Safety Kit Poster ,Set K1062 Activity Booklet and {Town Vehicles} Sets have the greatest quantity of parts (136)


In-depth analysis with detailed information is included in PySpark_Project2_Lego.ipynb

To create this projects, these sites were used Spark & PySpark.