Skip to content

Latest commit

 

History

History
756 lines (524 loc) · 19.5 KB

2023.02.04,SpaceX_Presentation,SQL.md

File metadata and controls

756 lines (524 loc) · 19.5 KB

Skills Network Logo

Assignment: SQL Notebook for Peer Assignment

Estimated time needed: 60 minutes.

Introduction

Using this Python notebook you will:

  1. Understand the Spacex DataSet
  2. Load the dataset into the corresponding table in a Db2 database
  3. Execute SQL queries to answer assignment questions

Overview of the DataSet

SpaceX has gained worldwide attention for a series of historic milestones.

It is the only private company ever to return a spacecraft from low-earth orbit, which it first accomplished in December 2010. SpaceX advertises Falcon 9 rocket launches on its website with a cost of 62 million dollars wheras other providers cost upward of 165 million dollars each, much of the savings is because Space X can reuse the first stage.

Therefore if we can determine if the first stage will land, we can determine the cost of a launch.

This information can be used if an alternate company wants to bid against SpaceX for a rocket launch.

This dataset includes a record for each payload carried during a SpaceX mission into outer space.

Download the datasets

This assignment requires you to load the spacex dataset.

In many cases the dataset to be analyzed is available as a .CSV (comma separated values) file, perhaps on the internet. Click on the link below to download and save the dataset (.CSV file):

Spacex DataSet

Store the dataset in database table

it is highly recommended to manually load the table using the database console LOAD tool in DB2.

Now open the Db2 console, open the LOAD tool, Select / Drag the .CSV file for the dataset, Next create a New Table, and then follow the steps on-screen instructions to load the data. Name the new table as follows:

SPACEXDATASET

Follow these steps while using old DB2 UI which is having Open Console Screen

Note:While loading Spacex dataset, ensure that detect datatypes is disabled. Later click on the pencil icon(edit option).

  1. Change the Date Format by manually typing DD-MM-YYYY and timestamp format as DD-MM-YYYY HH:MM:SS

  2. Change the PAYLOAD_MASS__KG_ datatype to INTEGER.

Changes to be considered when having DB2 instance with the new UI having Go to UI screen

  • Refer to this insruction in this link for viewing the new Go to UI screen.

  • Later click on Data link(below SQL) in the Go to UI screen and click on Load Data tab.

  • Later browse for the downloaded spacex file.

  • Once done select the schema andload the file.

!pip install sqlalchemy==1.3.9
Collecting sqlalchemy==1.3.9
  Downloading SQLAlchemy-1.3.9.tar.gz (6.0 MB)
�[2K     �[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━�[0m �[32m6.0/6.0 MB�[0m �[31m83.6 MB/s�[0m eta �[36m0:00:00�[0m:00:01�[0m00:01�[0m
�[?25h  Preparing metadata (setup.py) ... �[?25ldone
�[?25hBuilding wheels for collected packages: sqlalchemy
  Building wheel for sqlalchemy (setup.py) ... �[?25ldone
�[?25h  Created wheel for sqlalchemy: filename=SQLAlchemy-1.3.9-cp37-cp37m-linux_x86_64.whl size=1159122 sha256=4ca84e6075c8a4fc6b1dc82bbfa731f748db43aeefb704e9f6bade028052d2f8
  Stored in directory: /home/jupyterlab/.cache/pip/wheels/ef/95/ac/c232f83b415900c26553c64266e1a2b2863bc63e7a5d606c7e
Successfully built sqlalchemy
Installing collected packages: sqlalchemy
  Attempting uninstall: sqlalchemy
    Found existing installation: SQLAlchemy 1.3.24
    Uninstalling SQLAlchemy-1.3.24:
      Successfully uninstalled SQLAlchemy-1.3.24
Successfully installed sqlalchemy-1.3.9

Connect to the database

Let us first load the SQL extension and establish a connection with the database

%load_ext sql
import csv, sqlite3

con = sqlite3.connect("my_data1.db")
cur = con.cursor()
!pip install -q pandas==1.1.5
%sql sqlite:///my_data1.db
'Connected: @my_data1.db'
import pandas as pd
df = pd.read_csv("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/labs/module_2/data/Spacex.csv")
df.to_sql("SPACEXTBL", con, if_exists='replace', index=False,method="multi")
/home/jupyterlab/conda/envs/python/lib/python3.7/site-packages/pandas/core/generic.py:2882: UserWarning: The spaces in these column names will not be changed. In pandas versions < 0.14, spaces were converted to underscores.
  both result in 0.1234 being formatted as 0.12.

Tasks

Now write and execute SQL queries to solve the assignment tasks.

Note: If the column names are in mixed case enclose it in double quotes For Example "Landing_Outcome"

Task 1

Display the names of the unique launch sites in the space mission
%%sql
Select DISTINCT(launch_site) FROM SPACEXTBL
 * sqlite:///my_data1.db
Done.
Launch_Site
CCAFS LC-40
VAFB SLC-4E
KSC LC-39A
CCAFS SLC-40

Task 2

Display 5 records where launch sites begin with the string 'CCA'
%%sql
Select * FROM SPACEXTBL
WHERE launch_site LIKE '%CCA%'
LIMIT 5;
 * sqlite:///my_data1.db
Done.
Date Time (UTC) Booster_Version Launch_Site Payload PAYLOAD_MASS__KG_ Orbit Customer Mission_Outcome Landing _Outcome
04-06-2010 18:45:00 F9 v1.0 B0003 CCAFS LC-40 Dragon Spacecraft Qualification Unit 0 LEO SpaceX Success Failure (parachute)
08-12-2010 15:43:00 F9 v1.0 B0004 CCAFS LC-40 Dragon demo flight C1, two CubeSats, barrel of Brouere cheese 0 LEO (ISS) NASA (COTS) NRO Success Failure (parachute)
22-05-2012 07:44:00 F9 v1.0 B0005 CCAFS LC-40 Dragon demo flight C2 525 LEO (ISS) NASA (COTS) Success No attempt
08-10-2012 00:35:00 F9 v1.0 B0006 CCAFS LC-40 SpaceX CRS-1 500 LEO (ISS) NASA (CRS) Success No attempt
01-03-2013 15:10:00 F9 v1.0 B0007 CCAFS LC-40 SpaceX CRS-2 677 LEO (ISS) NASA (CRS) Success No attempt

Task 3

Display the total payload mass carried by boosters launched by NASA (CRS)
%%sql
SELECT SUM(PAYLOAD_MASS__KG_) FROM SPACEXTBL
WHERE Customer LIKE "%(CRS)%"
 * sqlite:///my_data1.db
Done.
SUM(PAYLOAD_MASS__KG_)
48213

Task 4

Display average payload mass carried by booster version F9 v1.1
%%sql
SELECT AVG(PAYLOAD_MASS__KG_) FROM SPACEXTBL
WHERE Booster_Version LIKE '%F9 v1.1%';
 * sqlite:///my_data1.db
Done.
AVG(PAYLOAD_MASS__KG_)
2534.6666666666665

Task 5

List the date when the first succesful landing outcome in ground pad was acheived.

Hint:Use min function

%%sql
SELECT DATE, [Landing _Outcome] FROM SPACEXTBL
where (substr(DATE,7)||substr(DATE,4,2)||substr(DATE,1,2) 
      between '20100604' and '20171130') AND [Landing _Outcome]  LIKE "%Success (ground pad)%"
      LIMIT 1;
 * sqlite:///my_data1.db
Done.
Date Landing _Outcome
22-12-2015 Success (ground pad)

Task 6

List the names of the boosters which have success in drone ship and have payload mass greater than 4000 but less than 6000
%%sql
SELECT Booster_Version FROM SPACEXTBL
Where PAYLOAD_MASS__KG_ >4000 AND PAYLOAD_MASS__KG_ <6000
AND"Landing _Outcome" LIKE '%Success (drone ship)%';
 * sqlite:///my_data1.db
Done.
Booster_Version
F9 FT B1022
F9 FT B1026
F9 FT B1021.2
F9 FT B1031.2

Task 7

List the total number of successful and failure mission outcomes
%%sql
SELECT Mission_Outcome,COUNT(*) FROM SPACEXTBL
Group by Mission_Outcome;
 * sqlite:///my_data1.db
Done.
Mission_Outcome COUNT(*)
Failure (in flight) 1
Success 98
Success 1
Success (payload status unclear) 1

Task 8

List the names of the booster_versions which have carried the maximum payload mass. Use a subquery
%%sql
Select MAX(PAYLOAD_MASS__KG_) FROM SPACEXTBL
 * sqlite:///my_data1.db
Done.
MAX(PAYLOAD_MASS__KG_)
15600
%%sql
SELECT Distinct(Booster_version),(PAYLOAD_MASS__KG_) as "Payload" FROM SPACEXTBL
Where PAYLOAD_MASS__KG_=(Select MAX(PAYLOAD_MASS__KG_) FROM SPACEXTBL);
 * sqlite:///my_data1.db
Done.
Booster_Version Payload
F9 B5 B1048.4 15600
F9 B5 B1049.4 15600
F9 B5 B1051.3 15600
F9 B5 B1056.4 15600
F9 B5 B1048.5 15600
F9 B5 B1051.4 15600
F9 B5 B1049.5 15600
F9 B5 B1060.2 15600
F9 B5 B1058.3 15600
F9 B5 B1051.6 15600
F9 B5 B1060.3 15600
F9 B5 B1049.7 15600

Task 9

List the records which will display the month names, failure landing_outcomes in drone ship ,booster versions, launch_site for the months in year 2015.

Note: SQLLite does not support monthnames. So you need to use substr(Date, 4, 2) as month to get the months and substr(Date,7,4)='2015' for year.

%%sql

SELECT CASE cast(SUBSTR(DATE,4,2) as integer)
        WHEN "01" THEN "January" 
        WHEN "02" THEN "February"
        WHEN "03" THEN "March"
        WHEN "04" THEN "April"
        WHEN "05" THEN "May"
        WHEN "06" THEN "June"
        WHEN "07" THEN "July"
        WHEN "08" THEN "August"
        WHEN "09" THEN "September"
        WHEN "10" THEN "October"
        WHEN "11" THEN "November"
        WHEN "12" THEN "December"
    END as "month", 
    Date,Launch_site, Booster_Version,[Landing _Outcome]
    FROM SPACEXTBL
    WHERE [Landing _Outcome] = "Failure (drone ship)" AND substr(Date,7,4)='2015'
 * sqlite:///my_data1.db
Done.
month Date Launch_Site Booster_Version Landing _Outcome
January 10-01-2015 CCAFS LC-40 F9 v1.1 B1012 Failure (drone ship)
April 14-04-2015 CCAFS LC-40 F9 v1.1 B1015 Failure (drone ship)

Task 10

Rank the count of successful landing_outcomes between the date 04-06-2010 and 20-03-2017 in descending order.
%%sql
SELECT [Landing _Outcome],COUNT([Landing _Outcome]) "Missions", RANK() OVER( ORDER BY COUNT([Landing _Outcome]) DESC) Rank FROM SPACEXTBL 
where (substr(DATE,7)||substr(DATE,4,2)||substr(DATE,1,2) 
      between '20100604' and '20170320') AND [Landing _Outcome] LIKE "%Success%"
Group by [Landing _Outcome];
 * sqlite:///my_data1.db
Done.
Landing _Outcome Missions Rank
Success (drone ship) 5 1
Success (ground pad) 3 2

Reference Links

Author(s)

Lakshmi Holla

Other Contributors

Rav Ahuja

Change log

Date Version Changed by Change Description
2021-07-09 0.2 Lakshmi Holla Changes made in magic sql
2021-05-20 0.1 Lakshmi Holla Created Initial Version

© IBM Corporation 2021. All rights reserved.