Skip to content

Latest commit

 

History

History
189 lines (143 loc) · 6.48 KB

File metadata and controls

189 lines (143 loc) · 6.48 KB

Lecture 7 (Basics of SQL)

Resources

You can find examples and motivation in the resources.

Summary

In this lecture we will go through the basics of SQL. This includes the basic syntax for making queries to the database. This includes both retrieving and storing data in the database.

SQL

What is SQL? Structured Query Language (SQL) is a computer language that enables us to communicate and manipulate SQL databases(db) and it is one of the most widely used languages for db management.

There are many flavours (versions) of SQL. However, they all share the basic syntax as below and often only differ in the way the basic syntax is extended.

SQL is a Relational Database Management System (RDBMS), meaning that the data is stored in tables that are seperated based on the scope of the database. A nice illustration of how data is stored can be seen in the figure below.

drawing

Image borrowed from:https://r4ds.hadley.nz/

Here, the tables are seperated to reflect differring objects. We also see that the tables are linked or related to each other through specific variables (keys). This is where the name relational database comes from.

Basic Syntax

Writing an SQL query (a question or a requests) is similiar to speaking with the database. Once we get a grasp of the basic syntax we will be able to query the database for data in whatever form that is.

We use SELECT to determine the columns we want and FROM to specify the table we want it from. Following the image above, we can make the following query.

SELECT carrier, dest, tailnum
FROM flights;

Make sure to end the query with ;

We can use SELELCT * to get all columns in the table. The query above is the simplest query one can make in SQL.

We can add a layer of complexity by filtering the table. We can do this by using the WHERE command. Continuing the example, we can make the following query.

SELECT carrier, dest, tailnum
FROM flights
WHERE dest='sweden';

Furthermore, we can use the logical operators AND, OR, NOT to perform logical chaining of conditions. The basic syntax is the following.

SELECT variable1, variable2, ... -- * if you want all variables
FROM table
WHERE condition1 AND/OR/NOT condition2 AND/OR/NOT condition3 ...;

To aggregate multiple tables into a single table we can use ... JOIN. The syntax is similar to the previous lecture. ON specifies the key(s) to join on. Continuing the example above the syntax is the following.

SELECT carrier, dest, tailnum, airlines.names
FROM flights
LEFT JOIN airlines ON flights.carrier = airlines.carrier;

Using the syntax described above, we can chain multiple operations to generate the desired table. You can see more operations on W3.

Since the SQL syntax has been adopted by both pandas and tidyverse, it will not be difficult to learn the content above.

Storing Data

To start storing data, we need to create a table to store it in. We can do this using the CREATE command. The query is as follows.

CREATE TABLE table_name (
  ...
);

To our table, we need to add columns (variable), we can do this by specifying the name of the column and datatype. We can also specify constraints on the variable. The query is as follows.

CREATE TABLE table_name (
  Variable1 datatype constraint,
  Variable2 datatype constraint,
  ...
);

Common constrains are NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY. There are more constraints that can be added. They can be found on W3.

The unique identifier (id) of a row should be constrained with UNIQUE and it must also be filled (NOT NULL). The PRIMARY KEY incorporates both constraints. Therefore, it is good practice to specify the row id using the PRIMARY KEY constraint.

An example of creating a table can look something like the following query.

CREATE TABLE Persons (
  name VARCHAR(255) NOT NULL,
  age INT,  -- It does not need to be specified
  id_number VARCHAR PRIMARY KEY
);

Specific datatypes can be looked up on W3.

To change a table that already exists we can use the ALTER command. Specific ways to alter tables can be found on W3.

Putting it all together, the SQL query looks as follows.

CREATE TABLE Persons (
  Name VARCHAR(255) NOT NULL,
  Age INT,  -- It does not need to be specified
  ID_number VARCHAR(255) PRIMARY KEY
);

CREATE TABLE Computers (
  Computer_id INT PRIMARY KEY,
  Model VARCHAR(255),
  Brand VARCHAR(255),
  Owner VARCHAR(255) FOREIGN KEY REFERENCES Persons(ID_number) 
)

Here, we create two tables and use FOREIGN KEY to link the owner of the computer to a specific person using the id-number. If we later realise that we want a person to have a variable specifying occupation, we can use the following command.

ALTER TABLE Persons
ADD Occupation VARCHAR(255);

With a table created, we can add entries to the table. This can be done using the INSERT command. The full syntax looks as follows.

INSERT INTO table_name (variable1, variable2, ...)
VALUES (value1, value2, ...)

Note that the order of the values should follow the order of the variables. Furthermore, if the constraints are violated we will get an error.

In the example above the query may look like

INSERT INTO Persons (Name, ID_number)
VALUES ('Taariq', '19000101-xxxx');

Since, the Age variable does not need to be filled in, the query will populate the Persons table and have an empty cell (containing NULL) for the Age column.

We are only scratching the surface of SQL, there is a lot more to learn. However, with this basic knowledge we will be able to interact with databases. You can find more information in the listed resources (W3).

There are various libraries in Python and R to interact with SQL databases. We will primarily be working with SQLite databases since they are more accessible. Popular libraries are sqlite3 for Python and dbplyr for R.