-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdata_loading.py
54 lines (39 loc) · 1.57 KB
/
data_loading.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# -*- coding: utf-8 -*-
"""Data Loading.ipynb
Automatically generated by Colaboratory.
Original file is located at
https://colab.research.google.com/drive/1A2Uw-sUxi6iu-OmeodPQaW5vRTfpMQwS
#**1 DATA LOADING**
##**1.1 LOADING LIBRARIES**
"""
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno
# Standard scaler
from sklearn.preprocessing import StandardScaler
# Libraries for hierarchical clustering
from scipy.cluster.hierarchy import linkage, fcluster, dendrogram
from sklearn.manifold import TSNE
from sklearn.decomposition import PCA
"""##**1.2 LOADING DATASET**"""
# Load data from url
url= '/content/russian_alcohol_consumption.csv'
df = pd.read_csv(url)
df.head()
"""We have access to the per capita sales of wine, beer, vodka, champagne and brandy in Russia between 1998 and 2016."""
# Number of rows and columns
df.shape
# Basic info about our dataset
df.info()
"""The data has 1615 entries and 7 different features. Based on the provider's instructions, these features include:
1. **"year" -** year (1998-2016)
2. **"region" -** name of a federal subject of Russia. It could be oblast, republic, krai, autonomous okrug, federal city and a single autonomous oblast
3. **"wine" -** sale of wine in litres by year per capita
4. **"beer" -** sale of beer in litres by year per capita
5. **"vodka" -** sale of vodka in litres by year per capita
6. **"champagne" -** sale of champagne in litres by year per capita
7. **"brandy" -** sale of brandy in litres by year per capita
"""