Skip to content

Commit

Permalink
Everything was polished and prepared for the release
Browse files Browse the repository at this point in the history
README.md:
 - two more references were added
 - English spelling and grammar was checked

data-visualization-scripts-only.R
 - useless library was removed
 - 3d plot axis labels were corrected
 - misspelled plot title was fixed

data-visualization.Rmd & data-visualization.pdf
 - legend of the plots was corrected
 - misspelled plot title was fixed
 - conclusions were added

data-visualization.ipynb
 - header and tail posters were added

modules/modules-demo.ipynb
 - URL parts naming scheme was added

project_diary_Vitalii.md was removed since now it is fully covered by project_diary.md
  • Loading branch information
FrightenedFox committed Jun 7, 2021
1 parent 528bdeb commit b48ae9a
Show file tree
Hide file tree
Showing 10 changed files with 7,234 additions and 94 deletions.
18 changes: 10 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,17 @@

*By Vitalii Morskyi & Julia Makarska*

The phenomenon of phishing has been around for many years. However, the last year has shown how important internet security is among other things. Over a year ago the world stopped: everybody and everything was moved to the Internet. That motivated us to analyse the topic of Phishing. Usually phishers use email or SMS messages to deceive us and force us to act according to their expectations. The key points we want to emphasise in our research are how easy it is to get tricked and what are the common properties of malicious URLs. The aspects we analysed cover only a small piece of this cheating method, however we found the results to be interesting and hope you will as well. This file, however, is more about recreating the steps of our analysis, not reporting the final results. However, if you are interested in the latter one, please checkout the [`demonstration`](https://github.com/FrightenedFox/r-lab-project/tree/main/demonstration) folder or the [`data-visualization.\*`](https://github.com/FrightenedFox/r-lab-project/blob/main/data-visualization.pdf) files.
The phenomenon of phishing has been around for many years. However, the last year has shown how important internet security is among other things. Over a year ago, the world stopped: everybody and everything was moved to the Internet. That motivated us to analyse the topic of Phishing. Phishers usually use email or SMS messages to deceive users and force to act according to their expectations. The key points we want to emphasize in our research are how easy it is to get tricked and what are the common properties of malicious URLs. The aspects we analyzed cover only a small piece of this cheating method, however we found the results to be interesting, and hope you will as well. At the same time, this file is more about recreating the steps of our analysis, not reporting the final results. However, if you are interested in the latter one, please checkout the [`demonstration`](https://github.com/FrightenedFox/r-lab-project/tree/main/demonstration) folder or the [`data-visualization.\*`](https://github.com/FrightenedFox/r-lab-project/blob/main/data-visualization.pdf) files.


## Setting up the environment

The main analysis is made using **Jupyter Notebook** which is usually used with Python, but also supports R.
The main analysis is conducted by using **Jupyter Notebook** which is usually used with Python, but also supports R.
So, to get things work properly, you would have to install some R and Python packages.

### Python modules

First of all you need [Python](https://www.python.org/downloads/) 3.5 or greater. Next you are expected to install `JupyterLab` and `r-essentials` modules.
First of all, you need [Python](https://www.python.org/downloads/) 3.5 or greater. Next, you are expected to install `JupyterLab` and `r-essentials` modules.

#### Using [`conda`](https://docs.conda.io/en/latest/miniconda.html)

Expand Down Expand Up @@ -45,7 +45,7 @@ For more ways of installing `JupyterLab` please checkout [this page](https://jup

### Running the JupyterLab environment

Assuming R-essentials are installed you can use one of the following commands to open JupyterLab environment:
Assuming R-essentials are installed, you can use one of the following commands to open JupyterLab environment:

```bash
jupyter-lab
Expand All @@ -57,15 +57,15 @@ or
python -m jupyter-lab
```

If everything was installed correctly then webpage similar to the one shown on the image below should appear in your default browser.
If everything has been installed correctly, then webpage similar to the one shown on the image below should appear in your default browser.

<p align="center">
<img src="./images/README_JupyterLab_450pdi.png" alt="Example of the JupyterLab environment">
</p>

### R packages

To install all required packages please open `R Console` in the JupyterLab tab and execute the following piece of code:
To install all required packages, please open `R Console` in the JupyterLab tab and execute the following piece of code:

```R
install.packages("stringi")
Expand All @@ -78,9 +78,9 @@ install.packages("rgl")
install.packages("GGally")
```

Note: if any problems occur while installing those packages try creating a separate `Conda Environment` specially for this project. To do so you can use `conda create --name EnvironmentName jupyterlab r-essentials` command. To activate your environment use the following command: `conda activate EnvironmentName`. Now you can continue from the step <a href="#running-the-jupyterlab-environment"><em>Running the JupyterLab environment</em></a>. [Does that work better?](https://github.com/FrightenedFox/r-lab-project#running-the-jupyterlab-environmenta) You can find out more about `Conda Environments` on their [official documentation page](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html).
Note: if any problems occur while installing those packages, try creating a separate `Conda Environment` specifically for this project. To do so, you can use `conda create --name EnvironmentName jupyterlab r-essentials` command. To activate your environment, use the following command: `conda activate EnvironmentName`. Now you can continue from the step <a href="#running-the-jupyterlab-environment"><em>Running the JupyterLab environment</em></a>. You can find out more about `Conda Environments` on their [official documentation page](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html).

If there were no problems with installing modules you are ready to go. You can start from opening the file `data-visualization.ipynb` by clicking on it's icon on the side bar.
If there were no problems with installing modules, you are ready to go. You can start from opening the file `data-visualization.ipynb` by clicking on it's icon on the side bar.

## References

Expand All @@ -95,6 +95,8 @@ If there were no problems with installing modules you are ready to go. You can s
### Theory behind the scenes:
- [CERT Polska : Lista ostrzeżeń przed niebezpiecznymi stronami](https://cert.pl/posts/2020/03/ostrzezenia_phishing/)
- [Uniform Resource Identifier](https://en.wikipedia.org/wiki/Uniform_Resource_Identifier)
- [Czym jest PHISHING](https://www.gov.pl/web/baza-wiedzy/)czym-jest-phishing-i-jak-nie-dac-sie-nabrac-na-podejrzane-widomosci-e-mail-oraz-sms-y
- [Kontrolowany atak PHISHINGOWY](https://phishing.opcja.pl/)

### Documentations and code examples:
- [R documentation](https://www.rdocumentation.org/)
Expand Down
6 changes: 2 additions & 4 deletions data-visualization-scripts-only.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
# install.packages("ggplot2")
# install.packages("ggExtra")
# install.packages("hrbrthemes")
# install.packages("tidyverse")
# install.packages("rgl")
# install.packages("GGally")

Expand All @@ -15,7 +14,6 @@ library("ggplot2")
library("ggExtra")
library("hrbrthemes")
library("rgl")
library("tidyverse")
library("GGally")

source("modules/split-url.r")
Expand Down Expand Up @@ -149,7 +147,7 @@ ggplot(fdfp[fdfp$url_l < 500, ], aes(x = label, y = url_l, group = label, fill =
splom(~data.frame(xyx_host, lett_host, dig_host, symb_host),
data = fdfp[sample(nrow(fdfp), 1000),],
pch = 1,
main = "Rozkład symboli w hoscie adresu URL.",
main = "Rozkład symboli w hoście adresu URL.",
groups = label,
# xlab = c("A", "B", "C", "D"),
# xlab = "", # czymś takim można usunąńć ten napis "Scatter Plot Matrix"
Expand Down Expand Up @@ -307,7 +305,7 @@ plot3d(
col = fdfm$color,
type = 's',
radius = 30,
xlab = "JS", ylab = "JS obf ", zlab = "URL")
xlab = "URL", ylab = "JS ", zlab = "JS obf")
rgl.bg( sphere = FALSE, fogtype = "none", color = c("#d8d7c4", "black"),
back = "lines", fogScale = 1)

44 changes: 21 additions & 23 deletions data-visualization.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,6 @@ output:
Zjawisko Phishingu zachodzi już od wielu lat. Jednak między innymi ostatni rok pokazał nam jak ważne jest bezpieczeństwo w internecie. Od roku świat się zatrzymał i przeniósł wszystko do Internetu. Z uwagi na ten fakt podjęcie tematu Phishingu uznaliśmy za bardzo na miejscu. Chcemy pokazać jak łatwo można dać się okraść. Przedstawiony przez nas projekt obejmuje tylko niewielki kawałek tej metody oszustwa, jednak uznaliśmy, że temat jest ciekawy.
Phishing jest to atak oparty na wiadomościach e-mail lub SMS. Przestępcy internetowi próbują Cię oszukać i wymusić na Tobie działania zgodne z ich oczekiwaniami.

# Importowanie danych, pakietów R i modułów

---

```{r message=FALSE, warning=FALSE}
Expand All @@ -21,7 +19,6 @@ library("ggplot2")
library("ggExtra")
library("hrbrthemes")
library("rgl")
library("tidyverse")
library("GGally")
source("modules/split-url.r")
Expand Down Expand Up @@ -158,7 +155,6 @@ ldsc_res_2 <- lett_dig_symb_count(split_res_2)

Najpierw tworzymy jedną macierz z wynikami wszystkich obliczeń, którą potem konwertujemy w ramkę danych.


```{r}
params_df <- as.data.frame(cbind(
lengths_res,
Expand Down Expand Up @@ -269,9 +265,9 @@ ggplot(fdfp[fdfp$url_l < 500, ], aes(x = label, y = url_l, group = label, fill =
theme(plot.title = element_text(family = "",
face = 'bold',
colour = 'black',
size = 12),
# panel.background = element_rect(fill = "#f0bc5e", colour = "black")
# rect = element_rect(fill = "#d8d7c4")
size = 12)
# panel.background = element_rect(fill = "#f0bc5e", colour = "black")
# rect = element_rect(fill = "#d8d7c4")
) +
scale_fill_manual(values = c("#6dd38c", "#f3aca7")) +
labs(color = "Etykieta")
Expand All @@ -298,21 +294,21 @@ splom(~data.frame(xyx_host, lett_host, dig_host, symb_host),
\newpage

```{r}
ggpairs(fdfp[fdfp$url_l < 500 & sample(nrow(fdfp), 500), ],
ggpairs(fdfp[fdfp$url_l < 500 & sample(nrow(fdfp), 1000), ],
aes(color = color,
alpha = .5),
columns = c("xyx_host", "lett_host", "dig_host", "symb_host"),
columnLabels = c("Ilość ciągów\npostaci XYX",
"Ilość liter",
"Ilość cyfr",
"Liczba znaków\ninterpunkcyjnych")) +
ggtitle("Rozkład symboli w hoscie adresu URL.") +
ggtitle("Rozkład symboli w hoście adresu URL.") +
theme(plot.title = element_text(family = "",
face = 'bold',
colour = 'black',
size = 12)
#panel.background = element_rect(fill = "#f0bc5e", colour = "black"),
#rect = element_rect(fill = "#d8d7c4")
# panel.background = element_rect(fill = "#f0bc5e", colour = "black"),
# rect = element_rect(fill = "#d8d7c4")
) +
labs(color = "Etykieta")
```
Expand All @@ -322,8 +318,8 @@ ggpairs(fdfp[fdfp$url_l < 500 & sample(nrow(fdfp), 500), ],
```{r}
histogram(~ symb_url | label ,
data = fdfp[sample(nrow(fdfp), 2000),],
main = "Porównanie liczba znaków interpunkcyjnych\nw dobrych i złych domenach",
xlab = "Ilość symboli w URL",
main = "Porównanie ilości znaków interpunkcyjnych\nw dobrych i złych domenach.",
xlab = "Ilość symboli w adresie URL",
ylab = "Procent całości",
layout = c(1, 2),
nint = 20,
Expand All @@ -343,8 +339,8 @@ ggplot(fdfp[fdfp$url_l < 500, ], aes(x = symb_url, fill = label)) +
face = 'bold',
colour = 'black',
size = 12)
# panel.background = element_rect(fill = "#f0bc5e", colour = "black"),
# rect = element_rect(fill = "#d8d7c4")
# panel.background = element_rect(fill = "#f0bc5e", colour = "black"),
# rect = element_rect(fill = "#d8d7c4")
) +
scale_fill_manual(values = c("#6dd38c", "#f3aca7")) +
labs(fill = "Etykieta", color = "Etykieta")
Expand Down Expand Up @@ -397,8 +393,8 @@ ggplot(data = fdfp[fdfp$url_l < 200, ], aes(x = symb_url, group = label, fill =
face = 'bold',
colour = 'black',
size = 12)
#panel.background = element_rect(fill = "#f0bc5e", colour = "black"),
#rect = element_rect(fill = "#d8d7c4")
# panel.background = element_rect(fill = "#f0bc5e", colour = "black"),
# rect = element_rect(fill = "#d8d7c4")
) +
scale_fill_manual(values = c("#6dd38c", "#f3aca7")) +
labs(fill = "Etykieta", color = "Etykieta")
Expand All @@ -416,8 +412,8 @@ ggplot(data = fdfm, aes(x = js_len, group = label, fill = label)) +
face = 'bold',
colour = 'black',
size = 12)
#panel.background = element_rect(fill = "#f0bc5e", colour = "black"),
#rect = element_rect(fill = "#d8d7c4")
# panel.background = element_rect(fill = "#f0bc5e", colour = "black"),
# rect = element_rect(fill = "#d8d7c4")
) +
scale_fill_manual(values = c("#f3aca7", "#6dd38c")) +
labs(fill = "Etykieta", color = "Etykieta")
Expand All @@ -435,17 +431,19 @@ ggplot(data = fdfm, aes(x = js_len, y = js_obf_len, color = label) ) +
face = 'bold',
colour = 'black',
size = 12)
#panel.background = element_rect(fill = "#f0bc5e", colour = "black"),
#rect = element_rect(fill = "#d8d7c4")
# panel.background = element_rect(fill = "#f0bc5e", colour = "black"),
# rect = element_rect(fill = "#d8d7c4")
) +
scale_fill_manual(values = c("#6dd38c", "#f3aca7")) +
labs(fill = "Etykieta", color = "Etykieta")
```

\newpage
# Wnioski

# Jakieś wnioski przydałoby się dodać
---

Przeprowadzając powyższą analizę wiele się nauczyliśmy. Zauważyliśmy jakie związki leksykalne występują w przypadku domen złych. Wiemy jakie "zamiany" występują najczęściej. Przyglądając się linkom nie zawsze da się to wszystko wyłapać od razu. Zatem najważniejszym, ale nie jedynym, wnioskiem płynącym z naszego projektu jest fakt, że trzeba uważać w jakie linki się wchodzi.
**Dostałeś/łaś podejrzanego maila? Nie otwieraj żadnych linków!**



24 changes: 19 additions & 5 deletions data-visualization.ipynb

Large diffs are not rendered by default.

Binary file modified data-visualization.pdf
Binary file not shown.
Binary file modified images/README_poster_450pdi.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit b48ae9a

Please sign in to comment.