Everything was polished and prepared for the release

README.md: - two more references were added - English spelling and grammar was checked data-visualization-scripts-only.R - useless library was removed - 3d plot axis labels were corrected - misspelled plot title was fixed data-visualization.Rmd & data-visualization.pdf - legend of the plots was corrected - misspelled plot title was fixed - conclusions were added data-visualization.ipynb - header and tail posters were added modules/modules-demo.ipynb - URL parts naming scheme was added project_diary_Vitalii.md was removed since now it is fully covered by project_diary.md
FrightenedFox · Jun 7, 2021 · b48ae9a · b48ae9a
1 parent 528bdeb
commit b48ae9a
Show file tree

Hide file tree

Showing 10 changed files with 7,234 additions and 94 deletions.
diff --git a/README.md b/README.md
@@ -6,17 +6,17 @@
 
 *By Vitalii Morskyi & Julia Makarska*
 
-The phenomenon of phishing has been around for many years. However, the last year has shown how important internet security is among other things. Over a year ago the world stopped: everybody and everything was moved to the Internet. That motivated us to analyse the topic of Phishing. Usually phishers use email or SMS messages to deceive us and force us to act according to their expectations. The key points we want to emphasise in our research are how easy it is to get tricked and what are the common properties of malicious URLs. The aspects we analysed cover only a small piece of this cheating method, however we found the results to be interesting and hope you will as well. This file, however, is more about recreating the steps of our analysis, not reporting the final results. However, if you are interested in the latter one, please checkout the [`demonstration`](https://github.com/FrightenedFox/r-lab-project/tree/main/demonstration) folder or the [`data-visualization.\*`](https://github.com/FrightenedFox/r-lab-project/blob/main/data-visualization.pdf) files. 
+The phenomenon of phishing has been around for many years. However, the last year has shown how important internet security is among other things. Over a year ago, the world stopped: everybody and everything was moved to the Internet. That motivated us to analyse the topic of Phishing. Phishers usually use email or SMS messages to deceive users and force to act according to their expectations. The key points we want to emphasize in our research are how easy it is to get tricked and what are the common properties of malicious URLs. The aspects we analyzed cover only a small piece of this cheating method, however we found the results to be interesting, and hope you will as well. At the same time, this file is more about recreating the steps of our analysis, not reporting the final results. However, if you are interested in the latter one, please checkout the [`demonstration`](https://github.com/FrightenedFox/r-lab-project/tree/main/demonstration) folder or the [`data-visualization.\*`](https://github.com/FrightenedFox/r-lab-project/blob/main/data-visualization.pdf) files. 
 
 
 ## Setting up the environment
 
-The main analysis is made using **Jupyter Notebook** which is usually used with Python, but also supports R.  
+The main analysis is conducted by using **Jupyter Notebook** which is usually used with Python, but also supports R.  
 So, to get things work properly, you would have to install some R and Python packages.
 
 ### Python modules
 
-First of all you need [Python](https://www.python.org/downloads/) 3.5 or greater. Next you are expected to install `JupyterLab` and `r-essentials` modules. 
+First of all, you need [Python](https://www.python.org/downloads/) 3.5 or greater. Next, you are expected to install `JupyterLab` and `r-essentials` modules. 
 
 #### Using [`conda`](https://docs.conda.io/en/latest/miniconda.html)
 
@@ -45,7 +45,7 @@ For more ways of installing `JupyterLab` please checkout [this page](https://jup
 
 ### Running the JupyterLab environment
 
-Assuming R-essentials are installed you can use one of the following commands to open JupyterLab environment:
+Assuming R-essentials are installed, you can use one of the following commands to open JupyterLab environment:
 
 ```bash
 jupyter-lab
@@ -57,15 +57,15 @@ or
 python -m jupyter-lab
 ```
 
-If everything was installed correctly then webpage similar to the one shown on the image below should appear in your default browser. 
+If everything has been installed correctly, then webpage similar to the one shown on the image below should appear in your default browser. 
 
 <p align="center">
   <img src="./images/README_JupyterLab_450pdi.png" alt="Example of the JupyterLab environment">
 </p>
 
 ### R packages
 
-To install all required packages please open `R Console` in the JupyterLab tab and execute the following piece of code:
+To install all required packages, please open `R Console` in the JupyterLab tab and execute the following piece of code:
 
 ```R
 install.packages("stringi")
@@ -78,9 +78,9 @@ install.packages("rgl")
 install.packages("GGally")
 ```
 
-Note: if any problems occur while installing those packages try creating a separate `Conda Environment` specially for this project. To do so you can use `conda create --name EnvironmentName jupyterlab r-essentials` command. To activate your environment use the following command: `conda activate EnvironmentName`. Now you can continue from the step <a href="#running-the-jupyterlab-environment"><em>Running the JupyterLab environment</em></a>. [Does that work better?](https://github.com/FrightenedFox/r-lab-project#running-the-jupyterlab-environmenta) You can find out more about `Conda Environments` on their [official documentation page](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html).
+Note: if any problems occur while installing those packages, try creating a separate `Conda Environment` specifically for this project. To do so, you can use `conda create --name EnvironmentName jupyterlab r-essentials` command. To activate your environment, use the following command: `conda activate EnvironmentName`. Now you can continue from the step <a href="#running-the-jupyterlab-environment"><em>Running the JupyterLab environment</em></a>. You can find out more about `Conda Environments` on their [official documentation page](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html).
 
-If there were no problems with installing modules you are ready to go. You can start from opening the file `data-visualization.ipynb` by clicking on it's icon on the side bar. 
+If there were no problems with installing modules, you are ready to go. You can start from opening the file `data-visualization.ipynb` by clicking on it's icon on the side bar. 
 
 ## References
 
@@ -95,6 +95,8 @@ If there were no problems with installing modules you are ready to go. You can s
 ### Theory behind the scenes: 
 - [CERT Polska : Lista ostrzeżeń przed niebezpiecznymi stronami](https://cert.pl/posts/2020/03/ostrzezenia_phishing/)
 - [Uniform Resource Identifier](https://en.wikipedia.org/wiki/Uniform_Resource_Identifier)
+- [Czym jest PHISHING](https://www.gov.pl/web/baza-wiedzy/)czym-jest-phishing-i-jak-nie-dac-sie-nabrac-na-podejrzane-widomosci-e-mail-oraz-sms-y
+- [Kontrolowany atak PHISHINGOWY](https://phishing.opcja.pl/)
 
 ### Documentations and code examples:
 - [R documentation](https://www.rdocumentation.org/)

diff --git a/data-visualization-scripts-only.R b/data-visualization-scripts-only.R
@@ -4,7 +4,6 @@
 # install.packages("ggplot2")
 # install.packages("ggExtra")
 # install.packages("hrbrthemes")
-# install.packages("tidyverse")
 # install.packages("rgl")
 # install.packages("GGally")
 
@@ -15,7 +14,6 @@ library("ggplot2")
 library("ggExtra")
 library("hrbrthemes")
 library("rgl")
-library("tidyverse")
 library("GGally")
 
 source("modules/split-url.r")
@@ -149,7 +147,7 @@ ggplot(fdfp[fdfp$url_l < 500, ], aes(x = label, y = url_l, group = label, fill =
 splom(~data.frame(xyx_host, lett_host, dig_host, symb_host), 
       data = fdfp[sample(nrow(fdfp), 1000),],
       pch = 1,
-      main = "Rozkład symboli w hoscie adresu URL.",
+      main = "Rozkład symboli w hoście adresu URL.",
       groups = label,
       #       xlab = c("A", "B", "C", "D"),
       #       xlab = "", # czymś takim można usunąńć ten napis "Scatter Plot Matrix"
@@ -307,7 +305,7 @@ plot3d(
   col = fdfm$color, 
   type = 's', 
   radius = 30,
-  xlab = "JS", ylab = "JS obf ", zlab = "URL")
+  xlab = "URL", ylab = "JS ", zlab = "JS obf")
 rgl.bg( sphere = FALSE, fogtype = "none", color = c("#d8d7c4", "black"), 
         back = "lines", fogScale = 1)
 
diff --git a/data-visualization.Rmd b/data-visualization.Rmd
@@ -9,8 +9,6 @@ output:
 Zjawisko Phishingu zachodzi już od wielu lat. Jednak między innymi ostatni rok pokazał nam jak ważne jest bezpieczeństwo w internecie. Od roku świat się zatrzymał i przeniósł wszystko do Internetu. Z uwagi na ten fakt podjęcie tematu Phishingu uznaliśmy za bardzo na miejscu. Chcemy pokazać jak łatwo można dać się okraść. Przedstawiony przez nas projekt obejmuje tylko niewielki kawałek tej metody oszustwa, jednak uznaliśmy, że temat jest ciekawy.
 Phishing jest to atak oparty na wiadomościach e-mail lub SMS. Przestępcy internetowi próbują Cię oszukać i wymusić na Tobie działania zgodne z ich oczekiwaniami.
 
-# Importowanie danych, pakietów R i modułów
-
 ---
 
 ```{r message=FALSE, warning=FALSE}
@@ -21,7 +19,6 @@ library("ggplot2")
 library("ggExtra")
 library("hrbrthemes")
 library("rgl")
-library("tidyverse")
 library("GGally")
 
 source("modules/split-url.r")
@@ -158,7 +155,6 @@ ldsc_res_2 <- lett_dig_symb_count(split_res_2)
 
 Najpierw tworzymy jedną macierz z wynikami wszystkich obliczeń, którą potem konwertujemy w ramkę danych.
 
-
 ```{r}
 params_df <- as.data.frame(cbind(
     lengths_res,
@@ -269,9 +265,9 @@ ggplot(fdfp[fdfp$url_l < 500, ], aes(x = label, y = url_l, group = label, fill =
   theme(plot.title = element_text(family = "", 
                                   face = 'bold', 
                                   colour = 'black', 
-                                  size = 12),
-        # panel.background = element_rect(fill = "#f0bc5e", colour = "black")
-        # rect = element_rect(fill = "#d8d7c4") 
+                                  size = 12)
+#         panel.background = element_rect(fill = "#f0bc5e", colour = "black")
+#         rect = element_rect(fill = "#d8d7c4") 
   ) + 
  scale_fill_manual(values = c("#6dd38c", "#f3aca7")) +
   labs(color = "Etykieta")
@@ -298,21 +294,21 @@ splom(~data.frame(xyx_host, lett_host, dig_host, symb_host),
 \newpage
 
 ```{r}
-ggpairs(fdfp[fdfp$url_l < 500 & sample(nrow(fdfp), 500), ], 
+ggpairs(fdfp[fdfp$url_l < 500 & sample(nrow(fdfp), 1000), ], 
         aes(color = color,
             alpha = .5),
         columns = c("xyx_host", "lett_host", "dig_host", "symb_host"),
         columnLabels = c("Ilość ciągów\npostaci XYX", 
                          "Ilość liter", 
                          "Ilość cyfr",
                          "Liczba znaków\ninterpunkcyjnych")) +
-  ggtitle("Rozkład symboli w hoscie adresu URL.") +
+  ggtitle("Rozkład symboli w hoście adresu URL.") +
   theme(plot.title = element_text(family = "", 
                                   face = 'bold', 
                                   colour = 'black', 
                                   size = 12)
-        #panel.background = element_rect(fill = "#f0bc5e", colour = "black"),
-        #rect = element_rect(fill = "#d8d7c4")
+#         panel.background = element_rect(fill = "#f0bc5e", colour = "black"),
+#         rect = element_rect(fill = "#d8d7c4")
   ) + 
   labs(color = "Etykieta")
 ```
@@ -322,8 +318,8 @@ ggpairs(fdfp[fdfp$url_l < 500 & sample(nrow(fdfp), 500), ],
 ```{r}
 histogram(~ symb_url  | label  , 
           data = fdfp[sample(nrow(fdfp), 2000),],
-          main = "Porównanie liczba znaków interpunkcyjnych\nw dobrych i złych domenach",
-          xlab = "Ilość symboli w URL",
+          main = "Porównanie ilości znaków interpunkcyjnych\nw dobrych i złych domenach.",
+          xlab = "Ilość symboli w adresie URL",
           ylab = "Procent całości",
           layout = c(1, 2),
           nint = 20,
@@ -343,8 +339,8 @@ ggplot(fdfp[fdfp$url_l < 500, ], aes(x = symb_url, fill = label)) +
                                   face = 'bold', 
                                   colour = 'black', 
                                   size = 12)
-        # panel.background = element_rect(fill = "#f0bc5e", colour = "black"),
-        # rect = element_rect(fill = "#d8d7c4")
+#         panel.background = element_rect(fill = "#f0bc5e", colour = "black"),
+#         rect = element_rect(fill = "#d8d7c4")
   ) + 
   scale_fill_manual(values = c("#6dd38c", "#f3aca7")) +
   labs(fill = "Etykieta", color = "Etykieta")
@@ -397,8 +393,8 @@ ggplot(data = fdfp[fdfp$url_l < 200, ], aes(x = symb_url, group = label, fill =
                                   face = 'bold', 
                                   colour = 'black', 
                                   size = 12)
-        #panel.background = element_rect(fill = "#f0bc5e", colour = "black"),
-        #rect = element_rect(fill = "#d8d7c4")
+#         panel.background = element_rect(fill = "#f0bc5e", colour = "black"),
+#         rect = element_rect(fill = "#d8d7c4")
   ) + 
   scale_fill_manual(values = c("#6dd38c", "#f3aca7")) +
   labs(fill = "Etykieta", color = "Etykieta")
@@ -416,8 +412,8 @@ ggplot(data = fdfm, aes(x = js_len, group = label, fill = label)) +
                                   face = 'bold', 
                                   colour = 'black', 
                                   size = 12)
-        #panel.background = element_rect(fill = "#f0bc5e", colour = "black"),
-        #rect = element_rect(fill = "#d8d7c4")
+#         panel.background = element_rect(fill = "#f0bc5e", colour = "black"),
+#         rect = element_rect(fill = "#d8d7c4")
   ) + 
   scale_fill_manual(values = c("#f3aca7", "#6dd38c")) +
   labs(fill = "Etykieta", color = "Etykieta")
@@ -435,17 +431,19 @@ ggplot(data = fdfm, aes(x = js_len, y = js_obf_len, color = label) ) +
                                   face = 'bold', 
                                   colour = 'black', 
                                   size = 12)
-        #panel.background = element_rect(fill = "#f0bc5e", colour = "black"),
-        #rect = element_rect(fill = "#d8d7c4")
+#         panel.background = element_rect(fill = "#f0bc5e", colour = "black"),
+#         rect = element_rect(fill = "#d8d7c4")
   ) + 
   scale_fill_manual(values = c("#6dd38c", "#f3aca7")) +
   labs(fill = "Etykieta", color = "Etykieta")
 ```
 
-\newpage
+# Wnioski
 
-# Jakieś wnioski przydałoby się dodać
+---
 
+Przeprowadzając powyższą analizę wiele się nauczyliśmy. Zauważyliśmy jakie związki leksykalne występują w przypadku domen złych. Wiemy jakie "zamiany" występują najczęściej. Przyglądając się linkom nie zawsze da się to wszystko wyłapać od razu. Zatem najważniejszym, ale nie jedynym, wnioskiem płynącym z naszego projektu jest fakt, że trzeba uważać w jakie linki się wchodzi.
+**Dostałeś/łaś podejrzanego maila? Nie otwieraj żadnych linków!**
 
 
 
diff --git a/data-visualization.ipynb b/data-visualization.ipynb
diff --git a/data-visualization.pdf b/data-visualization.pdf
diff --git a/images/README_poster_450pdi.png b/images/README_poster_450pdi.png