This repository introduces the NICT Darknet Dataset 2022 https://csdataset.nict.go.jp/darknet-2022/. This dataset and the ground truth have been used in [1] [2] [3] [4] on malware detection in the darknet. The observation scale of all darknet sensors and the total observed IPv4 addresses, the malware activity threats and their characteristic of network activity, and the goal of detecting TCP ports where malware activities were observed are presented.
The data is collected with eight darknet sensors that monitor different IP address spaces and have different observation scales of the subnet. The observation scale of darknet sensors ranges from 2048 IP addresses to 32768 IP addresses. These eight darknet sensors monitor a total of 86016 IP addresses.
To emphasize the unknown scanning activities, well-known ports 22, 23, 80, 81, 445, 1433, 2323, 3389, 5555, 8080, and 52869 are excluded when constructing observation matrices in the data processing step [1]. Port 22 is SSH service. Port 23 and 2323 are Telnet and its alternative. Port 80, 81, and 8080 are web services. Port 445 is used for file sharing of service message block in Windows. Port 1433 is Microsoft SQL Server. Port 3389 is Microsoft remote desktop. Port 5555 is SoftEther VPN SoftEther VPN (Ethernet over HTTPS). Port 52869 is Xsan, the file sharing system for macOS.
For darknet sensors B and D, we observe intersecting columns of source hosts and destination ports all the time in October 2018. There is average 3742 intersecting source hosts and 1172 destination ports.
Ground truths were the malware activities in October 2018, labeled by human experts of operators of the National Institute of Information and Communications Technology (NICT) [1]. The ground truths are 35 TCP destination ports, 21, 25, 82, 83, 84, 85, 88, 110, 443, 444, 1701, 2004, 2480, 5358, 5379, 5431, 5900, 5984, 6379, 7379, 7547, 8000, 8001, 8010, 8081, 8088, 8181, 8291, 8443, 8888, 9000, 23023, 37215, 49152, 65000. They are grouped into IoT Malware, router vulnerability, and application vulnerability.
We can observe a spike in the number of source hosts for Mirai type I on October 20. The following figures are the number of source hosts and number of packets per half an hour in October 2018 at TCP ports where Mirai type I were observed by sensors B and D.
We can observe a surge in the number of source hosts targeting port 88 and 8081 for Mirai II on October 20.
We can observe a surge in the number of source hosts on October 16 and 19 for Mirai type III.
For the threat related to MikroTik of router vulnerability, there are spikes of source hosts on October 13, 16, and 19.
A surge in the number of source hosts can be observed on October 12 for application vulnerability threats part 1.
A surge in the number of source hosts can be observed on October 14 and October 30 for application vulnerability threats part 2.
- Han, C., Takeuchi, J. I., Takahashi, T., & Inoue, D. (2021, October). Automated detection of malware activities using nonnegative matrix factorization. In 2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) (pp. 548-556). IEEE.
- Han, C., Takeuchi, J. I., Takahashi, T., & Inoue, D. (2022). Dark-TRACER: Early detection framework for malware activity based on anomalous spatiotemporal patterns. IEEE Access, 10, 13038-13058.
- Han, C., Shimamura, J., Takahashi, T., Inoue, D., Takeuchi, J. I., & Nakao, K. (2020). Real-time detection of global cyberthreat based on darknet by estimating anomalous synchronization using graphical lasso. IEICE TRANSACTIONS on Information and Systems, 103(10), 2113-2124.
- Chang, Y. W., Chen, H. Y., Han, C., Morikawa, T., Takahashi, T., & Lin, T. N. (2023). FINISH: Efficient and Scalable NMF-Based Federated Learning for Detecting Malware Activities. IEEE Transactions on Emerging Topics in Computing.