CSE 543, Information Assurance and Security, Fall 2023, Group 1-5
A project report presented by
- Gautham Vijayaraj, [email protected]
- Krupaben Kothadia, [email protected]
- Avani Mundra, [email protected]
- Justin Young, [email protected]
- Anuranjan Dubey, [email protected]
- Rahul Nayak, [email protected]
- Sangeeth Santhosh, [email protected]
- Yeshwanth Reddy Chennur, [email protected]
The project will focus on the following areas:
-
Data Exploration: There are a wide variety of suspicious activities that go on social media these days. Thus it is important that we study the different types of suspicious activities.
-
Secure Data Collection Techniques: Finding the right balance between monitoring or detecting suspicious activity or threats in Social media and respecting the user's privacy to maintain ethics, data confidentiality and integrity is crucial while dealing with real data.
-
Data Preprocessing and Feature Engineering: Discussing the various preprocessing and/or NLP techniques to produce a robust dataset for detection of suspicious activities on social media.
-
Data Mining Techniques to Handle Multimodal Data: The data obtained from Social Media can be unstructured and varied as it can contain texts, images, videos and more. So we need to find a data mining technique that can help us process and analyze suspicious activity.
-
Leveraging Machine Learning Models: Specific machine learning algorithms have been previously employed to enhance the accuracy and efficiency of the proposed tools in the previous research in classifying and categorizing security-related content from Twitter. Research will be performed on whether these algorithms be extended beyond their initial implementation on Twitter, and if they can be adapted for use on other social media platforms with minimal modifications.
-
Model Security Assurance: Discussion on techniques producing a secure machine learning detection model which can refrain from tampering or unauthorized modifications from hackers.
-
Model Evaluation: The suspicious social media accounts to be analyzed need to be found correctly without incorrectly grouping non-suspicious accounts as suspicious. Such incorrect classification would mean questioning the integrity of an honest user. Research will be performed on whether the right approach can be found to classify social media accounts into suspicious or non-suspicious without false grouping.
The project is expected to achieve the following outcomes:
-
Machine Learning Models: Finding reliable machine learning models that can recognize different kinds of suspicious activity on social media networks. The suspicious activities can be fraudulent accounts, hate speech, cyberbullying, and misinformation, multiple models may be created.
-
Accurate Results : Coming up with the research of the machine learning models which produces high levels of accuracy, precision, recall, and F1-score when detecting instances of suspicious activity.
-
Efficient Data Processing: Focus on finding efficient techniques to process large volumes of data without compromising the performance rate at any instance. Conduct an in-depth analysis to determine the suspicious activities which would be identified using specific features and indicators.
-
Adaptive Learning : A system that’s designed to adapt and evolve to new forms of threats and suspicious activities. The machine learning models would not only make accurate predictions but also provide insights into why a particular activity is deemed suspicious.
-
Privacy-Preserving Techniques : A system that protects the integrity and confidentiality of user data while detecting social media for suspicious activities and threats. Maintains the ethical standards of user data and ensures data privacy.
-
Documentation and Reporting Findings of the Project: A thorough report describing the project's methodology, algorithms, steps of implementation, and outcomes will be prepared.
-
Basis for Further Research: The models and conclusions will operate as a starting point for additional study in the area of detecting suspicious behavior on social media.
The following are some of the risks associated with the project:
-
Incomplete or Inaccurate Information: Information on different data mining techniques, machine learning models, or security measures may be insufficient or inaccurate in the literature.
-
Dependency on External Sources: The literature study depends on the accessibility and dependability of outside sources, which may be modified or stopped altogether.
-
Complexity of Model Evaluation: It can be difficult to evaluate machine learning models and categorize social media accounts, and there may not be a single set of accepted standards.
-
Security Risks: When discussing machine learning model security, it's possible to unintentionally reveal information that could be used by malicious individuals.
-
Zero-Day Detection Challenges: Zero-day detection techniques might not always work or have practical application restrictions.
-
Data Mining Challenges: It may be difficult to find appropriate data mining methods to handle the variety of unstructured social media data, and some methods might be less efficient than others.
-
Resource Constraints: The quality and breadth of the literature review may suffer from a lack of time and resources.