URL Embedding Using MIL Algorithm

Paper:Bag-of-Characters: A Multiple Instance Learning Framework for URL Embedding in Web Security

Project Overview

This project implements a novel approach for embedding URLs using a Multi-Instance Learning (MIL) approach. The goal is to enhance the detection of malicious web activities by transforming URLs into a structured, vectorized format that captures both semantic and structural nuances.

Modules

The project is divided into three main modules:

data_preprocessing.py - Handles the loading and initial processing of URL data from CSV files.
feature_extraction.py - Manages the transformation of URLs to vector representations using position encoding and normalizes these vectors using a MIL-based strategy.
main.py - Orchestrates the training process, applies KMeans clustering, computes miVLAD vectors, and saves the results.

Getting Started

To get started with this project, clone the repository and install the required dependencies:

git clone https://github.com/chiachen-chang/mil_urlembedding
cd your-repository-directory
pip install -r requirements.txt

Usage

Run the main.py to start the process:

python main.py

Contributing

We welcome contributions from the community, whether they are feature requests, improvements, or bug fixes. Please fork the repository and submit your pull requests for review.

Discussion and Learning

We encourage everyone to participate in discussions and learning around this project. If you have questions, suggestions, or insights, please feel free to open an issue for discussion

Let's collaborate to make URL embedding even more effective and secure!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
data_preprocessing.py		data_preprocessing.py
feature_extraction.py		feature_extraction.py
main.py		main.py
requirements.txt		requirements.txt
sample_test.csv		sample_test.csv
sample_train.csv		sample_train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

URL Embedding Using MIL Algorithm

Project Overview

Modules

Getting Started

Usage

Contributing

Discussion and Learning

About

Releases

Packages

Languages

chiachen-chang/mil_urlembedding

Folders and files

Latest commit

History

Repository files navigation

URL Embedding Using MIL Algorithm

Project Overview

Modules

Getting Started

Usage

Contributing

Discussion and Learning

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages