Develop Machine Learning Model for Personally Identifiable Information (PII) Detection #10

akshit-g · 2023-07-03T17:13:36Z

We need to implement a machine learning model capable of identifying regions within documents or images containing Personally Identifiable Information (PII). PII, including names, addresses, social security numbers, and email addresses, must be accurately detected to enhance user privacy and data security.

Data Collection and Annotation:
Collect a diverse dataset containing examples of documents or images with annotated regions of PII.
Ensure accurate and consistent annotations, marking the exact boundaries of PII regions.

Model Selection:
Choose or design a suitable object detection model architecture (e.g., Faster R-CNN, SSD, YOLO) for accurate and efficient region detection.

Data Preprocessing:
Preprocess the dataset, including resizing, normalization, and data augmentation, to prepare it for model training.

Model Training and Evaluation:
Split the annotated dataset into training and testing sets.

shradiphylleia · 2023-07-07T16:02:33Z

This is my first contribution to ML model project please review my approach so that I can go about it.
approach for the training part.
1.Collect the data-I am trying to search for public datasets and pre-existing labeled dataset. If the dataset is not diverse enough than use data augmentation techniques.
2.Clean the data
3.OCR-use ocr technique to extract data from images extracted information then can help us label data to be considered sensitive.
4.Training model-then i will use the collected data and ocr extracted data for training ml model.
use cnn to recognize and classify sensitive information

akshit-g · 2023-07-07T16:09:39Z

Hey!
The approach checks out. Although you also need to check if there is PII in the image itself. Like a group picture, or a screenshot of a chat where my profile picture is visible.

Other than than, the approach looks good.

akshit-g added OSoC’23 Advanced labels Jul 3, 2023

akshit-g assigned shradiphylleia Jul 7, 2023

akshit-g added the backend label Jul 13, 2023

akshit-g changed the title ~~Develop ML model~~ Develop ML model to identify PII regions Jul 14, 2023

akshit-g changed the title ~~Develop ML model to identify PII regions~~ Develop Machine Learning Model for Personally Identifiable Information (PII) Detection Sep 29, 2023

akshit-g added hacktoberfest Issues open for contribution under Hacktoberfest 2020 and removed OSoC’23 labels Sep 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop Machine Learning Model for Personally Identifiable Information (PII) Detection #10

Develop Machine Learning Model for Personally Identifiable Information (PII) Detection #10

akshit-g commented Jul 3, 2023 •

edited

Loading

shradiphylleia commented Jul 7, 2023

akshit-g commented Jul 7, 2023

Develop Machine Learning Model for Personally Identifiable Information (PII) Detection #10

Develop Machine Learning Model for Personally Identifiable Information (PII) Detection #10

Comments

akshit-g commented Jul 3, 2023 • edited Loading

shradiphylleia commented Jul 7, 2023

akshit-g commented Jul 7, 2023

akshit-g commented Jul 3, 2023 •

edited

Loading