You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We need to implement a machine learning model capable of identifying regions within documents or images containing Personally Identifiable Information (PII). PII, including names, addresses, social security numbers, and email addresses, must be accurately detected to enhance user privacy and data security.
Data Collection and Annotation:
Collect a diverse dataset containing examples of documents or images with annotated regions of PII.
Ensure accurate and consistent annotations, marking the exact boundaries of PII regions.
Model Selection:
Choose or design a suitable object detection model architecture (e.g., Faster R-CNN, SSD, YOLO) for accurate and efficient region detection.
Data Preprocessing:
Preprocess the dataset, including resizing, normalization, and data augmentation, to prepare it for model training.
Model Training and Evaluation:
Split the annotated dataset into training and testing sets.
The text was updated successfully, but these errors were encountered:
This is my first contribution to ML model project please review my approach so that I can go about it.
approach for the training part.
1.Collect the data-I am trying to search for public datasets and pre-existing labeled dataset. If the dataset is not diverse enough than use data augmentation techniques.
2.Clean the data
3.OCR-use ocr technique to extract data from images extracted information then can help us label data to be considered sensitive.
4.Training model-then i will use the collected data and ocr extracted data for training ml model.
use cnn to recognize and classify sensitive information
Hey!
The approach checks out. Although you also need to check if there is PII in the image itself. Like a group picture, or a screenshot of a chat where my profile picture is visible.
akshit-g
changed the title
Develop ML model
Develop ML model to identify PII regions
Jul 14, 2023
akshit-g
changed the title
Develop ML model to identify PII regions
Develop Machine Learning Model for Personally Identifiable Information (PII) Detection
Sep 29, 2023
We need to implement a machine learning model capable of identifying regions within documents or images containing Personally Identifiable Information (PII). PII, including names, addresses, social security numbers, and email addresses, must be accurately detected to enhance user privacy and data security.
Data Collection and Annotation:
Collect a diverse dataset containing examples of documents or images with annotated regions of PII.
Ensure accurate and consistent annotations, marking the exact boundaries of PII regions.
Model Selection:
Choose or design a suitable object detection model architecture (e.g., Faster R-CNN, SSD, YOLO) for accurate and efficient region detection.
Data Preprocessing:
Preprocess the dataset, including resizing, normalization, and data augmentation, to prepare it for model training.
Model Training and Evaluation:
Split the annotated dataset into training and testing sets.
The text was updated successfully, but these errors were encountered: