Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data classification, Sensitive data identification? #392

Open
juju4 opened this issue Oct 7, 2023 · 2 comments
Open

Data classification, Sensitive data identification? #392

juju4 opened this issue Oct 7, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@juju4
Copy link
Contributor

juju4 commented Oct 7, 2023

It would be good if pandora could

Note: this could be useful for both file and text input. For example, user could use the internal pandora to validate a text before sending to an external llm as prompt or online tool/spell/translate/whatever

@adulau adulau added the enhancement New feature or request label Oct 7, 2023
@Rafiot
Copy link
Contributor

Rafiot commented Oct 25, 2023

Regarding data classification, can you explain more what you mean? It might be possible if the classification is in the metadata, but I'm not sure how do to that efficiently in any other situation.

I'll look at the tools you mentioned, especially the yelp one as it is already a python module. If you're already working on a module, please le tme know so I don't reinvent the wheel.

Just a note regarding the LLM part and generally sharing with 3rd party: I'd not trust anything automated to properly detect PII/secrets before sending them to a 3rd party blackbox, so this is never going to be supported officially by pandora. A human will always have to take the responsibility for that kind of behaviors.

@juju4
Copy link
Contributor Author

juju4 commented Oct 28, 2023

I'm not looking to remove human from decision, just try to help them make it.
Idea was if you have an internal pandora instance where in best case, people get used to submit their office files, having at same place a reminder that the file/content has a classification banner or file metadata or is identified with sensitive data would be a nice helper.

The classification identification outside of metadata would just be a text pattern match with some example scales (BAIL/BAF from https://help.libreoffice.org/latest/en-US/text/shared/guide/classification.html and TLP from https://www.first.org/tlp/) that could be customized to match internal naming.

Not working on a module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants