Data classification, Sensitive data identification? #392

juju4 · 2023-10-07T19:31:26Z

It would be good if pandora could

extract a data classification if present in document
highlight if sensitive data is present and matching patterns: typically credentials but also PII, PHI. Few tools that could be used:

Note: this could be useful for both file and text input. For example, user could use the internal pandora to validate a text before sending to an external llm as prompt or online tool/spell/translate/whatever

Rafiot · 2023-10-25T09:33:02Z

Regarding data classification, can you explain more what you mean? It might be possible if the classification is in the metadata, but I'm not sure how do to that efficiently in any other situation.

I'll look at the tools you mentioned, especially the yelp one as it is already a python module. If you're already working on a module, please le tme know so I don't reinvent the wheel.

Just a note regarding the LLM part and generally sharing with 3rd party: I'd not trust anything automated to properly detect PII/secrets before sending them to a 3rd party blackbox, so this is never going to be supported officially by pandora. A human will always have to take the responsibility for that kind of behaviors.

juju4 · 2023-10-28T16:30:23Z

I'm not looking to remove human from decision, just try to help them make it.
Idea was if you have an internal pandora instance where in best case, people get used to submit their office files, having at same place a reminder that the file/content has a classification banner or file metadata or is identified with sensitive data would be a nice helper.

The classification identification outside of metadata would just be a text pattern match with some example scales (BAIL/BAF from https://help.libreoffice.org/latest/en-US/text/shared/guide/classification.html and TLP from https://www.first.org/tlp/) that could be customized to match internal naming.

Not working on a module.

adulau added the enhancement New feature or request label Oct 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data classification, Sensitive data identification? #392

Data classification, Sensitive data identification? #392

juju4 commented Oct 7, 2023

Rafiot commented Oct 25, 2023

juju4 commented Oct 28, 2023

Data classification, Sensitive data identification? #392

Data classification, Sensitive data identification? #392

Comments

juju4 commented Oct 7, 2023

Rafiot commented Oct 25, 2023

juju4 commented Oct 28, 2023