Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance ChnPiiGenerator with Support for Additional Chinese PII Data Types #224

Open
MooooCat opened this issue Oct 17, 2024 · 0 comments
Open
Labels
difficulty-medium enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@MooooCat
Copy link
Contributor

🚅Search before asking

I have searched for issues similar to this one.

🚅Description

We would like to enhance the existing ChnPiiGenerator by adding support for more data types. Currently, this generator primarily handles Chinese personal identifiable information (PII) data. Expanding its capabilities to support additional data types will improve its applicability and functionality.

The PR link is #191, which introduces the ChnPiiGenerator class for handling Chinese PII data, including fitting, converting, and reverse converting processes.

🏕Solution(optional)

It is suggested to add support for the following data types in the ChnPiiGenerator:

  1. Chinese addresses
  2. Unified Social Credit Codes
  3. Other common Chinese PII data types

🍰Detail(optional)

In the ChnPiiGenerator class, the fit method can be extended to identify and process new data types. Below is a relevant code snippet from the fit method, you can refer to the following code snippet to understand how existing data types are handled in the ChnPiiGenerator:

    def fit(self, metadata: Metadata | None = None, **kwargs: dict[str, Any]):

        for each_col in metadata.column_list:
            data_type = metadata.get_column_data_type(each_col)
            if data_type == "chinese_name":
                self.chn_name_columns_list.append(each_col)
                continue
            if data_type == "china_mainland_mobile_phone":
                self.chn_phone_columns_list.append(each_col)
                continue
            if data_type == "china_mainland_id":
                self.chn_id_columns_list.append(each_col)
                continue
            if data_type == "chinese_company_name":
                self.chn_company_name_list.append(each_col)

        self.fitted = True
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty-medium enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant