Enhance ChnPiiGenerator with Support for Additional Chinese PII Data Types #224

MooooCat · 2024-10-17T12:39:58Z

🚅Search before asking

I have searched for issues similar to this one.

🚅Description

We would like to enhance the existing ChnPiiGenerator by adding support for more data types. Currently, this generator primarily handles Chinese personal identifiable information (PII) data. Expanding its capabilities to support additional data types will improve its applicability and functionality.

The PR link is #191, which introduces the ChnPiiGenerator class for handling Chinese PII data, including fitting, converting, and reverse converting processes.

🏕Solution(optional)

It is suggested to add support for the following data types in the ChnPiiGenerator:

Chinese addresses
Unified Social Credit Codes
Other common Chinese PII data types

🍰Detail(optional)

In the ChnPiiGenerator class, the fit method can be extended to identify and process new data types. Below is a relevant code snippet from the fit method, you can refer to the following code snippet to understand how existing data types are handled in the ChnPiiGenerator:

    def fit(self, metadata: Metadata | None = None, **kwargs: dict[str, Any]):

        for each_col in metadata.column_list:
            data_type = metadata.get_column_data_type(each_col)
            if data_type == "chinese_name":
                self.chn_name_columns_list.append(each_col)
                continue
            if data_type == "china_mainland_mobile_phone":
                self.chn_phone_columns_list.append(each_col)
                continue
            if data_type == "china_mainland_id":
                self.chn_id_columns_list.append(each_col)
                continue
            if data_type == "chinese_company_name":
                self.chn_company_name_list.append(each_col)

        self.fitted = True

The text was updated successfully, but these errors were encountered:

MooooCat added good first issue Good for newcomers difficulty-medium enhancement New feature or request help wanted Extra attention is needed labels Oct 17, 2024

Wh1isper mentioned this issue Nov 22, 2024

Performance: reduce for cycles when handling dataframe #245

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance ChnPiiGenerator with Support for Additional Chinese PII Data Types #224

Enhance ChnPiiGenerator with Support for Additional Chinese PII Data Types #224

MooooCat commented Oct 17, 2024

Enhance ChnPiiGenerator with Support for Additional Chinese PII Data Types #224

Enhance ChnPiiGenerator with Support for Additional Chinese PII Data Types #224

Comments

MooooCat commented Oct 17, 2024

🚅Search before asking

🚅Description

🏕Solution(optional)

🍰Detail(optional)