Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] concatenate_columns Ignoring null values is not available #1164

Open
Fu-Jie opened this issue Sep 8, 2022 · 3 comments · May be fixed by #1166
Open

[BUG] concatenate_columns Ignoring null values is not available #1164

Fu-Jie opened this issue Sep 8, 2022 · 3 comments · May be fixed by #1166

Comments

@Fu-Jie
Copy link

Fu-Jie commented Sep 8, 2022

Source Code

@pf.register_dataframe_method
@deprecated_alias(columns="column_names")
def concatenate_columns(
    df: pd.DataFrame,
    column_names: List[Hashable],
    new_column_name: Hashable,
    sep: str = "-",
    ignore_empty: bool = True,
) -> pd.DataFrame:

    if len(column_names) < 2:
        raise JanitorError("At least two columns must be specified")


################# Need  modified  ###################
    df[new_column_name] = (
        df[column_names].astype(str).fillna("").agg(sep.join, axis=1)
    )
######## ############ to ###############################
    df[new_column_name] = (
        df[column_names].astype('string').fillna("").agg(sep.join, axis=1)
    )


    if ignore_empty:

        def remove_empty_string(x):
            """Ignore empty/null string values from the concatenated output."""
            return sep.join(x for x in x.split(sep) if x)

        df[new_column_name] = df[new_column_name].transform(
            remove_empty_string
        )

    return df

Like this, np.nan becomes 'nan' in astype str

image

If change it to this, there is no problem

image

@samukweku
Copy link
Collaborator

samukweku commented Sep 8, 2022

thanks @Fu-Jie , do you mind raising a PR for this? It might be better to add an na argument, to decide what happens if nulls are present

@Fu-Jie
Copy link
Author

Fu-Jie commented Sep 8, 2022

thanks @Fu-Jie , do you mind raising a PR for this? It might be better to add an na argument, to decide what happens if nulls are present

I haven't done PR before, but I can try it. This is my solution, what do you think?

na = ['NaT','nan','<NA>']
df[new_column_name] = (
        df[column_names].astype(str).replace(na,"").agg(sep.join, axis=1)
    )

image

@samukweku
Copy link
Collaborator

give it a go @Fu-Jie ; the @pyjanitor-devs/core-devs is welcoming and will gladly guide you through the process. Have a look at the guide on how to submit a PR.

@Fu-Jie Fu-Jie linked a pull request Sep 8, 2022 that will close this issue
3 tasks
@Zeroto521 Zeroto521 changed the title concatenate_columns Ignoring null values is not available [BUG] concatenate_columns Ignoring null values is not available Sep 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants