Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BoostDM | encode_consequence_type(data) inconsinstent when missing consequence. #1

Open
FedericaBrando opened this issue Nov 30, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@FedericaBrando
Copy link
Member

def encode_consequence_type(data):
data.loc[~data['csqn_type'].isin(csqn_type_list), 'csqn_type'] = 'none'
one_hot = pd.get_dummies(data, columns=['csqn_type'], prefix_sep='_')
one_hot.drop(columns=['csqn_type_none'], inplace=True, errors='ignore')
for c in csqn_type_list:
col = f'csqn_type_{c}'
if col not in list(one_hot.columns):
one_hot[col] = 0
return one_hot

pd.get_dummies() in this instance returns by default a dataframe of True and False. However, If a consequence is missing, in the for-loop it is added the missing consequence column filled with 0.

i.e.

SOCS1.DLBCLNOS.annotated.tsv.gz.

csqn_type_missense	csqn_type_nonsense	csqn_type_synonymous	csqn_type_splicing
False			True			False			0
False			True			False			0
False			True			False			0
False			True			False			0
False			True			False			0
False			True			False			0
False			True			False			0
False			True			False			0
True			False			False			0

This inconstistency may derive some problem, in my case I was using a condition like df[df[csqn_col]] expecting boolean column and it failed.

Possible solutions:

  1. Uniform all column to 1/0
    • add dtype=float to pd.getdummies() function
  2. Uniform all columns to boolean values
@FedericaBrando FedericaBrando added the bug Something isn't working label Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant