Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hail backend - SV WES #3575

Merged
merged 20 commits into from
Sep 8, 2023
Merged

Hail backend - SV WES #3575

merged 20 commits into from
Sep 8, 2023

Conversation

hanars
Copy link
Collaborator

@hanars hanars commented Aug 23, 2023

Getting this up early for review, still needs unit tests

Copy link
Contributor

@ShifaSZ ShifaSZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly, it looks good to me.

Comment on lines -1161 to +1157
QUERY_CLASS_MAP = {cls.DATA_TYPE: cls for cls in [VariantHailTableQuery, SvHailTableQuery]}
class GcnvHailTableQuery(SvHailTableQuery):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the file is getting bigger, how about splitting them into per-class files?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, Ben suggested the same thing on the other PR. I agree, but to keep the diff sane I will do a follow up PR to move everything into its own file

@hanars hanars changed the base branch from hail-backend-sv-wgs to dev September 5, 2023 18:12
@hanars hanars requested review from bpblanken and removed request for bpblanken September 5, 2023 20:00

def get_allowed_sv_type_ids(self, sv_types):
return super().get_allowed_sv_type_ids([
type.replace('gCNV_', '') for type in sv_types if type.startswith('gCNV_')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a case for doing this replacement in the pipeline? So that instead of passing the enum ids for gCNV_DEL and gCNV_DUP we instead pass the ids for DEL and DUP? Or are gCNV_DEL and gCNV_DUP used elsewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh I had thought we already did, because gCNV_DUP and gCNV_DEL are in the enum, but then after I saw that they weren't I though this approach was easier. We need to return them to the client without the prefix so it would be doing an additional annotation to format them and remove the prefix after the query, and doing a one time string replace in python before doing the query seemed more efficient than having to modify every row in the table

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree the string replace in python is better. I guess, if we'd like to remove gCNV_DUP and gCNV_DEL from the enum it's no problem on my end.

@@ -8,6 +8,8 @@
def _hl_json_default(o):
if isinstance(o, hl.Struct) or isinstance(o, hl.utils.frozendict):
return dict(o)
elif isinstance(o, set):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be hl.set?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it was an issue with a regular python set, sets are not actually json serializable and this solved the particular issue. But I can also add a case for a hail set

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, since we are only json serializing collected results we do not run into this case, as hail will have already converted all hail setexpressions to python sets. The only cases where we end up with hail-specific data types are those that hail provides which can exist as raw dat atypes and not expressions, which is just Interval, Struct, and frozendict - https://hail.is/docs/0.2/utils/index.html#utils

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, I think that's right.

@hanars hanars merged commit e944207 into dev Sep 8, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants