Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement node assembly and standardize identifiers #24

Merged
merged 34 commits into from
May 17, 2021
Merged

Implement node assembly and standardize identifiers #24

merged 34 commits into from
May 17, 2021

Conversation

bgyori
Copy link
Member

@bgyori bgyori commented May 13, 2021

This PR makes the following changes:

@bgyori
Copy link
Member Author

bgyori commented May 13, 2021

I will test this now in practice to try to make a new build and will push fixes here. Until then, feel free to review/comment @cthoyt .


if __name__ == "__main__":
ReactomeProcessor.cli()
WikipathwaysProcessor.cli()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way cli is implemented somehow the second call here to WikipathwaysProcessor is never reached. @cthoyt if you know why, please push a fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I don't know why this isn't working

@bgyori
Copy link
Member Author

bgyori commented May 14, 2021

This is basically working now. One thing I didn't figure out how to refactor best is the integration of node assembly with cached node generation from individual processors. Currently, indra_cogex.representation.Node objects use INDRA-standardized namespace and ID pairs whereas the dumped tables use a single CURIE string with a different namespace and ID standard. Assembly works directly on Node objects which means that assembly can only happen if nodes are collected from processors. In principle we could re-read the dumped node tables into Node objects, then assemble them, and dump them again into an assembled table but that seems a bit "dirty" to me (instead of just working with the original Node objects that processors generate).

@cthoyt
Copy link
Member

cthoyt commented May 14, 2021

Overall looks good. I think we should have started by keeping the prefix/id separate for each node. I made a comment on the code, but the operation of validating prefix/id pairs and canonicalizing them should be operations implemented in INDRA, right?

Also, now we have the nice CI tests for code docs, we have to do all of them ;)

@bgyori
Copy link
Member Author

bgyori commented May 16, 2021

The tests are erroring on GHA with ModuleNotFoundError: No module named 'indra_cogex', with this setup I'm not sure what to change for the package to be visible.

@bgyori
Copy link
Member Author

bgyori commented May 16, 2021

I'm honestly not sure this much strict linting and mypy is productive, we need to be able to move on and focus on functionality.

@bgyori
Copy link
Member Author

bgyori commented May 16, 2021

The tests are erroring on GHA with ModuleNotFoundError: No module named 'indra_cogex', with this setup I'm not sure what to change for the package to be visible.

@cthoyt I'm not sure why this happens, I haven't used tox with pytest in this way before so am a bit lost. What would we need to change to avoid the "ModuleNotFoundError: No module named 'indra_cogex'" errors on GHA? See e.g., https://github.com/bgyori/indra_cogex/runs/2593007487?check_suite_focus=true.

cthoyt and others added 5 commits May 16, 2021 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants