-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RWR and TieDIE Integration #92
Conversation
…tart repo; with the modified output file, implemented the parsing function; met one error about Snakemake checking pathway.txt; need to write the tests after
…ate the README; more things to consider: how to select edges for pathway file.
… fixing bugs about running pagerank in TieDIE)
…an utility function in src/utils.py (add rank column to a dataframe); updating README for RWR and TieDIE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't review everything and am only leaving a few comments about things I noticed during an initial quick pass.
.github/workflows/test-spras.yml
Outdated
docker pull erikliu24/rwwr:latest | ||
docker pull erikliu24/tiedie:latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a reminder to change these to reedcompbio
after @annaritz pushes the containers to the organization account. Same goes for the steps below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will make changes to this file when Anna pushes the image to reedcompbio
. And the changes will be made also in src/random_walk.py
and src/tiedie.py
as the run
functions in these two files are also pulling images from my personal account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I built and pushed both of these to DockerHub so you can make the test workflow changes now:
src/rwr.py
Outdated
Access fields from the dataset and write the required input files | ||
@param data: dataset | ||
@param filename_map: a dict mapping file types in the required_inputs to the filename for that type | ||
@return: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty return comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this empty because all the other algorithms implemented in SPRAS also have an empty return comment. I would appreciate your recommendation on what should be included here. If you have any suggestions, I can update the return comment for this algorithm as well as for the other algorithms in SPRAS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because generate_inputs
doesn't return anything I suggest deleting the @return:
line from the docstring. We'll have to formalize this once we finally start automatically generating documentation from docstrings with something Sphinx.
src/rwr.py
Outdated
@param raw_pathway_file: pathway file produced by an algorithm's run function | ||
@param standardized_pathway_file: the same pathway written in the universal format | ||
""" | ||
print('Parsing random-walk-with-restart output') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had initially considered printing this line to the console to provide users with additional information about the progress of the program. However, I can remove this line if it would be more in line with the unified coding style of SPRAS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this parsing output message. We do some logging when it is helpful for debugging, like the Python commands that are going to be run inside the Docker containers. Otherwise for general information about workflow progress we rely on the Snakemake output.
I will do a second pass through, but this is what I caught fast through my first pass through |
…ytest -k; Tab issue of RWR); updated parameters for RWR for newer version of RWR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I build and pushed the Docker images so that you can keep making progress on this. I still haven't looked at the core Python code.
src/rwr.py
Outdated
@param raw_pathway_file: pathway file produced by an algorithm's run function | ||
@param standardized_pathway_file: the same pathway written in the universal format | ||
""" | ||
print('Parsing random-walk-with-restart output') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this parsing output message. We do some logging when it is helpful for debugging, like the Python commands that are going to be run inside the Docker containers. Otherwise for general information about workflow progress we rely on the Snakemake output.
… Images and made suitable changes based on this)
… RWR_and_TieDIE
…R, and updated the config.yaml to add more tests
We're reviewing this now, specifically the RWR code. It currently outputs a single file with both edge and node information (for example, the node visitation probabilities). This reminded us of #88, or is there a way to keep intermediate files besides a |
If I understand correctly, it is awkward to stick the node information in Because writing this additional output file is atypical SPRAS behavior, I suggest clearly documenting it in the |
for the testing of tiedie and RWR, can we add a test to check for bidirectional edges in the input pathway/edges? |
Add a column of 1s to the dataframe | ||
@param df: the dataframe to add the rank column of 1s to | ||
""" | ||
df['rank'] = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we are going to show the headers in the future (because we have a bunch of files that look the same but have different meanings), most of the other algorithms have 'Rank' for the column name for the rank column added. It's a minor thing, but I think this can be changed to also use 'Rank' for uniformity. Also since this is being added, we should update the rest of the algorithms that are manually adding a rank column of 1s to use this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #120 I updated the other algorithms to use the util function to add the rank column because you already modified them to add the directionality column to the standardized pathway output
#120 adding directionality has been merged, so we can work on incorporating those changes here. It also introduced some merge conflicts we'll have to resolve. |
I'm closing this pull request and creating separate pull requests for RWR and TieDIE with the directionality included. |
Sounds good @Lyce24. When you create the new pull requests, please include a comment like |
Summary:
This pull request introduces Docker images for RWR (Random Walk with Restart) and TieDIE algorithms, along with the necessary code implementation and tests. Additionally, it includes a new utility function and updates the GitHub Actions workflow.
Changes Made:
docker-wrapper/RandomWalk/Dockerfile
anddocker-wrapper/TieDIE/Dockerfile
respectively.src/random-walk.py
andsrc/tiedie.py
.test/RandomWalk
andtest/TieDIE
directories..github/workflows/test-spras.yml
.add_rank_column
function tosrc/utils.py
for adding an extra column to a dataframe.Next Steps:
Transfer the Docker images from the personal DockerHub account to the official DockerHub account for the project once access is granted.