Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RWR and TieDIE Integration #92

Closed
wants to merge 26 commits into from
Closed

Conversation

Lyce24
Copy link

@Lyce24 Lyce24 commented Jun 16, 2023

Summary:

This pull request introduces Docker images for RWR (Random Walk with Restart) and TieDIE algorithms, along with the necessary code implementation and tests. Additionally, it includes a new utility function and updates the GitHub Actions workflow.

Changes Made:

  • Created Docker images for RWR and TieDIE in docker-wrapper/RandomWalk/Dockerfile and docker-wrapper/TieDIE/Dockerfile respectively.
  • Implemented the algorithm execution code in src/random-walk.py and src/tiedie.py.
  • Added tests for RWR and TieDIE in the test/RandomWalk and test/TieDIE directories.
  • Updated the GitHub Actions workflow to include the new tests in .github/workflows/test-spras.yml.
  • Added the add_rank_column function to src/utils.py for adding an extra column to a dataframe.

Next Steps:

Transfer the Docker images from the personal DockerHub account to the official DockerHub account for the project once access is granted.

Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review everything and am only leaving a few comments about things I noticed during an initial quick pass.

Comment on lines 84 to 85
docker pull erikliu24/rwwr:latest
docker pull erikliu24/tiedie:latest
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a reminder to change these to reedcompbio after @annaritz pushes the containers to the organization account. Same goes for the steps below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will make changes to this file when Anna pushes the image to reedcompbio. And the changes will be made also in src/random_walk.py and src/tiedie.py as the run functions in these two files are also pulling images from my personal account.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I built and pushed both of these to DockerHub so you can make the test workflow changes now:

config/config.yaml Outdated Show resolved Hide resolved
config/config.yaml Outdated Show resolved Hide resolved
docker-wrappers/TieDIE/Tutorial.pdf Outdated Show resolved Hide resolved
src/util.py Outdated Show resolved Hide resolved
src/util.py Outdated Show resolved Hide resolved
test/TieDIE/test_tiedie.py Outdated Show resolved Hide resolved
config/config.yaml Outdated Show resolved Hide resolved
config/config.yaml Outdated Show resolved Hide resolved
config/config.yaml Outdated Show resolved Hide resolved
config/config.yaml Outdated Show resolved Hide resolved
config/config.yaml Outdated Show resolved Hide resolved
docker-wrappers/TieDIE/README.md Outdated Show resolved Hide resolved
src/rwr.py Outdated
Access fields from the dataset and write the required input files
@param data: dataset
@param filename_map: a dict mapping file types in the required_inputs to the filename for that type
@return:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty return comment

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this empty because all the other algorithms implemented in SPRAS also have an empty return comment. I would appreciate your recommendation on what should be included here. If you have any suggestions, I can update the return comment for this algorithm as well as for the other algorithms in SPRAS.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because generate_inputs doesn't return anything I suggest deleting the @return: line from the docstring. We'll have to formalize this once we finally start automatically generating documentation from docstrings with something Sphinx.

src/rwr.py Outdated
@param raw_pathway_file: pathway file produced by an algorithm's run function
@param standardized_pathway_file: the same pathway written in the universal format
"""
print('Parsing random-walk-with-restart output')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had initially considered printing this line to the console to provide users with additional information about the progress of the program. However, I can remove this line if it would be more in line with the unified coding style of SPRAS.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this parsing output message. We do some logging when it is helpful for debugging, like the Python commands that are going to be run inside the Docker containers. Otherwise for general information about workflow progress we rely on the Snakemake output.

src/rwr.py Outdated Show resolved Hide resolved
src/tiedie.py Outdated Show resolved Hide resolved
@ntalluri
Copy link
Collaborator

I will do a second pass through, but this is what I caught fast through my first pass through

…ytest -k; Tab issue of RWR); updated parameters for RWR for newer version of RWR
Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I build and pushed the Docker images so that you can keep making progress on this. I still haven't looked at the core Python code.

docker-wrappers/RWR/README.md Outdated Show resolved Hide resolved
docker-wrappers/RWR/README.md Outdated Show resolved Hide resolved
docker-wrappers/RWR/README.md Outdated Show resolved Hide resolved
docker-wrappers/RWR/Dockerfile Outdated Show resolved Hide resolved
docker-wrappers/RWR/README.md Outdated Show resolved Hide resolved
src/rwr.py Outdated
@param raw_pathway_file: pathway file produced by an algorithm's run function
@param standardized_pathway_file: the same pathway written in the universal format
"""
print('Parsing random-walk-with-restart output')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this parsing output message. We do some logging when it is helpful for debugging, like the Python commands that are going to be run inside the Docker containers. Otherwise for general information about workflow progress we rely on the Snakemake output.

test/RWR/test_rwr.py Outdated Show resolved Hide resolved
test/RWR/test_rwr.py Outdated Show resolved Hide resolved
test/RWR/test_rwr.py Outdated Show resolved Hide resolved
test/TieDIE/test_tiedie.py Outdated Show resolved Hide resolved
@annaritz
Copy link
Contributor

We're reviewing this now, specifically the RWR code. It currently outputs a single file with both edge and node information (for example, the node visitation probabilities). This reminded us of #88, or is there a way to keep intermediate files besides a raw-pathway.txt file? It would help to have this information about the nodes, but at the same time we don't want to introduce a change that requires adjusting ALL algorithms (unless we really vet that change).

@agitter
Copy link
Collaborator

agitter commented Jul 25, 2023

If I understand correctly, it is awkward to stick the node information in raw-pathway.txt because that file is edge-based but the node scores are node-based. Snakemake should allow us to write additional output files per rule as long as the rule writes the expected output file(s). Would it work if we use the output_file argument to the run function to get the output directory and create a node-based file of your choice to write a node score file? It would be a one-off output file that isn't currently used by any downstream steps, but it would be there for each RWR run and available for inspection or manual visualization.

Because writing this additional output file is atypical SPRAS behavior, I suggest clearly documenting it in the run function docstring and perhaps the RWR docker wrapper readme as well.

@ntalluri
Copy link
Collaborator

ntalluri commented Aug 16, 2023

for the testing of tiedie and RWR, can we add a test to check for bidirectional edges in the input pathway/edges?

@agitter agitter mentioned this pull request Aug 18, 2023
Add a column of 1s to the dataframe
@param df: the dataframe to add the rank column of 1s to
"""
df['rank'] = 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are going to show the headers in the future (because we have a bunch of files that look the same but have different meanings), most of the other algorithms have 'Rank' for the column name for the rank column added. It's a minor thing, but I think this can be changed to also use 'Rank' for uniformity. Also since this is being added, we should update the rest of the algorithms that are manually adding a rank column of 1s to use this function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #120 I updated the other algorithms to use the util function to add the rank column because you already modified them to add the directionality column to the standardized pathway output

@agitter
Copy link
Collaborator

agitter commented Dec 3, 2023

#120 adding directionality has been merged, so we can work on incorporating those changes here. It also introduced some merge conflicts we'll have to resolve.

@Lyce24
Copy link
Author

Lyce24 commented Feb 11, 2024

I'm closing this pull request and creating separate pull requests for RWR and TieDIE with the directionality included.

@Lyce24 Lyce24 closed this Feb 11, 2024
@agitter
Copy link
Collaborator

agitter commented Feb 11, 2024

Sounds good @Lyce24. When you create the new pull requests, please include a comment like Replaces #92 so we can refer back to the old comments here if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants