-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RWR pathway reconstruction algorithm #148
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some clarifications will be helpful, especially about:
- The logic of sources/targets/prize files
- What happens if there are no sources (or if there are no targets in the source-target mode)
- Whether you want to have separate output files or keep all outputs in a single file.
I also will test these changes locally.
__all__ = ['RWR'] | ||
|
||
""" | ||
RWR will construct a directed graph from the provided input file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good to note that RWR assumes a directed graph. "From the provided input file" is vague, especially if the user can input a network file and a node prizes file.
- an edge is represented with a head and tail node, which represents the direction of the interation between two nodes | ||
- uses networkx Digraph() object | ||
|
||
Expected raw input format: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is edge flux for the input file requirements? And this is the network file, correct?
- the 'type' column should be 1 for edges, 2 for nodes, and 3 for pathways as we want to keep information about nodes, edges, and pathways. | ||
- it can include repeated and bidirectional edges | ||
|
||
Expected raw input format for prizes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the prizes file optional, or is it required and your code will generate a stub file with all nodes set to 1.0? Also, I think the description here is for the network file, not the node file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should be renamed to focus on sources/targets, with prizes as a secondary attribute. RWR requires at least sources to run, even if there are no targets or prizes, right?
sources_targets = data.request_node_columns(["sources", "targets"]) | ||
if sources_targets is None: | ||
if data.contains_node_columns('prize'): | ||
sources_targets = data.request_node_columns(['prize']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you are assuming that if there is NO sources file but there's a prize file, all the nodes with a prize are sources, right? Add comments to clearly descibe the logic here.
raise ValueError(f"{input_type} filename is missing") | ||
|
||
sources_targets = data.request_node_columns(["sources", "targets"]) | ||
if sources_targets is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if there are targets but no sources?
node_df = data.request_node_columns(['prize']) | ||
input_df = pd.merge(input_df, node_df, on='NODEID') | ||
else: | ||
#If there aren't prizes but are sources and targets, make prizes based on them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor - rephrase to be "If there aren't prizes but there are sources and targets, set their prize to be 1.0"
|
||
# Skips parameter validation step | ||
@staticmethod | ||
def run(edges=None, prizes = None, output_file = None, single_source = None, df = None, w = None, f = None, threshold = None, container_framework="docker"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should single_source be a Boolean?
df_node = df_node.rename(columns={'Node1': 'Node', 'Node2': 'Pr', 'Edge Flux': 'R_Pr', 'Weight': 'Final_Pr', 'InNetwork' : 'InNetwork'}) | ||
df_node.to_csv(node_output_file, sep="\t", index=False, header=True) | ||
|
||
df_pathway = df.loc[df['Type'] == 3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused about that Type 3 is. Is this the subgraph after filtering applying the selection function and threshold? A few more comments here (or at the top) about these output types would be helpful.
|
||
TEST_DIR = 'test/RWR/' | ||
OUT_FILE_DEFAULT = TEST_DIR+'output/rwr-edges.txt' | ||
OUT_FILE_OPTIONAL = TEST_DIR+'output/rwr-edges-optional.txt' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename OPTIONAL to OPTIONS or OPTIONAL_ARGS - I think that's what you mean (the optional arguments)
|
||
|
||
@staticmethod | ||
def parse_output(raw_pathway_file, standardized_pathway_file): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now remember that we talked about how your RWR outputs everything as a single file, so you can parse pieces of it in parse_output()
. Would you rather output three separate output files? That would be cleaner conceptually. We can ask the group about how to parse multiple output files.
This pull request replaces the RWR part of pull request #92. Changes are made to adapt the updated version of SPRAS with directionality.