Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Directionality #120

Merged
merged 50 commits into from
Dec 3, 2023
Merged
Show file tree
Hide file tree
Changes from 49 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
af6a590
documented research per algorithm as docstrings per class for the alg…
ntalluri Aug 23, 2023
3f173b5
added directionality for generate inputs
ntalluri Aug 29, 2023
a214cc8
Merge branch 'master' of github.com:ntalluri/spras into direction
ntalluri Aug 30, 2023
1e7fef9
added in parse_output directionality
ntalluri Aug 30, 2023
4fdaadb
removed directed from config file, testing with analysis all false
ntalluri Aug 30, 2023
cae057a
made updates to code and attempted to add testing for interactome
ntalluri Sep 4, 2023
319a5d3
precommit formatting
ntalluri Sep 5, 2023
bb55ee0
cleaned up code and finished interactome test
ntalluri Sep 5, 2023
7aed4ee
updated util to deal with the idea if someone is using an old config …
ntalluri Sep 5, 2023
0e15534
current ml repairs
ntalluri Sep 5, 2023
75be8b8
ml post processing pre-commit and config file
ntalluri Sep 6, 2023
550614a
fixed testing
ntalluri Sep 12, 2023
760b1b7
Merge branch 'master' into direction
ntalluri Sep 18, 2023
9e29ceb
updated summary.py/associated files and tests. updated interactome.py
ntalluri Sep 18, 2023
0b47bd9
Merge branch 'direction' of github.com:ntalluri/spras into direction
ntalluri Sep 18, 2023
713279a
pre-commit
ntalluri Sep 18, 2023
1198c9a
added generate inputs test, cleaned up code
ntalluri Sep 19, 2023
39b4748
Resolve merge conflicts
agitter Sep 22, 2023
af1b849
Update EGFR network with edge directions
agitter Sep 22, 2023
a34f832
added back graphspace to work for directed and undirected graphs only
ntalluri Sep 26, 2023
9b551d6
precommit
ntalluri Sep 26, 2023
b314751
automate test_prepare_inputs
ntalluri Sep 29, 2023
32be496
renamed the tests for creating the inputs
ntalluri Oct 4, 2023
57599f7
fix break in test
ntalluri Oct 4, 2023
e991e0a
added parse_output tests and still fixing generate inputs
ntalluri Oct 4, 2023
dd6c899
precommit
ntalluri Oct 4, 2023
ae276f5
added more information to step 5 of contributing guide
ntalluri Oct 4, 2023
df7da32
add cytoscape into workflow
ntalluri Oct 4, 2023
e5be5a0
cleaning up generate inputs/parse outputs test suites
ntalluri Oct 17, 2023
09c2913
clean up gen inputs and prase outputs
ntalluri Oct 17, 2023
8dd0990
Merge with master
agitter Oct 18, 2023
d9b0f0b
Fix ruff errors on GitHub actions
agitter Oct 18, 2023
6f631ef
made changes based on review
ntalluri Oct 26, 2023
f651376
made more changes based on review
ntalluri Oct 26, 2023
b764129
fixed error
ntalluri Oct 27, 2023
8eb9ebb
some of the comments
ntalluri Nov 27, 2023
5c0939f
more comments
ntalluri Nov 27, 2023
884fe4f
precommit
ntalluri Nov 27, 2023
19a4ad4
more comments
ntalluri Nov 27, 2023
99e6e6f
more comments resolved
ntalluri Nov 28, 2023
e8735ea
resolving parse output comments
ntalluri Nov 29, 2023
96c0482
precommit
ntalluri Nov 29, 2023
83262c7
add check to dataset.py
ntalluri Dec 1, 2023
7bc9e3e
Rename .csv to .txt in test directory
agitter Dec 3, 2023
d918c00
Add tests for invalid 4th edge column
agitter Dec 3, 2023
cd625d4
Remove self-edges from EGFR data
agitter Dec 3, 2023
ee58069
Update Cytoscape wrapper for directed edges
agitter Dec 3, 2023
da655b3
Remove directed from EGFR config and run more algs
agitter Dec 3, 2023
b4d6e1c
Systematic proofreading and formatting
agitter Dec 3, 2023
cccf5ce
Bump Cytoscape image version in workflow
agitter Dec 3, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,15 @@ Follow the example for any of the other pathway reconstruction algorithm.
First pull the image `<username>/local-neighborhood` from Docker Hub.
Then build the Docker image using the `Dockerfile` that was completed in Step 2.

Modify generate inputs:
1. Include a key-value pair in the algo_exp_file dictionary that links the specific algorithm to its expected network file.
2. Obtain the expected network file from the workflow, manually confirm it is correct, and save it to `test/generate-inputs/expected`. Name it as `{algorithm_name}-{network_file_name}-expected.txt`.

Modify parse outputs:
1. Obtain the raw-pathway output (e.g. from the run function in your wrapper by running the Snakemake workflow) and save it to `test/parse-outputs/input`. Name it as `{algorithm_name}-raw-pathway.txt`.
2. Obtain the expected universal output from the workflow, manually confirm it is correct, and save it to `test/parse-outputs/expected` directory. Name it as `{algorithm_name}-pathway-expected.txt`.
3. Add the new algorithm's name to the algorithms list in `test/parse-outputs/test_parse_outputs.py`.

ntalluri marked this conversation as resolved.
Show resolved Hide resolved
### Step 6: Work with SPRAS maintainers to revise the pull request
Step 0 previously described how to create a `local-neighborhood` branch and create a pull request.
Make sure to commit all of the new and modified files and push them to the `local-neighborhood` branch on your fork.
Expand All @@ -205,7 +214,7 @@ The pull request will be closed so that the `master` branch of the fork stays sy
1. Import the new class in `src/runner.py` so the wrapper functions can be accessed
1. Document the usage of the Docker wrapper and the assumptions made when implementing the wrapper
1. Add example usage for the new algorithm and its parameters to the template config file
1. Write test functions and provide example input data in a new test subdirectory `test/<algorithm>`
1. Write test functions and provide example input data in a new test subdirectory `test/<algorithm>`. Provide example data and algorithm/expected files names to lists or dicts in `test/generate-inputs` and `test/parse-outputs`. Use the full path with the names of the test files.
1. Extend `.github/workflows/test-spras.yml` to pull and build the new Docker image

When adding new algorithms, there are many other considerations that are not relevant with the simple Local Neighborhood example.
Expand Down
22 changes: 12 additions & 10 deletions Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -57,10 +57,11 @@ def write_dataset_log(dataset, logfile):
def make_final_input(wildcards):
final_input = []

#TODO analysis could be parsed in the parse_config() function.
# TODO analysis could be parsed in the parse_config() function.
if config["analysis"]["summary"]["include"]:
# add summary output file for each pathway
final_input.extend(expand('{out_dir}{sep}{dataset}-{algorithm_params}{sep}summary.txt',out_dir=out_dir,sep=SEP,dataset=dataset_labels,algorithm_params=algorithms_with_params))
# TODO: reuse in the future once we make summary work for mixed graphs. See https://github.com/Reed-CompBio/spras/issues/128
# final_input.extend(expand('{out_dir}{sep}{dataset}-{algorithm_params}{sep}summary.txt',out_dir=out_dir,sep=SEP,dataset=dataset_labels,algorithm_params=algorithms_with_params))
# add table summarizing all pathways for each dataset
final_input.extend(expand('{out_dir}{sep}{dataset}-pathway-summary.txt',out_dir=out_dir,sep=SEP,dataset=dataset_labels))

Expand Down Expand Up @@ -219,14 +220,15 @@ rule parse_output:
run:
runner.parse_output(wildcards.algorithm, input.raw_file, output.standardized_file)

# TODO: reuse in the future once we make summary work for mixed graphs. See https://github.com/Reed-CompBio/spras/issues/128
# Collect summary statistics for a single pathway
rule summarize_pathway:
input:
standardized_file = SEP.join([out_dir, '{dataset}-{algorithm}-{params}', 'pathway.txt'])
output:
summary_file = SEP.join([out_dir, '{dataset}-{algorithm}-{params}', 'summary.txt'])
run:
summary.run(input.standardized_file,output.summary_file,directed=algorithm_directed[wildcards.algorithm])
# rule summarize_pathway:
# input:
# standardized_file = SEP.join([out_dir, '{dataset}-{algorithm}-{params}', 'pathway.txt'])
# output:
# summary_file = SEP.join([out_dir, '{dataset}-{algorithm}-{params}', 'summary.txt'])
# run:
# summary.run(input.standardized_file,output.summary_file)

# Write GraphSpace JSON graphs
rule viz_graphspace:
Expand All @@ -235,7 +237,7 @@ rule viz_graphspace:
graph_json = SEP.join([out_dir, '{dataset}-{algorithm}-{params}', 'gs.json']),
style_json = SEP.join([out_dir, '{dataset}-{algorithm}-{params}', 'gsstyle.json'])
run:
graphspace.write_json(input.standardized_file,output.graph_json,output.style_json,directed=algorithm_directed[wildcards.algorithm])
graphspace.write_json(input.standardized_file,output.graph_json,output.style_json)


# Write a Cytoscape session file with all pathways for each dataset
Expand Down
7 changes: 0 additions & 7 deletions config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,14 +29,12 @@
- name: "pathlinker"
params:
include: true
directed: true
run1:
k: range(100,201,100)

- name: "omicsintegrator1"
params:
include: true
directed: false
run1:
r: [5]
b: [5, 6]
Expand All @@ -47,7 +45,6 @@
- name: "omicsintegrator2"
params:
include: true
directed: false
run1:
b: [4]
g: [0]
Expand All @@ -58,7 +55,6 @@
- name: "meo"
params:
include: true
directed: true
run1:
max_path_length: [3]
local_search: ["Yes"]
Expand All @@ -67,20 +63,17 @@
- name: "mincostflow"
params:
include: true
directed: false
run1:
flow: [1] # The flow must be an int
capacity: [1]

- name: "allpairs"
params:
include: true
directed: false

- name: "domino"
params:
include: true
directed: false
run1:
slice_threshold: [0.3]
module_threshold: [0.05]
Expand Down
9 changes: 2 additions & 7 deletions config/egfr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ algorithms:
-
name: pathlinker
params:
directed: true
include: true
run1:
k:
Expand All @@ -18,7 +17,6 @@ algorithms:
-
name: omicsintegrator1
params:
directed: false
include: true
run1:
b:
Expand All @@ -38,8 +36,7 @@ algorithms:
-
name: omicsintegrator2
params:
directed: false
include: false
include: true
run1:
b:
- 4
Expand All @@ -53,8 +50,7 @@ algorithms:
-
name: meo
params:
directed: true
include: false
include: true
run1:
local_search:
- "Yes"
Expand All @@ -65,7 +61,6 @@ algorithms:
-
name: domino
params:
directed: false
include: true
run1:
slice_threshold:
Expand Down
2 changes: 2 additions & 0 deletions docker-wrappers/Cytoscape/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,9 @@ The Docker wrapper can be tested with `pytest`.

## Versions:
- v1: Use supervisord to launch Cytoscape from a Python subprocess, then connect to Cytoscape with py4cytoscape. Only loads undirected pathways. Compatible with Singularity in local testing (Apptainer version 1.2.2-1.el7) but fails in GitHub Actions.
- v2: Add support for edge direction column.

## TODO
- Add an auth file for `xvfb-run`
- Java initial heap size, maximum Java heap size, and thread stack size are hard-coded in `Cytoscape.vmoptions` file
- Resolve issues with `Cytoscape.vmoptions` line endings being reset to Windows-style. They must be reset periodically, and the image will fail if they are not Unix-style.
2 changes: 1 addition & 1 deletion docker-wrappers/Cytoscape/cytoscape_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ def load_pathways(pathways: List[str], output: str) -> None:
path, name = parse_name(pathway)
suid = p4c.networks.import_network_from_tabular_file(
file=path,
column_type_list='s,t,x',
column_type_list='s,t,x,ea',
delimiters='\t'
)
p4c.networks.rename_network(name, network=suid)
Expand Down
17 changes: 12 additions & 5 deletions input/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,20 @@ This format may be deprecated.

### Edge file
Edge files do not include a header row.
Each row lists the two nodes that are connected with an undirected edge and a weight for that edge.
Directed edges are not currently supported.
Each row lists the two nodes that are connected with an edge, the weight for that edge, and, optionally, a directionality column to indicate whether the edge is directed or undirected.
The directionality values are either a 'U' for an undirected edge or a 'D' for a directed edge.
If the directionality column is not included, SPRAS will assume that the file's edges are entirely undirected.
The weights are typically in the range [0,1] with 1 being the highest confidence for the edge.

For example:
```
A B 0.98 U
B C 0.77 D
```
or
```
A B 0.98
B C 0.77
B C 0.77
```

## Toy datasets
Expand All @@ -46,15 +52,15 @@ The following files are very small toy datasets used to illustrate the supported
This dataset represents protein phosphorylation changes in response to epidermal growth factor (EGF) treatment.
The network includes protein-protein interactions from [iRefIndex](http://irefindex.org/) and kinase-substrate interactions from [PhosphoSitePlus](http://www.phosphosite.org/).
The files are originally from the [Temporal Pathway Synthesizer (TPS)](https://github.com/koksal/tps) repository.
They have been lightly modified for SPRAS by lowering one edge weight that was greater than 1, removing a PSEUDONODE prize, adding a prize of 10.0 to EGF_HUMAN, and converting all edges to undirected edges.
They have been lightly modified for SPRAS by lowering one edge weight that was greater than 1, removing 182 self-edges, removing a PSEUDONODE prize, and adding a prize of 10.0 to EGF_HUMAN.
The only source is EGF_HUMAN.
All proteins with phosphorylation-based prizes are also labeled as targets.
All nodes are considered active.

If you use any of the input files `tps-egfr-prizes.txt` or `phosphosite-irefindex13.0-uniprot.txt`, reference the publication

[Synthesizing Signaling Pathways from Temporal Phosphoproteomic Data](https://doi.org/10.1016/j.celrep.2018.08.085).
Ali Sinan Köksal, Kirsten Beck, Dylan R. Cronin, Aaron McKenna, Nathan D. Camp, Saurabh Srivastava, Matthew E. MacGilvray, Rastislav Bodík, Alejandro Wolf-Yadlin, Ernest Fraenkel, Jasmin Fisher, Anthony Gitter.
Ali Sinan Köksal, Kirsten Beck, Dylan R. Cronin, Aaron McKenna, Nathan D. Camp, Saurabh Srivastava, Matthew E. MacGilvray, Rastislav Bodík, Alejandro Wolf-Yadlin, Ernest Fraenkel, Jasmin Fisher, Anthony Gitter.
*Cell Reports* 24(13):3607-3618 2018.

If you use the network file `phosphosite-irefindex13.0-uniprot.txt`, also reference iRefIndex and PhosphoSitePlus.
Expand All @@ -68,3 +74,4 @@ Peter V Hornbeck, Bin Zhang, Beth Murray, Jon M Kornhauser, Vaughan Latham, Elzb
*Nucleic Acids Research* 43(D1):D512-520 2015.

The TPS [publication](https://doi.org/10.1016/j.celrep.2018.08.085) describes how the network data and protein prizes were prepared.

18 changes: 9 additions & 9 deletions input/alternative-network.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
A B 0.98
B C 0.77
A D 0.12
C D 0.89
C E 0.59
C F 0.50
F G 0.76
G H 0.92
G I 0.66
A B 0.98 U
B C 0.77 U
A D 0.12 U
C D 0.89 U
C E 0.59 U
C F 0.50 U
F G 0.76 U
G H 0.92 U
G I 0.66 U
4 changes: 2 additions & 2 deletions input/network.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
A B 0.98
B C 0.77
A B 0.98 U
B C 0.77 U
Loading
Loading