Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dummy node logic #187

Merged
merged 23 commits into from
Nov 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ algorithms:
b: [5, 6]
w: np.linspace(0,5,2)
d: [10]
dummy_mode: ["file"] # Or "terminals", "all", "others"

- name: "omicsintegrator2"
params:
Expand Down Expand Up @@ -101,6 +102,8 @@ datasets:
-
# Labels can only contain letters, numbers, or underscores
label: data0
# To run OmicsIntegrator1 with dummy nodes, add dummy.txt file to node_files
# or a dummy column to the node table
node_files: ["node-prizes.txt", "sources.txt", "targets.txt"]
agitter marked this conversation as resolved.
Show resolved Hide resolved
# DataLoader.py can currently only load a single edge file, which is the primary network
edge_files: ["network.txt"]
Expand Down
1 change: 1 addition & 0 deletions config/egfr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ algorithms:
- 0.1
mu:
- 0.008
dummy_mode: ["file"]
-
name: omicsintegrator2
params:
Expand Down
10 changes: 9 additions & 1 deletion doc/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,22 @@ The graph algorithms below have been used (or have the potential to be used) for
- Basha et al., ResponseNet2.0: revealing signaling and regulatory pathways connecting your proteins and genes–now with human data. _Nucleic Acids Research._ 2013. [doi:10.1093/nar/gkt532](https://dx.doi.org/10.1093%2Fnar%2Fgkt532)
- Basha et al. ResponseNet v.3: revealing signaling and regulatory pathways connecting your proteins and genes across human tissues. _Nucleic Acids Research._ 2019. [doi:10.1093/nar/gkz421](https://dx.doi.org/10.1093%2Fnar%2Fgkz421)

## Prize Collecting Steiner Forest (PCSF)
## Prize Collecting Steiner Forest (PCSF): OmicsIntegrator1 and OmicsIntegrator2

**References:**
- Huang and Fraenkel. Integrating proteomic, transcriptional, and interactome data reveals hidden components of signaling and regulatory networks. _Science Signaling._ 2009. [doi:10.1126/scisignal.2000350](https://doi.org/10.1126/scisignal.2000350)
- Tuncbag et al. Simultaneous reconstruction of multiple signaling pathways via the prize-collecting steiner forest problem. _Journal of Computational Biology._ 2013. [doi:10.1089/cmb.2012.0092](https://doi.org/10.1089/cmb.2012.0092)
- Gitter et al. Sharing information to reconstruct patient-specific pathways in heterogeneous diseases. _Pacific Symposium on Biocomputing._ 2014. [doi:10.1142/9789814583220_0005](https://doi.org/10.1142/9789814583220_0005)
- Tuncbag et al., Network-Based Interpretation of Diverse High-Throughput Datasets through the Omics Integrator Software Package. _PLoS Computational Biology._ 2016. [doi:10.1371/journal.pcbi.1004879](https://doi.org/10.1371/journal.pcbi.1004879)

One of the parameter options for OmicsIntegraor1 is `dummy_mode`.
There are 4 dummy mode possibilities:
1. `terminals`: connect the dummy node to all nodes that have been assigned prizes
2. `all`: connect the dummy node to all nodes in the interactome i.e. full set of nodes in graph
3. `others`: connect the dummy node to all nodes that are not terminal nodes i.e. nodes w/o prizes
4. `file`: connect the dummy node to a specific list of nodes provided in a file
To support the `file` dummy node logic as part of OmicsIntegrator1, you can either add a separate `dummy.txt` file (and add this to the `node_files` argument in `config.yaml `) or add a `dummy` column node attribute to a file that contains `NODEID`, `prize`, `source`, etc.

## PathLinker

PathLinker takes as input (1) a weighted, directed PPI network, (2) two sets of nodes: a source set (representing receptors of a pathway of interest) and a target set (representing transcriptional regulators of a pathway of interest), and (3) an integer _k_. PathLinker efficiently computes the _k_-shortest paths from any source to any target and returns the subnetwork of the top _k_ paths as the pathway reconstruction. Later work expanded PathLinker by incorporating protein localization information to re-score tied paths, dubbed Localized PathLinker (LocPL).
Expand Down
4 changes: 2 additions & 2 deletions input/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ All other columns specify additional node attributes such as prizes.
Any nodes that are listed in a node file but are not present in one or more edges in the edge file will be removed.
For example:
```
NODEID prize sources targets active
A 1.0 True True
NODEID prize sources targets active dummy
A 1.0 True True True
B 3.3 True True
C 2.5 True True
D 1.9 True True True
Expand Down
4 changes: 2 additions & 2 deletions input/node-prizes.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
NODEID prize active
A 2 true
NODEID prize active dummy
A 2 true true
sumedhars marked this conversation as resolved.
Show resolved Hide resolved
C 5.7 true
4 changes: 2 additions & 2 deletions input/tps-egfr-prizes.txt
sumedhars marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
NODEID prize sources targets active
NODEID prize sources targets active dummy
1433Z_HUMAN 1.041379133 True True
agitter marked this conversation as resolved.
Show resolved Hide resolved
41_HUMAN 3.389112802 True True
4ET_HUMAN 2.569973509 True True
Expand Down Expand Up @@ -181,7 +181,7 @@ EF1A1_HUMAN 3.774750081 True True
EF1B_HUMAN 0.768939794 True True
EF1D_HUMAN 1.240472409 True True
EFNB2_HUMAN 2.222686177 True True
EGF_HUMAN 10 True True
EGF_HUMAN 10 True True True
EGFR_HUMAN 6.787874699 True True
EGLN1_HUMAN 1.876580206 True True
EIF3B_HUMAN 2.048949271 True True
Expand Down
59 changes: 45 additions & 14 deletions spras/omicsintegrator1.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,19 +35,21 @@ def write_conf(filename=Path('config.txt'), w=None, b=None, d=None, mu=None, noi
f.write('processes = 1\n')
f.write('threads = 1\n')

"""
Omics Integrator 1 works with partially directed graphs
- it takes in the universal input directly

Expected raw input format:
Interactor1 Interactor2 Weight Direction
- the expected raw input file should have node pairs in the 1st and 2nd columns, with a weight in the 3rd column and directionality in the 4th column
- it can include repeated and bidirectional edges
- it uses 'U' for undirected edges and 'D' for directed edges

"""
class OmicsIntegrator1(PRM):
required_inputs = ['prizes', 'edges']
"""
Omics Integrator 1 works with partially directed graphs
- it takes in the universal input directly

Expected raw input format:
Interactor1 Interactor2 Weight Direction
- the expected raw input file should have node pairs in the 1st and 2nd columns, with a weight in the 3rd column and
directionality in the 4th column
- it can include repeated and bidirectional edges
- it uses 'U' for undirected edges and 'D' for directed edges

"""
required_inputs = ['prizes', 'edges', 'dummy_nodes']

@staticmethod
def generate_inputs(data, filename_map):
Expand Down Expand Up @@ -83,13 +85,22 @@ def generate_inputs(data, filename_map):
columns=['Interactor1','Interactor2','Weight','Direction'],
header=['protein1','protein2','weight','directionality'])

# creates the dummy_nodes file
if 'dummy' in data.node_table.columns:
dummy_df = data.node_table[data.node_table['dummy'] == True]
# save as list of dummy nodes
dummy_df.to_csv(filename_map['dummy_nodes'], index=False, columns=['NODEID'], header=None)
else:
# create empty dummy file
with open(filename_map['dummy_nodes'], mode='w'):
pass

# TODO add parameter validation
# TODO add support for knockout argument
# TODO add reasonable default values
# TODO document required arguments
@staticmethod
def run(edges=None, prizes=None, dummy_mode=None, mu_squared=None, exclude_terms=None,
def run(edges=None, prizes=None, dummy_nodes=None, dummy_mode=None, mu_squared=None, exclude_terms=None,
output_file=None, noisy_edges=None, shuffled_prizes=None, random_terminals=None,
seed=None, w=None, b=None, d=None, mu=None, noise=None, g=None, r=None, container_framework="docker"):
"""
Expand Down Expand Up @@ -118,6 +129,19 @@ def run(edges=None, prizes=None, dummy_mode=None, mu_squared=None, exclude_terms
bind_path, prize_file = prepare_volume(prizes, work_dir)
volumes.append(bind_path)

# 4 dummy mode possibilities:
# 1. terminals -> connect the dummy node to all nodes that have been assigned prizes
# 2. all -> connect the dummy node to all nodes in the interactome i.e. full set of nodes in graph
# 3. others -> connect the dummy node to all nodes that are not terminal nodes i.e. nodes w/o prizes
# 4. file -> connect the dummy node to a specific list of nodes provided in a file

# add dummy node file to the volume if dummy_mode is not None and it is 'file'
if dummy_mode == 'file':
if dummy_nodes is None:
raise ValueError("dummy_nodes file is required when dummy_mode is set to 'file'")
bind_path, dummy_file = prepare_volume(dummy_nodes, work_dir)
volumes.append(bind_path)

out_dir = Path(output_file).parent
# Omics Integrator 1 requires that the output directory exist
out_dir.mkdir(parents=True, exist_ok=True)
Expand All @@ -139,9 +163,16 @@ def run(edges=None, prizes=None, dummy_mode=None, mu_squared=None, exclude_terms
'--outpath', mapped_out_dir,
'--outlabel', 'oi1']

# add the dummy mode argument
if dummy_mode is not None and dummy_mode:
# for custom dummy modes, add the file
if dummy_mode == 'file':
command.extend(['--dummy', dummy_file])
# else pass in the dummy_mode and let oi1 handle it
else:
command.extend(['--dummy', dummy_mode])

# Add optional arguments
if dummy_mode is not None:
command.extend(['--dummyMode', str(dummy_mode)])
if mu_squared is not None and mu_squared:
command.extend(['--musquared'])
if exclude_terms is not None and exclude_terms:
Expand Down
1 change: 1 addition & 0 deletions test/OmicsIntegrator1/input/oi1-dummy.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
B
30 changes: 30 additions & 0 deletions test/OmicsIntegrator1/test_oi1.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ def test_oi1_all_optional(self):
# Include all optional arguments
OmicsIntegrator1.run(edges=TEST_DIR+'input/oi1-edges.txt',
prizes=TEST_DIR+'input/oi1-prizes.txt',
dummy_nodes=None,
dummy_mode='terminals',
mu_squared=True,
exclude_terms=True,
Expand All @@ -66,6 +67,23 @@ def test_oi1_all_optional(self):
r=0)
assert out_path.exists()

def test_oi1_dummy_file(self):
out_path = Path(OUT_FILE)
out_path.unlink(missing_ok=True)
# Include optional argument
OmicsIntegrator1.run(edges=TEST_DIR+'input/oi1-edges.txt',
prizes=TEST_DIR+'input/oi1-prizes.txt',
dummy_nodes=TEST_DIR + 'input/oi1-dummy.txt',
dummy_mode='file',
output_file=OUT_FILE,
w=5,
b=1,
d=10,
noise=0.333,
g=0.001,
r=0)
assert out_path.exists()

def test_oi1_missing(self):
# Test the expected error is raised when required arguments are missing
with pytest.raises(ValueError):
Expand All @@ -81,6 +99,18 @@ def test_oi1_missing(self):
b=1,
d=10)

def test_oi1_missing_dummy(self):
# Test the expected error is raised when the dummy_nodes file is missing and the dummy_mode is 'file'
with pytest.raises(ValueError):
# No edges
OmicsIntegrator1.run(edges=TEST_DIR+'input/oi1-edges.txt',
prizes=TEST_DIR + 'input/oi1-prizes.txt',
output_file=TEST_DIR+'output/test_optimalForest.sif',
w=5,
b=1,
d=10,
dummy_mode='file')

# Only run Singularity test if the binary is available on the system
# spython is only available on Unix, but do not explicitly skip non-Unix platforms
@pytest.mark.skipif(not shutil.which('singularity'), reason='Singularity not found on system')
Expand Down
Loading