Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dummy node logic #187

Merged
merged 23 commits into from
Nov 23, 2024
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ algorithms:
b: [5, 6]
w: np.linspace(0,5,2)
d: [10]
dummy_mode: ["terminals"]

- name: "omicsintegrator2"
params:
Expand All @@ -69,7 +70,7 @@ algorithms:

- name: "meo"
params:
include: true
include: true
run1:
max_path_length: [3]
local_search: ["Yes"]
Expand Down Expand Up @@ -101,6 +102,7 @@ datasets:
-
# Labels can only contain letters, numbers, or underscores
label: data0
# To run OmicsIntegrator1 with dummy nodes, add the dummy.txt file to node_files
ntalluri marked this conversation as resolved.
Show resolved Hide resolved
node_files: ["node-prizes.txt", "sources.txt", "targets.txt"]
agitter marked this conversation as resolved.
Show resolved Hide resolved
# DataLoader.py can currently only load a single edge file, which is the primary network
edge_files: ["network.txt"]
Expand Down Expand Up @@ -173,4 +175,4 @@ analysis:
# 'euclidean', 'manhattan', 'cosine'
metric: 'euclidean'
evaluation:
include: true
include: true
25 changes: 25 additions & 0 deletions input/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,31 @@ C 2.5 True True
D 1.9 True True True
```

##### OmicsIntegrator1: Dummy Nodes
There are 4 dummy mode possibilities:
1. terminals -> connect the dummy node to all nodes that have been assigned prizes
2. all -> connect the dummy node to all nodes in the interactome i.e. full set of nodes in graph
3. others -> connect the dummy node to all nodes that are not terminal nodes i.e. nodes w/o prizes
4. file -> custom nodes - connect the dummy node to a specific list of nodes provided in a file
To support the `file` dummy node logic as part of OmicsIntegrator1, you can either add a seperate `dummy.txt` file (and add this to the `node_files` argument in `config.yaml `) or add a `dummy` column node attribute to a file that contains `NODEID`, `prize`, `source`, etc.
sumedhars marked this conversation as resolved.
Show resolved Hide resolved

If adding a seperate `dummy.txt` file:
Make a file with the name `dummy.txt` and list the dummy nodes, each seperated by a new line. Example:
```
A
B
C
```
sumedhars marked this conversation as resolved.
Show resolved Hide resolved

If adding the `dummy` column node attribute, then add the dummy column and specify boolean values for the `dummy` attribute:
```
NODEID prize sources targets dummy
A 1.0 True True True
B 3.3 True True
C 2.5 True True
D 1.9 True True True
```
sumedhars marked this conversation as resolved.
Show resolved Hide resolved

A secondary format provides only a list of node identifiers and uses the filename as the node attribute, as in the example `sources.txt`.
This format may be deprecated.

Expand Down
1 change: 1 addition & 0 deletions input/dummy.txt
sumedhars marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
A
2 changes: 1 addition & 1 deletion input/node-prizes.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
NODEID prize active
NODEID prize active dummy
A 2 true
sumedhars marked this conversation as resolved.
Show resolved Hide resolved
C 5.7 true
4 changes: 2 additions & 2 deletions input/tps-egfr-prizes.txt
sumedhars marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
NODEID prize sources targets active
NODEID prize sources targets active dummy
1433Z_HUMAN 1.041379133 True True
agitter marked this conversation as resolved.
Show resolved Hide resolved
41_HUMAN 3.389112802 True True
4ET_HUMAN 2.569973509 True True
Expand Down Expand Up @@ -181,7 +181,7 @@ EF1A1_HUMAN 3.774750081 True True
EF1B_HUMAN 0.768939794 True True
EF1D_HUMAN 1.240472409 True True
EFNB2_HUMAN 2.222686177 True True
EGF_HUMAN 10 True True
EGF_HUMAN 10 True True True
EGFR_HUMAN 6.787874699 True True
EGLN1_HUMAN 1.876580206 True True
EIF3B_HUMAN 2.048949271 True True
Expand Down
37 changes: 33 additions & 4 deletions spras/omicsintegrator1.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ def write_conf(filename=Path('config.txt'), w=None, b=None, d=None, mu=None, noi

"""
class OmicsIntegrator1(PRM):
required_inputs = ['prizes', 'edges']
required_inputs = ['prizes', 'edges', 'dummy_nodes']

@staticmethod
def generate_inputs(data, filename_map):
Expand Down Expand Up @@ -83,13 +83,23 @@ def generate_inputs(data, filename_map):
columns=['Interactor1','Interactor2','Weight','Direction'],
header=['protein1','protein2','weight','directionality'])

# creates the dummy_nodes file
if 'dummy' in data.node_table.columns:
dummy_df = data.node_table[data.node_table['dummy'] == True]
# save as list of dummy nodes
dummy_df.to_csv(filename_map['dummy_nodes'], index=False, columns=['NODEID'], header=None)
else:
# create empty dummy file
with open(filename_map['dummy_nodes'], mode='w'):
pass


# TODO add parameter validation
# TODO add support for knockout argument
# TODO add reasonable default values
# TODO document required arguments
@staticmethod
def run(edges=None, prizes=None, dummy_mode=None, mu_squared=None, exclude_terms=None,
def run(edges=None, prizes=None, dummy_nodes=None, dummy_mode=None, mu_squared=None, exclude_terms=None,
output_file=None, noisy_edges=None, shuffled_prizes=None, random_terminals=None,
seed=None, w=None, b=None, d=None, mu=None, noise=None, g=None, r=None, container_framework="docker"):
"""
Expand Down Expand Up @@ -118,6 +128,18 @@ def run(edges=None, prizes=None, dummy_mode=None, mu_squared=None, exclude_terms
bind_path, prize_file = prepare_volume(prizes, work_dir)
volumes.append(bind_path)

# 4 dummy mode possibilities:
# 1. terminals -> connect the dummy node to all nodes that have been assigned prizes
# 2. all -> connect the dummy node to all nodes in the interactome i.e. full set of nodes in graph
# 3. others -> connect the dummy node to all nodes that are not terminal nodes i.e. nodes w/o prizes
# 4. file -> custom nodes - connect the dummy node to a specific list of nodes provided in a file

# add dummy node file to the volume if dummy_mode is not None and it is 'file'
if dummy_mode is not None and dummy_mode == 'file':
# needs to use dummy node file that was put in the dataset
bind_path, dummy_file = prepare_volume(dummy_nodes, work_dir)
sumedhars marked this conversation as resolved.
Show resolved Hide resolved
volumes.append(bind_path)

out_dir = Path(output_file).parent
# Omics Integrator 1 requires that the output directory exist
out_dir.mkdir(parents=True, exist_ok=True)
Expand All @@ -139,9 +161,16 @@ def run(edges=None, prizes=None, dummy_mode=None, mu_squared=None, exclude_terms
'--outpath', mapped_out_dir,
'--outlabel', 'oi1']

# add the dummy mode argument
if dummy_mode is not None and dummy_mode:
# for custom dummy modes, add the file
if dummy_mode == 'file':
command.extend(['--dummy', dummy_file])
# else pass in the dummy_mode and let oi1 handle it
else:
command.extend(['--dummy', dummy_mode])

# Add optional arguments
if dummy_mode is not None:
command.extend(['--dummyMode', str(dummy_mode)])
if mu_squared is not None and mu_squared:
command.extend(['--musquared'])
if exclude_terms is not None and exclude_terms:
Expand Down
Loading