From 9f74c08f059995c1d5589fc3d861b7718a9cd81c Mon Sep 17 00:00:00 2001 From: Oliver Faulkner Anderson Date: Mon, 15 Jul 2024 15:59:36 -0700 Subject: [PATCH] Deployed 3916bb0 with MkDocs version: 1.6.0 --- contributing-guide/index.html | 373 +++++++++++++++++++--------------- index.html | 2 +- search/search_index.json | 2 +- setup/index.html | 18 ++ sitemap.xml.gz | Bin 127 -> 127 bytes 5 files changed, 229 insertions(+), 166 deletions(-) diff --git a/contributing-guide/index.html b/contributing-guide/index.html index 3d3021aa..0d4b8ae4 100644 --- a/contributing-guide/index.html +++ b/contributing-guide/index.html @@ -151,31 +151,24 @@

Step 2: Data Import

  • Navigate to a terminal window and pull the official Neo4j Docker image with the following command:

    +

    docker pull neo4j

  • - -
    docker pull neo4j
    -
    -
    1. Create a folder in your root directory named neo4j:

      -
    2. -
    3. -

      Within the new ~/neo4j directory create the following directories:

      -
    4. -
    5. ~/neo4j/data/ to allow storage of database state between Docker instances
    6. -
    7. ~/neo4j/logs/ to allow storage of logs between Docker instances
    8. -
    9. ~/neo4j/import/ to store data for import
    10. -
    11. -

      ~/neo4j/plugins/ to store any necessary plugins for production environments

      +
      - Within the new `~/neo4j` directory create the following directories:
      +  - `~/neo4j/data/` to allow storage of database state between Docker instances
      +  - `~/neo4j/logs/` to allow storage of logs between Docker instances
      +  - `~/neo4j/import/` to store data for import
      +  - `~/neo4j/plugins/` to store any necessary plugins for production environments
      +
    12. Copy over all of the files in the cloned ProteinWeaver /data/tutorial directory to ~/neo4j/import/.

    13. Create a Neo4j Docker instance with GDS and APOC plugins using the following command:

      -
    14. -
    -
    docker run \
    +
    ```bash
    +docker run \
         --name proteinweaver \
         -p7474:7474 -p7687:7687 \
         -v $HOME/neo4j/data:/data \
    @@ -186,95 +179,106 @@ 

    Step 2: Data Import

    -e NEO4J_apoc_export_file_enabled=true \ -e NEO4J_apoc_import_file_enabled=true \ -e NEO4J_apoc_import_file_use__neo4j__config=true \ - -e NEO4J_PLUGINS='["graph-data-science"]' \ - -e NEO4JLABS_PLUGINS=\[\"apoc\"\] \ + -e NEO4J_PLUGINS='["graph-data-science"]' \ + -e NEO4JLABS_PLUGINS=\[\"apoc\"\] \ neo4j:5.12.0-community-bullseye +```
      -
    • -

      This docker instance has no security restrictions, to change username and password edit: - --env NEO4J_AUTH=username/password

      +
    • This docker instance has no security restrictions, to change username and password edit:
      --env NEO4J_AUTH=username/password
    • +
  • Access the docker image at http://localhost:7474 in your browser.

  • -

    Once in the Neo4j Browser, create constraints before data import. We use NCBI as the source of the unique taxon identifiers. - CREATE CONSTRAINT txid_constraint FOR (n:protein) REQUIRE (n.txid, n.id) IS UNIQUE; - Create a constraint for the GO terms in the database using the following command: +

    Once in the Neo4j Browser, create constraints before data import. We use NCBI as the source of the unique taxon identifiers.

    +

    Create a constraint for the proteins in the database, requiring that only one instance of each protein exists:
    + CREATE CONSTRAINT txid_constraint FOR (n:protein) REQUIRE (n.txid, n.id) IS UNIQUE;

    +

    Create a constraint for the GO terms in the database using the following command:
    CREATE CONSTRAINT go_constraint FOR (n:go_term) REQUIRE n.id IS UNIQUE;

  • Import D. rerio protein interactome with the following command:

    -
  • - -
    :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_interactome_Mar12_2024.txt' AS zfish
    +
    ```cypher
    +:auto LOAD CSV WITH HEADERS FROM 'file:///zfish_interactome_Mar12_2024.txt' AS zfish
     FIELDTERMINATOR '\t'
     CALL {
         with zfish
    -    MERGE (a:protein {id: zfish.uniprotID1, name: zfish.name1, txid: "txid7955", species: "Danio rerio"})
    -    MERGE (b:protein {id: zfish.uniprotID2, name: zfish.name2, txid: "txid7955", species: "Danio rerio"})
    +    MERGE (a:protein {id: zfish.uniprotID1, name: zfish.name1, txid: "txid7955", species: "Danio rerio"})
    +    MERGE (b:protein {id: zfish.uniprotID2, name: zfish.name2, txid: "txid7955", species: "Danio rerio"})
         MERGE (a)-[r:ProPro]-(b)
     } IN TRANSACTIONS OF 100 ROWS;
    +```
     
    -
      -
    1. Set a relationship property for the evidence
    2. -
    -
    :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_interactome_Mar12_2024.txt' AS zfish
    +
    +
  • +

    Set a relationship property for the evidence

    +
    ```cypher
    +:auto LOAD CSV WITH HEADERS FROM 'file:///zfish_interactome_Mar12_2024.txt' AS zfish
     FIELDTERMINATOR '\t'
     CALL {
         with zfish
    -    MATCH (s:protein {id: zfish.uniprotID1, txid: "txid7955"})-[r:ProPro]-(t:protein {id: zfish.uniprotID2, txid: "txid7955"})
    +    MATCH (s:protein {id: zfish.uniprotID1, txid: "txid7955"})-[r:ProPro]-(t:protein {id: zfish.uniprotID2, txid: "txid7955"})
         SET r.evidence = zfish.evidence
     } IN TRANSACTIONS OF 1000 ROWS;
    +```
     
    -
      -
    1. Add GO data to D. rerio nodes:
    2. -
    -
    :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_GO_data_Mar12_24.tsv' AS zfishgo
    +
  • +
  • +

    Add GO data to D. rerio nodes:

    +
    ```cypher
    +:auto LOAD CSV WITH HEADERS FROM 'file:///zfish_GO_data_Mar12_24.tsv' AS zfishgo
     FIELDTERMINATOR '\t'
     CALL {
         with zfishgo
    -    MATCH (n:protein {id: zfishgo.GENE_PRODUCT_ID, txid: "txid7955"})
    +    MATCH (n:protein {id: zfishgo.GENE_PRODUCT_ID, txid: "txid7955"})
         MERGE (g:go_term {id: zfishgo.GO_TERM})
         MERGE (n)-[r:ProGo]-(g)
     } IN TRANSACTIONS OF 1000 ROWS;
    +```
     
    -
      -
    1. Set qualifier property for D. rerio.
    2. -
    -
    :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_GO_data_Mar12_24.tsv' AS zfishgo
    +
  • +
  • +

    Set qualifier property for D. rerio.

    +
    ```cypher
    +:auto LOAD CSV WITH HEADERS FROM 'file:///zfish_GO_data_Mar12_24.tsv' AS zfishgo
     FIELDTERMINATOR '\t'
     CALL {
         with zfishgo
    -    MATCH (p:protein {id: zfishgo.GENE_PRODUCT_ID, txid: "txid7955"})-[r:ProGo]-(g:go_term {id: zfishgo.GO_TERM})
    +    MATCH (p:protein {id: zfishgo.GENE_PRODUCT_ID, txid: "txid7955"})-[r:ProGo]-(g:go_term {id: zfishgo.GO_TERM})
         SET r.relationship = zfishgo.QUALIFIER
     } IN TRANSACTIONS OF 1000 ROWS;
    +```
     
    -
      -
    1. The last step is calling a graph projection for pathfinding algorithms. We also have to change the ProPro edges to be undirected for the pathfinding algorithms in order to be more biologically accurate for protein-protein interaction networks.
    2. -
    -
    CALL gds.graph.project('proGoGraph',['go_term', 'protein'],['ProGo', 'ProPro']);
    +
  • +
  • +

    The last step is calling a graph projection for pathfinding algorithms. We also have to change the ProPro edges to be undirected for the pathfinding algorithms in order to be more biologically accurate for protein-protein interaction networks.

    +
    ```cypher
    +CALL gds.graph.project('proGoGraph',['go_term', 'protein'],['ProGo', 'ProPro']);
     CALL gds.graph.relationships.toUndirected( 'proGoGraph', {relationshipType: 'ProPro', mutateRelationshipType: 'ProProUndirected'} ) YIELD inputRelationships, relationshipsWritten;
    +```
     
    +
  • +

    Useful Commands:

    1. Drop graph projection: - CALL gds.graph.drop('proGoGraph') YIELD graphName;

      + CALL gds.graph.drop('proGoGraph') YIELD graphName;

    2. Drop constraints: - DROP CONSTRAINT txid_constraint; - DROP CONSTRAINT go_constraint;

      + DROP CONSTRAINT txid_constraint; + DROP CONSTRAINT go_constraint;

    3. Delete nodes: - MATCH (n:protein {txid: 'txid7955'}) DETACH DELETE n;

      + MATCH (n:protein {txid: 'txid7955'}) DETACH DELETE n;

    4. Show database information: - :schema

      + :schema

    Step 3: Create a New Query in Neo4j

    @@ -282,54 +286,53 @@

    Step 3: Create a New Query in Neo4j<

    First practice with some example commands:

    1. -

      Count how many nodes there are in the database: - MATCH (n) RETURN COUNT(n);

      +

      Count how many nodes there are in the database:
      + MATCH (n) RETURN COUNT(n);

    2. -

      Now count how many protein nodes there are: - MATCH (n:protein) RETURN COUNT(n);

      +

      Now count how many protein nodes there are:
      + MATCH (n:protein) RETURN COUNT(n);

    3. -

      Return the first 25 nodes in the zebrafish txid: - MATCH (n:protein {txid: 'txid7955'}) RETURN n LIMIT 25;

      +

      Return the first 25 nodes in the zebrafish txid:
      + MATCH (n:protein {txid: 'txid7955'}) RETURN n LIMIT 25;

    4. -

      Retrieve all the species in the database: - MATCH (n:protein) RETURN COLLECT(DISTINCT n.species);

      +

      Retrieve all the species in the database:
      + MATCH (n:protein) RETURN COLLECT(DISTINCT n.species);

    5. -

      Find nodes with a ProGo relationship (limit 25): - MATCH (p)-[r:ProGo]->(g) RETURN p, r, g LIMIT 25;

      +

      Find nodes with a ProGo relationship (limit 25):
      + MATCH (p)-[r:ProGo]->(g) RETURN p, r, g LIMIT 25;

    6. -

      Return the relationship qualifier property for the ProGo relationship (limit 25): - MATCH (p)-[r:ProGo]->(g) RETURN r.relationship LIMIT 25;

      +

      Return the relationship qualifier property for the ProGo relationship (limit 25):
      + MATCH (p)-[r:ProGo]->(g) RETURN r.relationship LIMIT 25;

    7. -

      Update property of existing node (for fun): - MATCH (n:protein {species: 'Danio rerio'}) SET n.species = 'Ranio derio';

      +

      Update property of existing node (for fun):
      + MATCH (n:protein {species: 'Danio rerio'}) SET n.species = 'Ranio derio';

    8. -

      Set species property back to proper one: - MATCH (n:protein {species: 'Ranio derio'}) SET n.species = 'Danio rerio';

      +

      Set species property back to proper one:
      + MATCH (n:protein {species: 'Ranio derio'}) SET n.species = 'Danio rerio';

    9. Now it is your turn to devise a new Cypher query. Your query should end in a RETURN statement rather than change a property. We will use this query in the next step to create a new webpage that returns and presents the results of this query on ProteinWeaver's user interface.

    Step 4: Setting up Local Development

    -

    Now that you have the Neo4j database up and running, and you have a query that you are interested in, we will now set up the frontend and backend for local development

    +

    Now that you have the Neo4j database up and running, and you have a query that you are interested in, we will now set up the frontend and backend for local development.

    Backend Server

    1. -

      Open up a terminal window and go to the server director inside the protein-weaver directory.

      +

      Open up a terminal window and go to the server directory inside the protein-weaver directory.

    2. -

      We want to install npm which is responsible for building the necessary packages of the server. We will use a version manager for node, called nvm. This is helpful as it allows you to install multiple versions of node. More information about nvm can be found here. Follow the following commands in your terminal

      -
    3. -
    -
    export NVM_DIR="$([ -z "${XDG_CONFIG_HOME-}" ] && printf %s "${HOME}/.nvm" || printf %s "${XDG_CONFIG_HOME}/nvm")"
    -[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh" # This loads nvm
    +

    We want to install npm which is responsible for building the necessary packages of the server. We will use a version manager for node, called nvm. This is helpful as it allows you to install multiple versions of node. More information about nvm can be found here. Enter the following commands in your terminal:

    +
    ```bash
    +export NVM_DIR="$([ -z "${XDG_CONFIG_HOME-}" ] && printf %s "${HOME}/.nvm" || printf %s "${XDG_CONFIG_HOME}/nvm")"
    +[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh" # This loads nvm
     
     nvm use
     
    @@ -337,44 +340,60 @@ 

    Backend Server

    npm install -npm start # This starts our node.js server for our backend +npm start # This starts our node.js server for our backend +```
    -
      -
    1. If everything goes smoothly, you will get a message saying “Server listening on http://localhost:3000/”
    2. -
    3. If you also want to test that the API functionality is working, you can go to the following URL and it should say that you have successfully connected to the backend API: http://localhost:3000/api/test
    4. + +
    5. +

      If everything goes smoothly, you will get a message saying “Server listening on http://localhost:3000/

      +
    6. +
    7. +

      If you also want to test that the API functionality is working, you can go to the following URL and it should say that you have successfully connected to the backend API: http://localhost:3000/api/test

      +

    Frontend

      -
    1. Open up another terminal window, and go to the client directory in the protein-weaver directory.
    2. -
    3. Do the following commands in the terminal window:
    4. -
    -
    nvm use
    +
  • +

    Open up another terminal window, and go to the client directory in the protein-weaver directory.

    +
  • +
  • +

    Enter the following commands in the terminal window:

    +
    ```bash
    +nvm use
     
     nvm install
     
     npm install
     
    -npm run dev       # This will start our frontend instance
    +npm run dev # This will start our frontend instance
    +```
     
    -
      -
    1. If everything goes smoothly, you should be greeted with a message from VITE, and that it is running on the local host of http://localhost:5173/
    2. + +
    3. +

      If everything goes smoothly, you should be greeted with a message from VITE as well as a message indicating that it is running on http://localhost:5173/.

      +
    4. +
    5. +

      To summarize, we have set up Neo4j and populated the database with D. rerio, created a query that we are interested in, and then set up the backend and frontend of ProteinWeaver for local development. The three localhost URLs are found below:

      + +
    -

    To summarize, we have set up neo4j and populated the database with D. rerio, created a query that we are interested in, and then set up the backend and frontend of protein-weaver for local development. The three localhost urls are found below -- Neo4j: http://localhost:7474/browser/ -- Backend: http://localhost:3000/api/test -- Frontend: http://localhost:5173/

    Step 5: Create a New Page with Query

    Create New API Call

    -

    This section aims to create a new API call in the backend, utilizing the neo4j query you made previously. Before we start implementing a new API call, it is important to have a better understanding of how the backend codebase looks like for proteinweaver. We will go through the important files in the backend:

    -

    /src

    -

    Within the server directory, there is another folder called src, which contains important files that sets up the node.js server. You will generally never need to make changes within this folder. index.js is responsible for initializing node.js server, and also the neo4j driver that will be used to make the connection to the database. The neo4j.js file contains the driver. constants.js store variables including ports, url, and neo4j credentials.

    -

    .env

    -

    Within the server folder, we also have a file called .env which outlines the neo4j credentials we need.

    -

    /routes

    -

    The routes folder contains routes.js which houses all the API calls we use for proteinweaver. The router can take in multiple requests, including POST or GET requests. It is helpful to understand the general structure of setting up an API call, and we will use the example below. This API call is responsible for, given a list of nodes, provide us the average degree value.

    -
    //Example of API call in routes.js
    -
    -router.post("/getAvgDegree", jsonParser, async (req, res, next) => {
    +

    This section aims to create a new API call in the backend utilizing the Neo4j query you made previously. Before we start implementing a new API call, it is important to have a better understanding of what the backend codebase looks like for ProteinWeaver. We will go through the important files in the backend:

    +

    /src

    +

    Within the server directory, the src directory contains important files that sets up the node.js server. You will generally never need to make changes within this folder. index.js is responsible for initializing the node.js server and the Neo4j driver that will be used to make the connection to the database. The neo4j.js file contains the Neo4j driver. constants.js stores variables including ports, URLs, and Neo4j credentials.

    +

    .env

    +

    Within the server folder, we also have a file called .env which outlines the Neo4j credentials for authentication with our database.

    +

    /routes

    +

    The routes folder contains routes.js which houses all the API calls we use for ProteinWeaver. The router can take in multiple requests, including POST or GET requests. It is helpful to understand the general structure of setting up an API call, and we will use the example below. This API call is responsible for, given a list of nodes, providing us with the average degree value.

    +
    ```js
    +//Example of API call in routes.js
    +
    +router.post("/getAvgDegree", jsonParser, async (req, res, next) => {
       const data = req.body;
       const nodeList = data.nodeList;
       const species = data.species;
    @@ -383,26 +402,29 @@ 

    /routes

    const avgDegreeService = new AvgDegreeService(getDriver()); const avgDegree = await avgDegreeService.getAvgDegree(species, nodeList); - console.log("Average Degree:"); - console.log(avgDegree); - + console.log("Average Degree:"); + console.log(avgDegree) res.json(avgDegree); } catch (e) { next(e); } }); +```
      -
    • We use the route.post() function to create a new POST API call.
    • +
    • We use the route.post() function to create a new POST API call.
    • It takes in three parameters, first the API call’s URL, the parser we use, and the request, response and next variables
    • -
    • The req.body holds the information that the API caller has provided. This usually comes in the form of a JSON request body, and in this case this if the following body: {"nodeList": ["FBgn0003731","FBgn0031972","FBgn0264492","FBgn0000499","FBgn0001139"],"species": "txid7227"}
    • -
    • The try-catch statement is used to capture potential errors and throw them in an appropriate manner.
    • -
    • The try portion of the statement creates a new variable called avgDegreeService by using a class AvgDegreeService. This class is defined in a file called avg.degree.service.js in the service folder, and it is responsible for utilizing the neo4j driver, creating a query call with some parameters, and getting the response. The class contains the function getAvgDegree which takes in two parameters, species and nodeList
    • -
    • We use the await key because this is a type of Promise. This essentially tells the program to wait until we get the output from the avgDegreeService.getAvgDegree() function.
    • -
    • Finally, we set the response in res.json to be the variable avgDegree
    • +
    • The req.body holds the information that the API caller has provided. This usually comes in the form of a JSON request body, and in this case this if the following body:
      + {"nodeList": ["FBgn0003731","FBgn0031972","FBgn0264492","FBgn0000499","FBgn0001139"],"species": "txid7227"}
    • +
    • The "try-catch" statement is used to capture potential errors and throw them in an appropriate manner.
        +
      • The try portion of the statement creates a new variable called avgDegreeService by using a class AvgDegreeService. This class is defined in a file called avg.degree.service.js in the /services folder, and it is responsible for utilizing the Neo4j driver, creating a query call with some parameters, and getting the response. The class contains the function getAvgDegree which takes in two parameters: species and nodeList.
      • +
      • We use the await key because this is a type of Promise. This essentially tells the program to wait until we get the output from the avgDegreeService.getAvgDegree() function.
      -

      /services

      -

      The service folder contains the heart of all the dependent functions the routes.js file needs. This is where you will be adding a new neo4j query as a function that will then be called into a new route in routes.js. Before that, it is helpful to understand the general structure of what a service file is, and we will use avg.degree.service.js as an example.

      +
    • +
    • Finally, we set the response in res.json to be the variable avgDegree
    • +
    +

    /services

    +

    The services folder contains the heart of all the dependent functions the routes.js file needs. This is where you will be adding a new Neo4j query as a function that will then be called into a new route in routes.js. Before that, it is helpful to understand the general structure of what a service file is, and we will use avg.degree.service.js as an example.

    //avg.degree.service.js file
     
     export default class AvgDegreeService {
    @@ -450,49 +472,56 @@ 

    /services

      -
    • This file creates a call called AvgDegreeService, and requires the neo4j driver we initialized in src/neo4j.js as a variable in the constructor
    • -
    • We create an async method (which is why we need the await keyword when we call the method) called getAvgDegree, which takes in the two parameters.
    • -
    • You first have to initialize the neo4j driver session, and then we execute a read on the database with a neo4j query.
    • -
    • Everything inside tx.run() is where you place the neo4j query. Notice that within the query, we use variables as the txid and the nodelist. These variables are paired in the portion after the neo4j query.
    • -
    • Finally we close the neo4j session and return the res.records in a variable.
    • +
    • This file creates a call called AvgDegreeService, and requires the Neo4j driver we initialized in src/neo4j.js as a variable in the constructor.
    • +
    • We create an async method (which is why we need the await keyword when we call the method) called getAvgDegree, which takes in the two parameters.
    • +
    • You first have to initialize the Neo4j driver session, and then we execute a read on the database with a Neo4j query.
    • +
    • Everything inside tx.run() is where you place the Neo4j query. Notice that within the query, we use variables as the txid and the nodelist. These variables are paired in the portion after the Neo4j query.
    • +
    • Finally we close the Neo4j session and return the res.records in a variable.

    Testing API using Postman

    postman_example

    -

    We can test this API call in many ways but one that is common is using Postman. Postman allows you to create API requests without the need of a frontend server. You can download the app or use the browser. We will test out the getAvgDegree API Call with the following steps:

    +

    We can test this API call in many ways but one that is common is using Postman. Postman allows you to create API requests without the need of a frontend server. You can download the app or use the browser. We will test out the getAvgDegree API Call with the following steps:

    • Create a new workspace in Postman.
    • Select POST as the request type, and use http://localhost:3000/api/getAvgDegree as the URL
    • -
    • We need to set the body of the request. Navigate to the body tab and set the body as raw and JSON. Now use the following example as the input: {"nodeList": ["FBgn0003731","FBgn0031972","FBgn0264492","FBgn0000499","FBgn0001139"],"species": "txid7227"}
    • -
    • When you are ready, click the send button. If it is successful you should get a 200 OK response and within the response body a value of 354.4 for the average node degree.
    • +
    • We need to set the body of the request. Navigate to the body tab and set the body as raw and JSON. Now use the following example as the input:
      +{"nodeList": ["FBgn0003731","FBgn0031972","FBgn0264492","FBgn0000499","FBgn0001139"],"species": "txid7227"}
    • +
    • When you are ready, click the send button. If it is successful you should get a "200 OK" response and within the response body a value of 354.4 for the average node degree.
    -

    Below includes a visualization that summarises the key parts of the backend server. Now that you have a better understanding about how API calls are made and how to test them, we can now implement a new API call that will use the neo4j query you made previously. +

    Below includes a visualization that summarises the key parts of the backend server. Now that you have a better understanding about how API calls are made and how to test them, we can now implement a new API call that will use the Neo4j query you made previously. ProteinWeaver_Backend

    Adding new API Call

      -
    1. Create a new file in the service directory.
        -
      • You can duplicate the avg.degree.service.js file and rename it to something that represents your query.
      • +
      • +

        Create a new file in the service directory.

        +
          +
        • You can duplicate the avg.degree.service.js file and rename it to something that represents your query.
        • Within the file, rename the class name to something that represents your query.
        • Rename the method “getAvgDegree” to something that represents your query.
        • -
        • Change the parameters of the method to include what you need for your query. (You may not need any in your parameters if you are hardcoding a query)
        • -
        • Place your neo4j query inside of tx.run()
        • -
        • You can delete the part where speciesInput and nodeList are paired if you do not have any parameters. If you do have parameters, make sure you pair the parameters properly with the neo4j query.
        • -
        • You are now done with setting up your service file for your API call
        • +
        • Change the parameters of the method to include what you need for your query (you may not need any in your parameters if you are hardcoding a query).
        • +
        • Place your Neo4j query inside of tx.run().
        • +
        • You can delete the part where speciesInput and nodeList are paired if you do not have any parameters. If you do have parameters, make sure you pair the parameters properly with the Neo4j query.
        • +
        • You are now done with setting up your service file for your API call.
      • -
      • Create a new API call in router.js.
          -
        • You can use the /getAvgDegree API call as reference.
        • -
        • Set the API URL to a name that represents your query
        • -
        • If your API call will need some parameters, set the correct variables in the request body, just like how getAvgDegree did it with nodeList and species
        • -
        • Create a new instance of the service class you made previously like AvgDegreeService with the neo4j driver
        • -
        • Call your method in the service class, and making sure if you need the parameters, you order it correctly
        • -
        • Finally make sure the res.json function has the correct variable.
        • +
        • +

          Create a new API call in router.js.

          +
            +
          • You can use the /getAvgDegree API call as reference.
          • +
          • Set the API URL to a name that represents your query.
          • +
          • If your API call will need some parameters, set the correct variables in the request body, just like how getAvgDegree did it with nodeList and species.
          • +
          • Create a new instance of the service class you made previously like AvgDegreeService with the Neo4j driver.
          • +
          • Call your method in the service class, and making sure if you need the parameters, you order it correctly.
          • +
          • Finally make sure the res.json function has the correct variable.
        • -
        • Test out your API call using Postman
            -
          • All API calls in proteinweaver goes under the following url. Simply add your API call after the last backslash: http://localhost:3000/api/
          • -
          • Ensure that you are setting the response as a POST response
          • -
          • If you require parameters in your API call, make sure to set the body, configure as raw and JSON mode, and then ensure the JSON body is in the correct format (See the example previously when testing out Postman)
          • -
          • If you get a 200 OK response and you’ve inspected the response body to what you expect, then you have completed the backend portion.
          • +
          • +

            Test out your API call using Postman

            +
              +
            • All API calls in ProteinWeaver go under the following url. Simply add your API call after the last backslash: http://localhost:3000/api/.
            • +
            • Ensure that you are setting the response as a POST response.
            • +
            • If you require parameters in your API call, make sure to set the body, configure as raw and JSON mode, and then ensure the JSON body is in the correct format (See the example previously when testing out Postman).
            • +
            • If you get a "200 OK" response and you’ve inspected the response body to what you expect, then you have completed the backend portion.
    @@ -509,14 +538,14 @@

    Add Button to Execute Query

  • Navigate to client/src/components/ and add a new component by creating a page named NewQuery.jsx. This document will be where we add the API query and do other styling. Copy these imports to the top of the page and create the NewQuery component:

    -
  • - -
    import React, { useState, useEffect } from "react";
    +
    ```js
    +import React, { useState, useEffect } from "react";
     
     // create component
     export default function NewQuery() { };
    +```
     
    -
      +
    1. Now go back to the first page you created NewPage.jsx. Import the NewQuery component with import NewQuery from "../components/NewQuery.jsx";. Within the central <div></div> add <NewQuery /> to place the component within the NewPage.

    2. @@ -525,82 +554,98 @@

      Add Button to Execute Query

    3. Finally, add a useEffect hook that will execute your API query when you load the page. Inside of the set of "{ }" brackets in NewQuery() { } copy the following code to execute your query on refresh:

      -
    4. -
    -
    // create empty object to store query results
    +
    ```js
    +// create empty object to store query results
     const [nodeNames, setNodeNames] = useState([]);
     
         // execute query on page reload
         useEffect(() => {
    -        fetch("/api/newQuery")
    +        fetch("/api/newQuery")
                 .then((res) => res.json())
                 .then((data) => {
                     const names = data.map((item) => item.properties.name); // extract just names
                     setNodeNames(names);
                 })
                 .catch((error) => {
    -                console.error("Error fetching network data:", error);
    +                console.error("Error fetching network data:", error);
                 });
         }, []);
     
         // display the node names in the console (right click and inspect element)
         console.log(nodeNames);
    +```
     

    You can check the structure of your query response in the running server terminal. Using the object hierarchy displayed there, we extracted just the "name" property in the useEffect hook for displaying. You should now have a blank page at http://localhost:5173/newpage that allows you to see the names of the nodes returned by your Neo4j query in the console when you inspect the page element.

    + +

    Add Button to Execute Query

      -
    1. Now we will add the ability for users to execute the query on demand rather than when refreshing the page. To do this, first we will modify the useEffect statement and make it a function:
    2. -
    -
    // Function for submitting the query
    +
  • +

    Now we will add the ability for users to execute the query on demand rather than when refreshing the page. To do this, first we will modify the useEffect statement and make it a function:

    +
    ```js
    +// Function for submitting the query
     async function handleNewQuery(e) {
    -        setNodeNames([]); // reset upon execution
    +       setNodeNames([]); // reset upon execution
             e.preventDefault(); // prevent default form submission
     
             // copied exactly from the useEffect statement
    -        fetch("/api/newQuery")
    +        fetch("/api/newQuery")
                 .then((res) => res.json())
                 .then((data) => {
                     const names = data.map((item) => item.properties.name);
                     setNodeNames(names);
                 })
                 .catch((error) => {
    -                console.error("Error fetching network data:", error);
    +                console.error("Error fetching network data:", error);
                 });
     
             // functions must return something, since we executed everything and assigned node names already we just return
             return;
         }
    +```
     
    -
      -
    1. Next we will create a New Query button that executes our new function when clicked. Place this inside of the { } brackets of NewQuery() { } after everything else. A React component is like any other function, it must end in a return statement. The return statement holds everything that the user will actually interact with and is where we will style things as well.
    2. -
    -
    return (
    +
  • +
  • +

    Next we will create a New Query button that executes our new function when clicked. Place this inside of the { } brackets of NewQuery() { } after everything else. A React component is like any other function, it must end in a return statement. The return statement holds everything that the user will actually interact with and is where we will style things as well.

    +
    ```js
    +return (
             <div>
                 <button onClick={handleNewQuery}>New Query</button>
             </div>
         );
    +```
     

    Now we should have a button that will set the node results in the console only after we have pressed it.

    -
      -
    1. Now lets display the information to the users without having to inspect the element. Copy the following code below the <button></button> inside of the <div></div>:
    2. -
    -
    {nodeNames.map((name, index) => (
    +
  • +
  • +

    Now lets display the information to the users without having to inspect the element. Copy the following code below the <button></button> inside of the <div></div>:

    +
    ```js
    +{nodeNames.map((name, index) => (
                     <p key={index}>{index + 1}: {name}</p>
                 ))}
    +```
     

    We are now displaying a list of the node names ordered by their index.

    +
  • +

    Congratulations, you have now created a new webpage with full connection to the Neo4j database!

    Add New Page Icon to NavBar

    Let's finish off by doing some styling and adding a new icon to the NavBar.

    1. -

      Navigate to client/src/components/NavBar.jsx and copy one of the <li></li> snippets and paste it below another. Create a new link to your page by replacing the old link with <Link to={/newpage}>.

      +

      Navigate to client/src/components/NavBar.jsx and copy one of the <li></li> snippets and paste it below another. Create a new link to your page by replacing the old link with <Link to={`/newpage`}>.

    2. Now rename the icon by typing "New" within the <div></div>.

    3. -

      Finally navigate to https://react-icons.github.io/react-icons/ and choose your favorite icon. I will be using the GiTigerHead icon for mine! Add the relevant import statement to the top of the NavBar page: import { GiTigerHead } from "react-icons/gi";. Next replace the icon component in the code that you copied from earlier with the name of the new one. In my case I put <GiTigerHead />.

      +

      Next, navigate to https://react-icons.github.io/react-icons/ and choose your favorite icon. I will be using the GiTigerHead icon for mine!

      +
    4. +
    5. +

      Add the relevant import statement to the top of the NavBar page: import { GiTigerHead } from "react-icons/gi";.

      +
    6. +
    7. +

      Finally, replace the icon component in the code that you copied from earlier with the name of the new one. In my case I put <GiTigerHead />.

    Congratulations, you have now completed the contributing guide!

    diff --git a/index.html b/index.html index 4f93fd88..7394f44e 100644 --- a/index.html +++ b/index.html @@ -140,5 +140,5 @@

    Website Overview

    diff --git a/search/search_index.json b/search/search_index.json index 1a01e3b4..809e90f6 100644 --- a/search/search_index.json +++ b/search/search_index.json @@ -1 +1 @@ -{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Welcome to ProteinWeaver Docs ProteinWeaver is a web interface for ontology-based protein network visualization. Background & Motivation Being able to explore how proteins are connected to other proteins with a specific function is a great tool for a biologists, as it allows them to quickly generate hypotheses that seeks to answer the ways that a protein is connected to a pathway or process. ProteinWeaver provides the tools for this type of exploration via an intuitive website that easily lets users query a protein and a specific function or process (as a gene ontology term ). Website Overview ProteinWeaver allows the users to input a protein of their interest, a specific function or process ( gene ontology term ), and the number of paths to output in the network. This generates a subnetwork that connects the protein of interest to the k shortest paths that include a protein labeled with the specific GO term. The network's information is summarised, including GO term description, links to proteins' and GO term AmiGO entry, and GO term qualifiers of the proteins. Exploration is possibly by easily interacting with the graph and setting new nodes as the protein of interest. Queries are easily reproduced through exporting a log history of all queries and explorations done in a session, and exporting networks via images.","title":"Home"},{"location":"#welcome-to-proteinweaver-docs","text":"ProteinWeaver is a web interface for ontology-based protein network visualization.","title":"Welcome to ProteinWeaver Docs"},{"location":"#background-motivation","text":"Being able to explore how proteins are connected to other proteins with a specific function is a great tool for a biologists, as it allows them to quickly generate hypotheses that seeks to answer the ways that a protein is connected to a pathway or process. ProteinWeaver provides the tools for this type of exploration via an intuitive website that easily lets users query a protein and a specific function or process (as a gene ontology term ).","title":"Background & Motivation"},{"location":"#website-overview","text":"ProteinWeaver allows the users to input a protein of their interest, a specific function or process ( gene ontology term ), and the number of paths to output in the network. This generates a subnetwork that connects the protein of interest to the k shortest paths that include a protein labeled with the specific GO term. The network's information is summarised, including GO term description, links to proteins' and GO term AmiGO entry, and GO term qualifiers of the proteins. Exploration is possibly by easily interacting with the graph and setting new nodes as the protein of interest. Queries are easily reproduced through exporting a log history of all queries and explorations done in a session, and exporting networks via images.","title":"Website Overview"},{"location":"contributing-guide/","text":"Contributing Guide This is the guide for getting started with ProteinWeaver and will set you up to contribute to whichever aspects of ProteinWeaver interest you. Step 1: Fork & Installation ProteinWeaver uses a Dockerized version of Neo4j as the database. Follow these instructions to install Docker Desktop. We will also be using GitHub to contribute to ProteinWeaver. It is recommended to install GitHub Desktop because of its easy user interface. Then you will need to fork the contributing-guide branch of the ProteinWeaver GitHub repository to get the Zebrafish datasets and the base code for the front and backends in your own repository. Once forked, clone the repository to your local desktop so that you have access to ProteinWeaver locally. Step 2: Data Import The following section will be using a bash terminal to set up the Dockerized Neo4j environment. Open the Docker Desktop application. Navigate to a terminal window and pull the official Neo4j Docker image with the following command: docker pull neo4j Create a folder in your root directory named neo4j : Within the new ~/neo4j directory create the following directories: ~/neo4j/data/ to allow storage of database state between Docker instances ~/neo4j/logs/ to allow storage of logs between Docker instances ~/neo4j/import/ to store data for import ~/neo4j/plugins/ to store any necessary plugins for production environments Copy over all of the files in the cloned ProteinWeaver /data/tutorial directory to ~/neo4j/import/ . Create a Neo4j Docker instance with GDS and APOC plugins using the following command: docker run \\ --name proteinweaver \\ -p7474:7474 -p7687:7687 \\ -v $HOME/neo4j/data:/data \\ -v $HOME/neo4j/logs:/logs \\ -v $HOME/neo4j/import:/import \\ -v $HOME/neo4j/plugins:/plugins \\ --env NEO4J_AUTH=none \\ -e NEO4J_apoc_export_file_enabled=true \\ -e NEO4J_apoc_import_file_enabled=true \\ -e NEO4J_apoc_import_file_use__neo4j__config=true \\ -e NEO4J_PLUGINS='[\"graph-data-science\"]' \\ -e NEO4JLABS_PLUGINS=\\[\\\"apoc\\\"\\] \\ neo4j:5.12.0-community-bullseye This docker instance has no security restrictions, to change username and password edit: --env NEO4J_AUTH=username/password Access the docker image at http://localhost:7474 in your browser. Once in the Neo4j Browser, create constraints before data import. We use NCBI as the source of the unique taxon identifiers. CREATE CONSTRAINT txid_constraint FOR (n:protein) REQUIRE (n.txid, n.id) IS UNIQUE; Create a constraint for the GO terms in the database using the following command: CREATE CONSTRAINT go_constraint FOR (n:go_term) REQUIRE n.id IS UNIQUE; Import D. rerio protein interactome with the following command: :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_interactome_Mar12_2024.txt' AS zfish FIELDTERMINATOR '\\t' CALL { with zfish MERGE (a:protein {id: zfish.uniprotID1, name: zfish.name1, txid: \"txid7955\", species: \"Danio rerio\"}) MERGE (b:protein {id: zfish.uniprotID2, name: zfish.name2, txid: \"txid7955\", species: \"Danio rerio\"}) MERGE (a)-[r:ProPro]-(b) } IN TRANSACTIONS OF 100 ROWS; Set a relationship property for the evidence :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_interactome_Mar12_2024.txt' AS zfish FIELDTERMINATOR '\\t' CALL { with zfish MATCH (s:protein {id: zfish.uniprotID1, txid: \"txid7955\"})-[r:ProPro]-(t:protein {id: zfish.uniprotID2, txid: \"txid7955\"}) SET r.evidence = zfish.evidence } IN TRANSACTIONS OF 1000 ROWS; Add GO data to D. rerio nodes: :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_GO_data_Mar12_24.tsv' AS zfishgo FIELDTERMINATOR '\\t' CALL { with zfishgo MATCH (n:protein {id: zfishgo.GENE_PRODUCT_ID, txid: \"txid7955\"}) MERGE (g:go_term {id: zfishgo.GO_TERM}) MERGE (n)-[r:ProGo]-(g) } IN TRANSACTIONS OF 1000 ROWS; Set qualifier property for D. rerio . :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_GO_data_Mar12_24.tsv' AS zfishgo FIELDTERMINATOR '\\t' CALL { with zfishgo MATCH (p:protein {id: zfishgo.GENE_PRODUCT_ID, txid: \"txid7955\"})-[r:ProGo]-(g:go_term {id: zfishgo.GO_TERM}) SET r.relationship = zfishgo.QUALIFIER } IN TRANSACTIONS OF 1000 ROWS; The last step is calling a graph projection for pathfinding algorithms. We also have to change the ProPro edges to be undirected for the pathfinding algorithms in order to be more biologically accurate for protein-protein interaction networks. CALL gds.graph.project('proGoGraph',['go_term', 'protein'],['ProGo', 'ProPro']); CALL gds.graph.relationships.toUndirected( 'proGoGraph', {relationshipType: 'ProPro', mutateRelationshipType: 'ProProUndirected'} ) YIELD inputRelationships, relationshipsWritten; Useful Commands: Drop graph projection: CALL gds.graph.drop('proGoGraph') YIELD graphName; Drop constraints: DROP CONSTRAINT txid_constraint; DROP CONSTRAINT go_constraint; Delete nodes: MATCH (n:protein {txid: 'txid7955'}) DETACH DELETE n; Show database information: :schema Step 3: Create a New Query in Neo4j Now that you have imported the D. rerio interaction network and annotations. It's time to explore the network and generate a new interesting query to you. First practice with some example commands: Count how many nodes there are in the database: MATCH (n) RETURN COUNT(n); Now count how many protein nodes there are: MATCH (n:protein) RETURN COUNT(n); Return the first 25 nodes in the zebrafish txid: MATCH (n:protein {txid: 'txid7955'}) RETURN n LIMIT 25; Retrieve all the species in the database: MATCH (n:protein) RETURN COLLECT(DISTINCT n.species); Find nodes with a ProGo relationship (limit 25): MATCH (p)-[r:ProGo]->(g) RETURN p, r, g LIMIT 25; Return the relationship qualifier property for the ProGo relationship (limit 25): MATCH (p)-[r:ProGo]->(g) RETURN r.relationship LIMIT 25; Update property of existing node (for fun): MATCH (n:protein {species: 'Danio rerio'}) SET n.species = 'Ranio derio'; Set species property back to proper one: MATCH (n:protein {species: 'Ranio derio'}) SET n.species = 'Danio rerio'; Now it is your turn to devise a new Cypher query. Your query should end in a RETURN statement rather than change a property. We will use this query in the next step to create a new webpage that returns and presents the results of this query on ProteinWeaver's user interface. Step 4: Setting up Local Development Now that you have the Neo4j database up and running, and you have a query that you are interested in, we will now set up the frontend and backend for local development Backend Server Open up a terminal window and go to the server director inside the protein-weaver directory. We want to install npm which is responsible for building the necessary packages of the server. We will use a version manager for node, called nvm. This is helpful as it allows you to install multiple versions of node. More information about nvm can be found here . Follow the following commands in your terminal export NVM_DIR=\"$([ -z \"${XDG_CONFIG_HOME-}\" ] && printf %s \"${HOME}/.nvm\" || printf %s \"${XDG_CONFIG_HOME}/nvm\")\" [ -s \"$NVM_DIR/nvm.sh\" ] && \\. \"$NVM_DIR/nvm.sh\" # This loads nvm nvm use nvm install npm install npm start # This starts our node.js server for our backend If everything goes smoothly, you will get a message saying \u201cServer listening on http://localhost:3000/\u201d If you also want to test that the API functionality is working, you can go to the following URL and it should say that you have successfully connected to the backend API: http://localhost:3000/api/test Frontend Open up another terminal window, and go to the client directory in the protein-weaver directory. Do the following commands in the terminal window: nvm use nvm install npm install npm run dev # This will start our frontend instance If everything goes smoothly, you should be greeted with a message from VITE, and that it is running on the local host of http://localhost:5173/ To summarize, we have set up neo4j and populated the database with D. rerio, created a query that we are interested in, and then set up the backend and frontend of protein-weaver for local development. The three localhost urls are found below - Neo4j: http://localhost:7474/browser/ - Backend: http://localhost:3000/api/test - Frontend: http://localhost:5173/ Step 5: Create a New Page with Query Create New API Call This section aims to create a new API call in the backend, utilizing the neo4j query you made previously. Before we start implementing a new API call, it is important to have a better understanding of how the backend codebase looks like for proteinweaver. We will go through the important files in the backend: /src Within the server directory, there is another folder called src, which contains important files that sets up the node.js server. You will generally never need to make changes within this folder. index.js is responsible for initializing node.js server, and also the neo4j driver that will be used to make the connection to the database. The neo4j.js file contains the driver. constants.js store variables including ports, url, and neo4j credentials. .env Within the server folder, we also have a file called .env which outlines the neo4j credentials we need. /routes The routes folder contains routes.js which houses all the API calls we use for proteinweaver. The router can take in multiple requests, including POST or GET requests. It is helpful to understand the general structure of setting up an API call, and we will use the example below. This API call is responsible for, given a list of nodes, provide us the average degree value. //Example of API call in routes.js router.post(\"/getAvgDegree\", jsonParser, async (req, res, next) => { const data = req.body; const nodeList = data.nodeList; const species = data.species; try { const avgDegreeService = new AvgDegreeService(getDriver()); const avgDegree = await avgDegreeService.getAvgDegree(species, nodeList); console.log(\"Average Degree:\"); console.log(avgDegree); res.json(avgDegree); } catch (e) { next(e); } }); We use the route.post() function to create a new POST API call. It takes in three parameters, first the API call\u2019s URL, the parser we use, and the request, response and next variables The req.body holds the information that the API caller has provided. This usually comes in the form of a JSON request body, and in this case this if the following body: {\"nodeList\": [\"FBgn0003731\",\"FBgn0031972\",\"FBgn0264492\",\"FBgn0000499\",\"FBgn0001139\"],\"species\": \"txid7227\"} The try-catch statement is used to capture potential errors and throw them in an appropriate manner. The try portion of the statement creates a new variable called avgDegreeService by using a class AvgDegreeService. This class is defined in a file called avg.degree.service.js in the service folder, and it is responsible for utilizing the neo4j driver, creating a query call with some parameters, and getting the response. The class contains the function getAvgDegree which takes in two parameters, species and nodeList We use the await key because this is a type of Promise. This essentially tells the program to wait until we get the output from the avgDegreeService.getAvgDegree() function. Finally, we set the response in res.json to be the variable avgDegree /services The service folder contains the heart of all the dependent functions the routes.js file needs. This is where you will be adding a new neo4j query as a function that will then be called into a new route in routes.js. Before that, it is helpful to understand the general structure of what a service file is, and we will use avg.degree.service.js as an example. //avg.degree.service.js file export default class AvgDegreeService { /** * @type {neo4j.Driver} */ driver; /** * The constructor expects an instance of the Neo4j Driver, which will be * used to interact with Neo4j. * * @param {neo4j.Driver} driver */ constructor(driver) { this.driver = driver; } async getAvgDegree(speciesInput, nodeList) { const session = this.driver.session(); const res = await session.executeRead((tx) => tx.run( ` MATCH (p:protein {txid: $speciesInput}) WHERE p.id IN toStringList($nodeList) WITH p MATCH (p)-[r:ProPro]-() WITH p, count(r) as degree RETURN avg(degree) as averageDegree; `, { speciesInput: speciesInput, nodeList: nodeList, } ) ); const deg = res.records; await session.close(); return deg; } } This file creates a call called AvgDegreeService, and requires the neo4j driver we initialized in src/neo4j.js as a variable in the constructor We create an async method (which is why we need the await keyword when we call the method) called getAvgDegree, which takes in the two parameters. You first have to initialize the neo4j driver session, and then we execute a read on the database with a neo4j query. Everything inside tx.run() is where you place the neo4j query. Notice that within the query, we use variables as the txid and the nodelist. These variables are paired in the portion after the neo4j query. Finally we close the neo4j session and return the res.records in a variable. Testing API using Postman We can test this API call in many ways but one that is common is using Postman . Postman allows you to create API requests without the need of a frontend server. You can download the app or use the browser. We will test out the getAvgDegree API Call with the following steps: Create a new workspace in Postman. Select POST as the request type, and use http://localhost:3000/api/getAvgDegree as the URL We need to set the body of the request. Navigate to the body tab and set the body as raw and JSON. Now use the following example as the input: {\"nodeList\": [\"FBgn0003731\",\"FBgn0031972\",\"FBgn0264492\",\"FBgn0000499\",\"FBgn0001139\"],\"species\": \"txid7227\"} When you are ready, click the send button. If it is successful you should get a 200 OK response and within the response body a value of 354.4 for the average node degree. Below includes a visualization that summarises the key parts of the backend server. Now that you have a better understanding about how API calls are made and how to test them, we can now implement a new API call that will use the neo4j query you made previously. Adding new API Call Create a new file in the service directory. You can duplicate the avg.degree.service.js file and rename it to something that represents your query. Within the file, rename the class name to something that represents your query. Rename the method \u201cgetAvgDegree\u201d to something that represents your query. Change the parameters of the method to include what you need for your query. (You may not need any in your parameters if you are hardcoding a query) Place your neo4j query inside of tx.run() You can delete the part where speciesInput and nodeList are paired if you do not have any parameters. If you do have parameters, make sure you pair the parameters properly with the neo4j query. You are now done with setting up your service file for your API call Create a new API call in router.js. You can use the /getAvgDegree API call as reference. Set the API URL to a name that represents your query If your API call will need some parameters, set the correct variables in the request body, just like how getAvgDegree did it with nodeList and species Create a new instance of the service class you made previously like AvgDegreeService with the neo4j driver Call your method in the service class, and making sure if you need the parameters, you order it correctly Finally make sure the res.json function has the correct variable. Test out your API call using Postman All API calls in proteinweaver goes under the following url. Simply add your API call after the last backslash: http://localhost:3000/api/ Ensure that you are setting the response as a POST response If you require parameters in your API call, make sure to set the body, configure as raw and JSON mode, and then ensure the JSON body is in the correct format (See the example previously when testing out Postman) If you get a 200 OK response and you\u2019ve inspected the response body to what you expect, then you have completed the backend portion. Step: 6 Add a New Page Now that we have linked the backend with the Neo4j database through the API call, we will create a React webpage with a button that lets a user execute our new query. Here is a general overview of adding a new page and a new API query: Navigate to client/src/pages and create a new page named NewPage.jsx . Examine the other pages in this directory and copy the content from TestingPage.jsx into the blank NewPage.jsx . Replace TestingPage() with the name of the new page you created: NewPage() . Add Button to Execute Query Navigate to client/src/main.jsx and add the NewPage component to the main website by importing it and creating a route. Import the component by adding this below the other import statements: import NewPage from \"./pages/NewPage.jsx\"; . Copy one of the route snippets and replace the path and element with \"/newpage\" and . Navigate to client/src/components/ and add a new component by creating a page named NewQuery.jsx . This document will be where we add the API query and do other styling. Copy these imports to the top of the page and create the NewQuery component: import React, { useState, useEffect } from \"react\"; // create component export default function NewQuery() { }; Now go back to the first page you created NewPage.jsx . Import the NewQuery component with import NewQuery from \"../components/NewQuery.jsx\"; . Within the central
    add to place the component within the NewPage. Go to the previous Service that you created with your own Neo4j Query from earlier. Modify the return statement within the first try section of your service to return network.records.map((record) => record.get('n')); to extract only the data on the nodes that your query returned. Finally, add a useEffect hook that will execute your API query when you load the page. Inside of the set of \"{ }\" brackets in NewQuery() { } copy the following code to execute your query on refresh: // create empty object to store query results const [nodeNames, setNodeNames] = useState([]); // execute query on page reload useEffect(() => { fetch(\"/api/newQuery\") .then((res) => res.json()) .then((data) => { const names = data.map((item) => item.properties.name); // extract just names setNodeNames(names); }) .catch((error) => { console.error(\"Error fetching network data:\", error); }); }, []); // display the node names in the console (right click and inspect element) console.log(nodeNames); You can check the structure of your query response in the running server terminal. Using the object hierarchy displayed there, we extracted just the \"name\" property in the useEffect hook for displaying. You should now have a blank page at http://localhost:5173/newpage that allows you to see the names of the nodes returned by your Neo4j query in the console when you inspect the page element. Add Button to Execute Query Now we will add the ability for users to execute the query on demand rather than when refreshing the page. To do this, first we will modify the useEffect statement and make it a function: // Function for submitting the query async function handleNewQuery(e) { setNodeNames([]); // reset upon execution e.preventDefault(); // prevent default form submission // copied exactly from the useEffect statement fetch(\"/api/newQuery\") .then((res) => res.json()) .then((data) => { const names = data.map((item) => item.properties.name); setNodeNames(names); }) .catch((error) => { console.error(\"Error fetching network data:\", error); }); // functions must return something, since we executed everything and assigned node names already we just return return; } Next we will create a New Query button that executes our new function when clicked. Place this inside of the { } brackets of NewQuery() { } after everything else. A React component is like any other function, it must end in a return statement. The return statement holds everything that the user will actually interact with and is where we will style things as well. return (
    ); Now we should have a button that will set the node results in the console only after we have pressed it. Now lets display the information to the users without having to inspect the element. Copy the following code below the inside of the
    : {nodeNames.map((name, index) => (

    {index + 1}: {name}

    ))} We are now displaying a list of the node names ordered by their index. Congratulations, you have now created a new webpage with full connection to the Neo4j database! Add New Page Icon to NavBar Let's finish off by doing some styling and adding a new icon to the NavBar. Navigate to client/src/components/NavBar.jsx and copy one of the
  • snippets and paste it below another. Create a new link to your page by replacing the old link with . Now rename the icon by typing \"New\" within the
    . Finally navigate to https://react-icons.github.io/react-icons/ and choose your favorite icon. I will be using the GiTigerHead icon for mine! Add the relevant import statement to the top of the NavBar page: import { GiTigerHead } from \"react-icons/gi\"; . Next replace the icon component in the code that you copied from earlier with the name of the new one. In my case I put . Congratulations, you have now completed the contributing guide!","title":"Contributing Guide"},{"location":"contributing-guide/#contributing-guide","text":"This is the guide for getting started with ProteinWeaver and will set you up to contribute to whichever aspects of ProteinWeaver interest you.","title":"Contributing Guide"},{"location":"contributing-guide/#step-1-fork-installation","text":"ProteinWeaver uses a Dockerized version of Neo4j as the database. Follow these instructions to install Docker Desktop. We will also be using GitHub to contribute to ProteinWeaver. It is recommended to install GitHub Desktop because of its easy user interface. Then you will need to fork the contributing-guide branch of the ProteinWeaver GitHub repository to get the Zebrafish datasets and the base code for the front and backends in your own repository. Once forked, clone the repository to your local desktop so that you have access to ProteinWeaver locally.","title":"Step 1: Fork & Installation"},{"location":"contributing-guide/#step-2-data-import","text":"The following section will be using a bash terminal to set up the Dockerized Neo4j environment. Open the Docker Desktop application. Navigate to a terminal window and pull the official Neo4j Docker image with the following command: docker pull neo4j Create a folder in your root directory named neo4j : Within the new ~/neo4j directory create the following directories: ~/neo4j/data/ to allow storage of database state between Docker instances ~/neo4j/logs/ to allow storage of logs between Docker instances ~/neo4j/import/ to store data for import ~/neo4j/plugins/ to store any necessary plugins for production environments Copy over all of the files in the cloned ProteinWeaver /data/tutorial directory to ~/neo4j/import/ . Create a Neo4j Docker instance with GDS and APOC plugins using the following command: docker run \\ --name proteinweaver \\ -p7474:7474 -p7687:7687 \\ -v $HOME/neo4j/data:/data \\ -v $HOME/neo4j/logs:/logs \\ -v $HOME/neo4j/import:/import \\ -v $HOME/neo4j/plugins:/plugins \\ --env NEO4J_AUTH=none \\ -e NEO4J_apoc_export_file_enabled=true \\ -e NEO4J_apoc_import_file_enabled=true \\ -e NEO4J_apoc_import_file_use__neo4j__config=true \\ -e NEO4J_PLUGINS='[\"graph-data-science\"]' \\ -e NEO4JLABS_PLUGINS=\\[\\\"apoc\\\"\\] \\ neo4j:5.12.0-community-bullseye This docker instance has no security restrictions, to change username and password edit: --env NEO4J_AUTH=username/password Access the docker image at http://localhost:7474 in your browser. Once in the Neo4j Browser, create constraints before data import. We use NCBI as the source of the unique taxon identifiers. CREATE CONSTRAINT txid_constraint FOR (n:protein) REQUIRE (n.txid, n.id) IS UNIQUE; Create a constraint for the GO terms in the database using the following command: CREATE CONSTRAINT go_constraint FOR (n:go_term) REQUIRE n.id IS UNIQUE; Import D. rerio protein interactome with the following command: :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_interactome_Mar12_2024.txt' AS zfish FIELDTERMINATOR '\\t' CALL { with zfish MERGE (a:protein {id: zfish.uniprotID1, name: zfish.name1, txid: \"txid7955\", species: \"Danio rerio\"}) MERGE (b:protein {id: zfish.uniprotID2, name: zfish.name2, txid: \"txid7955\", species: \"Danio rerio\"}) MERGE (a)-[r:ProPro]-(b) } IN TRANSACTIONS OF 100 ROWS; Set a relationship property for the evidence :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_interactome_Mar12_2024.txt' AS zfish FIELDTERMINATOR '\\t' CALL { with zfish MATCH (s:protein {id: zfish.uniprotID1, txid: \"txid7955\"})-[r:ProPro]-(t:protein {id: zfish.uniprotID2, txid: \"txid7955\"}) SET r.evidence = zfish.evidence } IN TRANSACTIONS OF 1000 ROWS; Add GO data to D. rerio nodes: :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_GO_data_Mar12_24.tsv' AS zfishgo FIELDTERMINATOR '\\t' CALL { with zfishgo MATCH (n:protein {id: zfishgo.GENE_PRODUCT_ID, txid: \"txid7955\"}) MERGE (g:go_term {id: zfishgo.GO_TERM}) MERGE (n)-[r:ProGo]-(g) } IN TRANSACTIONS OF 1000 ROWS; Set qualifier property for D. rerio . :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_GO_data_Mar12_24.tsv' AS zfishgo FIELDTERMINATOR '\\t' CALL { with zfishgo MATCH (p:protein {id: zfishgo.GENE_PRODUCT_ID, txid: \"txid7955\"})-[r:ProGo]-(g:go_term {id: zfishgo.GO_TERM}) SET r.relationship = zfishgo.QUALIFIER } IN TRANSACTIONS OF 1000 ROWS; The last step is calling a graph projection for pathfinding algorithms. We also have to change the ProPro edges to be undirected for the pathfinding algorithms in order to be more biologically accurate for protein-protein interaction networks. CALL gds.graph.project('proGoGraph',['go_term', 'protein'],['ProGo', 'ProPro']); CALL gds.graph.relationships.toUndirected( 'proGoGraph', {relationshipType: 'ProPro', mutateRelationshipType: 'ProProUndirected'} ) YIELD inputRelationships, relationshipsWritten;","title":"Step 2: Data Import"},{"location":"contributing-guide/#useful-commands","text":"Drop graph projection: CALL gds.graph.drop('proGoGraph') YIELD graphName; Drop constraints: DROP CONSTRAINT txid_constraint; DROP CONSTRAINT go_constraint; Delete nodes: MATCH (n:protein {txid: 'txid7955'}) DETACH DELETE n; Show database information: :schema","title":"Useful Commands:"},{"location":"contributing-guide/#step-3-create-a-new-query-in-neo4j","text":"Now that you have imported the D. rerio interaction network and annotations. It's time to explore the network and generate a new interesting query to you.","title":"Step 3: Create a New Query in Neo4j"},{"location":"contributing-guide/#first-practice-with-some-example-commands","text":"Count how many nodes there are in the database: MATCH (n) RETURN COUNT(n); Now count how many protein nodes there are: MATCH (n:protein) RETURN COUNT(n); Return the first 25 nodes in the zebrafish txid: MATCH (n:protein {txid: 'txid7955'}) RETURN n LIMIT 25; Retrieve all the species in the database: MATCH (n:protein) RETURN COLLECT(DISTINCT n.species); Find nodes with a ProGo relationship (limit 25): MATCH (p)-[r:ProGo]->(g) RETURN p, r, g LIMIT 25; Return the relationship qualifier property for the ProGo relationship (limit 25): MATCH (p)-[r:ProGo]->(g) RETURN r.relationship LIMIT 25; Update property of existing node (for fun): MATCH (n:protein {species: 'Danio rerio'}) SET n.species = 'Ranio derio'; Set species property back to proper one: MATCH (n:protein {species: 'Ranio derio'}) SET n.species = 'Danio rerio'; Now it is your turn to devise a new Cypher query. Your query should end in a RETURN statement rather than change a property. We will use this query in the next step to create a new webpage that returns and presents the results of this query on ProteinWeaver's user interface.","title":"First practice with some example commands:"},{"location":"contributing-guide/#step-4-setting-up-local-development","text":"Now that you have the Neo4j database up and running, and you have a query that you are interested in, we will now set up the frontend and backend for local development","title":"Step 4: Setting up Local Development"},{"location":"contributing-guide/#backend-server","text":"Open up a terminal window and go to the server director inside the protein-weaver directory. We want to install npm which is responsible for building the necessary packages of the server. We will use a version manager for node, called nvm. This is helpful as it allows you to install multiple versions of node. More information about nvm can be found here . Follow the following commands in your terminal export NVM_DIR=\"$([ -z \"${XDG_CONFIG_HOME-}\" ] && printf %s \"${HOME}/.nvm\" || printf %s \"${XDG_CONFIG_HOME}/nvm\")\" [ -s \"$NVM_DIR/nvm.sh\" ] && \\. \"$NVM_DIR/nvm.sh\" # This loads nvm nvm use nvm install npm install npm start # This starts our node.js server for our backend If everything goes smoothly, you will get a message saying \u201cServer listening on http://localhost:3000/\u201d If you also want to test that the API functionality is working, you can go to the following URL and it should say that you have successfully connected to the backend API: http://localhost:3000/api/test","title":"Backend Server"},{"location":"contributing-guide/#frontend","text":"Open up another terminal window, and go to the client directory in the protein-weaver directory. Do the following commands in the terminal window: nvm use nvm install npm install npm run dev # This will start our frontend instance If everything goes smoothly, you should be greeted with a message from VITE, and that it is running on the local host of http://localhost:5173/ To summarize, we have set up neo4j and populated the database with D. rerio, created a query that we are interested in, and then set up the backend and frontend of protein-weaver for local development. The three localhost urls are found below - Neo4j: http://localhost:7474/browser/ - Backend: http://localhost:3000/api/test - Frontend: http://localhost:5173/","title":"Frontend"},{"location":"contributing-guide/#step-5-create-a-new-page-with-query","text":"","title":"Step 5: Create a New Page with Query"},{"location":"contributing-guide/#create-new-api-call","text":"This section aims to create a new API call in the backend, utilizing the neo4j query you made previously. Before we start implementing a new API call, it is important to have a better understanding of how the backend codebase looks like for proteinweaver. We will go through the important files in the backend:","title":"Create New API Call"},{"location":"contributing-guide/#src","text":"Within the server directory, there is another folder called src, which contains important files that sets up the node.js server. You will generally never need to make changes within this folder. index.js is responsible for initializing node.js server, and also the neo4j driver that will be used to make the connection to the database. The neo4j.js file contains the driver. constants.js store variables including ports, url, and neo4j credentials.","title":"/src"},{"location":"contributing-guide/#env","text":"Within the server folder, we also have a file called .env which outlines the neo4j credentials we need.","title":".env"},{"location":"contributing-guide/#routes","text":"The routes folder contains routes.js which houses all the API calls we use for proteinweaver. The router can take in multiple requests, including POST or GET requests. It is helpful to understand the general structure of setting up an API call, and we will use the example below. This API call is responsible for, given a list of nodes, provide us the average degree value. //Example of API call in routes.js router.post(\"/getAvgDegree\", jsonParser, async (req, res, next) => { const data = req.body; const nodeList = data.nodeList; const species = data.species; try { const avgDegreeService = new AvgDegreeService(getDriver()); const avgDegree = await avgDegreeService.getAvgDegree(species, nodeList); console.log(\"Average Degree:\"); console.log(avgDegree); res.json(avgDegree); } catch (e) { next(e); } }); We use the route.post() function to create a new POST API call. It takes in three parameters, first the API call\u2019s URL, the parser we use, and the request, response and next variables The req.body holds the information that the API caller has provided. This usually comes in the form of a JSON request body, and in this case this if the following body: {\"nodeList\": [\"FBgn0003731\",\"FBgn0031972\",\"FBgn0264492\",\"FBgn0000499\",\"FBgn0001139\"],\"species\": \"txid7227\"} The try-catch statement is used to capture potential errors and throw them in an appropriate manner. The try portion of the statement creates a new variable called avgDegreeService by using a class AvgDegreeService. This class is defined in a file called avg.degree.service.js in the service folder, and it is responsible for utilizing the neo4j driver, creating a query call with some parameters, and getting the response. The class contains the function getAvgDegree which takes in two parameters, species and nodeList We use the await key because this is a type of Promise. This essentially tells the program to wait until we get the output from the avgDegreeService.getAvgDegree() function. Finally, we set the response in res.json to be the variable avgDegree","title":"/routes"},{"location":"contributing-guide/#services","text":"The service folder contains the heart of all the dependent functions the routes.js file needs. This is where you will be adding a new neo4j query as a function that will then be called into a new route in routes.js. Before that, it is helpful to understand the general structure of what a service file is, and we will use avg.degree.service.js as an example. //avg.degree.service.js file export default class AvgDegreeService { /** * @type {neo4j.Driver} */ driver; /** * The constructor expects an instance of the Neo4j Driver, which will be * used to interact with Neo4j. * * @param {neo4j.Driver} driver */ constructor(driver) { this.driver = driver; } async getAvgDegree(speciesInput, nodeList) { const session = this.driver.session(); const res = await session.executeRead((tx) => tx.run( ` MATCH (p:protein {txid: $speciesInput}) WHERE p.id IN toStringList($nodeList) WITH p MATCH (p)-[r:ProPro]-() WITH p, count(r) as degree RETURN avg(degree) as averageDegree; `, { speciesInput: speciesInput, nodeList: nodeList, } ) ); const deg = res.records; await session.close(); return deg; } } This file creates a call called AvgDegreeService, and requires the neo4j driver we initialized in src/neo4j.js as a variable in the constructor We create an async method (which is why we need the await keyword when we call the method) called getAvgDegree, which takes in the two parameters. You first have to initialize the neo4j driver session, and then we execute a read on the database with a neo4j query. Everything inside tx.run() is where you place the neo4j query. Notice that within the query, we use variables as the txid and the nodelist. These variables are paired in the portion after the neo4j query. Finally we close the neo4j session and return the res.records in a variable.","title":"/services"},{"location":"contributing-guide/#testing-api-using-postman","text":"We can test this API call in many ways but one that is common is using Postman . Postman allows you to create API requests without the need of a frontend server. You can download the app or use the browser. We will test out the getAvgDegree API Call with the following steps: Create a new workspace in Postman. Select POST as the request type, and use http://localhost:3000/api/getAvgDegree as the URL We need to set the body of the request. Navigate to the body tab and set the body as raw and JSON. Now use the following example as the input: {\"nodeList\": [\"FBgn0003731\",\"FBgn0031972\",\"FBgn0264492\",\"FBgn0000499\",\"FBgn0001139\"],\"species\": \"txid7227\"} When you are ready, click the send button. If it is successful you should get a 200 OK response and within the response body a value of 354.4 for the average node degree. Below includes a visualization that summarises the key parts of the backend server. Now that you have a better understanding about how API calls are made and how to test them, we can now implement a new API call that will use the neo4j query you made previously.","title":"Testing API using Postman"},{"location":"contributing-guide/#adding-new-api-call","text":"Create a new file in the service directory. You can duplicate the avg.degree.service.js file and rename it to something that represents your query. Within the file, rename the class name to something that represents your query. Rename the method \u201cgetAvgDegree\u201d to something that represents your query. Change the parameters of the method to include what you need for your query. (You may not need any in your parameters if you are hardcoding a query) Place your neo4j query inside of tx.run() You can delete the part where speciesInput and nodeList are paired if you do not have any parameters. If you do have parameters, make sure you pair the parameters properly with the neo4j query. You are now done with setting up your service file for your API call Create a new API call in router.js. You can use the /getAvgDegree API call as reference. Set the API URL to a name that represents your query If your API call will need some parameters, set the correct variables in the request body, just like how getAvgDegree did it with nodeList and species Create a new instance of the service class you made previously like AvgDegreeService with the neo4j driver Call your method in the service class, and making sure if you need the parameters, you order it correctly Finally make sure the res.json function has the correct variable. Test out your API call using Postman All API calls in proteinweaver goes under the following url. Simply add your API call after the last backslash: http://localhost:3000/api/ Ensure that you are setting the response as a POST response If you require parameters in your API call, make sure to set the body, configure as raw and JSON mode, and then ensure the JSON body is in the correct format (See the example previously when testing out Postman) If you get a 200 OK response and you\u2019ve inspected the response body to what you expect, then you have completed the backend portion.","title":"Adding new API Call"},{"location":"contributing-guide/#step-6-add-a-new-page","text":"Now that we have linked the backend with the Neo4j database through the API call, we will create a React webpage with a button that lets a user execute our new query. Here is a general overview of adding a new page and a new API query: Navigate to client/src/pages and create a new page named NewPage.jsx . Examine the other pages in this directory and copy the content from TestingPage.jsx into the blank NewPage.jsx . Replace TestingPage() with the name of the new page you created: NewPage() .","title":"Step: 6 Add a New Page"},{"location":"contributing-guide/#add-button-to-execute-query","text":"Navigate to client/src/main.jsx and add the NewPage component to the main website by importing it and creating a route. Import the component by adding this below the other import statements: import NewPage from \"./pages/NewPage.jsx\"; . Copy one of the route snippets and replace the path and element with \"/newpage\" and . Navigate to client/src/components/ and add a new component by creating a page named NewQuery.jsx . This document will be where we add the API query and do other styling. Copy these imports to the top of the page and create the NewQuery component: import React, { useState, useEffect } from \"react\"; // create component export default function NewQuery() { }; Now go back to the first page you created NewPage.jsx . Import the NewQuery component with import NewQuery from \"../components/NewQuery.jsx\"; . Within the central
    add to place the component within the NewPage. Go to the previous Service that you created with your own Neo4j Query from earlier. Modify the return statement within the first try section of your service to return network.records.map((record) => record.get('n')); to extract only the data on the nodes that your query returned. Finally, add a useEffect hook that will execute your API query when you load the page. Inside of the set of \"{ }\" brackets in NewQuery() { } copy the following code to execute your query on refresh: // create empty object to store query results const [nodeNames, setNodeNames] = useState([]); // execute query on page reload useEffect(() => { fetch(\"/api/newQuery\") .then((res) => res.json()) .then((data) => { const names = data.map((item) => item.properties.name); // extract just names setNodeNames(names); }) .catch((error) => { console.error(\"Error fetching network data:\", error); }); }, []); // display the node names in the console (right click and inspect element) console.log(nodeNames); You can check the structure of your query response in the running server terminal. Using the object hierarchy displayed there, we extracted just the \"name\" property in the useEffect hook for displaying. You should now have a blank page at http://localhost:5173/newpage that allows you to see the names of the nodes returned by your Neo4j query in the console when you inspect the page element.","title":"Add Button to Execute Query"},{"location":"contributing-guide/#add-button-to-execute-query_1","text":"Now we will add the ability for users to execute the query on demand rather than when refreshing the page. To do this, first we will modify the useEffect statement and make it a function: // Function for submitting the query async function handleNewQuery(e) { setNodeNames([]); // reset upon execution e.preventDefault(); // prevent default form submission // copied exactly from the useEffect statement fetch(\"/api/newQuery\") .then((res) => res.json()) .then((data) => { const names = data.map((item) => item.properties.name); setNodeNames(names); }) .catch((error) => { console.error(\"Error fetching network data:\", error); }); // functions must return something, since we executed everything and assigned node names already we just return return; } Next we will create a New Query button that executes our new function when clicked. Place this inside of the { } brackets of NewQuery() { } after everything else. A React component is like any other function, it must end in a return statement. The return statement holds everything that the user will actually interact with and is where we will style things as well. return (
    ); Now we should have a button that will set the node results in the console only after we have pressed it. Now lets display the information to the users without having to inspect the element. Copy the following code below the inside of the
    : {nodeNames.map((name, index) => (

    {index + 1}: {name}

    ))} We are now displaying a list of the node names ordered by their index. Congratulations, you have now created a new webpage with full connection to the Neo4j database!","title":"Add Button to Execute Query"},{"location":"contributing-guide/#add-new-page-icon-to-navbar","text":"Let's finish off by doing some styling and adding a new icon to the NavBar. Navigate to client/src/components/NavBar.jsx and copy one of the
  • snippets and paste it below another. Create a new link to your page by replacing the old link with . Now rename the icon by typing \"New\" within the
    . Finally navigate to https://react-icons.github.io/react-icons/ and choose your favorite icon. I will be using the GiTigerHead icon for mine! Add the relevant import statement to the top of the NavBar page: import { GiTigerHead } from \"react-icons/gi\"; . Next replace the icon component in the code that you copied from earlier with the name of the new one. In my case I put . Congratulations, you have now completed the contributing guide!","title":"Add New Page Icon to NavBar"},{"location":"data-version/","text":"ProteinWeaver Data Log & Version This section of the documentation outlines the data sources, processing steps and versions of the ProteinWeaver web interface. Drosophila melanogaster Data Sources 2023-09-29 (BETA): Interaction data: interactome-flybase-collapsed-weighted.txt (Source) GO association data: gene_association.fb (Source) 2024-03-18: GO association data: dmel_GO_data_Mar15_24.tsv (Source) Downloaded and merged data together in scripts/SubColNames.R . FlyBase IDs from UniProt IDs for mapping: idmapping_2024_03_18.tsv (Source) Downloaded from UniProt and merged with GO data from QuickGO to match the FlyBase naming convention. Renamed columns to \"GENE_PRODUCT_ID\" and \"FB_ID\" and merged in scripts/SubColNames.R . 2024-04-01: Added 415,493 inferred ProGo edges using a Cypher command. 2024-04-03: GO association data: gene_association_fb_2024-04-03.tsv dmel_GO_data_2024-04-03.tsv Removed qualifiers with \"NOT\" preceding them using `scripts/RemoveNotQualifier.R Reduced inferred ProGo edges to 413,704. Current D. melanogaster Network | Proteins | Interactions (ProPro) | Annotations (ProGo) | | -------- | --------------------- | :------------------ | | 11501 | 233054 | 510962 | Bacillus subtilis Data Sources 2023-10-18 (BETA): Interaction data: bsub_interactome.csv Source Exported the \u201cInteraction\u201d set and renamed to bsub_interactome.csv . GO association data: subtiwiki.gene.export.2023-10-18.tsv processed and merged into bsub_GO_data.csv (Source) Exported the \u201cGene\u201d set with all options selected. Processed and merged the file according to scripts/JoinBSUtoUniProt.R . bsub_go_uniprot.tsv (Source) Selected all annotations for B. subtilis and used the following bash command to download: wget 'https://golr-aux.geneontology.io/solr/select?defType=edismax&qt=standard&indent=on&wt=csv&rows=100000&start=0&fl=source,bioentity_internal_id,bioentity_label,qualifier,annotation_class,reference,evidence_type,evidence_with,aspect,bioentity_name,synonym,type,taxon,date,assigned_by,annotation_extension_class,bioentity_isoform&facet=true&facet.mincount=1&facet.sort=count&json.nl=arrarr&facet.limit=25&hl=true&hl.simple.pre=%3Cem%20class=%22hilite%22%3E&hl.snippets=1000&csv.encapsulator=&csv.separator=%09&csv.header=false&csv.mv.separator=%7C&fq=document_category:%22annotation%22&fq=taxon_subset_closure_label:%22Bacillus%20subtilis%20subsp.%20subtilis%20str.%20168%22&facet.field=aspect&facet.field=taxon_subset_closure_label&facet.field=type&facet.field=evidence_subset_closure_label&facet.field=regulates_closure_label&facet.field=isa_partof_closure_label&facet.field=annotation_class_label&facet.field=qualifier&facet.field=annotation_extension_class_closure_label&facet.field=assigned_by&facet.field=panther_family_label&q=*:*' File was renamed to bsub_go_uniprot.tsv , processed and merged into bsub_GO_data.csv according to the scripts/JoinBSUtoUniProt.R file. 2024-03-18: GO association data: bsub_GO_data_Mar18_24.tsv (Source) Downloaded and merged data together in scripts/SubColNames.R and imported with data/README.md . BSU IDs from UniProt IDs for mapping: subtiwiki.gene.export.2024-03-18.tsv (Source) Selected BSU and UniProt outlinks from menu and exported. Renamed columns to \"GENE_PRODUCT_ID\" and \"BSU_ID\" to remove special characters. Merged in scripts/SubColNames.R . 2024-04-01: Added 39,215 inferred ProGo edges using a Cypher command. 2024-04-03: No \"NOT\" qualifiers were found in the dataset so there were no changes to the B. subtilis data structure during this update. 2024-06-11: Added new interaction data from STRING-DB . Downloaded physical interactions full 224308.protein.physical.links.full.v12.0.txt and 224308.protein.info.v12.0.txt and merged both into interactome_txid224308_2024-06-06.txt and cleaned according to BsubDataMerging.Rmd . Added updated GO term edges for B. subtilis after new data import. Downloaded all reviewed annotations from QuickGO ([Source])(https://www.ebi.ac.uk/QuickGO/annotations?taxonId=224308&taxonUsage=descendants&geneProductSubset=Swiss-Prot&geneProductType=protein) and downloaded UniProt and BSU ID mapper subtiwiki.gene.export.2024-06-03.tsv from SubtiWiki . Merged the two into annotations_txid224308_2024-06-03.txt according to BsubDataMerging.Rmd . 2024-06-24: Remove \"self-edges\" from PPI data. Current B. subtilis Network | Proteins | Interactions (ProPro) | Annotations (ProGo) | | -------- | --------------------- | :------------------ | | 1933 | 6441 | 65063 | Danio rerio Data Sources 2024-03-18: Interaction data: zfish_string_db_results.csv merged into zfish_interactome_Mar12_2024.txt . (Source) Downloaded file 7955.protein.physical.links.full.v12.0.txt.gz from String-DB and filtered to experimentally validated, database-curated, and textmined interactions according to scripts/ZebrafishDataMerging.Rmd . Mar. 12, 2024. 7955.protein.aliases.v12.0.txt merged into zfish_interactome_Mar12_2024.txt (Source) Downloaded file from String-DB to provide UniProt IDs for STRING-DB aliases. zfish_psicquic_results.csv merged into zfish_interactome_Mar12_2024.txt (Source) Used a Python script scripts/GetXML.ipynb to scrape all entries for \u201c Danio rerio \u201d from the REST API. Removed all tags that were in between the first and last instance. All tags but the first were removed from the file. Got data for interactions and interactors and converted XML format to JSON using scripts/get-interactions.js and scripts/get-interactors.js . Converted the resulting JSON files to CSV using a free online convertor . Merged interactions.csv and interactors.csv into zfish_psicquic_results.csv using scripts/ZebrafishDataMerging.Rmd . Some UniProt IDs were found from the IntAct entry using the IntAct ID as documented in the Rmd. zfish_id_mapper.tsv merged into zfish_interactome_Mar12_2024.txt (Source) Retrieved updated UniProt entries and common names for 11,765 entries. 2781 protein entries were found to be obsolete, thus did not have a name available on UniProt. These were removed and separated into their own dataset. The resulting dataset had 6,438 unique proteins. zfish_gene_names.tsv merged into zfish_interactome_Mar12_2024.txt (Source) Retrieved gene names for 6,438 D. rerio proteins zfish_unique_protein_ids_Mar12_24.txt from UniProt's name mapping service. For entries with a \"gene name\", the gene name was used as the name, for those without a gene name, the first portion of the \"protein name\" was used. This was decided to ensure uniqueness for the node names in the user interface. Merged all D. rerio data together into one master file using the instructions in scripts/ZebrafishDataMerging.Rmd . GO Association Data: zfish_GO_data_Mar12_24.tsv (Source) Used QuickGO to get all 65,876 \"Reviewed\" GO annotations for D. rerio . Replaced the \" \" in headers with \"_\" to ease data import. 2024-04-01: Added 86,304 inferred ProGo edges using a Cypher command. 2024-04-03: GO association data: zfish_GO_data_2024-04-03.tsv Removed qualifiers with \"NOT\" preceding them using `scripts/RemoveNotQualifier.R Reduced inferred ProGo edges to 86,216. 2024-06-11: Added alt_name parameter to Neo4j import statement. 2024-06-24: Remove trailing whitespaces from some names according to ZebrafishDataMerging.Rmd . Remove \"self-edges\" from PPI data. Current D. rerio Network | Proteins | Interactions (ProPro) | Annotations (ProGo) | | -------- | --------------------- | :------------------ | | 6438 | 45003 | 108758 | Gene Ontology Hierarchy Data Sources 2023-09-29: Common name: go.obo processed into go.txt (Source) Used wget to download the file. Processed the file using scripts/ParseOBOtoTXT.ipynb . Relationships: go.obo processed into is_a_import.tsv Processed the file using scripts/ParseOntologyRelationship.ipynb . go.obo processed into relationship_import.tsv Processed the file using scripts/ParseOntologyRelationship.ipynb . 2024-03-28: goNeverAnnotate.txt joined with go.txt into go_2024-03-28.txt Joined the data together with scripts/GeneOntologyNeverAnnotate.R . gocheck_do_not_annotate.txt parsed from gocheck_do_not_annotate.obo using scripts/ParseOBOtoTXT.ipynb and merged into go_2024-03-28.txt . Gene Ontology Data Structure | GO Terms | \"is_a\" Relationships (GoGo) | | -------- | :-------------------------- | | 42854 | 68308 | Taxon ID source: NCBI taxonomy browser Looked up species name and got taxon ID. Versioning & Dates 2023-09-29 -- 2024-03-17 (BETA): Imported weighted D. melanogaster interactome and FlyBase annotations. Imported raw GO data and \"is_a\" relationships. 2024-03-18: Added D. rerio protein interactome and GO association data. Updated B. subtilis and D. melanogaster GO association networks with QuickGO data. 2024-03-28: Added blacklist indicator to GO term nodes that should never have an annotation. 2024-04-01: Added inferred ProGo edges from descendant ProGo edges. This means that proteins annotated to a specific GO term, such as Mbs to enzyme inhibitor activity, will also be annotated to that GO term's ancestors, such as molecular function inhibitor activity and molecular_function. | Species | Inferred Edges | | --------------- | :------------- | | D. melanogaster | 415,493 | | B. subtilis | 39,215 | | D. rerio | 86,304 | | Total | 541,012 | 2024-04-03: Removed \"NOT\" qualifiers (those that should not be explicitly annotated to the GO term due to experimental or other evidence) from all GO assocation datasets. Repropogated the \"inferred_from_descendant\" edges to ensure no false propogation of \"NOT\" qualifiers. | Species | Inferred Edges | | --------------- | :------------- | | D. melanogaster | 413,704 | | B. subtilis | 39,215 | | D. rerio | 86,216 | | Total | 539,135 | 2024-06-11: Added B. subtilis interaction data from STRING-DB and updated QuickGO annotations. Added alt_name parameters to B. subtilis and D. rerio nodes. | Species | Inferred Edges | | --------------- | :------------- | | D. melanogaster | 413,704 | | B. subtilis | 54,270 | | D. rerio | 86,216 | | Total | 554,190 | 2024-06-24: Removed trailing whitespaces from D. rerio data. Removed \"self-edges\" i.e., interactions between two copies of the same protein to improve path algorithm performance. 309 \"self-edges\" were removed from the data from B. subtilis and D. rerio .","title":"Data Log & Version"},{"location":"data-version/#proteinweaver-data-log-version","text":"This section of the documentation outlines the data sources, processing steps and versions of the ProteinWeaver web interface.","title":"ProteinWeaver Data Log & Version"},{"location":"data-version/#drosophila-melanogaster-data-sources","text":"","title":"Drosophila melanogaster Data Sources"},{"location":"data-version/#2023-09-29-beta","text":"","title":"2023-09-29 (BETA):"},{"location":"data-version/#interaction-data","text":"interactome-flybase-collapsed-weighted.txt (Source)","title":"Interaction data:"},{"location":"data-version/#go-association-data","text":"gene_association.fb (Source)","title":"GO association data:"},{"location":"data-version/#2024-03-18","text":"","title":"2024-03-18:"},{"location":"data-version/#go-association-data_1","text":"dmel_GO_data_Mar15_24.tsv (Source) Downloaded and merged data together in scripts/SubColNames.R .","title":"GO association data:"},{"location":"data-version/#flybase-ids-from-uniprot-ids-for-mapping","text":"idmapping_2024_03_18.tsv (Source) Downloaded from UniProt and merged with GO data from QuickGO to match the FlyBase naming convention. Renamed columns to \"GENE_PRODUCT_ID\" and \"FB_ID\" and merged in scripts/SubColNames.R .","title":"FlyBase IDs from UniProt IDs for mapping:"},{"location":"data-version/#2024-04-01","text":"Added 415,493 inferred ProGo edges using a Cypher command.","title":"2024-04-01:"},{"location":"data-version/#2024-04-03","text":"","title":"2024-04-03:"},{"location":"data-version/#go-association-data_2","text":"gene_association_fb_2024-04-03.tsv dmel_GO_data_2024-04-03.tsv Removed qualifiers with \"NOT\" preceding them using `scripts/RemoveNotQualifier.R Reduced inferred ProGo edges to 413,704.","title":"GO association data:"},{"location":"data-version/#current-d-melanogaster-network","text":"| Proteins | Interactions (ProPro) | Annotations (ProGo) | | -------- | --------------------- | :------------------ | | 11501 | 233054 | 510962 |","title":"Current D. melanogaster Network"},{"location":"data-version/#bacillus-subtilis-data-sources","text":"","title":"Bacillus subtilis Data Sources"},{"location":"data-version/#2023-10-18-beta","text":"","title":"2023-10-18 (BETA):"},{"location":"data-version/#interaction-data_1","text":"bsub_interactome.csv Source Exported the \u201cInteraction\u201d set and renamed to bsub_interactome.csv .","title":"Interaction data:"},{"location":"data-version/#go-association-data_3","text":"subtiwiki.gene.export.2023-10-18.tsv processed and merged into bsub_GO_data.csv (Source) Exported the \u201cGene\u201d set with all options selected. Processed and merged the file according to scripts/JoinBSUtoUniProt.R . bsub_go_uniprot.tsv (Source) Selected all annotations for B. subtilis and used the following bash command to download: wget 'https://golr-aux.geneontology.io/solr/select?defType=edismax&qt=standard&indent=on&wt=csv&rows=100000&start=0&fl=source,bioentity_internal_id,bioentity_label,qualifier,annotation_class,reference,evidence_type,evidence_with,aspect,bioentity_name,synonym,type,taxon,date,assigned_by,annotation_extension_class,bioentity_isoform&facet=true&facet.mincount=1&facet.sort=count&json.nl=arrarr&facet.limit=25&hl=true&hl.simple.pre=%3Cem%20class=%22hilite%22%3E&hl.snippets=1000&csv.encapsulator=&csv.separator=%09&csv.header=false&csv.mv.separator=%7C&fq=document_category:%22annotation%22&fq=taxon_subset_closure_label:%22Bacillus%20subtilis%20subsp.%20subtilis%20str.%20168%22&facet.field=aspect&facet.field=taxon_subset_closure_label&facet.field=type&facet.field=evidence_subset_closure_label&facet.field=regulates_closure_label&facet.field=isa_partof_closure_label&facet.field=annotation_class_label&facet.field=qualifier&facet.field=annotation_extension_class_closure_label&facet.field=assigned_by&facet.field=panther_family_label&q=*:*' File was renamed to bsub_go_uniprot.tsv , processed and merged into bsub_GO_data.csv according to the scripts/JoinBSUtoUniProt.R file.","title":"GO association data:"},{"location":"data-version/#2024-03-18_1","text":"","title":"2024-03-18:"},{"location":"data-version/#go-association-data_4","text":"bsub_GO_data_Mar18_24.tsv (Source) Downloaded and merged data together in scripts/SubColNames.R and imported with data/README.md .","title":"GO association data:"},{"location":"data-version/#bsu-ids-from-uniprot-ids-for-mapping","text":"subtiwiki.gene.export.2024-03-18.tsv (Source) Selected BSU and UniProt outlinks from menu and exported. Renamed columns to \"GENE_PRODUCT_ID\" and \"BSU_ID\" to remove special characters. Merged in scripts/SubColNames.R .","title":"BSU IDs from UniProt IDs for mapping:"},{"location":"data-version/#2024-04-01_1","text":"Added 39,215 inferred ProGo edges using a Cypher command.","title":"2024-04-01:"},{"location":"data-version/#2024-04-03_1","text":"No \"NOT\" qualifiers were found in the dataset so there were no changes to the B. subtilis data structure during this update.","title":"2024-04-03:"},{"location":"data-version/#2024-06-11","text":"Added new interaction data from STRING-DB . Downloaded physical interactions full 224308.protein.physical.links.full.v12.0.txt and 224308.protein.info.v12.0.txt and merged both into interactome_txid224308_2024-06-06.txt and cleaned according to BsubDataMerging.Rmd . Added updated GO term edges for B. subtilis after new data import. Downloaded all reviewed annotations from QuickGO ([Source])(https://www.ebi.ac.uk/QuickGO/annotations?taxonId=224308&taxonUsage=descendants&geneProductSubset=Swiss-Prot&geneProductType=protein) and downloaded UniProt and BSU ID mapper subtiwiki.gene.export.2024-06-03.tsv from SubtiWiki . Merged the two into annotations_txid224308_2024-06-03.txt according to BsubDataMerging.Rmd .","title":"2024-06-11:"},{"location":"data-version/#2024-06-24","text":"Remove \"self-edges\" from PPI data.","title":"2024-06-24:"},{"location":"data-version/#current-b-subtilis-network","text":"| Proteins | Interactions (ProPro) | Annotations (ProGo) | | -------- | --------------------- | :------------------ | | 1933 | 6441 | 65063 |","title":"Current B. subtilis Network"},{"location":"data-version/#danio-rerio-data-sources","text":"","title":"Danio rerio Data Sources"},{"location":"data-version/#2024-03-18_2","text":"","title":"2024-03-18:"},{"location":"data-version/#interaction-data_2","text":"zfish_string_db_results.csv merged into zfish_interactome_Mar12_2024.txt . (Source) Downloaded file 7955.protein.physical.links.full.v12.0.txt.gz from String-DB and filtered to experimentally validated, database-curated, and textmined interactions according to scripts/ZebrafishDataMerging.Rmd . Mar. 12, 2024. 7955.protein.aliases.v12.0.txt merged into zfish_interactome_Mar12_2024.txt (Source) Downloaded file from String-DB to provide UniProt IDs for STRING-DB aliases. zfish_psicquic_results.csv merged into zfish_interactome_Mar12_2024.txt (Source) Used a Python script scripts/GetXML.ipynb to scrape all entries for \u201c Danio rerio \u201d from the REST API. Removed all tags that were in between the first and last instance. All tags but the first were removed from the file. Got data for interactions and interactors and converted XML format to JSON using scripts/get-interactions.js and scripts/get-interactors.js . Converted the resulting JSON files to CSV using a free online convertor . Merged interactions.csv and interactors.csv into zfish_psicquic_results.csv using scripts/ZebrafishDataMerging.Rmd . Some UniProt IDs were found from the IntAct entry using the IntAct ID as documented in the Rmd. zfish_id_mapper.tsv merged into zfish_interactome_Mar12_2024.txt (Source) Retrieved updated UniProt entries and common names for 11,765 entries. 2781 protein entries were found to be obsolete, thus did not have a name available on UniProt. These were removed and separated into their own dataset. The resulting dataset had 6,438 unique proteins. zfish_gene_names.tsv merged into zfish_interactome_Mar12_2024.txt (Source) Retrieved gene names for 6,438 D. rerio proteins zfish_unique_protein_ids_Mar12_24.txt from UniProt's name mapping service. For entries with a \"gene name\", the gene name was used as the name, for those without a gene name, the first portion of the \"protein name\" was used. This was decided to ensure uniqueness for the node names in the user interface. Merged all D. rerio data together into one master file using the instructions in scripts/ZebrafishDataMerging.Rmd .","title":"Interaction data:"},{"location":"data-version/#go-association-data_5","text":"zfish_GO_data_Mar12_24.tsv (Source) Used QuickGO to get all 65,876 \"Reviewed\" GO annotations for D. rerio . Replaced the \" \" in headers with \"_\" to ease data import.","title":"GO Association Data:"},{"location":"data-version/#2024-04-01_2","text":"Added 86,304 inferred ProGo edges using a Cypher command.","title":"2024-04-01:"},{"location":"data-version/#2024-04-03_2","text":"","title":"2024-04-03:"},{"location":"data-version/#go-association-data_6","text":"zfish_GO_data_2024-04-03.tsv Removed qualifiers with \"NOT\" preceding them using `scripts/RemoveNotQualifier.R Reduced inferred ProGo edges to 86,216.","title":"GO association data:"},{"location":"data-version/#2024-06-11_1","text":"Added alt_name parameter to Neo4j import statement.","title":"2024-06-11:"},{"location":"data-version/#2024-06-24_1","text":"Remove trailing whitespaces from some names according to ZebrafishDataMerging.Rmd . Remove \"self-edges\" from PPI data.","title":"2024-06-24:"},{"location":"data-version/#current-d-rerio-network","text":"| Proteins | Interactions (ProPro) | Annotations (ProGo) | | -------- | --------------------- | :------------------ | | 6438 | 45003 | 108758 |","title":"Current D. rerio Network"},{"location":"data-version/#gene-ontology-hierarchy-data-sources","text":"","title":"Gene Ontology Hierarchy Data Sources"},{"location":"data-version/#2023-09-29","text":"","title":"2023-09-29:"},{"location":"data-version/#common-name","text":"go.obo processed into go.txt (Source) Used wget to download the file. Processed the file using scripts/ParseOBOtoTXT.ipynb .","title":"Common name:"},{"location":"data-version/#relationships","text":"go.obo processed into is_a_import.tsv Processed the file using scripts/ParseOntologyRelationship.ipynb . go.obo processed into relationship_import.tsv Processed the file using scripts/ParseOntologyRelationship.ipynb .","title":"Relationships:"},{"location":"data-version/#2024-03-28","text":"goNeverAnnotate.txt joined with go.txt into go_2024-03-28.txt Joined the data together with scripts/GeneOntologyNeverAnnotate.R . gocheck_do_not_annotate.txt parsed from gocheck_do_not_annotate.obo using scripts/ParseOBOtoTXT.ipynb and merged into go_2024-03-28.txt .","title":"2024-03-28:"},{"location":"data-version/#gene-ontology-data-structure","text":"| GO Terms | \"is_a\" Relationships (GoGo) | | -------- | :-------------------------- | | 42854 | 68308 |","title":"Gene Ontology Data Structure"},{"location":"data-version/#taxon-id-source","text":"NCBI taxonomy browser Looked up species name and got taxon ID.","title":"Taxon ID source:"},{"location":"data-version/#versioning-dates","text":"","title":"Versioning & Dates"},{"location":"data-version/#2023-09-29-2024-03-17-beta","text":"Imported weighted D. melanogaster interactome and FlyBase annotations. Imported raw GO data and \"is_a\" relationships.","title":"2023-09-29 -- 2024-03-17 (BETA):"},{"location":"data-version/#2024-03-18_3","text":"Added D. rerio protein interactome and GO association data. Updated B. subtilis and D. melanogaster GO association networks with QuickGO data.","title":"2024-03-18:"},{"location":"data-version/#2024-03-28_1","text":"Added blacklist indicator to GO term nodes that should never have an annotation.","title":"2024-03-28:"},{"location":"data-version/#2024-04-01_3","text":"Added inferred ProGo edges from descendant ProGo edges. This means that proteins annotated to a specific GO term, such as Mbs to enzyme inhibitor activity, will also be annotated to that GO term's ancestors, such as molecular function inhibitor activity and molecular_function. | Species | Inferred Edges | | --------------- | :------------- | | D. melanogaster | 415,493 | | B. subtilis | 39,215 | | D. rerio | 86,304 | | Total | 541,012 |","title":"2024-04-01:"},{"location":"data-version/#2024-04-03_3","text":"Removed \"NOT\" qualifiers (those that should not be explicitly annotated to the GO term due to experimental or other evidence) from all GO assocation datasets. Repropogated the \"inferred_from_descendant\" edges to ensure no false propogation of \"NOT\" qualifiers. | Species | Inferred Edges | | --------------- | :------------- | | D. melanogaster | 413,704 | | B. subtilis | 39,215 | | D. rerio | 86,216 | | Total | 539,135 |","title":"2024-04-03:"},{"location":"data-version/#2024-06-11_2","text":"Added B. subtilis interaction data from STRING-DB and updated QuickGO annotations. Added alt_name parameters to B. subtilis and D. rerio nodes. | Species | Inferred Edges | | --------------- | :------------- | | D. melanogaster | 413,704 | | B. subtilis | 54,270 | | D. rerio | 86,216 | | Total | 554,190 |","title":"2024-06-11:"},{"location":"data-version/#2024-06-24_2","text":"Removed trailing whitespaces from D. rerio data. Removed \"self-edges\" i.e., interactions between two copies of the same protein to improve path algorithm performance. 309 \"self-edges\" were removed from the data from B. subtilis and D. rerio .","title":"2024-06-24:"},{"location":"setup/","text":"Setup The setup guide will include instructions for creating the front and backenbd local dev environments (database, server, and client). Backend Database ProteinWeaver uses a Dockerized version of Neo4j as the database. Follow these instructions to install Docker Desktop. Once installed continue with the following steps: Pull the official Neo4j Docker image. docker pull neo4j Create a directory in your $HOME named neo4j Within ~/neo4j directory create the following directories: ~/neo4j/data/ to allow storage of database state between Docker instances ~/neo4j/logs/ to allow storage of logs between Docker instances ~/neo4j/import/ to store data for import ~/neo4j/plugins/ to store any necessary plugins for production environments Download the most recent datasets from the /import directory on GitHub and place them inside of your ~/neo4j/import/ local directory. These are all the prerequisite files you will need for this tutorial and will be updated as new versions are released. Create a Docker instance with GDS and APOC plugins using the following command: docker run \\ --name proteinweaver \\ -p7474:7474 -p7687:7687 \\ -v $HOME/neo4j/data:/data \\ -v $HOME/neo4j/logs:/logs \\ -v $HOME/neo4j/import:/import \\ -v $HOME/neo4j/plugins:/plugins \\ --env NEO4J_AUTH=none \\ -e NEO4J_apoc_export_file_enabled=true \\ -e NEO4J_apoc_import_file_enabled=true \\ -e NEO4J_apoc_import_file_use__neo4j__config=true \\ -e NEO4J_PLUGINS='[\"graph-data-science\"]' \\ -e NEO4JLABS_PLUGINS=\\[\\\"apoc\\\"\\] \\ neo4j:5.12.0-community-bullseye This example Docker instance has no security restrictions, to set a username and password edit this line in the previous command: --env NEO4J_AUTH=username/password Access the Docker image at http://localhost:7474 . You will need to input the username and password you defined in the run command. Create constraints before data import. We use NCBI as the source of the unique taxon identifiers: CREATE CONSTRAINT txid_constraint FOR (n:protein) REQUIRE (n.txid, n.id) IS UNIQUE; CREATE CONSTRAINT go_constraint FOR (n:go_term) REQUIRE n.id IS UNIQUE; D. melanogaster imports Import D. melanogaster protein interactome using the following command: :auto LOAD CSV WITH HEADERS FROM 'file:///interactome-flybase-collapsed-weighted.txt' AS fly FIELDTERMINATOR '\\t' CALL { with fly MERGE (a:protein {id: fly.FlyBase1, name: fly.symbol1, txid: \"txid7227\", species: \"Drosophila melanogaster\"}) MERGE (b:protein {id: fly.FlyBase2, name: fly.symbol2, txid: \"txid7227\", species: \"Drosophila melanogaster\"}) MERGE (a)-[r:ProPro]-(b) } IN TRANSACTIONS OF 100 ROWS; Set the alt_name parameter as the same as the name. MATCH (n:protein {txid: \"txid7227\"}) SET n.alt_name = n.name; Import the first batch of D. melanogaster GO data from FlyBase into the database using the following command: :auto LOAD CSV WITH HEADERS FROM 'file:///gene_association_fb_2024-04-03.tsv' AS flygo FIELDTERMINATOR '\\t' CALL { with flygo MATCH (n:protein {id: flygo.db_object_id, txid:\"txid7227\"}) MERGE (g:go_term {id: flygo.go_id}) MERGE (n)-[r:ProGo]-(g) } IN TRANSACTIONS OF 1000 ROWS; Import the relationships qualifiers for the first batch of GO terms and D. melanogaster proteins using the following commands: :auto LOAD CSV WITH HEADERS FROM 'file:///gene_association_fb_2024-04-03.tsv' AS flygo FIELDTERMINATOR '\\t' CALL { with flygo MATCH (p:protein {id: flygo.db_object_id, txid:\"txid7227\"})-[r:ProGo]-(g:go_term {id: flygo.go_id}) SET r.relationship = flygo.qualifier } IN TRANSACTIONS OF 1000 ROWS; Import more GO data for D. melanogaster :auto LOAD CSV WITH HEADERS FROM 'file:///dmel_GO_data_2024-04-03.tsv' AS dmelgo FIELDTERMINATOR '\\t' CALL { with dmelgo MATCH (n:protein {id: dmelgo.FB_ID, txid: \"txid7227\"}) MERGE (g:go_term {id: dmelgo.GO_TERM}) MERGE (n)-[r:ProGo]-(g) } IN TRANSACTIONS OF 1000 ROWS; Set second batch of qualifier properties for D. melanogaster . :auto LOAD CSV WITH HEADERS FROM 'file:///dmel_GO_data_2024-04-03.tsv' AS dmelgo FIELDTERMINATOR '\\t' CALL { with dmelgo MATCH (p:protein {id: dmelgo.FB_ID, txid: \"txid7227\"})-[r:ProGo]-(g:go_term {id: dmelgo.GO_TERM}) SET r.relationship = dmelgo.QUALIFIER } IN TRANSACTIONS OF 1000 ROWS; B. subtilis imports Import B. subtilis protein interactome with the following command: :auto LOAD CSV WITH HEADERS FROM 'file:///interactome_txid224308_2024-06-06.txt' AS bsub FIELDTERMINATOR '\\t' CALL { with bsub MERGE (a:protein {id: bsub.protein_1_locus, name: bsub.protein_1_name, alt_name: bsub.protein_1_alt_name, txid: \"txid224308\", species: \"Bacillus subtilis 168\"}) MERGE (b:protein {id: bsub.protein_2_locus, name: bsub.protein_2_name, alt_name: bsub.protein_2_alt_name, txid: \"txid224308\", species: \"Bacillus subtilis 168\"}) MERGE (a)-[r:ProPro]-(b) } IN TRANSACTIONS OF 100 ROWS; Add first batch of GO data from SubtiWiki to B. subtilis nodes: :auto LOAD CSV WITH HEADERS FROM 'file:///bsub_GO_data.csv' AS bsubgo CALL { with bsubgo MATCH (n:protein {id: bsubgo.locus, txid: \"txid224308\"}) MERGE (g:go_term {id: bsubgo.go_term}) MERGE (n)-[r:ProGo]-(g) } IN TRANSACTIONS OF 1000 ROWS; Set qualifier property from first batch of GO data for B. subtilis . :auto LOAD CSV WITH HEADERS FROM 'file:///bsub_GO_data.csv' AS bsubgo CALL { with bsubgo MATCH (p:protein {id: bsubgo.locus, txid: \"txid224308\"})-[r:ProGo]-(g:go_term {id: bsubgo.go_term}) SET r.relationship = bsubgo.qualifier } IN TRANSACTIONS OF 1000 ROWS; Import more GO data for B. subtilis :auto LOAD CSV WITH HEADERS FROM 'file:///annotations_txid224308_2024-06-03.txt' AS bsubgo FIELDTERMINATOR '\\t' CALL { with bsubgo MATCH (n:protein {id: bsubgo.BSU_ID, txid: \"txid224308\"}) MERGE (g:go_term {id: bsubgo.GO_TERM}) MERGE (n)-[r:ProGo]-(g) } IN TRANSACTIONS OF 1000 ROWS; Set qualifier property for second batch of GO data ( B. subtilis ). :auto LOAD CSV WITH HEADERS FROM 'file:///annotations_txid224308_2024-06-03.txt' AS bsubgo FIELDTERMINATOR '\\t' CALL { with bsubgo MATCH (p:protein {id: bsubgo.BSU_ID, txid: \"txid224308\"})-[r:ProGo]-(g:go_term {id: bsubgo.GO_TERM}) SET r.relationship = bsubgo.QUALIFIER } IN TRANSACTIONS OF 1000 ROWS; D. rerio imports Import D. rerio protein interactome with the following command: :auto LOAD CSV WITH HEADERS FROM 'file:///interactome_txid7955_2024-06-24.txt' AS zfish FIELDTERMINATOR '\\t' CALL { with zfish MERGE (a:protein {id: zfish.uniprotID1, name: zfish.name1, alt_name: zfish.alt_name1, txid: \"txid7955\", species: \"Danio rerio\"}) MERGE (b:protein {id: zfish.uniprotID2, name: zfish.name2, alt_name: zfish.alt_name2, txid: \"txid7955\", species: \"Danio rerio\"}) MERGE (a)-[r:ProPro]-(b) } IN TRANSACTIONS OF 100 ROWS; Add GO data to D. rerio nodes: :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_GO_data_2024-04-03.tsv' AS zfishgo FIELDTERMINATOR '\\t' CALL { with zfishgo MATCH (n:protein {id: zfishgo.GENE_PRODUCT_ID, txid: \"txid7955\"}) MERGE (g:go_term {id: zfishgo.GO_TERM}) MERGE (n)-[r:ProGo]-(g) } IN TRANSACTIONS OF 1000 ROWS; Set qualifier property for D. rerio . :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_GO_data_2024-04-03.tsv' AS zfishgo FIELDTERMINATOR '\\t' CALL { with zfishgo MATCH (p:protein {id: zfishgo.GENE_PRODUCT_ID, txid: \"txid7955\"})-[r:ProGo]-(g:go_term {id: zfishgo.GO_TERM}) SET r.relationship = zfishgo.QUALIFIER } IN TRANSACTIONS OF 1000 ROWS; Gene Ontology hierarchy imports Import the GO hierarchy with the following command: :auto LOAD CSV WITH HEADERS FROM 'file:///is_a_import.tsv' AS go FIELDTERMINATOR '\\t' CALL { with go MERGE (a:go_term {id: go.id}) MERGE (b:go_term {id: go.id2}) MERGE (a)-[r:GoGo]->(b) SET r.relationship = go.is_a } IN TRANSACTIONS OF 100 ROWS; Import the GO term common names and descriptions with the following Cypher command: :auto LOAD CSV WITH HEADERS FROM 'file:///go_2024-03-28.txt' AS go FIELDTERMINATOR '\\t' CALL { with go MATCH (n:go_term {id: go.id}) SET n.name = go.name, n.namespace = go.namespace, n.def = go.def } IN TRANSACTIONS OF 1000 ROWS; Add blacklist indicator to GO term nodes: :auto LOAD CSV WITH HEADERS FROM 'file:///go_2024-03-28.txt' AS go FIELDTERMINATOR '\\t' CALL { with go MATCH (n:go_term {id: go.id}) SET n.never_annotate = go.never_annotate } IN TRANSACTIONS OF 1000 ROWS; Propogation of ancestral ProGo edges Add ancestral edges for D. rerio . MATCH (p:protein {txid: 'txid7955'})-[:ProGo]-(g:go_term) WITH p, collect(g) AS go_terms UNWIND go_terms as go_input MATCH (p)-[:ProGo]-(g:go_term {id: go_input.id})-[:GoGo*]->(g2) WITH p, collect(distinct g2) AS parent_terms UNWIND parent_terms AS parent_term MERGE (p)-[r:ProGo]-(parent_term) Add ancestral edges for B. subtilis . MATCH (p:protein {txid: 'txid224308'})-[:ProGo]-(g:go_term) WITH p, collect(g) AS go_terms UNWIND go_terms as go_input MATCH (p)-[:ProGo]-(g:go_term {id: go_input.id})-[:GoGo*]->(g2) WITH p, collect(distinct g2) AS parent_terms UNWIND parent_terms AS parent_term MERGE (p)-[r:ProGo]-(parent_term) Add ancestral edges for D. melanogaster . MATCH (p:protein {txid: 'txid7227'})-[:ProGo]-(g:go_term) WITH p, collect(g) AS go_terms UNWIND go_terms as go_input MATCH (p)-[:ProGo]-(g:go_term {id: go_input.id})-[:GoGo*]->(g2) WITH p, collect(distinct g2) AS parent_terms UNWIND parent_terms AS parent_term MERGE (p)-[r:ProGo]-(parent_term) Add qualifiers for new ProGo edges for each species. MATCH (p:protein {txid: 'txid7227'})-[r:ProGo]-(g:go_term) WHERE r.relationship IS NULL SET r.relationship = \"inferred_from_descendant\" MATCH (p:protein {txid: 'txid224308'})-[r:ProGo]-(g:go_term) WHERE r.relationship IS NULL SET r.relationship = \"inferred_from_descendant\" MATCH (p:protein {txid: 'txid7955'})-[r:ProGo]-(g:go_term) WHERE r.relationship IS NULL SET r.relationship = \"inferred_from_descendant\" Now remove all the Protein-Protein edges from the same protein to itself with the following command (these edges may causes issues with our path algorithms). MATCH (p:protein)-[rel:ProPro]-(p) DETACH DELETE rel; The last step is calling a graph projection for pathfinding algorithms. We also have to change the ProPro edges to be undirected for the pathfinding algorithms in order to be more biologically accurate for protein-protein interaction networks. CALL gds.graph.project('proGoGraph',['go_term', 'protein'],['ProGo', 'ProPro']); CALL gds.graph.relationships.toUndirected( 'proGoGraph', {relationshipType: 'ProPro', mutateRelationshipType: 'ProProUndirected'} ) YIELD inputRelationships, relationshipsWritten; Backend Server The backend server is run using Express.js. To setup the server continue with the following steps: Open a new terminal window and clone the ProteinWeaver GitHub repository. Locate the server directory: cd server Next we need to install node.js , and the recommended way is to use a Node Version Manager. Follow the NVM GitHub instructions before proceeding. The correct version is outlined in the .nvmrc file in both of the client and server directories. Follow the command below to use the correct version. nvm use If you do not have the correct version, install it with the following command: npm install You can verify your node version is now correct with the following command: node -v Finally, to start the server enter: npm start The server should be running on http://localhost:3000/ . There are several APIs, and you can verify it works by using http://localhost:3000/api/test which should output a JSON object. Please keep the terminal window open. Frontend Client The client uses the React.js framework, and uses Vite.js as a bundler. Open a new terminal window and navigate to the cloned ProteinWeaver Github repository. Locate the client directory with the following bash command: cd client Similar to the backend server setup, we need to use and install the correct node.js version. Follow the command below to use the correct version. nvm use If you do not have the correct version, install it with the following command: npm install You can verify your node version is now correct with the following command: node -v Lastly, start the client with the following command: npm run dev ProteinWeaver should now be up and running on http://localhost:5173/ ! Verify Guide Once you have completed the guide, you can use the following query to verify that the database matches the most updated version (AS OF 2024-05-06): match (fly:protein {txid :\"txid7227\"}) WITH COUNT(fly) AS flyCount match (bsub:protein {txid :\"txid224308\"}) WITH flyCount, COUNT(bsub) AS bsubCount match (drerio:protein {txid :\"txid7955\"}) WITH flyCount, bsubCount, COUNT(drerio) AS drerioCount match (go:go_term) WITH flyCount, bsubCount, drerioCount, COUNT(go) AS goCount match (fly1 {txid :\"txid7227\"}) -[flyProPro:ProPro]- (fly2 {txid :\"txid7227\"}) WITH flyCount, bsubCount, drerioCount, goCount, COUNT(flyProPro)/2 AS flyProProCount match (bsub1 {txid :\"txid224308\"}) -[bsubProPro:ProPro]- (bsub2 {txid :\"txid224308\"}) WITH flyCount, bsubCount, drerioCount, goCount, flyProProCount, COUNT(bsubProPro)/2 AS bsubProProCount match (drerio1 {txid :\"txid7955\"}) -[drerioProPro:ProPro]- (drerio2 {txid :\"txid7955\"}) WITH flyCount, bsubCount, drerioCount, goCount, flyProProCount, bsubProProCount, COUNT(drerioProPro)/2 AS drerioProProCount match (go1:go_term) -[goGoGo:GoGo]- (go2:go_term) WITH flyCount, bsubCount, drerioCount, goCount, flyProProCount, bsubProProCount, drerioProProCount, COUNT(goGoGo)/2 AS goGoGoCount match (fly:protein {txid :\"txid7227\"}) -[flyProGo:ProGo]- (go) WITH flyCount, bsubCount, drerioCount, goCount, flyProProCount, bsubProProCount, drerioProProCount, goGoGoCount, COUNT(flyProGo) AS flyProGoCount match (bsub:protein {txid :\"txid224308\"}) -[bsubProGo:ProGo]- (go) WITH flyCount, bsubCount, drerioCount, goCount, flyProProCount, bsubProProCount, drerioProProCount, goGoGoCount,flyProGoCount, COUNT(bsubProGo) AS bsubProGoCount match (drerio:protein {txid :\"txid7955\"}) -[drerioProGo:ProGo]- (go) WITH flyCount, bsubCount, drerioCount, goCount, flyProProCount, bsubProProCount, drerioProProCount, goGoGoCount,flyProGoCount, bsubProGoCount, COUNT(drerioProGo) AS drerioProGoCount RETURN flyCount, flyProProCount, flyProGoCount, bsubCount, bsubProProCount, bsubProGoCount, drerioCount, drerioProProCount, drerioProGoCount, goCount, goGoGoCount You should get the following output: \u2552\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2555 \u2502flyCount\u2502flyProProCount\u2502flyProGoCount\u2502bsubCount\u2502bsubProProCount\u2502bsubProGoCount\u2502drerioCount\u2502drerioProProCount\u2502drerioProGoCount\u2502goCount\u2502goGoGoCount\u2502 \u255e\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2561 \u250211501 \u2502233054 \u2502510962 \u25021933 \u25026441 \u250265063 \u25026438 \u250245003 \u2502108758 \u250242861 \u250268308 \u2502 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 Useful Commands Delete nodes: MATCH (n:protein {txid: \"example\", species: \"example\"}) DETACH DELETE n Drop constraints: DROP CONSTRAINT constraint Drop graph projection: CALL gds.graph.drop('proGoGraph') YIELD graphName Show database information: :schema","title":"Setup"},{"location":"setup/#setup","text":"The setup guide will include instructions for creating the front and backenbd local dev environments (database, server, and client).","title":"Setup"},{"location":"setup/#backend-database","text":"ProteinWeaver uses a Dockerized version of Neo4j as the database. Follow these instructions to install Docker Desktop. Once installed continue with the following steps: Pull the official Neo4j Docker image. docker pull neo4j Create a directory in your $HOME named neo4j Within ~/neo4j directory create the following directories: ~/neo4j/data/ to allow storage of database state between Docker instances ~/neo4j/logs/ to allow storage of logs between Docker instances ~/neo4j/import/ to store data for import ~/neo4j/plugins/ to store any necessary plugins for production environments Download the most recent datasets from the /import directory on GitHub and place them inside of your ~/neo4j/import/ local directory. These are all the prerequisite files you will need for this tutorial and will be updated as new versions are released. Create a Docker instance with GDS and APOC plugins using the following command: docker run \\ --name proteinweaver \\ -p7474:7474 -p7687:7687 \\ -v $HOME/neo4j/data:/data \\ -v $HOME/neo4j/logs:/logs \\ -v $HOME/neo4j/import:/import \\ -v $HOME/neo4j/plugins:/plugins \\ --env NEO4J_AUTH=none \\ -e NEO4J_apoc_export_file_enabled=true \\ -e NEO4J_apoc_import_file_enabled=true \\ -e NEO4J_apoc_import_file_use__neo4j__config=true \\ -e NEO4J_PLUGINS='[\"graph-data-science\"]' \\ -e NEO4JLABS_PLUGINS=\\[\\\"apoc\\\"\\] \\ neo4j:5.12.0-community-bullseye This example Docker instance has no security restrictions, to set a username and password edit this line in the previous command: --env NEO4J_AUTH=username/password Access the Docker image at http://localhost:7474 . You will need to input the username and password you defined in the run command. Create constraints before data import. We use NCBI as the source of the unique taxon identifiers: CREATE CONSTRAINT txid_constraint FOR (n:protein) REQUIRE (n.txid, n.id) IS UNIQUE; CREATE CONSTRAINT go_constraint FOR (n:go_term) REQUIRE n.id IS UNIQUE;","title":"Backend Database"},{"location":"setup/#d-melanogaster-imports","text":"Import D. melanogaster protein interactome using the following command: :auto LOAD CSV WITH HEADERS FROM 'file:///interactome-flybase-collapsed-weighted.txt' AS fly FIELDTERMINATOR '\\t' CALL { with fly MERGE (a:protein {id: fly.FlyBase1, name: fly.symbol1, txid: \"txid7227\", species: \"Drosophila melanogaster\"}) MERGE (b:protein {id: fly.FlyBase2, name: fly.symbol2, txid: \"txid7227\", species: \"Drosophila melanogaster\"}) MERGE (a)-[r:ProPro]-(b) } IN TRANSACTIONS OF 100 ROWS; Set the alt_name parameter as the same as the name. MATCH (n:protein {txid: \"txid7227\"}) SET n.alt_name = n.name; Import the first batch of D. melanogaster GO data from FlyBase into the database using the following command: :auto LOAD CSV WITH HEADERS FROM 'file:///gene_association_fb_2024-04-03.tsv' AS flygo FIELDTERMINATOR '\\t' CALL { with flygo MATCH (n:protein {id: flygo.db_object_id, txid:\"txid7227\"}) MERGE (g:go_term {id: flygo.go_id}) MERGE (n)-[r:ProGo]-(g) } IN TRANSACTIONS OF 1000 ROWS; Import the relationships qualifiers for the first batch of GO terms and D. melanogaster proteins using the following commands: :auto LOAD CSV WITH HEADERS FROM 'file:///gene_association_fb_2024-04-03.tsv' AS flygo FIELDTERMINATOR '\\t' CALL { with flygo MATCH (p:protein {id: flygo.db_object_id, txid:\"txid7227\"})-[r:ProGo]-(g:go_term {id: flygo.go_id}) SET r.relationship = flygo.qualifier } IN TRANSACTIONS OF 1000 ROWS; Import more GO data for D. melanogaster :auto LOAD CSV WITH HEADERS FROM 'file:///dmel_GO_data_2024-04-03.tsv' AS dmelgo FIELDTERMINATOR '\\t' CALL { with dmelgo MATCH (n:protein {id: dmelgo.FB_ID, txid: \"txid7227\"}) MERGE (g:go_term {id: dmelgo.GO_TERM}) MERGE (n)-[r:ProGo]-(g) } IN TRANSACTIONS OF 1000 ROWS; Set second batch of qualifier properties for D. melanogaster . :auto LOAD CSV WITH HEADERS FROM 'file:///dmel_GO_data_2024-04-03.tsv' AS dmelgo FIELDTERMINATOR '\\t' CALL { with dmelgo MATCH (p:protein {id: dmelgo.FB_ID, txid: \"txid7227\"})-[r:ProGo]-(g:go_term {id: dmelgo.GO_TERM}) SET r.relationship = dmelgo.QUALIFIER } IN TRANSACTIONS OF 1000 ROWS;","title":"D. melanogaster imports"},{"location":"setup/#b-subtilis-imports","text":"Import B. subtilis protein interactome with the following command: :auto LOAD CSV WITH HEADERS FROM 'file:///interactome_txid224308_2024-06-06.txt' AS bsub FIELDTERMINATOR '\\t' CALL { with bsub MERGE (a:protein {id: bsub.protein_1_locus, name: bsub.protein_1_name, alt_name: bsub.protein_1_alt_name, txid: \"txid224308\", species: \"Bacillus subtilis 168\"}) MERGE (b:protein {id: bsub.protein_2_locus, name: bsub.protein_2_name, alt_name: bsub.protein_2_alt_name, txid: \"txid224308\", species: \"Bacillus subtilis 168\"}) MERGE (a)-[r:ProPro]-(b) } IN TRANSACTIONS OF 100 ROWS; Add first batch of GO data from SubtiWiki to B. subtilis nodes: :auto LOAD CSV WITH HEADERS FROM 'file:///bsub_GO_data.csv' AS bsubgo CALL { with bsubgo MATCH (n:protein {id: bsubgo.locus, txid: \"txid224308\"}) MERGE (g:go_term {id: bsubgo.go_term}) MERGE (n)-[r:ProGo]-(g) } IN TRANSACTIONS OF 1000 ROWS; Set qualifier property from first batch of GO data for B. subtilis . :auto LOAD CSV WITH HEADERS FROM 'file:///bsub_GO_data.csv' AS bsubgo CALL { with bsubgo MATCH (p:protein {id: bsubgo.locus, txid: \"txid224308\"})-[r:ProGo]-(g:go_term {id: bsubgo.go_term}) SET r.relationship = bsubgo.qualifier } IN TRANSACTIONS OF 1000 ROWS; Import more GO data for B. subtilis :auto LOAD CSV WITH HEADERS FROM 'file:///annotations_txid224308_2024-06-03.txt' AS bsubgo FIELDTERMINATOR '\\t' CALL { with bsubgo MATCH (n:protein {id: bsubgo.BSU_ID, txid: \"txid224308\"}) MERGE (g:go_term {id: bsubgo.GO_TERM}) MERGE (n)-[r:ProGo]-(g) } IN TRANSACTIONS OF 1000 ROWS; Set qualifier property for second batch of GO data ( B. subtilis ). :auto LOAD CSV WITH HEADERS FROM 'file:///annotations_txid224308_2024-06-03.txt' AS bsubgo FIELDTERMINATOR '\\t' CALL { with bsubgo MATCH (p:protein {id: bsubgo.BSU_ID, txid: \"txid224308\"})-[r:ProGo]-(g:go_term {id: bsubgo.GO_TERM}) SET r.relationship = bsubgo.QUALIFIER } IN TRANSACTIONS OF 1000 ROWS;","title":"B. subtilis imports"},{"location":"setup/#d-rerio-imports","text":"Import D. rerio protein interactome with the following command: :auto LOAD CSV WITH HEADERS FROM 'file:///interactome_txid7955_2024-06-24.txt' AS zfish FIELDTERMINATOR '\\t' CALL { with zfish MERGE (a:protein {id: zfish.uniprotID1, name: zfish.name1, alt_name: zfish.alt_name1, txid: \"txid7955\", species: \"Danio rerio\"}) MERGE (b:protein {id: zfish.uniprotID2, name: zfish.name2, alt_name: zfish.alt_name2, txid: \"txid7955\", species: \"Danio rerio\"}) MERGE (a)-[r:ProPro]-(b) } IN TRANSACTIONS OF 100 ROWS; Add GO data to D. rerio nodes: :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_GO_data_2024-04-03.tsv' AS zfishgo FIELDTERMINATOR '\\t' CALL { with zfishgo MATCH (n:protein {id: zfishgo.GENE_PRODUCT_ID, txid: \"txid7955\"}) MERGE (g:go_term {id: zfishgo.GO_TERM}) MERGE (n)-[r:ProGo]-(g) } IN TRANSACTIONS OF 1000 ROWS; Set qualifier property for D. rerio . :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_GO_data_2024-04-03.tsv' AS zfishgo FIELDTERMINATOR '\\t' CALL { with zfishgo MATCH (p:protein {id: zfishgo.GENE_PRODUCT_ID, txid: \"txid7955\"})-[r:ProGo]-(g:go_term {id: zfishgo.GO_TERM}) SET r.relationship = zfishgo.QUALIFIER } IN TRANSACTIONS OF 1000 ROWS;","title":"D. rerio imports"},{"location":"setup/#gene-ontology-hierarchy-imports","text":"Import the GO hierarchy with the following command: :auto LOAD CSV WITH HEADERS FROM 'file:///is_a_import.tsv' AS go FIELDTERMINATOR '\\t' CALL { with go MERGE (a:go_term {id: go.id}) MERGE (b:go_term {id: go.id2}) MERGE (a)-[r:GoGo]->(b) SET r.relationship = go.is_a } IN TRANSACTIONS OF 100 ROWS; Import the GO term common names and descriptions with the following Cypher command: :auto LOAD CSV WITH HEADERS FROM 'file:///go_2024-03-28.txt' AS go FIELDTERMINATOR '\\t' CALL { with go MATCH (n:go_term {id: go.id}) SET n.name = go.name, n.namespace = go.namespace, n.def = go.def } IN TRANSACTIONS OF 1000 ROWS; Add blacklist indicator to GO term nodes: :auto LOAD CSV WITH HEADERS FROM 'file:///go_2024-03-28.txt' AS go FIELDTERMINATOR '\\t' CALL { with go MATCH (n:go_term {id: go.id}) SET n.never_annotate = go.never_annotate } IN TRANSACTIONS OF 1000 ROWS;","title":"Gene Ontology hierarchy imports"},{"location":"setup/#propogation-of-ancestral-progo-edges","text":"Add ancestral edges for D. rerio . MATCH (p:protein {txid: 'txid7955'})-[:ProGo]-(g:go_term) WITH p, collect(g) AS go_terms UNWIND go_terms as go_input MATCH (p)-[:ProGo]-(g:go_term {id: go_input.id})-[:GoGo*]->(g2) WITH p, collect(distinct g2) AS parent_terms UNWIND parent_terms AS parent_term MERGE (p)-[r:ProGo]-(parent_term) Add ancestral edges for B. subtilis . MATCH (p:protein {txid: 'txid224308'})-[:ProGo]-(g:go_term) WITH p, collect(g) AS go_terms UNWIND go_terms as go_input MATCH (p)-[:ProGo]-(g:go_term {id: go_input.id})-[:GoGo*]->(g2) WITH p, collect(distinct g2) AS parent_terms UNWIND parent_terms AS parent_term MERGE (p)-[r:ProGo]-(parent_term) Add ancestral edges for D. melanogaster . MATCH (p:protein {txid: 'txid7227'})-[:ProGo]-(g:go_term) WITH p, collect(g) AS go_terms UNWIND go_terms as go_input MATCH (p)-[:ProGo]-(g:go_term {id: go_input.id})-[:GoGo*]->(g2) WITH p, collect(distinct g2) AS parent_terms UNWIND parent_terms AS parent_term MERGE (p)-[r:ProGo]-(parent_term) Add qualifiers for new ProGo edges for each species. MATCH (p:protein {txid: 'txid7227'})-[r:ProGo]-(g:go_term) WHERE r.relationship IS NULL SET r.relationship = \"inferred_from_descendant\" MATCH (p:protein {txid: 'txid224308'})-[r:ProGo]-(g:go_term) WHERE r.relationship IS NULL SET r.relationship = \"inferred_from_descendant\" MATCH (p:protein {txid: 'txid7955'})-[r:ProGo]-(g:go_term) WHERE r.relationship IS NULL SET r.relationship = \"inferred_from_descendant\" Now remove all the Protein-Protein edges from the same protein to itself with the following command (these edges may causes issues with our path algorithms). MATCH (p:protein)-[rel:ProPro]-(p) DETACH DELETE rel; The last step is calling a graph projection for pathfinding algorithms. We also have to change the ProPro edges to be undirected for the pathfinding algorithms in order to be more biologically accurate for protein-protein interaction networks. CALL gds.graph.project('proGoGraph',['go_term', 'protein'],['ProGo', 'ProPro']); CALL gds.graph.relationships.toUndirected( 'proGoGraph', {relationshipType: 'ProPro', mutateRelationshipType: 'ProProUndirected'} ) YIELD inputRelationships, relationshipsWritten;","title":"Propogation of ancestral ProGo edges"},{"location":"setup/#backend-server","text":"The backend server is run using Express.js. To setup the server continue with the following steps: Open a new terminal window and clone the ProteinWeaver GitHub repository. Locate the server directory: cd server Next we need to install node.js , and the recommended way is to use a Node Version Manager. Follow the NVM GitHub instructions before proceeding. The correct version is outlined in the .nvmrc file in both of the client and server directories. Follow the command below to use the correct version. nvm use If you do not have the correct version, install it with the following command: npm install You can verify your node version is now correct with the following command: node -v Finally, to start the server enter: npm start The server should be running on http://localhost:3000/ . There are several APIs, and you can verify it works by using http://localhost:3000/api/test which should output a JSON object. Please keep the terminal window open.","title":"Backend Server"},{"location":"setup/#frontend-client","text":"The client uses the React.js framework, and uses Vite.js as a bundler. Open a new terminal window and navigate to the cloned ProteinWeaver Github repository. Locate the client directory with the following bash command: cd client Similar to the backend server setup, we need to use and install the correct node.js version. Follow the command below to use the correct version. nvm use If you do not have the correct version, install it with the following command: npm install You can verify your node version is now correct with the following command: node -v Lastly, start the client with the following command: npm run dev ProteinWeaver should now be up and running on http://localhost:5173/ !","title":"Frontend Client"},{"location":"setup/#verify-guide","text":"Once you have completed the guide, you can use the following query to verify that the database matches the most updated version (AS OF 2024-05-06): match (fly:protein {txid :\"txid7227\"}) WITH COUNT(fly) AS flyCount match (bsub:protein {txid :\"txid224308\"}) WITH flyCount, COUNT(bsub) AS bsubCount match (drerio:protein {txid :\"txid7955\"}) WITH flyCount, bsubCount, COUNT(drerio) AS drerioCount match (go:go_term) WITH flyCount, bsubCount, drerioCount, COUNT(go) AS goCount match (fly1 {txid :\"txid7227\"}) -[flyProPro:ProPro]- (fly2 {txid :\"txid7227\"}) WITH flyCount, bsubCount, drerioCount, goCount, COUNT(flyProPro)/2 AS flyProProCount match (bsub1 {txid :\"txid224308\"}) -[bsubProPro:ProPro]- (bsub2 {txid :\"txid224308\"}) WITH flyCount, bsubCount, drerioCount, goCount, flyProProCount, COUNT(bsubProPro)/2 AS bsubProProCount match (drerio1 {txid :\"txid7955\"}) -[drerioProPro:ProPro]- (drerio2 {txid :\"txid7955\"}) WITH flyCount, bsubCount, drerioCount, goCount, flyProProCount, bsubProProCount, COUNT(drerioProPro)/2 AS drerioProProCount match (go1:go_term) -[goGoGo:GoGo]- (go2:go_term) WITH flyCount, bsubCount, drerioCount, goCount, flyProProCount, bsubProProCount, drerioProProCount, COUNT(goGoGo)/2 AS goGoGoCount match (fly:protein {txid :\"txid7227\"}) -[flyProGo:ProGo]- (go) WITH flyCount, bsubCount, drerioCount, goCount, flyProProCount, bsubProProCount, drerioProProCount, goGoGoCount, COUNT(flyProGo) AS flyProGoCount match (bsub:protein {txid :\"txid224308\"}) -[bsubProGo:ProGo]- (go) WITH flyCount, bsubCount, drerioCount, goCount, flyProProCount, bsubProProCount, drerioProProCount, goGoGoCount,flyProGoCount, COUNT(bsubProGo) AS bsubProGoCount match (drerio:protein {txid :\"txid7955\"}) -[drerioProGo:ProGo]- (go) WITH flyCount, bsubCount, drerioCount, goCount, flyProProCount, bsubProProCount, drerioProProCount, goGoGoCount,flyProGoCount, bsubProGoCount, COUNT(drerioProGo) AS drerioProGoCount RETURN flyCount, flyProProCount, flyProGoCount, bsubCount, bsubProProCount, bsubProGoCount, drerioCount, drerioProProCount, drerioProGoCount, goCount, goGoGoCount You should get the following output: \u2552\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2555 \u2502flyCount\u2502flyProProCount\u2502flyProGoCount\u2502bsubCount\u2502bsubProProCount\u2502bsubProGoCount\u2502drerioCount\u2502drerioProProCount\u2502drerioProGoCount\u2502goCount\u2502goGoGoCount\u2502 \u255e\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2561 \u250211501 \u2502233054 \u2502510962 \u25021933 \u25026441 \u250265063 \u25026438 \u250245003 \u2502108758 \u250242861 \u250268308 \u2502 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518","title":"Verify Guide"},{"location":"setup/#useful-commands","text":"Delete nodes: MATCH (n:protein {txid: \"example\", species: \"example\"}) DETACH DELETE n Drop constraints: DROP CONSTRAINT constraint Drop graph projection: CALL gds.graph.drop('proGoGraph') YIELD graphName Show database information: :schema","title":"Useful Commands"},{"location":"tech-stack/","text":"Tech Stack Frontend This section documents the structure of the frontend and outlines the important interactions. /client \u251c\u2500\u2500 public/ \u251c\u2500\u2500 src/ \u2502 \u251c\u2500\u2500 assets/ \u2502 \u251c\u2500\u2500 components/ \u2502 \u251c\u2500\u2500 layout/ \u2502 \u251c\u2500\u2500 pages/ \u2502 \u251c\u2500\u2500 App.css \u2502 \u251c\u2500\u2500 App.jsx \u2502 \u251c\u2500\u2500 index.css \u2502 \u251c\u2500\u2500 main.jsx \u251c\u2500\u2500 index.html \u251c\u2500\u2500 package.json \u251c\u2500\u2500 vite.config.js Important Files & Directories index.html serves as a way to connect our React framework to standard HTML format. package.json is where all the dependancies of our node.js config lives main.jsx is where we inject the jsx code into the root div in the index.html. This is also where the website routing is structured App.jsx can be thought of as the \"home\" page index.css provides the style of our website layout directory structures main website so that it can be browsed through dynamically pages directory populates the page using the layout components directory the bread and butter of react lives here. React follows a composable model, where we build smaller components and are able to dynamically and efficiently call them whenever they are needed. Concepts A vague list of core concepts to learn HTML CSS node.js React.js React Components react-dom-router useState useEffect Resources Full Stack Development Explained 100+ Web Development Things you Should Know How to OVER Engineer a Website // What is a Tech Stack? How to Create a Express/Node + React Project | Node Backend + React Frontend Scrimba: Learn React Backend Server This section outlines the structure of the backend server, and important concepts to understand the structure. Structure /server \u251c\u2500\u2500 routes/ \u251c\u2500\u2500 services/ \u251c\u2500\u2500 tests/ \u251c\u2500\u2500 src/ \u2502 \u251c\u2500\u2500 constants.js \u2502 \u251c\u2500\u2500 index.js \u2502 \u251c\u2500\u2500 neo4j.js \u251c\u2500\u2500 .env \u251c\u2500\u2500 package.json Important Files & Directories index.js initializes the neo4j database connection and api routing using Express.js as the server .env config file which contains information necessary to connect to the neo4j database constants.js contain config information in the form of js neo4j.js initializes a singleton instance of the neo4j driver, which is used to make API calls to the database routes.js is where API calls are created which utilizes the neo4j driver services directory contains a list of classes which contains the methods to build the API calls in routes. Concepts A Vague list of concepts that are useful to understand API calls server routing middleware backend frameworks Resources Backend web development - a complete overview","title":"Tech Stack"},{"location":"tech-stack/#tech-stack","text":"","title":"Tech Stack"},{"location":"tech-stack/#frontend","text":"This section documents the structure of the frontend and outlines the important interactions. /client \u251c\u2500\u2500 public/ \u251c\u2500\u2500 src/ \u2502 \u251c\u2500\u2500 assets/ \u2502 \u251c\u2500\u2500 components/ \u2502 \u251c\u2500\u2500 layout/ \u2502 \u251c\u2500\u2500 pages/ \u2502 \u251c\u2500\u2500 App.css \u2502 \u251c\u2500\u2500 App.jsx \u2502 \u251c\u2500\u2500 index.css \u2502 \u251c\u2500\u2500 main.jsx \u251c\u2500\u2500 index.html \u251c\u2500\u2500 package.json \u251c\u2500\u2500 vite.config.js","title":"Frontend"},{"location":"tech-stack/#important-files-directories","text":"index.html serves as a way to connect our React framework to standard HTML format. package.json is where all the dependancies of our node.js config lives main.jsx is where we inject the jsx code into the root div in the index.html. This is also where the website routing is structured App.jsx can be thought of as the \"home\" page index.css provides the style of our website layout directory structures main website so that it can be browsed through dynamically pages directory populates the page using the layout components directory the bread and butter of react lives here. React follows a composable model, where we build smaller components and are able to dynamically and efficiently call them whenever they are needed.","title":"Important Files & Directories"},{"location":"tech-stack/#concepts","text":"A vague list of core concepts to learn HTML CSS node.js React.js React Components react-dom-router useState useEffect","title":"Concepts"},{"location":"tech-stack/#resources","text":"Full Stack Development Explained 100+ Web Development Things you Should Know How to OVER Engineer a Website // What is a Tech Stack? How to Create a Express/Node + React Project | Node Backend + React Frontend Scrimba: Learn React","title":"Resources"},{"location":"tech-stack/#backend-server","text":"This section outlines the structure of the backend server, and important concepts to understand the structure.","title":"Backend Server"},{"location":"tech-stack/#structure","text":"/server \u251c\u2500\u2500 routes/ \u251c\u2500\u2500 services/ \u251c\u2500\u2500 tests/ \u251c\u2500\u2500 src/ \u2502 \u251c\u2500\u2500 constants.js \u2502 \u251c\u2500\u2500 index.js \u2502 \u251c\u2500\u2500 neo4j.js \u251c\u2500\u2500 .env \u251c\u2500\u2500 package.json","title":"Structure"},{"location":"tech-stack/#important-files-directories_1","text":"index.js initializes the neo4j database connection and api routing using Express.js as the server .env config file which contains information necessary to connect to the neo4j database constants.js contain config information in the form of js neo4j.js initializes a singleton instance of the neo4j driver, which is used to make API calls to the database routes.js is where API calls are created which utilizes the neo4j driver services directory contains a list of classes which contains the methods to build the API calls in routes.","title":"Important Files & Directories"},{"location":"tech-stack/#concepts_1","text":"A Vague list of concepts that are useful to understand API calls server routing middleware backend frameworks","title":"Concepts"},{"location":"tech-stack/#resources_1","text":"Backend web development - a complete overview","title":"Resources"}]} \ No newline at end of file +{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Welcome to ProteinWeaver Docs ProteinWeaver is a web interface for ontology-based protein network visualization. Background & Motivation Being able to explore how proteins are connected to other proteins with a specific function is a great tool for a biologists, as it allows them to quickly generate hypotheses that seeks to answer the ways that a protein is connected to a pathway or process. ProteinWeaver provides the tools for this type of exploration via an intuitive website that easily lets users query a protein and a specific function or process (as a gene ontology term ). Website Overview ProteinWeaver allows the users to input a protein of their interest, a specific function or process ( gene ontology term ), and the number of paths to output in the network. This generates a subnetwork that connects the protein of interest to the k shortest paths that include a protein labeled with the specific GO term. The network's information is summarised, including GO term description, links to proteins' and GO term AmiGO entry, and GO term qualifiers of the proteins. Exploration is possibly by easily interacting with the graph and setting new nodes as the protein of interest. Queries are easily reproduced through exporting a log history of all queries and explorations done in a session, and exporting networks via images.","title":"Home"},{"location":"#welcome-to-proteinweaver-docs","text":"ProteinWeaver is a web interface for ontology-based protein network visualization.","title":"Welcome to ProteinWeaver Docs"},{"location":"#background-motivation","text":"Being able to explore how proteins are connected to other proteins with a specific function is a great tool for a biologists, as it allows them to quickly generate hypotheses that seeks to answer the ways that a protein is connected to a pathway or process. ProteinWeaver provides the tools for this type of exploration via an intuitive website that easily lets users query a protein and a specific function or process (as a gene ontology term ).","title":"Background & Motivation"},{"location":"#website-overview","text":"ProteinWeaver allows the users to input a protein of their interest, a specific function or process ( gene ontology term ), and the number of paths to output in the network. This generates a subnetwork that connects the protein of interest to the k shortest paths that include a protein labeled with the specific GO term. The network's information is summarised, including GO term description, links to proteins' and GO term AmiGO entry, and GO term qualifiers of the proteins. Exploration is possibly by easily interacting with the graph and setting new nodes as the protein of interest. Queries are easily reproduced through exporting a log history of all queries and explorations done in a session, and exporting networks via images.","title":"Website Overview"},{"location":"contributing-guide/","text":"Contributing Guide This is the guide for getting started with ProteinWeaver and will set you up to contribute to whichever aspects of ProteinWeaver interest you. Step 1: Fork & Installation ProteinWeaver uses a Dockerized version of Neo4j as the database. Follow these instructions to install Docker Desktop. We will also be using GitHub to contribute to ProteinWeaver. It is recommended to install GitHub Desktop because of its easy user interface. Then you will need to fork the contributing-guide branch of the ProteinWeaver GitHub repository to get the Zebrafish datasets and the base code for the front and backends in your own repository. Once forked, clone the repository to your local desktop so that you have access to ProteinWeaver locally. Step 2: Data Import The following section will be using a bash terminal to set up the Dockerized Neo4j environment. Open the Docker Desktop application. Navigate to a terminal window and pull the official Neo4j Docker image with the following command: docker pull neo4j Create a folder in your root directory named neo4j : - Within the new `~/neo4j` directory create the following directories: - `~/neo4j/data/` to allow storage of database state between Docker instances - `~/neo4j/logs/` to allow storage of logs between Docker instances - `~/neo4j/import/` to store data for import - `~/neo4j/plugins/` to store any necessary plugins for production environments Copy over all of the files in the cloned ProteinWeaver /data/tutorial directory to ~/neo4j/import/ . Create a Neo4j Docker instance with GDS and APOC plugins using the following command: ```bash docker run \\ --name proteinweaver \\ -p7474:7474 -p7687:7687 \\ -v $HOME/neo4j/data:/data \\ -v $HOME/neo4j/logs:/logs \\ -v $HOME/neo4j/import:/import \\ -v $HOME/neo4j/plugins:/plugins \\ --env NEO4J_AUTH=none \\ -e NEO4J_apoc_export_file_enabled=true \\ -e NEO4J_apoc_import_file_enabled=true \\ -e NEO4J_apoc_import_file_use__neo4j__config=true \\ -e NEO4J_PLUGINS='[\"graph-data-science\"]' \\ -e NEO4JLABS_PLUGINS=\\[\\\"apoc\\\"\\] \\ neo4j:5.12.0-community-bullseye ``` This docker instance has no security restrictions, to change username and password edit: --env NEO4J_AUTH=username/password Access the docker image at http://localhost:7474 in your browser. Once in the Neo4j Browser, create constraints before data import. We use NCBI as the source of the unique taxon identifiers. Create a constraint for the proteins in the database, requiring that only one instance of each protein exists: CREATE CONSTRAINT txid_constraint FOR (n:protein) REQUIRE (n.txid, n.id) IS UNIQUE; Create a constraint for the GO terms in the database using the following command: CREATE CONSTRAINT go_constraint FOR (n:go_term) REQUIRE n.id IS UNIQUE; Import D. rerio protein interactome with the following command: ```cypher :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_interactome_Mar12_2024.txt' AS zfish FIELDTERMINATOR '\\t' CALL { with zfish MERGE (a:protein {id: zfish.uniprotID1, name: zfish.name1, txid: \"txid7955\", species: \"Danio rerio\"}) MERGE (b:protein {id: zfish.uniprotID2, name: zfish.name2, txid: \"txid7955\", species: \"Danio rerio\"}) MERGE (a)-[r:ProPro]-(b) } IN TRANSACTIONS OF 100 ROWS; ``` Set a relationship property for the evidence ```cypher :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_interactome_Mar12_2024.txt' AS zfish FIELDTERMINATOR '\\t' CALL { with zfish MATCH (s:protein {id: zfish.uniprotID1, txid: \"txid7955\"})-[r:ProPro]-(t:protein {id: zfish.uniprotID2, txid: \"txid7955\"}) SET r.evidence = zfish.evidence } IN TRANSACTIONS OF 1000 ROWS; ``` Add GO data to D. rerio nodes: ```cypher :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_GO_data_Mar12_24.tsv' AS zfishgo FIELDTERMINATOR '\\t' CALL { with zfishgo MATCH (n:protein {id: zfishgo.GENE_PRODUCT_ID, txid: \"txid7955\"}) MERGE (g:go_term {id: zfishgo.GO_TERM}) MERGE (n)-[r:ProGo]-(g) } IN TRANSACTIONS OF 1000 ROWS; ``` Set qualifier property for D. rerio . ```cypher :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_GO_data_Mar12_24.tsv' AS zfishgo FIELDTERMINATOR '\\t' CALL { with zfishgo MATCH (p:protein {id: zfishgo.GENE_PRODUCT_ID, txid: \"txid7955\"})-[r:ProGo]-(g:go_term {id: zfishgo.GO_TERM}) SET r.relationship = zfishgo.QUALIFIER } IN TRANSACTIONS OF 1000 ROWS; ``` The last step is calling a graph projection for pathfinding algorithms. We also have to change the ProPro edges to be undirected for the pathfinding algorithms in order to be more biologically accurate for protein-protein interaction networks. ```cypher CALL gds.graph.project('proGoGraph',['go_term', 'protein'],['ProGo', 'ProPro']); CALL gds.graph.relationships.toUndirected( 'proGoGraph', {relationshipType: 'ProPro', mutateRelationshipType: 'ProProUndirected'} ) YIELD inputRelationships, relationshipsWritten; ``` Useful Commands: Drop graph projection: CALL gds.graph.drop('proGoGraph') YIELD graphName; Drop constraints: DROP CONSTRAINT txid_constraint; DROP CONSTRAINT go_constraint; Delete nodes: MATCH (n:protein {txid: 'txid7955'}) DETACH DELETE n; Show database information: :schema Step 3: Create a New Query in Neo4j Now that you have imported the D. rerio interaction network and annotations. It's time to explore the network and generate a new interesting query to you. First practice with some example commands: Count how many nodes there are in the database: MATCH (n) RETURN COUNT(n); Now count how many protein nodes there are: MATCH (n:protein) RETURN COUNT(n); Return the first 25 nodes in the zebrafish txid: MATCH (n:protein {txid: 'txid7955'}) RETURN n LIMIT 25; Retrieve all the species in the database: MATCH (n:protein) RETURN COLLECT(DISTINCT n.species); Find nodes with a ProGo relationship (limit 25): MATCH (p)-[r:ProGo]->(g) RETURN p, r, g LIMIT 25; Return the relationship qualifier property for the ProGo relationship (limit 25): MATCH (p)-[r:ProGo]->(g) RETURN r.relationship LIMIT 25; Update property of existing node (for fun): MATCH (n:protein {species: 'Danio rerio'}) SET n.species = 'Ranio derio'; Set species property back to proper one: MATCH (n:protein {species: 'Ranio derio'}) SET n.species = 'Danio rerio'; Now it is your turn to devise a new Cypher query. Your query should end in a RETURN statement rather than change a property. We will use this query in the next step to create a new webpage that returns and presents the results of this query on ProteinWeaver's user interface. Step 4: Setting up Local Development Now that you have the Neo4j database up and running, and you have a query that you are interested in, we will now set up the frontend and backend for local development. Backend Server Open up a terminal window and go to the server directory inside the protein-weaver directory. We want to install npm which is responsible for building the necessary packages of the server. We will use a version manager for node, called nvm . This is helpful as it allows you to install multiple versions of node. More information about nvm can be found here . Enter the following commands in your terminal: ```bash export NVM_DIR=\"$([ -z \"${XDG_CONFIG_HOME-}\" ] && printf %s \"${HOME}/.nvm\" || printf %s \"${XDG_CONFIG_HOME}/nvm\")\" [ -s \"$NVM_DIR/nvm.sh\" ] && \\. \"$NVM_DIR/nvm.sh\" # This loads nvm nvm use nvm install npm install npm start # This starts our node.js server for our backend ``` If everything goes smoothly, you will get a message saying \u201cServer listening on http://localhost:3000/ \u201d If you also want to test that the API functionality is working, you can go to the following URL and it should say that you have successfully connected to the backend API: http://localhost:3000/api/test Frontend Open up another terminal window, and go to the client directory in the protein-weaver directory. Enter the following commands in the terminal window: ```bash nvm use nvm install npm install npm run dev # This will start our frontend instance ``` If everything goes smoothly, you should be greeted with a message from VITE as well as a message indicating that it is running on http://localhost:5173/ . To summarize, we have set up Neo4j and populated the database with D. rerio , created a query that we are interested in, and then set up the backend and frontend of ProteinWeaver for local development. The three localhost URLs are found below: Neo4j: http://localhost:7474/browser/ Backend: http://localhost:3000/api/test Frontend: http://localhost:5173/ Step 5: Create a New Page with Query Create New API Call This section aims to create a new API call in the backend utilizing the Neo4j query you made previously. Before we start implementing a new API call, it is important to have a better understanding of what the backend codebase looks like for ProteinWeaver. We will go through the important files in the backend: /src Within the server directory, the src directory contains important files that sets up the node.js server. You will generally never need to make changes within this folder. index.js is responsible for initializing the node.js server and the Neo4j driver that will be used to make the connection to the database. The neo4j.js file contains the Neo4j driver. constants.js stores variables including ports, URLs, and Neo4j credentials. .env Within the server folder, we also have a file called .env which outlines the Neo4j credentials for authentication with our database. /routes The routes folder contains routes.js which houses all the API calls we use for ProteinWeaver. The router can take in multiple requests, including POST or GET requests. It is helpful to understand the general structure of setting up an API call, and we will use the example below. This API call is responsible for, given a list of nodes, providing us with the average degree value. ```js //Example of API call in routes.js router.post(\"/getAvgDegree\", jsonParser, async (req, res, next) => { const data = req.body; const nodeList = data.nodeList; const species = data.species; try { const avgDegreeService = new AvgDegreeService(getDriver()); const avgDegree = await avgDegreeService.getAvgDegree(species, nodeList); console.log(\"Average Degree:\"); console.log(avgDegree) res.json(avgDegree); } catch (e) { next(e); } }); ``` We use the route.post() function to create a new POST API call. It takes in three parameters, first the API call\u2019s URL, the parser we use, and the request, response and next variables The req.body holds the information that the API caller has provided. This usually comes in the form of a JSON request body, and in this case this if the following body: {\"nodeList\": [\"FBgn0003731\",\"FBgn0031972\",\"FBgn0264492\",\"FBgn0000499\",\"FBgn0001139\"],\"species\": \"txid7227\"} The \"try-catch\" statement is used to capture potential errors and throw them in an appropriate manner. The try portion of the statement creates a new variable called avgDegreeService by using a class AvgDegreeService . This class is defined in a file called avg.degree.service.js in the /services folder, and it is responsible for utilizing the Neo4j driver, creating a query call with some parameters, and getting the response. The class contains the function getAvgDegree which takes in two parameters: species and nodeList . We use the await key because this is a type of Promise. This essentially tells the program to wait until we get the output from the avgDegreeService.getAvgDegree() function. Finally, we set the response in res.json to be the variable avgDegree /services The services folder contains the heart of all the dependent functions the routes.js file needs. This is where you will be adding a new Neo4j query as a function that will then be called into a new route in routes.js . Before that, it is helpful to understand the general structure of what a service file is, and we will use avg.degree.service.js as an example. //avg.degree.service.js file export default class AvgDegreeService { /** * @type {neo4j.Driver} */ driver; /** * The constructor expects an instance of the Neo4j Driver, which will be * used to interact with Neo4j. * * @param {neo4j.Driver} driver */ constructor(driver) { this.driver = driver; } async getAvgDegree(speciesInput, nodeList) { const session = this.driver.session(); const res = await session.executeRead((tx) => tx.run( ` MATCH (p:protein {txid: $speciesInput}) WHERE p.id IN toStringList($nodeList) WITH p MATCH (p)-[r:ProPro]-() WITH p, count(r) as degree RETURN avg(degree) as averageDegree; `, { speciesInput: speciesInput, nodeList: nodeList, } ) ); const deg = res.records; await session.close(); return deg; } } This file creates a call called AvgDegreeService , and requires the Neo4j driver we initialized in src/neo4j.js as a variable in the constructor. We create an async method (which is why we need the await keyword when we call the method) called getAvgDegree , which takes in the two parameters. You first have to initialize the Neo4j driver session, and then we execute a read on the database with a Neo4j query. Everything inside tx.run() is where you place the Neo4j query. Notice that within the query, we use variables as the txid and the nodelist. These variables are paired in the portion after the Neo4j query. Finally we close the Neo4j session and return the res.records in a variable. Testing API using Postman We can test this API call in many ways but one that is common is using Postman . Postman allows you to create API requests without the need of a frontend server. You can download the app or use the browser. We will test out the getAvgDegree API Call with the following steps: Create a new workspace in Postman. Select POST as the request type, and use http://localhost:3000/api/getAvgDegree as the URL We need to set the body of the request. Navigate to the body tab and set the body as raw and JSON. Now use the following example as the input: {\"nodeList\": [\"FBgn0003731\",\"FBgn0031972\",\"FBgn0264492\",\"FBgn0000499\",\"FBgn0001139\"],\"species\": \"txid7227\"} When you are ready, click the send button. If it is successful you should get a \"200 OK\" response and within the response body a value of 354.4 for the average node degree. Below includes a visualization that summarises the key parts of the backend server. Now that you have a better understanding about how API calls are made and how to test them, we can now implement a new API call that will use the Neo4j query you made previously. Adding new API Call Create a new file in the service directory. You can duplicate the avg.degree.service.js file and rename it to something that represents your query. Within the file, rename the class name to something that represents your query. Rename the method \u201cgetAvgDegree\u201d to something that represents your query. Change the parameters of the method to include what you need for your query (you may not need any in your parameters if you are hardcoding a query). Place your Neo4j query inside of tx.run() . You can delete the part where speciesInput and nodeList are paired if you do not have any parameters. If you do have parameters, make sure you pair the parameters properly with the Neo4j query. You are now done with setting up your service file for your API call. Create a new API call in router.js . You can use the /getAvgDegree API call as reference. Set the API URL to a name that represents your query. If your API call will need some parameters, set the correct variables in the request body, just like how getAvgDegree did it with nodeList and species . Create a new instance of the service class you made previously like AvgDegreeService with the Neo4j driver. Call your method in the service class, and making sure if you need the parameters, you order it correctly. Finally make sure the res.json function has the correct variable. Test out your API call using Postman All API calls in ProteinWeaver go under the following url. Simply add your API call after the last backslash: http://localhost:3000/api/. Ensure that you are setting the response as a POST response. If you require parameters in your API call, make sure to set the body, configure as raw and JSON mode, and then ensure the JSON body is in the correct format (See the example previously when testing out Postman). If you get a \"200 OK\" response and you\u2019ve inspected the response body to what you expect, then you have completed the backend portion. Step: 6 Add a New Page Now that we have linked the backend with the Neo4j database through the API call, we will create a React webpage with a button that lets a user execute our new query. Here is a general overview of adding a new page and a new API query: Navigate to client/src/pages and create a new page named NewPage.jsx . Examine the other pages in this directory and copy the content from TestingPage.jsx into the blank NewPage.jsx . Replace TestingPage() with the name of the new page you created: NewPage() . Add Button to Execute Query Navigate to client/src/main.jsx and add the NewPage component to the main website by importing it and creating a route. Import the component by adding this below the other import statements: import NewPage from \"./pages/NewPage.jsx\"; . Copy one of the route snippets and replace the path and element with \"/newpage\" and . Navigate to client/src/components/ and add a new component by creating a page named NewQuery.jsx . This document will be where we add the API query and do other styling. Copy these imports to the top of the page and create the NewQuery component: ```js import React, { useState, useEffect } from \"react\"; // create component export default function NewQuery() { }; ``` Now go back to the first page you created NewPage.jsx . Import the NewQuery component with import NewQuery from \"../components/NewQuery.jsx\"; . Within the central
    add to place the component within the NewPage. Go to the previous Service that you created with your own Neo4j Query from earlier. Modify the return statement within the first try section of your service to return network.records.map((record) => record.get('n')); to extract only the data on the nodes that your query returned. Finally, add a useEffect hook that will execute your API query when you load the page. Inside of the set of \"{ }\" brackets in NewQuery() { } copy the following code to execute your query on refresh: ```js // create empty object to store query results const [nodeNames, setNodeNames] = useState([]); // execute query on page reload useEffect(() => { fetch(\"/api/newQuery\") .then((res) => res.json()) .then((data) => { const names = data.map((item) => item.properties.name); // extract just names setNodeNames(names); }) .catch((error) => { console.error(\"Error fetching network data:\", error); }); }, []); // display the node names in the console (right click and inspect element) console.log(nodeNames); ``` You can check the structure of your query response in the running server terminal. Using the object hierarchy displayed there, we extracted just the \"name\" property in the useEffect hook for displaying. You should now have a blank page at http://localhost:5173/newpage that allows you to see the names of the nodes returned by your Neo4j query in the console when you inspect the page element. Add Button to Execute Query Now we will add the ability for users to execute the query on demand rather than when refreshing the page. To do this, first we will modify the useEffect statement and make it a function: ```js // Function for submitting the query async function handleNewQuery(e) { setNodeNames([]); // reset upon execution e.preventDefault(); // prevent default form submission // copied exactly from the useEffect statement fetch(\"/api/newQuery\") .then((res) => res.json()) .then((data) => { const names = data.map((item) => item.properties.name); setNodeNames(names); }) .catch((error) => { console.error(\"Error fetching network data:\", error); }); // functions must return something, since we executed everything and assigned node names already we just return return; } ``` Next we will create a New Query button that executes our new function when clicked. Place this inside of the { } brackets of NewQuery() { } after everything else. A React component is like any other function, it must end in a return statement. The return statement holds everything that the user will actually interact with and is where we will style things as well. ```js return (
    ); ``` Now we should have a button that will set the node results in the console only after we have pressed it. Now lets display the information to the users without having to inspect the element. Copy the following code below the inside of the
    : ```js {nodeNames.map((name, index) => (

    {index + 1}: {name}

    ))} ``` We are now displaying a list of the node names ordered by their index. Congratulations, you have now created a new webpage with full connection to the Neo4j database! Add New Page Icon to NavBar Let's finish off by doing some styling and adding a new icon to the NavBar. Navigate to client/src/components/NavBar.jsx and copy one of the
  • snippets and paste it below another. Create a new link to your page by replacing the old link with . Now rename the icon by typing \"New\" within the
    . Next, navigate to https://react-icons.github.io/react-icons/ and choose your favorite icon. I will be using the GiTigerHead icon for mine! Add the relevant import statement to the top of the NavBar page: import { GiTigerHead } from \"react-icons/gi\"; . Finally, replace the icon component in the code that you copied from earlier with the name of the new one. In my case I put . Congratulations, you have now completed the contributing guide!","title":"Contributing Guide"},{"location":"contributing-guide/#contributing-guide","text":"This is the guide for getting started with ProteinWeaver and will set you up to contribute to whichever aspects of ProteinWeaver interest you.","title":"Contributing Guide"},{"location":"contributing-guide/#step-1-fork-installation","text":"ProteinWeaver uses a Dockerized version of Neo4j as the database. Follow these instructions to install Docker Desktop. We will also be using GitHub to contribute to ProteinWeaver. It is recommended to install GitHub Desktop because of its easy user interface. Then you will need to fork the contributing-guide branch of the ProteinWeaver GitHub repository to get the Zebrafish datasets and the base code for the front and backends in your own repository. Once forked, clone the repository to your local desktop so that you have access to ProteinWeaver locally.","title":"Step 1: Fork & Installation"},{"location":"contributing-guide/#step-2-data-import","text":"The following section will be using a bash terminal to set up the Dockerized Neo4j environment. Open the Docker Desktop application. Navigate to a terminal window and pull the official Neo4j Docker image with the following command: docker pull neo4j Create a folder in your root directory named neo4j : - Within the new `~/neo4j` directory create the following directories: - `~/neo4j/data/` to allow storage of database state between Docker instances - `~/neo4j/logs/` to allow storage of logs between Docker instances - `~/neo4j/import/` to store data for import - `~/neo4j/plugins/` to store any necessary plugins for production environments Copy over all of the files in the cloned ProteinWeaver /data/tutorial directory to ~/neo4j/import/ . Create a Neo4j Docker instance with GDS and APOC plugins using the following command: ```bash docker run \\ --name proteinweaver \\ -p7474:7474 -p7687:7687 \\ -v $HOME/neo4j/data:/data \\ -v $HOME/neo4j/logs:/logs \\ -v $HOME/neo4j/import:/import \\ -v $HOME/neo4j/plugins:/plugins \\ --env NEO4J_AUTH=none \\ -e NEO4J_apoc_export_file_enabled=true \\ -e NEO4J_apoc_import_file_enabled=true \\ -e NEO4J_apoc_import_file_use__neo4j__config=true \\ -e NEO4J_PLUGINS='[\"graph-data-science\"]' \\ -e NEO4JLABS_PLUGINS=\\[\\\"apoc\\\"\\] \\ neo4j:5.12.0-community-bullseye ``` This docker instance has no security restrictions, to change username and password edit: --env NEO4J_AUTH=username/password Access the docker image at http://localhost:7474 in your browser. Once in the Neo4j Browser, create constraints before data import. We use NCBI as the source of the unique taxon identifiers. Create a constraint for the proteins in the database, requiring that only one instance of each protein exists: CREATE CONSTRAINT txid_constraint FOR (n:protein) REQUIRE (n.txid, n.id) IS UNIQUE; Create a constraint for the GO terms in the database using the following command: CREATE CONSTRAINT go_constraint FOR (n:go_term) REQUIRE n.id IS UNIQUE; Import D. rerio protein interactome with the following command: ```cypher :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_interactome_Mar12_2024.txt' AS zfish FIELDTERMINATOR '\\t' CALL { with zfish MERGE (a:protein {id: zfish.uniprotID1, name: zfish.name1, txid: \"txid7955\", species: \"Danio rerio\"}) MERGE (b:protein {id: zfish.uniprotID2, name: zfish.name2, txid: \"txid7955\", species: \"Danio rerio\"}) MERGE (a)-[r:ProPro]-(b) } IN TRANSACTIONS OF 100 ROWS; ``` Set a relationship property for the evidence ```cypher :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_interactome_Mar12_2024.txt' AS zfish FIELDTERMINATOR '\\t' CALL { with zfish MATCH (s:protein {id: zfish.uniprotID1, txid: \"txid7955\"})-[r:ProPro]-(t:protein {id: zfish.uniprotID2, txid: \"txid7955\"}) SET r.evidence = zfish.evidence } IN TRANSACTIONS OF 1000 ROWS; ``` Add GO data to D. rerio nodes: ```cypher :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_GO_data_Mar12_24.tsv' AS zfishgo FIELDTERMINATOR '\\t' CALL { with zfishgo MATCH (n:protein {id: zfishgo.GENE_PRODUCT_ID, txid: \"txid7955\"}) MERGE (g:go_term {id: zfishgo.GO_TERM}) MERGE (n)-[r:ProGo]-(g) } IN TRANSACTIONS OF 1000 ROWS; ``` Set qualifier property for D. rerio . ```cypher :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_GO_data_Mar12_24.tsv' AS zfishgo FIELDTERMINATOR '\\t' CALL { with zfishgo MATCH (p:protein {id: zfishgo.GENE_PRODUCT_ID, txid: \"txid7955\"})-[r:ProGo]-(g:go_term {id: zfishgo.GO_TERM}) SET r.relationship = zfishgo.QUALIFIER } IN TRANSACTIONS OF 1000 ROWS; ``` The last step is calling a graph projection for pathfinding algorithms. We also have to change the ProPro edges to be undirected for the pathfinding algorithms in order to be more biologically accurate for protein-protein interaction networks. ```cypher CALL gds.graph.project('proGoGraph',['go_term', 'protein'],['ProGo', 'ProPro']); CALL gds.graph.relationships.toUndirected( 'proGoGraph', {relationshipType: 'ProPro', mutateRelationshipType: 'ProProUndirected'} ) YIELD inputRelationships, relationshipsWritten; ```","title":"Step 2: Data Import"},{"location":"contributing-guide/#useful-commands","text":"Drop graph projection: CALL gds.graph.drop('proGoGraph') YIELD graphName; Drop constraints: DROP CONSTRAINT txid_constraint; DROP CONSTRAINT go_constraint; Delete nodes: MATCH (n:protein {txid: 'txid7955'}) DETACH DELETE n; Show database information: :schema","title":"Useful Commands:"},{"location":"contributing-guide/#step-3-create-a-new-query-in-neo4j","text":"Now that you have imported the D. rerio interaction network and annotations. It's time to explore the network and generate a new interesting query to you.","title":"Step 3: Create a New Query in Neo4j"},{"location":"contributing-guide/#first-practice-with-some-example-commands","text":"Count how many nodes there are in the database: MATCH (n) RETURN COUNT(n); Now count how many protein nodes there are: MATCH (n:protein) RETURN COUNT(n); Return the first 25 nodes in the zebrafish txid: MATCH (n:protein {txid: 'txid7955'}) RETURN n LIMIT 25; Retrieve all the species in the database: MATCH (n:protein) RETURN COLLECT(DISTINCT n.species); Find nodes with a ProGo relationship (limit 25): MATCH (p)-[r:ProGo]->(g) RETURN p, r, g LIMIT 25; Return the relationship qualifier property for the ProGo relationship (limit 25): MATCH (p)-[r:ProGo]->(g) RETURN r.relationship LIMIT 25; Update property of existing node (for fun): MATCH (n:protein {species: 'Danio rerio'}) SET n.species = 'Ranio derio'; Set species property back to proper one: MATCH (n:protein {species: 'Ranio derio'}) SET n.species = 'Danio rerio'; Now it is your turn to devise a new Cypher query. Your query should end in a RETURN statement rather than change a property. We will use this query in the next step to create a new webpage that returns and presents the results of this query on ProteinWeaver's user interface.","title":"First practice with some example commands:"},{"location":"contributing-guide/#step-4-setting-up-local-development","text":"Now that you have the Neo4j database up and running, and you have a query that you are interested in, we will now set up the frontend and backend for local development.","title":"Step 4: Setting up Local Development"},{"location":"contributing-guide/#backend-server","text":"Open up a terminal window and go to the server directory inside the protein-weaver directory. We want to install npm which is responsible for building the necessary packages of the server. We will use a version manager for node, called nvm . This is helpful as it allows you to install multiple versions of node. More information about nvm can be found here . Enter the following commands in your terminal: ```bash export NVM_DIR=\"$([ -z \"${XDG_CONFIG_HOME-}\" ] && printf %s \"${HOME}/.nvm\" || printf %s \"${XDG_CONFIG_HOME}/nvm\")\" [ -s \"$NVM_DIR/nvm.sh\" ] && \\. \"$NVM_DIR/nvm.sh\" # This loads nvm nvm use nvm install npm install npm start # This starts our node.js server for our backend ``` If everything goes smoothly, you will get a message saying \u201cServer listening on http://localhost:3000/ \u201d If you also want to test that the API functionality is working, you can go to the following URL and it should say that you have successfully connected to the backend API: http://localhost:3000/api/test","title":"Backend Server"},{"location":"contributing-guide/#frontend","text":"Open up another terminal window, and go to the client directory in the protein-weaver directory. Enter the following commands in the terminal window: ```bash nvm use nvm install npm install npm run dev # This will start our frontend instance ``` If everything goes smoothly, you should be greeted with a message from VITE as well as a message indicating that it is running on http://localhost:5173/ . To summarize, we have set up Neo4j and populated the database with D. rerio , created a query that we are interested in, and then set up the backend and frontend of ProteinWeaver for local development. The three localhost URLs are found below: Neo4j: http://localhost:7474/browser/ Backend: http://localhost:3000/api/test Frontend: http://localhost:5173/","title":"Frontend"},{"location":"contributing-guide/#step-5-create-a-new-page-with-query","text":"","title":"Step 5: Create a New Page with Query"},{"location":"contributing-guide/#create-new-api-call","text":"This section aims to create a new API call in the backend utilizing the Neo4j query you made previously. Before we start implementing a new API call, it is important to have a better understanding of what the backend codebase looks like for ProteinWeaver. We will go through the important files in the backend:","title":"Create New API Call"},{"location":"contributing-guide/#src","text":"Within the server directory, the src directory contains important files that sets up the node.js server. You will generally never need to make changes within this folder. index.js is responsible for initializing the node.js server and the Neo4j driver that will be used to make the connection to the database. The neo4j.js file contains the Neo4j driver. constants.js stores variables including ports, URLs, and Neo4j credentials.","title":"/src"},{"location":"contributing-guide/#env","text":"Within the server folder, we also have a file called .env which outlines the Neo4j credentials for authentication with our database.","title":".env"},{"location":"contributing-guide/#routes","text":"The routes folder contains routes.js which houses all the API calls we use for ProteinWeaver. The router can take in multiple requests, including POST or GET requests. It is helpful to understand the general structure of setting up an API call, and we will use the example below. This API call is responsible for, given a list of nodes, providing us with the average degree value. ```js //Example of API call in routes.js router.post(\"/getAvgDegree\", jsonParser, async (req, res, next) => { const data = req.body; const nodeList = data.nodeList; const species = data.species; try { const avgDegreeService = new AvgDegreeService(getDriver()); const avgDegree = await avgDegreeService.getAvgDegree(species, nodeList); console.log(\"Average Degree:\"); console.log(avgDegree) res.json(avgDegree); } catch (e) { next(e); } }); ``` We use the route.post() function to create a new POST API call. It takes in three parameters, first the API call\u2019s URL, the parser we use, and the request, response and next variables The req.body holds the information that the API caller has provided. This usually comes in the form of a JSON request body, and in this case this if the following body: {\"nodeList\": [\"FBgn0003731\",\"FBgn0031972\",\"FBgn0264492\",\"FBgn0000499\",\"FBgn0001139\"],\"species\": \"txid7227\"} The \"try-catch\" statement is used to capture potential errors and throw them in an appropriate manner. The try portion of the statement creates a new variable called avgDegreeService by using a class AvgDegreeService . This class is defined in a file called avg.degree.service.js in the /services folder, and it is responsible for utilizing the Neo4j driver, creating a query call with some parameters, and getting the response. The class contains the function getAvgDegree which takes in two parameters: species and nodeList . We use the await key because this is a type of Promise. This essentially tells the program to wait until we get the output from the avgDegreeService.getAvgDegree() function. Finally, we set the response in res.json to be the variable avgDegree","title":"/routes"},{"location":"contributing-guide/#services","text":"The services folder contains the heart of all the dependent functions the routes.js file needs. This is where you will be adding a new Neo4j query as a function that will then be called into a new route in routes.js . Before that, it is helpful to understand the general structure of what a service file is, and we will use avg.degree.service.js as an example. //avg.degree.service.js file export default class AvgDegreeService { /** * @type {neo4j.Driver} */ driver; /** * The constructor expects an instance of the Neo4j Driver, which will be * used to interact with Neo4j. * * @param {neo4j.Driver} driver */ constructor(driver) { this.driver = driver; } async getAvgDegree(speciesInput, nodeList) { const session = this.driver.session(); const res = await session.executeRead((tx) => tx.run( ` MATCH (p:protein {txid: $speciesInput}) WHERE p.id IN toStringList($nodeList) WITH p MATCH (p)-[r:ProPro]-() WITH p, count(r) as degree RETURN avg(degree) as averageDegree; `, { speciesInput: speciesInput, nodeList: nodeList, } ) ); const deg = res.records; await session.close(); return deg; } } This file creates a call called AvgDegreeService , and requires the Neo4j driver we initialized in src/neo4j.js as a variable in the constructor. We create an async method (which is why we need the await keyword when we call the method) called getAvgDegree , which takes in the two parameters. You first have to initialize the Neo4j driver session, and then we execute a read on the database with a Neo4j query. Everything inside tx.run() is where you place the Neo4j query. Notice that within the query, we use variables as the txid and the nodelist. These variables are paired in the portion after the Neo4j query. Finally we close the Neo4j session and return the res.records in a variable.","title":"/services"},{"location":"contributing-guide/#testing-api-using-postman","text":"We can test this API call in many ways but one that is common is using Postman . Postman allows you to create API requests without the need of a frontend server. You can download the app or use the browser. We will test out the getAvgDegree API Call with the following steps: Create a new workspace in Postman. Select POST as the request type, and use http://localhost:3000/api/getAvgDegree as the URL We need to set the body of the request. Navigate to the body tab and set the body as raw and JSON. Now use the following example as the input: {\"nodeList\": [\"FBgn0003731\",\"FBgn0031972\",\"FBgn0264492\",\"FBgn0000499\",\"FBgn0001139\"],\"species\": \"txid7227\"} When you are ready, click the send button. If it is successful you should get a \"200 OK\" response and within the response body a value of 354.4 for the average node degree. Below includes a visualization that summarises the key parts of the backend server. Now that you have a better understanding about how API calls are made and how to test them, we can now implement a new API call that will use the Neo4j query you made previously.","title":"Testing API using Postman"},{"location":"contributing-guide/#adding-new-api-call","text":"Create a new file in the service directory. You can duplicate the avg.degree.service.js file and rename it to something that represents your query. Within the file, rename the class name to something that represents your query. Rename the method \u201cgetAvgDegree\u201d to something that represents your query. Change the parameters of the method to include what you need for your query (you may not need any in your parameters if you are hardcoding a query). Place your Neo4j query inside of tx.run() . You can delete the part where speciesInput and nodeList are paired if you do not have any parameters. If you do have parameters, make sure you pair the parameters properly with the Neo4j query. You are now done with setting up your service file for your API call. Create a new API call in router.js . You can use the /getAvgDegree API call as reference. Set the API URL to a name that represents your query. If your API call will need some parameters, set the correct variables in the request body, just like how getAvgDegree did it with nodeList and species . Create a new instance of the service class you made previously like AvgDegreeService with the Neo4j driver. Call your method in the service class, and making sure if you need the parameters, you order it correctly. Finally make sure the res.json function has the correct variable. Test out your API call using Postman All API calls in ProteinWeaver go under the following url. Simply add your API call after the last backslash: http://localhost:3000/api/. Ensure that you are setting the response as a POST response. If you require parameters in your API call, make sure to set the body, configure as raw and JSON mode, and then ensure the JSON body is in the correct format (See the example previously when testing out Postman). If you get a \"200 OK\" response and you\u2019ve inspected the response body to what you expect, then you have completed the backend portion.","title":"Adding new API Call"},{"location":"contributing-guide/#step-6-add-a-new-page","text":"Now that we have linked the backend with the Neo4j database through the API call, we will create a React webpage with a button that lets a user execute our new query. Here is a general overview of adding a new page and a new API query: Navigate to client/src/pages and create a new page named NewPage.jsx . Examine the other pages in this directory and copy the content from TestingPage.jsx into the blank NewPage.jsx . Replace TestingPage() with the name of the new page you created: NewPage() .","title":"Step: 6 Add a New Page"},{"location":"contributing-guide/#add-button-to-execute-query","text":"Navigate to client/src/main.jsx and add the NewPage component to the main website by importing it and creating a route. Import the component by adding this below the other import statements: import NewPage from \"./pages/NewPage.jsx\"; . Copy one of the route snippets and replace the path and element with \"/newpage\" and . Navigate to client/src/components/ and add a new component by creating a page named NewQuery.jsx . This document will be where we add the API query and do other styling. Copy these imports to the top of the page and create the NewQuery component: ```js import React, { useState, useEffect } from \"react\"; // create component export default function NewQuery() { }; ``` Now go back to the first page you created NewPage.jsx . Import the NewQuery component with import NewQuery from \"../components/NewQuery.jsx\"; . Within the central
    add to place the component within the NewPage. Go to the previous Service that you created with your own Neo4j Query from earlier. Modify the return statement within the first try section of your service to return network.records.map((record) => record.get('n')); to extract only the data on the nodes that your query returned. Finally, add a useEffect hook that will execute your API query when you load the page. Inside of the set of \"{ }\" brackets in NewQuery() { } copy the following code to execute your query on refresh: ```js // create empty object to store query results const [nodeNames, setNodeNames] = useState([]); // execute query on page reload useEffect(() => { fetch(\"/api/newQuery\") .then((res) => res.json()) .then((data) => { const names = data.map((item) => item.properties.name); // extract just names setNodeNames(names); }) .catch((error) => { console.error(\"Error fetching network data:\", error); }); }, []); // display the node names in the console (right click and inspect element) console.log(nodeNames); ``` You can check the structure of your query response in the running server terminal. Using the object hierarchy displayed there, we extracted just the \"name\" property in the useEffect hook for displaying. You should now have a blank page at http://localhost:5173/newpage that allows you to see the names of the nodes returned by your Neo4j query in the console when you inspect the page element.","title":"Add Button to Execute Query"},{"location":"contributing-guide/#add-button-to-execute-query_1","text":"Now we will add the ability for users to execute the query on demand rather than when refreshing the page. To do this, first we will modify the useEffect statement and make it a function: ```js // Function for submitting the query async function handleNewQuery(e) { setNodeNames([]); // reset upon execution e.preventDefault(); // prevent default form submission // copied exactly from the useEffect statement fetch(\"/api/newQuery\") .then((res) => res.json()) .then((data) => { const names = data.map((item) => item.properties.name); setNodeNames(names); }) .catch((error) => { console.error(\"Error fetching network data:\", error); }); // functions must return something, since we executed everything and assigned node names already we just return return; } ``` Next we will create a New Query button that executes our new function when clicked. Place this inside of the { } brackets of NewQuery() { } after everything else. A React component is like any other function, it must end in a return statement. The return statement holds everything that the user will actually interact with and is where we will style things as well. ```js return (
    ); ``` Now we should have a button that will set the node results in the console only after we have pressed it. Now lets display the information to the users without having to inspect the element. Copy the following code below the inside of the
    : ```js {nodeNames.map((name, index) => (

    {index + 1}: {name}

    ))} ``` We are now displaying a list of the node names ordered by their index. Congratulations, you have now created a new webpage with full connection to the Neo4j database!","title":"Add Button to Execute Query"},{"location":"contributing-guide/#add-new-page-icon-to-navbar","text":"Let's finish off by doing some styling and adding a new icon to the NavBar. Navigate to client/src/components/NavBar.jsx and copy one of the
  • snippets and paste it below another. Create a new link to your page by replacing the old link with . Now rename the icon by typing \"New\" within the
    . Next, navigate to https://react-icons.github.io/react-icons/ and choose your favorite icon. I will be using the GiTigerHead icon for mine! Add the relevant import statement to the top of the NavBar page: import { GiTigerHead } from \"react-icons/gi\"; . Finally, replace the icon component in the code that you copied from earlier with the name of the new one. In my case I put . Congratulations, you have now completed the contributing guide!","title":"Add New Page Icon to NavBar"},{"location":"data-version/","text":"ProteinWeaver Data Log & Version This section of the documentation outlines the data sources, processing steps and versions of the ProteinWeaver web interface. Drosophila melanogaster Data Sources 2023-09-29 (BETA): Interaction data: interactome-flybase-collapsed-weighted.txt (Source) GO association data: gene_association.fb (Source) 2024-03-18: GO association data: dmel_GO_data_Mar15_24.tsv (Source) Downloaded and merged data together in scripts/SubColNames.R . FlyBase IDs from UniProt IDs for mapping: idmapping_2024_03_18.tsv (Source) Downloaded from UniProt and merged with GO data from QuickGO to match the FlyBase naming convention. Renamed columns to \"GENE_PRODUCT_ID\" and \"FB_ID\" and merged in scripts/SubColNames.R . 2024-04-01: Added 415,493 inferred ProGo edges using a Cypher command. 2024-04-03: GO association data: gene_association_fb_2024-04-03.tsv dmel_GO_data_2024-04-03.tsv Removed qualifiers with \"NOT\" preceding them using `scripts/RemoveNotQualifier.R Reduced inferred ProGo edges to 413,704. Current D. melanogaster Network | Proteins | Interactions (ProPro) | Annotations (ProGo) | | -------- | --------------------- | :------------------ | | 11501 | 233054 | 510962 | Bacillus subtilis Data Sources 2023-10-18 (BETA): Interaction data: bsub_interactome.csv Source Exported the \u201cInteraction\u201d set and renamed to bsub_interactome.csv . GO association data: subtiwiki.gene.export.2023-10-18.tsv processed and merged into bsub_GO_data.csv (Source) Exported the \u201cGene\u201d set with all options selected. Processed and merged the file according to scripts/JoinBSUtoUniProt.R . bsub_go_uniprot.tsv (Source) Selected all annotations for B. subtilis and used the following bash command to download: wget 'https://golr-aux.geneontology.io/solr/select?defType=edismax&qt=standard&indent=on&wt=csv&rows=100000&start=0&fl=source,bioentity_internal_id,bioentity_label,qualifier,annotation_class,reference,evidence_type,evidence_with,aspect,bioentity_name,synonym,type,taxon,date,assigned_by,annotation_extension_class,bioentity_isoform&facet=true&facet.mincount=1&facet.sort=count&json.nl=arrarr&facet.limit=25&hl=true&hl.simple.pre=%3Cem%20class=%22hilite%22%3E&hl.snippets=1000&csv.encapsulator=&csv.separator=%09&csv.header=false&csv.mv.separator=%7C&fq=document_category:%22annotation%22&fq=taxon_subset_closure_label:%22Bacillus%20subtilis%20subsp.%20subtilis%20str.%20168%22&facet.field=aspect&facet.field=taxon_subset_closure_label&facet.field=type&facet.field=evidence_subset_closure_label&facet.field=regulates_closure_label&facet.field=isa_partof_closure_label&facet.field=annotation_class_label&facet.field=qualifier&facet.field=annotation_extension_class_closure_label&facet.field=assigned_by&facet.field=panther_family_label&q=*:*' File was renamed to bsub_go_uniprot.tsv , processed and merged into bsub_GO_data.csv according to the scripts/JoinBSUtoUniProt.R file. 2024-03-18: GO association data: bsub_GO_data_Mar18_24.tsv (Source) Downloaded and merged data together in scripts/SubColNames.R and imported with data/README.md . BSU IDs from UniProt IDs for mapping: subtiwiki.gene.export.2024-03-18.tsv (Source) Selected BSU and UniProt outlinks from menu and exported. Renamed columns to \"GENE_PRODUCT_ID\" and \"BSU_ID\" to remove special characters. Merged in scripts/SubColNames.R . 2024-04-01: Added 39,215 inferred ProGo edges using a Cypher command. 2024-04-03: No \"NOT\" qualifiers were found in the dataset so there were no changes to the B. subtilis data structure during this update. 2024-06-11: Added new interaction data from STRING-DB . Downloaded physical interactions full 224308.protein.physical.links.full.v12.0.txt and 224308.protein.info.v12.0.txt and merged both into interactome_txid224308_2024-06-06.txt and cleaned according to BsubDataMerging.Rmd . Added updated GO term edges for B. subtilis after new data import. Downloaded all reviewed annotations from QuickGO ([Source])(https://www.ebi.ac.uk/QuickGO/annotations?taxonId=224308&taxonUsage=descendants&geneProductSubset=Swiss-Prot&geneProductType=protein) and downloaded UniProt and BSU ID mapper subtiwiki.gene.export.2024-06-03.tsv from SubtiWiki . Merged the two into annotations_txid224308_2024-06-03.txt according to BsubDataMerging.Rmd . 2024-06-24: Remove \"self-edges\" from PPI data. Current B. subtilis Network | Proteins | Interactions (ProPro) | Annotations (ProGo) | | -------- | --------------------- | :------------------ | | 1933 | 6441 | 65063 | Danio rerio Data Sources 2024-03-18: Interaction data: zfish_string_db_results.csv merged into zfish_interactome_Mar12_2024.txt . (Source) Downloaded file 7955.protein.physical.links.full.v12.0.txt.gz from String-DB and filtered to experimentally validated, database-curated, and textmined interactions according to scripts/ZebrafishDataMerging.Rmd . Mar. 12, 2024. 7955.protein.aliases.v12.0.txt merged into zfish_interactome_Mar12_2024.txt (Source) Downloaded file from String-DB to provide UniProt IDs for STRING-DB aliases. zfish_psicquic_results.csv merged into zfish_interactome_Mar12_2024.txt (Source) Used a Python script scripts/GetXML.ipynb to scrape all entries for \u201c Danio rerio \u201d from the REST API. Removed all tags that were in between the first and last instance. All tags but the first were removed from the file. Got data for interactions and interactors and converted XML format to JSON using scripts/get-interactions.js and scripts/get-interactors.js . Converted the resulting JSON files to CSV using a free online convertor . Merged interactions.csv and interactors.csv into zfish_psicquic_results.csv using scripts/ZebrafishDataMerging.Rmd . Some UniProt IDs were found from the IntAct entry using the IntAct ID as documented in the Rmd. zfish_id_mapper.tsv merged into zfish_interactome_Mar12_2024.txt (Source) Retrieved updated UniProt entries and common names for 11,765 entries. 2781 protein entries were found to be obsolete, thus did not have a name available on UniProt. These were removed and separated into their own dataset. The resulting dataset had 6,438 unique proteins. zfish_gene_names.tsv merged into zfish_interactome_Mar12_2024.txt (Source) Retrieved gene names for 6,438 D. rerio proteins zfish_unique_protein_ids_Mar12_24.txt from UniProt's name mapping service. For entries with a \"gene name\", the gene name was used as the name, for those without a gene name, the first portion of the \"protein name\" was used. This was decided to ensure uniqueness for the node names in the user interface. Merged all D. rerio data together into one master file using the instructions in scripts/ZebrafishDataMerging.Rmd . GO Association Data: zfish_GO_data_Mar12_24.tsv (Source) Used QuickGO to get all 65,876 \"Reviewed\" GO annotations for D. rerio . Replaced the \" \" in headers with \"_\" to ease data import. 2024-04-01: Added 86,304 inferred ProGo edges using a Cypher command. 2024-04-03: GO association data: zfish_GO_data_2024-04-03.tsv Removed qualifiers with \"NOT\" preceding them using `scripts/RemoveNotQualifier.R Reduced inferred ProGo edges to 86,216. 2024-06-11: Added alt_name parameter to Neo4j import statement. 2024-06-24: Remove trailing whitespaces from some names according to ZebrafishDataMerging.Rmd . Remove \"self-edges\" from PPI data. Current D. rerio Network | Proteins | Interactions (ProPro) | Annotations (ProGo) | | -------- | --------------------- | :------------------ | | 6438 | 45003 | 108758 | Gene Ontology Hierarchy Data Sources 2023-09-29: Common name: go.obo processed into go.txt (Source) Used wget to download the file. Processed the file using scripts/ParseOBOtoTXT.ipynb . Relationships: go.obo processed into is_a_import.tsv Processed the file using scripts/ParseOntologyRelationship.ipynb . go.obo processed into relationship_import.tsv Processed the file using scripts/ParseOntologyRelationship.ipynb . 2024-03-28: goNeverAnnotate.txt joined with go.txt into go_2024-03-28.txt Joined the data together with scripts/GeneOntologyNeverAnnotate.R . gocheck_do_not_annotate.txt parsed from gocheck_do_not_annotate.obo using scripts/ParseOBOtoTXT.ipynb and merged into go_2024-03-28.txt . Gene Ontology Data Structure | GO Terms | \"is_a\" Relationships (GoGo) | | -------- | :-------------------------- | | 42854 | 68308 | Taxon ID source: NCBI taxonomy browser Looked up species name and got taxon ID. Versioning & Dates 2023-09-29 -- 2024-03-17 (BETA): Imported weighted D. melanogaster interactome and FlyBase annotations. Imported raw GO data and \"is_a\" relationships. 2024-03-18: Added D. rerio protein interactome and GO association data. Updated B. subtilis and D. melanogaster GO association networks with QuickGO data. 2024-03-28: Added blacklist indicator to GO term nodes that should never have an annotation. 2024-04-01: Added inferred ProGo edges from descendant ProGo edges. This means that proteins annotated to a specific GO term, such as Mbs to enzyme inhibitor activity, will also be annotated to that GO term's ancestors, such as molecular function inhibitor activity and molecular_function. | Species | Inferred Edges | | --------------- | :------------- | | D. melanogaster | 415,493 | | B. subtilis | 39,215 | | D. rerio | 86,304 | | Total | 541,012 | 2024-04-03: Removed \"NOT\" qualifiers (those that should not be explicitly annotated to the GO term due to experimental or other evidence) from all GO assocation datasets. Repropogated the \"inferred_from_descendant\" edges to ensure no false propogation of \"NOT\" qualifiers. | Species | Inferred Edges | | --------------- | :------------- | | D. melanogaster | 413,704 | | B. subtilis | 39,215 | | D. rerio | 86,216 | | Total | 539,135 | 2024-06-11: Added B. subtilis interaction data from STRING-DB and updated QuickGO annotations. Added alt_name parameters to B. subtilis and D. rerio nodes. | Species | Inferred Edges | | --------------- | :------------- | | D. melanogaster | 413,704 | | B. subtilis | 54,270 | | D. rerio | 86,216 | | Total | 554,190 | 2024-06-24: Removed trailing whitespaces from D. rerio data. Removed \"self-edges\" i.e., interactions between two copies of the same protein to improve path algorithm performance. 309 \"self-edges\" were removed from the data from B. subtilis and D. rerio .","title":"Data Log & Version"},{"location":"data-version/#proteinweaver-data-log-version","text":"This section of the documentation outlines the data sources, processing steps and versions of the ProteinWeaver web interface.","title":"ProteinWeaver Data Log & Version"},{"location":"data-version/#drosophila-melanogaster-data-sources","text":"","title":"Drosophila melanogaster Data Sources"},{"location":"data-version/#2023-09-29-beta","text":"","title":"2023-09-29 (BETA):"},{"location":"data-version/#interaction-data","text":"interactome-flybase-collapsed-weighted.txt (Source)","title":"Interaction data:"},{"location":"data-version/#go-association-data","text":"gene_association.fb (Source)","title":"GO association data:"},{"location":"data-version/#2024-03-18","text":"","title":"2024-03-18:"},{"location":"data-version/#go-association-data_1","text":"dmel_GO_data_Mar15_24.tsv (Source) Downloaded and merged data together in scripts/SubColNames.R .","title":"GO association data:"},{"location":"data-version/#flybase-ids-from-uniprot-ids-for-mapping","text":"idmapping_2024_03_18.tsv (Source) Downloaded from UniProt and merged with GO data from QuickGO to match the FlyBase naming convention. Renamed columns to \"GENE_PRODUCT_ID\" and \"FB_ID\" and merged in scripts/SubColNames.R .","title":"FlyBase IDs from UniProt IDs for mapping:"},{"location":"data-version/#2024-04-01","text":"Added 415,493 inferred ProGo edges using a Cypher command.","title":"2024-04-01:"},{"location":"data-version/#2024-04-03","text":"","title":"2024-04-03:"},{"location":"data-version/#go-association-data_2","text":"gene_association_fb_2024-04-03.tsv dmel_GO_data_2024-04-03.tsv Removed qualifiers with \"NOT\" preceding them using `scripts/RemoveNotQualifier.R Reduced inferred ProGo edges to 413,704.","title":"GO association data:"},{"location":"data-version/#current-d-melanogaster-network","text":"| Proteins | Interactions (ProPro) | Annotations (ProGo) | | -------- | --------------------- | :------------------ | | 11501 | 233054 | 510962 |","title":"Current D. melanogaster Network"},{"location":"data-version/#bacillus-subtilis-data-sources","text":"","title":"Bacillus subtilis Data Sources"},{"location":"data-version/#2023-10-18-beta","text":"","title":"2023-10-18 (BETA):"},{"location":"data-version/#interaction-data_1","text":"bsub_interactome.csv Source Exported the \u201cInteraction\u201d set and renamed to bsub_interactome.csv .","title":"Interaction data:"},{"location":"data-version/#go-association-data_3","text":"subtiwiki.gene.export.2023-10-18.tsv processed and merged into bsub_GO_data.csv (Source) Exported the \u201cGene\u201d set with all options selected. Processed and merged the file according to scripts/JoinBSUtoUniProt.R . bsub_go_uniprot.tsv (Source) Selected all annotations for B. subtilis and used the following bash command to download: wget 'https://golr-aux.geneontology.io/solr/select?defType=edismax&qt=standard&indent=on&wt=csv&rows=100000&start=0&fl=source,bioentity_internal_id,bioentity_label,qualifier,annotation_class,reference,evidence_type,evidence_with,aspect,bioentity_name,synonym,type,taxon,date,assigned_by,annotation_extension_class,bioentity_isoform&facet=true&facet.mincount=1&facet.sort=count&json.nl=arrarr&facet.limit=25&hl=true&hl.simple.pre=%3Cem%20class=%22hilite%22%3E&hl.snippets=1000&csv.encapsulator=&csv.separator=%09&csv.header=false&csv.mv.separator=%7C&fq=document_category:%22annotation%22&fq=taxon_subset_closure_label:%22Bacillus%20subtilis%20subsp.%20subtilis%20str.%20168%22&facet.field=aspect&facet.field=taxon_subset_closure_label&facet.field=type&facet.field=evidence_subset_closure_label&facet.field=regulates_closure_label&facet.field=isa_partof_closure_label&facet.field=annotation_class_label&facet.field=qualifier&facet.field=annotation_extension_class_closure_label&facet.field=assigned_by&facet.field=panther_family_label&q=*:*' File was renamed to bsub_go_uniprot.tsv , processed and merged into bsub_GO_data.csv according to the scripts/JoinBSUtoUniProt.R file.","title":"GO association data:"},{"location":"data-version/#2024-03-18_1","text":"","title":"2024-03-18:"},{"location":"data-version/#go-association-data_4","text":"bsub_GO_data_Mar18_24.tsv (Source) Downloaded and merged data together in scripts/SubColNames.R and imported with data/README.md .","title":"GO association data:"},{"location":"data-version/#bsu-ids-from-uniprot-ids-for-mapping","text":"subtiwiki.gene.export.2024-03-18.tsv (Source) Selected BSU and UniProt outlinks from menu and exported. Renamed columns to \"GENE_PRODUCT_ID\" and \"BSU_ID\" to remove special characters. Merged in scripts/SubColNames.R .","title":"BSU IDs from UniProt IDs for mapping:"},{"location":"data-version/#2024-04-01_1","text":"Added 39,215 inferred ProGo edges using a Cypher command.","title":"2024-04-01:"},{"location":"data-version/#2024-04-03_1","text":"No \"NOT\" qualifiers were found in the dataset so there were no changes to the B. subtilis data structure during this update.","title":"2024-04-03:"},{"location":"data-version/#2024-06-11","text":"Added new interaction data from STRING-DB . Downloaded physical interactions full 224308.protein.physical.links.full.v12.0.txt and 224308.protein.info.v12.0.txt and merged both into interactome_txid224308_2024-06-06.txt and cleaned according to BsubDataMerging.Rmd . Added updated GO term edges for B. subtilis after new data import. Downloaded all reviewed annotations from QuickGO ([Source])(https://www.ebi.ac.uk/QuickGO/annotations?taxonId=224308&taxonUsage=descendants&geneProductSubset=Swiss-Prot&geneProductType=protein) and downloaded UniProt and BSU ID mapper subtiwiki.gene.export.2024-06-03.tsv from SubtiWiki . Merged the two into annotations_txid224308_2024-06-03.txt according to BsubDataMerging.Rmd .","title":"2024-06-11:"},{"location":"data-version/#2024-06-24","text":"Remove \"self-edges\" from PPI data.","title":"2024-06-24:"},{"location":"data-version/#current-b-subtilis-network","text":"| Proteins | Interactions (ProPro) | Annotations (ProGo) | | -------- | --------------------- | :------------------ | | 1933 | 6441 | 65063 |","title":"Current B. subtilis Network"},{"location":"data-version/#danio-rerio-data-sources","text":"","title":"Danio rerio Data Sources"},{"location":"data-version/#2024-03-18_2","text":"","title":"2024-03-18:"},{"location":"data-version/#interaction-data_2","text":"zfish_string_db_results.csv merged into zfish_interactome_Mar12_2024.txt . (Source) Downloaded file 7955.protein.physical.links.full.v12.0.txt.gz from String-DB and filtered to experimentally validated, database-curated, and textmined interactions according to scripts/ZebrafishDataMerging.Rmd . Mar. 12, 2024. 7955.protein.aliases.v12.0.txt merged into zfish_interactome_Mar12_2024.txt (Source) Downloaded file from String-DB to provide UniProt IDs for STRING-DB aliases. zfish_psicquic_results.csv merged into zfish_interactome_Mar12_2024.txt (Source) Used a Python script scripts/GetXML.ipynb to scrape all entries for \u201c Danio rerio \u201d from the REST API. Removed all tags that were in between the first and last instance. All tags but the first were removed from the file. Got data for interactions and interactors and converted XML format to JSON using scripts/get-interactions.js and scripts/get-interactors.js . Converted the resulting JSON files to CSV using a free online convertor . Merged interactions.csv and interactors.csv into zfish_psicquic_results.csv using scripts/ZebrafishDataMerging.Rmd . Some UniProt IDs were found from the IntAct entry using the IntAct ID as documented in the Rmd. zfish_id_mapper.tsv merged into zfish_interactome_Mar12_2024.txt (Source) Retrieved updated UniProt entries and common names for 11,765 entries. 2781 protein entries were found to be obsolete, thus did not have a name available on UniProt. These were removed and separated into their own dataset. The resulting dataset had 6,438 unique proteins. zfish_gene_names.tsv merged into zfish_interactome_Mar12_2024.txt (Source) Retrieved gene names for 6,438 D. rerio proteins zfish_unique_protein_ids_Mar12_24.txt from UniProt's name mapping service. For entries with a \"gene name\", the gene name was used as the name, for those without a gene name, the first portion of the \"protein name\" was used. This was decided to ensure uniqueness for the node names in the user interface. Merged all D. rerio data together into one master file using the instructions in scripts/ZebrafishDataMerging.Rmd .","title":"Interaction data:"},{"location":"data-version/#go-association-data_5","text":"zfish_GO_data_Mar12_24.tsv (Source) Used QuickGO to get all 65,876 \"Reviewed\" GO annotations for D. rerio . Replaced the \" \" in headers with \"_\" to ease data import.","title":"GO Association Data:"},{"location":"data-version/#2024-04-01_2","text":"Added 86,304 inferred ProGo edges using a Cypher command.","title":"2024-04-01:"},{"location":"data-version/#2024-04-03_2","text":"","title":"2024-04-03:"},{"location":"data-version/#go-association-data_6","text":"zfish_GO_data_2024-04-03.tsv Removed qualifiers with \"NOT\" preceding them using `scripts/RemoveNotQualifier.R Reduced inferred ProGo edges to 86,216.","title":"GO association data:"},{"location":"data-version/#2024-06-11_1","text":"Added alt_name parameter to Neo4j import statement.","title":"2024-06-11:"},{"location":"data-version/#2024-06-24_1","text":"Remove trailing whitespaces from some names according to ZebrafishDataMerging.Rmd . Remove \"self-edges\" from PPI data.","title":"2024-06-24:"},{"location":"data-version/#current-d-rerio-network","text":"| Proteins | Interactions (ProPro) | Annotations (ProGo) | | -------- | --------------------- | :------------------ | | 6438 | 45003 | 108758 |","title":"Current D. rerio Network"},{"location":"data-version/#gene-ontology-hierarchy-data-sources","text":"","title":"Gene Ontology Hierarchy Data Sources"},{"location":"data-version/#2023-09-29","text":"","title":"2023-09-29:"},{"location":"data-version/#common-name","text":"go.obo processed into go.txt (Source) Used wget to download the file. Processed the file using scripts/ParseOBOtoTXT.ipynb .","title":"Common name:"},{"location":"data-version/#relationships","text":"go.obo processed into is_a_import.tsv Processed the file using scripts/ParseOntologyRelationship.ipynb . go.obo processed into relationship_import.tsv Processed the file using scripts/ParseOntologyRelationship.ipynb .","title":"Relationships:"},{"location":"data-version/#2024-03-28","text":"goNeverAnnotate.txt joined with go.txt into go_2024-03-28.txt Joined the data together with scripts/GeneOntologyNeverAnnotate.R . gocheck_do_not_annotate.txt parsed from gocheck_do_not_annotate.obo using scripts/ParseOBOtoTXT.ipynb and merged into go_2024-03-28.txt .","title":"2024-03-28:"},{"location":"data-version/#gene-ontology-data-structure","text":"| GO Terms | \"is_a\" Relationships (GoGo) | | -------- | :-------------------------- | | 42854 | 68308 |","title":"Gene Ontology Data Structure"},{"location":"data-version/#taxon-id-source","text":"NCBI taxonomy browser Looked up species name and got taxon ID.","title":"Taxon ID source:"},{"location":"data-version/#versioning-dates","text":"","title":"Versioning & Dates"},{"location":"data-version/#2023-09-29-2024-03-17-beta","text":"Imported weighted D. melanogaster interactome and FlyBase annotations. Imported raw GO data and \"is_a\" relationships.","title":"2023-09-29 -- 2024-03-17 (BETA):"},{"location":"data-version/#2024-03-18_3","text":"Added D. rerio protein interactome and GO association data. Updated B. subtilis and D. melanogaster GO association networks with QuickGO data.","title":"2024-03-18:"},{"location":"data-version/#2024-03-28_1","text":"Added blacklist indicator to GO term nodes that should never have an annotation.","title":"2024-03-28:"},{"location":"data-version/#2024-04-01_3","text":"Added inferred ProGo edges from descendant ProGo edges. This means that proteins annotated to a specific GO term, such as Mbs to enzyme inhibitor activity, will also be annotated to that GO term's ancestors, such as molecular function inhibitor activity and molecular_function. | Species | Inferred Edges | | --------------- | :------------- | | D. melanogaster | 415,493 | | B. subtilis | 39,215 | | D. rerio | 86,304 | | Total | 541,012 |","title":"2024-04-01:"},{"location":"data-version/#2024-04-03_3","text":"Removed \"NOT\" qualifiers (those that should not be explicitly annotated to the GO term due to experimental or other evidence) from all GO assocation datasets. Repropogated the \"inferred_from_descendant\" edges to ensure no false propogation of \"NOT\" qualifiers. | Species | Inferred Edges | | --------------- | :------------- | | D. melanogaster | 413,704 | | B. subtilis | 39,215 | | D. rerio | 86,216 | | Total | 539,135 |","title":"2024-04-03:"},{"location":"data-version/#2024-06-11_2","text":"Added B. subtilis interaction data from STRING-DB and updated QuickGO annotations. Added alt_name parameters to B. subtilis and D. rerio nodes. | Species | Inferred Edges | | --------------- | :------------- | | D. melanogaster | 413,704 | | B. subtilis | 54,270 | | D. rerio | 86,216 | | Total | 554,190 |","title":"2024-06-11:"},{"location":"data-version/#2024-06-24_2","text":"Removed trailing whitespaces from D. rerio data. Removed \"self-edges\" i.e., interactions between two copies of the same protein to improve path algorithm performance. 309 \"self-edges\" were removed from the data from B. subtilis and D. rerio .","title":"2024-06-24:"},{"location":"setup/","text":"Setup The setup guide will include instructions for creating the front and backenbd local dev environments (database, server, and client). Backend Database ProteinWeaver uses a Dockerized version of Neo4j as the database. Follow these instructions to install Docker Desktop. Once installed continue with the following steps: Pull the official Neo4j Docker image. docker pull neo4j Create a directory in your $HOME named neo4j Within ~/neo4j directory create the following directories: ~/neo4j/data/ to allow storage of database state between Docker instances ~/neo4j/logs/ to allow storage of logs between Docker instances ~/neo4j/import/ to store data for import ~/neo4j/plugins/ to store any necessary plugins for production environments Download the most recent datasets from the /import directory on GitHub and place them inside of your ~/neo4j/import/ local directory. These are all the prerequisite files you will need for this tutorial and will be updated as new versions are released. Create a Docker instance with GDS and APOC plugins using the following command: docker run \\ --name proteinweaver \\ -p7474:7474 -p7687:7687 \\ -v $HOME/neo4j/data:/data \\ -v $HOME/neo4j/logs:/logs \\ -v $HOME/neo4j/import:/import \\ -v $HOME/neo4j/plugins:/plugins \\ --env NEO4J_AUTH=none \\ -e NEO4J_apoc_export_file_enabled=true \\ -e NEO4J_apoc_import_file_enabled=true \\ -e NEO4J_apoc_import_file_use__neo4j__config=true \\ -e NEO4J_PLUGINS='[\"graph-data-science\"]' \\ -e NEO4JLABS_PLUGINS=\\[\\\"apoc\\\"\\] \\ neo4j:5.12.0-community-bullseye This example Docker instance has no security restrictions, to set a username and password edit this line in the previous command: --env NEO4J_AUTH=username/password Access the Docker image at http://localhost:7474 . You will need to input the username and password you defined in the run command. Create constraints before data import. We use NCBI as the source of the unique taxon identifiers: CREATE CONSTRAINT txid_constraint FOR (n:protein) REQUIRE (n.txid, n.id) IS UNIQUE; CREATE CONSTRAINT go_constraint FOR (n:go_term) REQUIRE n.id IS UNIQUE; D. melanogaster imports Import D. melanogaster protein interactome using the following command: :auto LOAD CSV WITH HEADERS FROM 'file:///interactome-flybase-collapsed-weighted.txt' AS fly FIELDTERMINATOR '\\t' CALL { with fly MERGE (a:protein {id: fly.FlyBase1, name: fly.symbol1, txid: \"txid7227\", species: \"Drosophila melanogaster\"}) MERGE (b:protein {id: fly.FlyBase2, name: fly.symbol2, txid: \"txid7227\", species: \"Drosophila melanogaster\"}) MERGE (a)-[r:ProPro]-(b) } IN TRANSACTIONS OF 100 ROWS; Set the alt_name parameter as the same as the name. MATCH (n:protein {txid: \"txid7227\"}) SET n.alt_name = n.name; Import the first batch of D. melanogaster GO data from FlyBase into the database using the following command: :auto LOAD CSV WITH HEADERS FROM 'file:///gene_association_fb_2024-04-03.tsv' AS flygo FIELDTERMINATOR '\\t' CALL { with flygo MATCH (n:protein {id: flygo.db_object_id, txid:\"txid7227\"}) MERGE (g:go_term {id: flygo.go_id}) MERGE (n)-[r:ProGo]-(g) } IN TRANSACTIONS OF 1000 ROWS; Import the relationships qualifiers for the first batch of GO terms and D. melanogaster proteins using the following commands: :auto LOAD CSV WITH HEADERS FROM 'file:///gene_association_fb_2024-04-03.tsv' AS flygo FIELDTERMINATOR '\\t' CALL { with flygo MATCH (p:protein {id: flygo.db_object_id, txid:\"txid7227\"})-[r:ProGo]-(g:go_term {id: flygo.go_id}) SET r.relationship = flygo.qualifier } IN TRANSACTIONS OF 1000 ROWS; Import more GO data for D. melanogaster :auto LOAD CSV WITH HEADERS FROM 'file:///dmel_GO_data_2024-04-03.tsv' AS dmelgo FIELDTERMINATOR '\\t' CALL { with dmelgo MATCH (n:protein {id: dmelgo.FB_ID, txid: \"txid7227\"}) MERGE (g:go_term {id: dmelgo.GO_TERM}) MERGE (n)-[r:ProGo]-(g) } IN TRANSACTIONS OF 1000 ROWS; Set second batch of qualifier properties for D. melanogaster . :auto LOAD CSV WITH HEADERS FROM 'file:///dmel_GO_data_2024-04-03.tsv' AS dmelgo FIELDTERMINATOR '\\t' CALL { with dmelgo MATCH (p:protein {id: dmelgo.FB_ID, txid: \"txid7227\"})-[r:ProGo]-(g:go_term {id: dmelgo.GO_TERM}) SET r.relationship = dmelgo.QUALIFIER } IN TRANSACTIONS OF 1000 ROWS; B. subtilis imports Import B. subtilis protein interactome with the following command: :auto LOAD CSV WITH HEADERS FROM 'file:///interactome_txid224308_2024-06-06.txt' AS bsub FIELDTERMINATOR '\\t' CALL { with bsub MERGE (a:protein {id: bsub.protein_1_locus, name: bsub.protein_1_name, alt_name: bsub.protein_1_alt_name, txid: \"txid224308\", species: \"Bacillus subtilis 168\"}) MERGE (b:protein {id: bsub.protein_2_locus, name: bsub.protein_2_name, alt_name: bsub.protein_2_alt_name, txid: \"txid224308\", species: \"Bacillus subtilis 168\"}) MERGE (a)-[r:ProPro]-(b) } IN TRANSACTIONS OF 100 ROWS; Add first batch of GO data from SubtiWiki to B. subtilis nodes: :auto LOAD CSV WITH HEADERS FROM 'file:///bsub_GO_data.csv' AS bsubgo CALL { with bsubgo MATCH (n:protein {id: bsubgo.locus, txid: \"txid224308\"}) MERGE (g:go_term {id: bsubgo.go_term}) MERGE (n)-[r:ProGo]-(g) } IN TRANSACTIONS OF 1000 ROWS; Set qualifier property from first batch of GO data for B. subtilis . :auto LOAD CSV WITH HEADERS FROM 'file:///bsub_GO_data.csv' AS bsubgo CALL { with bsubgo MATCH (p:protein {id: bsubgo.locus, txid: \"txid224308\"})-[r:ProGo]-(g:go_term {id: bsubgo.go_term}) SET r.relationship = bsubgo.qualifier } IN TRANSACTIONS OF 1000 ROWS; Import more GO data for B. subtilis :auto LOAD CSV WITH HEADERS FROM 'file:///annotations_txid224308_2024-06-03.txt' AS bsubgo FIELDTERMINATOR '\\t' CALL { with bsubgo MATCH (n:protein {id: bsubgo.BSU_ID, txid: \"txid224308\"}) MERGE (g:go_term {id: bsubgo.GO_TERM}) MERGE (n)-[r:ProGo]-(g) } IN TRANSACTIONS OF 1000 ROWS; Set qualifier property for second batch of GO data ( B. subtilis ). :auto LOAD CSV WITH HEADERS FROM 'file:///annotations_txid224308_2024-06-03.txt' AS bsubgo FIELDTERMINATOR '\\t' CALL { with bsubgo MATCH (p:protein {id: bsubgo.BSU_ID, txid: \"txid224308\"})-[r:ProGo]-(g:go_term {id: bsubgo.GO_TERM}) SET r.relationship = bsubgo.QUALIFIER } IN TRANSACTIONS OF 1000 ROWS; D. rerio imports Import D. rerio protein interactome with the following command: :auto LOAD CSV WITH HEADERS FROM 'file:///interactome_txid7955_2024-06-24.txt' AS zfish FIELDTERMINATOR '\\t' CALL { with zfish MERGE (a:protein {id: zfish.uniprotID1, name: zfish.name1, alt_name: zfish.alt_name1, txid: \"txid7955\", species: \"Danio rerio\"}) MERGE (b:protein {id: zfish.uniprotID2, name: zfish.name2, alt_name: zfish.alt_name2, txid: \"txid7955\", species: \"Danio rerio\"}) MERGE (a)-[r:ProPro]-(b) } IN TRANSACTIONS OF 100 ROWS; Add GO data to D. rerio nodes: :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_GO_data_2024-04-03.tsv' AS zfishgo FIELDTERMINATOR '\\t' CALL { with zfishgo MATCH (n:protein {id: zfishgo.GENE_PRODUCT_ID, txid: \"txid7955\"}) MERGE (g:go_term {id: zfishgo.GO_TERM}) MERGE (n)-[r:ProGo]-(g) } IN TRANSACTIONS OF 1000 ROWS; Set qualifier property for D. rerio . :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_GO_data_2024-04-03.tsv' AS zfishgo FIELDTERMINATOR '\\t' CALL { with zfishgo MATCH (p:protein {id: zfishgo.GENE_PRODUCT_ID, txid: \"txid7955\"})-[r:ProGo]-(g:go_term {id: zfishgo.GO_TERM}) SET r.relationship = zfishgo.QUALIFIER } IN TRANSACTIONS OF 1000 ROWS; Gene Ontology hierarchy imports Import the GO hierarchy with the following command: :auto LOAD CSV WITH HEADERS FROM 'file:///is_a_import.tsv' AS go FIELDTERMINATOR '\\t' CALL { with go MERGE (a:go_term {id: go.id}) MERGE (b:go_term {id: go.id2}) MERGE (a)-[r:GoGo]->(b) SET r.relationship = go.is_a } IN TRANSACTIONS OF 100 ROWS; Import the GO term common names and descriptions with the following Cypher command: :auto LOAD CSV WITH HEADERS FROM 'file:///go_2024-03-28.txt' AS go FIELDTERMINATOR '\\t' CALL { with go MATCH (n:go_term {id: go.id}) SET n.name = go.name, n.namespace = go.namespace, n.def = go.def } IN TRANSACTIONS OF 1000 ROWS; Add blacklist indicator to GO term nodes: :auto LOAD CSV WITH HEADERS FROM 'file:///go_2024-03-28.txt' AS go FIELDTERMINATOR '\\t' CALL { with go MATCH (n:go_term {id: go.id}) SET n.never_annotate = go.never_annotate } IN TRANSACTIONS OF 1000 ROWS; Propogation of ancestral ProGo edges Add ancestral edges for D. rerio . MATCH (p:protein {txid: 'txid7955'})-[:ProGo]-(g:go_term) WITH p, collect(g) AS go_terms UNWIND go_terms as go_input MATCH (p)-[:ProGo]-(g:go_term {id: go_input.id})-[:GoGo*]->(g2) WITH p, collect(distinct g2) AS parent_terms UNWIND parent_terms AS parent_term MERGE (p)-[r:ProGo]-(parent_term) Add ancestral edges for B. subtilis . MATCH (p:protein {txid: 'txid224308'})-[:ProGo]-(g:go_term) WITH p, collect(g) AS go_terms UNWIND go_terms as go_input MATCH (p)-[:ProGo]-(g:go_term {id: go_input.id})-[:GoGo*]->(g2) WITH p, collect(distinct g2) AS parent_terms UNWIND parent_terms AS parent_term MERGE (p)-[r:ProGo]-(parent_term) Add ancestral edges for D. melanogaster . MATCH (p:protein {txid: 'txid7227'})-[:ProGo]-(g:go_term) WITH p, collect(g) AS go_terms UNWIND go_terms as go_input MATCH (p)-[:ProGo]-(g:go_term {id: go_input.id})-[:GoGo*]->(g2) WITH p, collect(distinct g2) AS parent_terms UNWIND parent_terms AS parent_term MERGE (p)-[r:ProGo]-(parent_term) Add qualifiers for new ProGo edges for each species. MATCH (p:protein {txid: 'txid7227'})-[r:ProGo]-(g:go_term) WHERE r.relationship IS NULL SET r.relationship = \"inferred_from_descendant\" MATCH (p:protein {txid: 'txid224308'})-[r:ProGo]-(g:go_term) WHERE r.relationship IS NULL SET r.relationship = \"inferred_from_descendant\" MATCH (p:protein {txid: 'txid7955'})-[r:ProGo]-(g:go_term) WHERE r.relationship IS NULL SET r.relationship = \"inferred_from_descendant\" Now remove all the Protein-Protein edges from the same protein to itself with the following command (these edges may causes issues with our path algorithms). MATCH (p:protein)-[rel:ProPro]-(p) DETACH DELETE rel; Now add the degree for all nodes for each species as a property: MATCH (pr:protein{txid: \"txid224308\"}) SET pr.degree = COUNT{(pr)-[:ProPro]-(:protein)} MATCH (pr:protein{txid: \"txid7955\"}) SET pr.degree = COUNT{(pr)-[:ProPro]-(:protein)} MATCH (pr:protein{txid: \"txid7227\"}) SET pr.degree = COUNT{(pr)-[:ProPro]-(:protein)} Now we want to add the protein degrees as a property in the nodes: MATCH (pr:protein) set pr.degree = count{(pr)-[:ProPro]-(:protein)}; The last step is calling a graph projection for pathfinding algorithms. We also have to change the ProPro edges to be undirected for the pathfinding algorithms in order to be more biologically accurate for protein-protein interaction networks. CALL gds.graph.project('proGoGraph',['go_term', 'protein'],['ProGo', 'ProPro']); CALL gds.graph.relationships.toUndirected( 'proGoGraph', {relationshipType: 'ProPro', mutateRelationshipType: 'ProProUndirected'} ) YIELD inputRelationships, relationshipsWritten; Backend Server The backend server is run using Express.js. To setup the server continue with the following steps: Open a new terminal window and clone the ProteinWeaver GitHub repository. Locate the server directory: cd server Next we need to install node.js , and the recommended way is to use a Node Version Manager. Follow the NVM GitHub instructions before proceeding. The correct version is outlined in the .nvmrc file in both of the client and server directories. Follow the command below to use the correct version. nvm use If you do not have the correct version, install it with the following command: npm install You can verify your node version is now correct with the following command: node -v Finally, to start the server enter: npm start The server should be running on http://localhost:3000/ . There are several APIs, and you can verify it works by using http://localhost:3000/api/test which should output a JSON object. Please keep the terminal window open. Frontend Client The client uses the React.js framework, and uses Vite.js as a bundler. Open a new terminal window and navigate to the cloned ProteinWeaver Github repository. Locate the client directory with the following bash command: cd client Similar to the backend server setup, we need to use and install the correct node.js version. Follow the command below to use the correct version. nvm use If you do not have the correct version, install it with the following command: npm install You can verify your node version is now correct with the following command: node -v Lastly, start the client with the following command: npm run dev ProteinWeaver should now be up and running on http://localhost:5173/ ! Verify Guide Once you have completed the guide, you can use the following query to verify that the database matches the most updated version (AS OF 2024-05-06): match (fly:protein {txid :\"txid7227\"}) WITH COUNT(fly) AS flyCount match (bsub:protein {txid :\"txid224308\"}) WITH flyCount, COUNT(bsub) AS bsubCount match (drerio:protein {txid :\"txid7955\"}) WITH flyCount, bsubCount, COUNT(drerio) AS drerioCount match (go:go_term) WITH flyCount, bsubCount, drerioCount, COUNT(go) AS goCount match (fly1 {txid :\"txid7227\"}) -[flyProPro:ProPro]- (fly2 {txid :\"txid7227\"}) WITH flyCount, bsubCount, drerioCount, goCount, COUNT(flyProPro)/2 AS flyProProCount match (bsub1 {txid :\"txid224308\"}) -[bsubProPro:ProPro]- (bsub2 {txid :\"txid224308\"}) WITH flyCount, bsubCount, drerioCount, goCount, flyProProCount, COUNT(bsubProPro)/2 AS bsubProProCount match (drerio1 {txid :\"txid7955\"}) -[drerioProPro:ProPro]- (drerio2 {txid :\"txid7955\"}) WITH flyCount, bsubCount, drerioCount, goCount, flyProProCount, bsubProProCount, COUNT(drerioProPro)/2 AS drerioProProCount match (go1:go_term) -[goGoGo:GoGo]- (go2:go_term) WITH flyCount, bsubCount, drerioCount, goCount, flyProProCount, bsubProProCount, drerioProProCount, COUNT(goGoGo)/2 AS goGoGoCount match (fly:protein {txid :\"txid7227\"}) -[flyProGo:ProGo]- (go) WITH flyCount, bsubCount, drerioCount, goCount, flyProProCount, bsubProProCount, drerioProProCount, goGoGoCount, COUNT(flyProGo) AS flyProGoCount match (bsub:protein {txid :\"txid224308\"}) -[bsubProGo:ProGo]- (go) WITH flyCount, bsubCount, drerioCount, goCount, flyProProCount, bsubProProCount, drerioProProCount, goGoGoCount,flyProGoCount, COUNT(bsubProGo) AS bsubProGoCount match (drerio:protein {txid :\"txid7955\"}) -[drerioProGo:ProGo]- (go) WITH flyCount, bsubCount, drerioCount, goCount, flyProProCount, bsubProProCount, drerioProProCount, goGoGoCount,flyProGoCount, bsubProGoCount, COUNT(drerioProGo) AS drerioProGoCount RETURN flyCount, flyProProCount, flyProGoCount, bsubCount, bsubProProCount, bsubProGoCount, drerioCount, drerioProProCount, drerioProGoCount, goCount, goGoGoCount You should get the following output: \u2552\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2555 \u2502flyCount\u2502flyProProCount\u2502flyProGoCount\u2502bsubCount\u2502bsubProProCount\u2502bsubProGoCount\u2502drerioCount\u2502drerioProProCount\u2502drerioProGoCount\u2502goCount\u2502goGoGoCount\u2502 \u255e\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2561 \u250211501 \u2502233054 \u2502510962 \u25021933 \u25026441 \u250265063 \u25026438 \u250245003 \u2502108758 \u250242861 \u250268308 \u2502 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 Useful Commands Delete nodes: MATCH (n:protein {txid: \"example\", species: \"example\"}) DETACH DELETE n Drop constraints: DROP CONSTRAINT constraint Drop graph projection: CALL gds.graph.drop('proGoGraph') YIELD graphName Show database information: :schema","title":"Setup"},{"location":"setup/#setup","text":"The setup guide will include instructions for creating the front and backenbd local dev environments (database, server, and client).","title":"Setup"},{"location":"setup/#backend-database","text":"ProteinWeaver uses a Dockerized version of Neo4j as the database. Follow these instructions to install Docker Desktop. Once installed continue with the following steps: Pull the official Neo4j Docker image. docker pull neo4j Create a directory in your $HOME named neo4j Within ~/neo4j directory create the following directories: ~/neo4j/data/ to allow storage of database state between Docker instances ~/neo4j/logs/ to allow storage of logs between Docker instances ~/neo4j/import/ to store data for import ~/neo4j/plugins/ to store any necessary plugins for production environments Download the most recent datasets from the /import directory on GitHub and place them inside of your ~/neo4j/import/ local directory. These are all the prerequisite files you will need for this tutorial and will be updated as new versions are released. Create a Docker instance with GDS and APOC plugins using the following command: docker run \\ --name proteinweaver \\ -p7474:7474 -p7687:7687 \\ -v $HOME/neo4j/data:/data \\ -v $HOME/neo4j/logs:/logs \\ -v $HOME/neo4j/import:/import \\ -v $HOME/neo4j/plugins:/plugins \\ --env NEO4J_AUTH=none \\ -e NEO4J_apoc_export_file_enabled=true \\ -e NEO4J_apoc_import_file_enabled=true \\ -e NEO4J_apoc_import_file_use__neo4j__config=true \\ -e NEO4J_PLUGINS='[\"graph-data-science\"]' \\ -e NEO4JLABS_PLUGINS=\\[\\\"apoc\\\"\\] \\ neo4j:5.12.0-community-bullseye This example Docker instance has no security restrictions, to set a username and password edit this line in the previous command: --env NEO4J_AUTH=username/password Access the Docker image at http://localhost:7474 . You will need to input the username and password you defined in the run command. Create constraints before data import. We use NCBI as the source of the unique taxon identifiers: CREATE CONSTRAINT txid_constraint FOR (n:protein) REQUIRE (n.txid, n.id) IS UNIQUE; CREATE CONSTRAINT go_constraint FOR (n:go_term) REQUIRE n.id IS UNIQUE;","title":"Backend Database"},{"location":"setup/#d-melanogaster-imports","text":"Import D. melanogaster protein interactome using the following command: :auto LOAD CSV WITH HEADERS FROM 'file:///interactome-flybase-collapsed-weighted.txt' AS fly FIELDTERMINATOR '\\t' CALL { with fly MERGE (a:protein {id: fly.FlyBase1, name: fly.symbol1, txid: \"txid7227\", species: \"Drosophila melanogaster\"}) MERGE (b:protein {id: fly.FlyBase2, name: fly.symbol2, txid: \"txid7227\", species: \"Drosophila melanogaster\"}) MERGE (a)-[r:ProPro]-(b) } IN TRANSACTIONS OF 100 ROWS; Set the alt_name parameter as the same as the name. MATCH (n:protein {txid: \"txid7227\"}) SET n.alt_name = n.name; Import the first batch of D. melanogaster GO data from FlyBase into the database using the following command: :auto LOAD CSV WITH HEADERS FROM 'file:///gene_association_fb_2024-04-03.tsv' AS flygo FIELDTERMINATOR '\\t' CALL { with flygo MATCH (n:protein {id: flygo.db_object_id, txid:\"txid7227\"}) MERGE (g:go_term {id: flygo.go_id}) MERGE (n)-[r:ProGo]-(g) } IN TRANSACTIONS OF 1000 ROWS; Import the relationships qualifiers for the first batch of GO terms and D. melanogaster proteins using the following commands: :auto LOAD CSV WITH HEADERS FROM 'file:///gene_association_fb_2024-04-03.tsv' AS flygo FIELDTERMINATOR '\\t' CALL { with flygo MATCH (p:protein {id: flygo.db_object_id, txid:\"txid7227\"})-[r:ProGo]-(g:go_term {id: flygo.go_id}) SET r.relationship = flygo.qualifier } IN TRANSACTIONS OF 1000 ROWS; Import more GO data for D. melanogaster :auto LOAD CSV WITH HEADERS FROM 'file:///dmel_GO_data_2024-04-03.tsv' AS dmelgo FIELDTERMINATOR '\\t' CALL { with dmelgo MATCH (n:protein {id: dmelgo.FB_ID, txid: \"txid7227\"}) MERGE (g:go_term {id: dmelgo.GO_TERM}) MERGE (n)-[r:ProGo]-(g) } IN TRANSACTIONS OF 1000 ROWS; Set second batch of qualifier properties for D. melanogaster . :auto LOAD CSV WITH HEADERS FROM 'file:///dmel_GO_data_2024-04-03.tsv' AS dmelgo FIELDTERMINATOR '\\t' CALL { with dmelgo MATCH (p:protein {id: dmelgo.FB_ID, txid: \"txid7227\"})-[r:ProGo]-(g:go_term {id: dmelgo.GO_TERM}) SET r.relationship = dmelgo.QUALIFIER } IN TRANSACTIONS OF 1000 ROWS;","title":"D. melanogaster imports"},{"location":"setup/#b-subtilis-imports","text":"Import B. subtilis protein interactome with the following command: :auto LOAD CSV WITH HEADERS FROM 'file:///interactome_txid224308_2024-06-06.txt' AS bsub FIELDTERMINATOR '\\t' CALL { with bsub MERGE (a:protein {id: bsub.protein_1_locus, name: bsub.protein_1_name, alt_name: bsub.protein_1_alt_name, txid: \"txid224308\", species: \"Bacillus subtilis 168\"}) MERGE (b:protein {id: bsub.protein_2_locus, name: bsub.protein_2_name, alt_name: bsub.protein_2_alt_name, txid: \"txid224308\", species: \"Bacillus subtilis 168\"}) MERGE (a)-[r:ProPro]-(b) } IN TRANSACTIONS OF 100 ROWS; Add first batch of GO data from SubtiWiki to B. subtilis nodes: :auto LOAD CSV WITH HEADERS FROM 'file:///bsub_GO_data.csv' AS bsubgo CALL { with bsubgo MATCH (n:protein {id: bsubgo.locus, txid: \"txid224308\"}) MERGE (g:go_term {id: bsubgo.go_term}) MERGE (n)-[r:ProGo]-(g) } IN TRANSACTIONS OF 1000 ROWS; Set qualifier property from first batch of GO data for B. subtilis . :auto LOAD CSV WITH HEADERS FROM 'file:///bsub_GO_data.csv' AS bsubgo CALL { with bsubgo MATCH (p:protein {id: bsubgo.locus, txid: \"txid224308\"})-[r:ProGo]-(g:go_term {id: bsubgo.go_term}) SET r.relationship = bsubgo.qualifier } IN TRANSACTIONS OF 1000 ROWS; Import more GO data for B. subtilis :auto LOAD CSV WITH HEADERS FROM 'file:///annotations_txid224308_2024-06-03.txt' AS bsubgo FIELDTERMINATOR '\\t' CALL { with bsubgo MATCH (n:protein {id: bsubgo.BSU_ID, txid: \"txid224308\"}) MERGE (g:go_term {id: bsubgo.GO_TERM}) MERGE (n)-[r:ProGo]-(g) } IN TRANSACTIONS OF 1000 ROWS; Set qualifier property for second batch of GO data ( B. subtilis ). :auto LOAD CSV WITH HEADERS FROM 'file:///annotations_txid224308_2024-06-03.txt' AS bsubgo FIELDTERMINATOR '\\t' CALL { with bsubgo MATCH (p:protein {id: bsubgo.BSU_ID, txid: \"txid224308\"})-[r:ProGo]-(g:go_term {id: bsubgo.GO_TERM}) SET r.relationship = bsubgo.QUALIFIER } IN TRANSACTIONS OF 1000 ROWS;","title":"B. subtilis imports"},{"location":"setup/#d-rerio-imports","text":"Import D. rerio protein interactome with the following command: :auto LOAD CSV WITH HEADERS FROM 'file:///interactome_txid7955_2024-06-24.txt' AS zfish FIELDTERMINATOR '\\t' CALL { with zfish MERGE (a:protein {id: zfish.uniprotID1, name: zfish.name1, alt_name: zfish.alt_name1, txid: \"txid7955\", species: \"Danio rerio\"}) MERGE (b:protein {id: zfish.uniprotID2, name: zfish.name2, alt_name: zfish.alt_name2, txid: \"txid7955\", species: \"Danio rerio\"}) MERGE (a)-[r:ProPro]-(b) } IN TRANSACTIONS OF 100 ROWS; Add GO data to D. rerio nodes: :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_GO_data_2024-04-03.tsv' AS zfishgo FIELDTERMINATOR '\\t' CALL { with zfishgo MATCH (n:protein {id: zfishgo.GENE_PRODUCT_ID, txid: \"txid7955\"}) MERGE (g:go_term {id: zfishgo.GO_TERM}) MERGE (n)-[r:ProGo]-(g) } IN TRANSACTIONS OF 1000 ROWS; Set qualifier property for D. rerio . :auto LOAD CSV WITH HEADERS FROM 'file:///zfish_GO_data_2024-04-03.tsv' AS zfishgo FIELDTERMINATOR '\\t' CALL { with zfishgo MATCH (p:protein {id: zfishgo.GENE_PRODUCT_ID, txid: \"txid7955\"})-[r:ProGo]-(g:go_term {id: zfishgo.GO_TERM}) SET r.relationship = zfishgo.QUALIFIER } IN TRANSACTIONS OF 1000 ROWS;","title":"D. rerio imports"},{"location":"setup/#gene-ontology-hierarchy-imports","text":"Import the GO hierarchy with the following command: :auto LOAD CSV WITH HEADERS FROM 'file:///is_a_import.tsv' AS go FIELDTERMINATOR '\\t' CALL { with go MERGE (a:go_term {id: go.id}) MERGE (b:go_term {id: go.id2}) MERGE (a)-[r:GoGo]->(b) SET r.relationship = go.is_a } IN TRANSACTIONS OF 100 ROWS; Import the GO term common names and descriptions with the following Cypher command: :auto LOAD CSV WITH HEADERS FROM 'file:///go_2024-03-28.txt' AS go FIELDTERMINATOR '\\t' CALL { with go MATCH (n:go_term {id: go.id}) SET n.name = go.name, n.namespace = go.namespace, n.def = go.def } IN TRANSACTIONS OF 1000 ROWS; Add blacklist indicator to GO term nodes: :auto LOAD CSV WITH HEADERS FROM 'file:///go_2024-03-28.txt' AS go FIELDTERMINATOR '\\t' CALL { with go MATCH (n:go_term {id: go.id}) SET n.never_annotate = go.never_annotate } IN TRANSACTIONS OF 1000 ROWS;","title":"Gene Ontology hierarchy imports"},{"location":"setup/#propogation-of-ancestral-progo-edges","text":"Add ancestral edges for D. rerio . MATCH (p:protein {txid: 'txid7955'})-[:ProGo]-(g:go_term) WITH p, collect(g) AS go_terms UNWIND go_terms as go_input MATCH (p)-[:ProGo]-(g:go_term {id: go_input.id})-[:GoGo*]->(g2) WITH p, collect(distinct g2) AS parent_terms UNWIND parent_terms AS parent_term MERGE (p)-[r:ProGo]-(parent_term) Add ancestral edges for B. subtilis . MATCH (p:protein {txid: 'txid224308'})-[:ProGo]-(g:go_term) WITH p, collect(g) AS go_terms UNWIND go_terms as go_input MATCH (p)-[:ProGo]-(g:go_term {id: go_input.id})-[:GoGo*]->(g2) WITH p, collect(distinct g2) AS parent_terms UNWIND parent_terms AS parent_term MERGE (p)-[r:ProGo]-(parent_term) Add ancestral edges for D. melanogaster . MATCH (p:protein {txid: 'txid7227'})-[:ProGo]-(g:go_term) WITH p, collect(g) AS go_terms UNWIND go_terms as go_input MATCH (p)-[:ProGo]-(g:go_term {id: go_input.id})-[:GoGo*]->(g2) WITH p, collect(distinct g2) AS parent_terms UNWIND parent_terms AS parent_term MERGE (p)-[r:ProGo]-(parent_term) Add qualifiers for new ProGo edges for each species. MATCH (p:protein {txid: 'txid7227'})-[r:ProGo]-(g:go_term) WHERE r.relationship IS NULL SET r.relationship = \"inferred_from_descendant\" MATCH (p:protein {txid: 'txid224308'})-[r:ProGo]-(g:go_term) WHERE r.relationship IS NULL SET r.relationship = \"inferred_from_descendant\" MATCH (p:protein {txid: 'txid7955'})-[r:ProGo]-(g:go_term) WHERE r.relationship IS NULL SET r.relationship = \"inferred_from_descendant\" Now remove all the Protein-Protein edges from the same protein to itself with the following command (these edges may causes issues with our path algorithms). MATCH (p:protein)-[rel:ProPro]-(p) DETACH DELETE rel; Now add the degree for all nodes for each species as a property: MATCH (pr:protein{txid: \"txid224308\"}) SET pr.degree = COUNT{(pr)-[:ProPro]-(:protein)} MATCH (pr:protein{txid: \"txid7955\"}) SET pr.degree = COUNT{(pr)-[:ProPro]-(:protein)} MATCH (pr:protein{txid: \"txid7227\"}) SET pr.degree = COUNT{(pr)-[:ProPro]-(:protein)} Now we want to add the protein degrees as a property in the nodes: MATCH (pr:protein) set pr.degree = count{(pr)-[:ProPro]-(:protein)}; The last step is calling a graph projection for pathfinding algorithms. We also have to change the ProPro edges to be undirected for the pathfinding algorithms in order to be more biologically accurate for protein-protein interaction networks. CALL gds.graph.project('proGoGraph',['go_term', 'protein'],['ProGo', 'ProPro']); CALL gds.graph.relationships.toUndirected( 'proGoGraph', {relationshipType: 'ProPro', mutateRelationshipType: 'ProProUndirected'} ) YIELD inputRelationships, relationshipsWritten;","title":"Propogation of ancestral ProGo edges"},{"location":"setup/#backend-server","text":"The backend server is run using Express.js. To setup the server continue with the following steps: Open a new terminal window and clone the ProteinWeaver GitHub repository. Locate the server directory: cd server Next we need to install node.js , and the recommended way is to use a Node Version Manager. Follow the NVM GitHub instructions before proceeding. The correct version is outlined in the .nvmrc file in both of the client and server directories. Follow the command below to use the correct version. nvm use If you do not have the correct version, install it with the following command: npm install You can verify your node version is now correct with the following command: node -v Finally, to start the server enter: npm start The server should be running on http://localhost:3000/ . There are several APIs, and you can verify it works by using http://localhost:3000/api/test which should output a JSON object. Please keep the terminal window open.","title":"Backend Server"},{"location":"setup/#frontend-client","text":"The client uses the React.js framework, and uses Vite.js as a bundler. Open a new terminal window and navigate to the cloned ProteinWeaver Github repository. Locate the client directory with the following bash command: cd client Similar to the backend server setup, we need to use and install the correct node.js version. Follow the command below to use the correct version. nvm use If you do not have the correct version, install it with the following command: npm install You can verify your node version is now correct with the following command: node -v Lastly, start the client with the following command: npm run dev ProteinWeaver should now be up and running on http://localhost:5173/ !","title":"Frontend Client"},{"location":"setup/#verify-guide","text":"Once you have completed the guide, you can use the following query to verify that the database matches the most updated version (AS OF 2024-05-06): match (fly:protein {txid :\"txid7227\"}) WITH COUNT(fly) AS flyCount match (bsub:protein {txid :\"txid224308\"}) WITH flyCount, COUNT(bsub) AS bsubCount match (drerio:protein {txid :\"txid7955\"}) WITH flyCount, bsubCount, COUNT(drerio) AS drerioCount match (go:go_term) WITH flyCount, bsubCount, drerioCount, COUNT(go) AS goCount match (fly1 {txid :\"txid7227\"}) -[flyProPro:ProPro]- (fly2 {txid :\"txid7227\"}) WITH flyCount, bsubCount, drerioCount, goCount, COUNT(flyProPro)/2 AS flyProProCount match (bsub1 {txid :\"txid224308\"}) -[bsubProPro:ProPro]- (bsub2 {txid :\"txid224308\"}) WITH flyCount, bsubCount, drerioCount, goCount, flyProProCount, COUNT(bsubProPro)/2 AS bsubProProCount match (drerio1 {txid :\"txid7955\"}) -[drerioProPro:ProPro]- (drerio2 {txid :\"txid7955\"}) WITH flyCount, bsubCount, drerioCount, goCount, flyProProCount, bsubProProCount, COUNT(drerioProPro)/2 AS drerioProProCount match (go1:go_term) -[goGoGo:GoGo]- (go2:go_term) WITH flyCount, bsubCount, drerioCount, goCount, flyProProCount, bsubProProCount, drerioProProCount, COUNT(goGoGo)/2 AS goGoGoCount match (fly:protein {txid :\"txid7227\"}) -[flyProGo:ProGo]- (go) WITH flyCount, bsubCount, drerioCount, goCount, flyProProCount, bsubProProCount, drerioProProCount, goGoGoCount, COUNT(flyProGo) AS flyProGoCount match (bsub:protein {txid :\"txid224308\"}) -[bsubProGo:ProGo]- (go) WITH flyCount, bsubCount, drerioCount, goCount, flyProProCount, bsubProProCount, drerioProProCount, goGoGoCount,flyProGoCount, COUNT(bsubProGo) AS bsubProGoCount match (drerio:protein {txid :\"txid7955\"}) -[drerioProGo:ProGo]- (go) WITH flyCount, bsubCount, drerioCount, goCount, flyProProCount, bsubProProCount, drerioProProCount, goGoGoCount,flyProGoCount, bsubProGoCount, COUNT(drerioProGo) AS drerioProGoCount RETURN flyCount, flyProProCount, flyProGoCount, bsubCount, bsubProProCount, bsubProGoCount, drerioCount, drerioProProCount, drerioProGoCount, goCount, goGoGoCount You should get the following output: \u2552\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2555 \u2502flyCount\u2502flyProProCount\u2502flyProGoCount\u2502bsubCount\u2502bsubProProCount\u2502bsubProGoCount\u2502drerioCount\u2502drerioProProCount\u2502drerioProGoCount\u2502goCount\u2502goGoGoCount\u2502 \u255e\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2561 \u250211501 \u2502233054 \u2502510962 \u25021933 \u25026441 \u250265063 \u25026438 \u250245003 \u2502108758 \u250242861 \u250268308 \u2502 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518","title":"Verify Guide"},{"location":"setup/#useful-commands","text":"Delete nodes: MATCH (n:protein {txid: \"example\", species: \"example\"}) DETACH DELETE n Drop constraints: DROP CONSTRAINT constraint Drop graph projection: CALL gds.graph.drop('proGoGraph') YIELD graphName Show database information: :schema","title":"Useful Commands"},{"location":"tech-stack/","text":"Tech Stack Frontend This section documents the structure of the frontend and outlines the important interactions. /client \u251c\u2500\u2500 public/ \u251c\u2500\u2500 src/ \u2502 \u251c\u2500\u2500 assets/ \u2502 \u251c\u2500\u2500 components/ \u2502 \u251c\u2500\u2500 layout/ \u2502 \u251c\u2500\u2500 pages/ \u2502 \u251c\u2500\u2500 App.css \u2502 \u251c\u2500\u2500 App.jsx \u2502 \u251c\u2500\u2500 index.css \u2502 \u251c\u2500\u2500 main.jsx \u251c\u2500\u2500 index.html \u251c\u2500\u2500 package.json \u251c\u2500\u2500 vite.config.js Important Files & Directories index.html serves as a way to connect our React framework to standard HTML format. package.json is where all the dependancies of our node.js config lives main.jsx is where we inject the jsx code into the root div in the index.html. This is also where the website routing is structured App.jsx can be thought of as the \"home\" page index.css provides the style of our website layout directory structures main website so that it can be browsed through dynamically pages directory populates the page using the layout components directory the bread and butter of react lives here. React follows a composable model, where we build smaller components and are able to dynamically and efficiently call them whenever they are needed. Concepts A vague list of core concepts to learn HTML CSS node.js React.js React Components react-dom-router useState useEffect Resources Full Stack Development Explained 100+ Web Development Things you Should Know How to OVER Engineer a Website // What is a Tech Stack? How to Create a Express/Node + React Project | Node Backend + React Frontend Scrimba: Learn React Backend Server This section outlines the structure of the backend server, and important concepts to understand the structure. Structure /server \u251c\u2500\u2500 routes/ \u251c\u2500\u2500 services/ \u251c\u2500\u2500 tests/ \u251c\u2500\u2500 src/ \u2502 \u251c\u2500\u2500 constants.js \u2502 \u251c\u2500\u2500 index.js \u2502 \u251c\u2500\u2500 neo4j.js \u251c\u2500\u2500 .env \u251c\u2500\u2500 package.json Important Files & Directories index.js initializes the neo4j database connection and api routing using Express.js as the server .env config file which contains information necessary to connect to the neo4j database constants.js contain config information in the form of js neo4j.js initializes a singleton instance of the neo4j driver, which is used to make API calls to the database routes.js is where API calls are created which utilizes the neo4j driver services directory contains a list of classes which contains the methods to build the API calls in routes. Concepts A Vague list of concepts that are useful to understand API calls server routing middleware backend frameworks Resources Backend web development - a complete overview","title":"Tech Stack"},{"location":"tech-stack/#tech-stack","text":"","title":"Tech Stack"},{"location":"tech-stack/#frontend","text":"This section documents the structure of the frontend and outlines the important interactions. /client \u251c\u2500\u2500 public/ \u251c\u2500\u2500 src/ \u2502 \u251c\u2500\u2500 assets/ \u2502 \u251c\u2500\u2500 components/ \u2502 \u251c\u2500\u2500 layout/ \u2502 \u251c\u2500\u2500 pages/ \u2502 \u251c\u2500\u2500 App.css \u2502 \u251c\u2500\u2500 App.jsx \u2502 \u251c\u2500\u2500 index.css \u2502 \u251c\u2500\u2500 main.jsx \u251c\u2500\u2500 index.html \u251c\u2500\u2500 package.json \u251c\u2500\u2500 vite.config.js","title":"Frontend"},{"location":"tech-stack/#important-files-directories","text":"index.html serves as a way to connect our React framework to standard HTML format. package.json is where all the dependancies of our node.js config lives main.jsx is where we inject the jsx code into the root div in the index.html. This is also where the website routing is structured App.jsx can be thought of as the \"home\" page index.css provides the style of our website layout directory structures main website so that it can be browsed through dynamically pages directory populates the page using the layout components directory the bread and butter of react lives here. React follows a composable model, where we build smaller components and are able to dynamically and efficiently call them whenever they are needed.","title":"Important Files & Directories"},{"location":"tech-stack/#concepts","text":"A vague list of core concepts to learn HTML CSS node.js React.js React Components react-dom-router useState useEffect","title":"Concepts"},{"location":"tech-stack/#resources","text":"Full Stack Development Explained 100+ Web Development Things you Should Know How to OVER Engineer a Website // What is a Tech Stack? How to Create a Express/Node + React Project | Node Backend + React Frontend Scrimba: Learn React","title":"Resources"},{"location":"tech-stack/#backend-server","text":"This section outlines the structure of the backend server, and important concepts to understand the structure.","title":"Backend Server"},{"location":"tech-stack/#structure","text":"/server \u251c\u2500\u2500 routes/ \u251c\u2500\u2500 services/ \u251c\u2500\u2500 tests/ \u251c\u2500\u2500 src/ \u2502 \u251c\u2500\u2500 constants.js \u2502 \u251c\u2500\u2500 index.js \u2502 \u251c\u2500\u2500 neo4j.js \u251c\u2500\u2500 .env \u251c\u2500\u2500 package.json","title":"Structure"},{"location":"tech-stack/#important-files-directories_1","text":"index.js initializes the neo4j database connection and api routing using Express.js as the server .env config file which contains information necessary to connect to the neo4j database constants.js contain config information in the form of js neo4j.js initializes a singleton instance of the neo4j driver, which is used to make API calls to the database routes.js is where API calls are created which utilizes the neo4j driver services directory contains a list of classes which contains the methods to build the API calls in routes.","title":"Important Files & Directories"},{"location":"tech-stack/#concepts_1","text":"A Vague list of concepts that are useful to understand API calls server routing middleware backend frameworks","title":"Concepts"},{"location":"tech-stack/#resources_1","text":"Backend web development - a complete overview","title":"Resources"}]} \ No newline at end of file diff --git a/setup/index.html b/setup/index.html index b67f653f..a57be066 100644 --- a/setup/index.html +++ b/setup/index.html @@ -421,6 +421,24 @@
    Propogation of ancestral ProGo edg
    MATCH (p:protein)-[rel:ProPro]-(p) DETACH DELETE rel;
     
      +
    1. Now add the degree for all nodes for each species as a property:
    2. +
    +
    MATCH (pr:protein{txid: "txid224308"})
    +SET pr.degree = COUNT{(pr)-[:ProPro]-(:protein)}
    +
    +MATCH (pr:protein{txid: "txid7955"})
    +SET pr.degree = COUNT{(pr)-[:ProPro]-(:protein)}
    +
    +MATCH (pr:protein{txid: "txid7227"})
    +SET pr.degree = COUNT{(pr)-[:ProPro]-(:protein)}
    +
    +
      +
    1. Now we want to add the protein degrees as a property in the nodes:
    2. +
    +
    MATCH (pr:protein)
    +set pr.degree = count{(pr)-[:ProPro]-(:protein)};
    +
    +
    1. The last step is calling a graph projection for pathfinding algorithms. We also have to change the ProPro edges to be undirected for the pathfinding algorithms in order to be more biologically accurate for protein-protein interaction networks.
    CALL gds.graph.project('proGoGraph',['go_term', 'protein'],['ProGo', 'ProPro']);
    diff --git a/sitemap.xml.gz b/sitemap.xml.gz
    index e06a0bbb06b294dcae85ec5efa1948adc25dd2e7..e797f2b4da7000db907f9b16bd0b76b79edda3c1 100644
    GIT binary patch
    delta 13
    Ucmb=gXP58h;P`N2%0%`G03l-q8~^|S
    
    delta 13
    Ucmb=gXP58h;3$x+oXB1Q02+(~(f|Me