-
Notifications
You must be signed in to change notification settings - Fork 8
Adding your own data to the graph
Alvin B edited this page Nov 11, 2020
·
13 revisions
Thank you for contributing to the Covidgraph! This file provides some guidelines on adding data to the project. Feel free to use the CovidGraph Data Sources room in Matrix for any further questions.
- Check the definitive list of data sources in GitHub
- If it doesn't already exist, create an issue in ZenHub in the 'covidgraph/documentation' repo (this is the default) and select the 'Data Source Template'
- If the data source is numeric also add the
numeric
label. - Assign the issue to yourself and anyone else you know will be interested/working on the data source.
- If needed, add if your loading script has a dependency on other data. If you are linking your newly added data to existing nodes, check the sourceguid properties on these nodes to determine the dependencies.
2. Follow the guidance in the Data Loading Template
- Make sure your data source is registered in covidgraph/motherlode. Contact Tim Bleimehl @tim.bleimehl:meet.dzd-ev.de
- See https://neo4j.com/docs/cypher-manual/current/syntax/naming/
- Node labels in camel case with an uppercase first letter (GeneSymbol)
- Relationships in all caps with underscores (HAS_BODY)
- properties in camel case (numberOfInfections)
- Make sure that each node/relationship you import has these two properties:
- Add a unique sourceguid property to the nodes/relationships you create, so we can trace back where data comes from.
- If possible, add a sourceroutine property that refers to the script/query that imported the data.
- Check the existing model for the graph.
- Re-use existing node labels and relationship types if possible. If not possible, document the usecase for your node label and relationship.
- Create indexes/constraints if necessary. This will make the data faster to search, and also ensure quality (e.g. a given property must exist)
- New to graph data modeling? There’s some great material out there on which pitfalls to avoid: https://neo4j.com/blog/data-modeling-pitfalls/
- The new database schema should be added to the GraphQL API schema here: https://github.com/Covid19-GraphQL/covid-graph-graphql
- Add "descriptions" to the schema (spec here), to elaborate in the entities. Template:
"""
The first sentence should be a brief standalone summary as it may
be pulled out as a short description.
Explain what this node represents, why it is in the covid graph
(if that makes sense in this context).
More important nodes will likely require more information. Aim to
adequately describe the entities to a complete newcomer to the project.
Note the important relationships, though details of each relation can
be stated in the description for the relationship fields.
The comments are in markdown so you can include _formatting_ or
[links](#)
"""
type NodeLabelInPascalCase{
"Single line description"
primaryId: String!
"""
What is this property? Maybe an example of it.
e.g.: ABC123
"""
propertyInCamelCase: String
"""
This is relation field that links to another node. Mention cardinality
(eg one to many), mention what it looks like in the real data: e.g.
if its "1 to many", how many is usual? 2? 2000?
"""
myFirstRelationship: [AnotherNode]
thisFieldSpeaksForItself: String
}
Aarhus University | DZD | GraphAware | Helomics | K&P | Linkurious | Munro | Neo4j | Prodyna | Qnit | S-cubed | Structr | Uni Freiburg | YOUsp | yWorks |