Skip to content

Adding your own data to the graph

Alvin B edited this page Nov 11, 2020 · 13 revisions

Thank you for contributing to the Covidgraph! This file provides some guidelines on adding data to the project. Feel free to use the CovidGraph Data Sources room in Matrix for any further questions.

1. Check which data is already in the knowledge graph

  • Check the definitive list of data sources in GitHub
  • If it doesn't already exist, create an issue in ZenHub in the 'covidgraph/documentation' repo (this is the default) and select the 'Data Source Template'
  • If the data source is numeric also add the numeric label.
  • Assign the issue to yourself and anyone else you know will be interested/working on the data source.
  • If needed, add if your loading script has a dependency on other data. If you are linking your newly added data to existing nodes, check the sourceguid properties on these nodes to determine the dependencies.

2. Follow the guidance in the Data Loading Template

  • Make sure your data source is registered in covidgraph/motherlode. Contact Tim Bleimehl @tim.bleimehl:meet.dzd-ev.de

3. Follow the suggested naming conventions for the nodes and relationships you create.

4. Ensure data traceability

  • Make sure that each node/relationship you import has these two properties:
  • Add a unique sourceguid property to the nodes/relationships you create, so we can trace back where data comes from.
  • If possible, add a sourceroutine property that refers to the script/query that imported the data.

5. Think about data modelling

  • Check the existing model for the graph.
  • Re-use existing node labels and relationship types if possible. If not possible, document the usecase for your node label and relationship.
  • Create indexes/constraints if necessary. This will make the data faster to search, and also ensure quality (e.g. a given property must exist)
  • New to graph data modeling? There’s some great material out there on which pitfalls to avoid: https://neo4j.com/blog/data-modeling-pitfalls/

6. Add to GraphQL API

"""
The first sentence should be a brief standalone summary as it may 
be pulled out as a short description.

Explain what this node represents, why it is in the covid graph 
(if that makes sense in this context). 

More important nodes will likely require more information. Aim to
adequately describe the entities to a complete newcomer to the project.

Note the important relationships, though details of each relation can 
be stated in the description for the relationship fields.

The comments are in markdown so you can include _formatting_ or 
[links](#)
"""
type NodeLabelInPascalCase{
  "Single line description"
  primaryId: String!

  """
  What is this property? Maybe an example of it.
  e.g.: ABC123
  """
  propertyInCamelCase: String

  """
  This is relation field that links to another node. Mention cardinality 
  (eg one to many), mention what it looks like in the real data: e.g.
  if its "1 to many", how many is usual? 2? 2000?
  """
  myFirstRelationship: [AnotherNode]

  thisFieldSpeaksForItself: String
}