molgenis-emx2 preview

This is a reference implementation of MOLGENIS/EMX2 data service.

Demo server: https://emx2.test.molgenis.org/

How to run

using docker compose
using java + postgresql
using kubernetes

1. Using docker compose

Install Docker compose.
Download molgenis-emx2 docker-compose.yml file
To start, in directory with docker-compose.yml run:

docker-compose up

To update to latest, run:

docker-compose pull

Stop by typing ctrl+c.

N.B.

because postgres starts slow, emx2 will restart 2-4 times because of 'ConnectException: Connection refused'. This is normal.
the data of postgresql will be stored in 'psql_data' folder. Remove this folder you want a clean start.
if you want particular molgenis-emx2 version then add version in docker-compose.yml file 'mswertz/emx2:version'

2. Using java and your own postgresql

Download molgenis-emx2-version-all.jar from releases.
Download and install Postgresql

Create postgresql database with name 'molgenis' and with superadmin user/pass 'molgenis'. On Linux/Mac commandline:

sudo -u postgres psql
postgres=# create database molgenis;
postgres=# create user molgenis with superuser encrypted password 'molgenis';
postgres=# grant all privileges on database molgenis to molgenis;

Start molgenis-emx2; will run on 8080 (requires java >=11)
```
java -jar molgenis-emx2-<version>-all.jar
```

Optionally, you can change defaults using either java properties or using env variables:

MOLGENIS_POSTGRES_URI
MOLGENIS_POSTGRES_USER
MOLGENIS_POSTGRES_PASS
MOLGENIS_HTTP_PORT

For example:

java -DMOLGENIS_POSTGRES_URI=jdbc:postgresql:mydatabase -DMOLGENIS_HTTP_PORT=9090 -jar molgenis-emx2-<version>-all.jar

3. Using Helm on Kubernetes

If you have Kubernetes server then you can install using Helm.

Add helm chart repository (once)

helm repo add emx2 https://mswertz.github.io/molgenis-emx2/helm-charts

Run the latest release (see Helm docs)

helm install emx2/emx2

Update helm repository to get newest release

helm repo update

Alternatively, download latest helm chart

How to develop

Basics

We use the following:

monorepo, i.e., all code is in this repository (it is not a monolith).
gradle for build (with yarn 'workspaces' for web app)
- gradle build => builds all
- gradle clean => removes all build artifacts
- gradle run => launches the app
- gradle test => runs the tests
Semantic Release where commit message determines major.minor.patch release
- fix(component): message => results in patch+1 release
- feat(component): message => results in minor+1 release
- BREAKING CHANGE: message => results in major+1 release
- chore(component): message => relates to build process, does not result in release.
- Other non-release commands: perf, refactor, test, style, docs.
github flow which means every pull/merge to master may result in release depending on commit message
Travis to actually execute build+test(+release) for each commit. See .travis.yml file.
Sonar for static quality code checks Major thanks to all these companies!

N.B. snapshot docker images can be found at Docker hub

Software we use to develop

postresql 13
java 13
yarn (on mac, brew install yarn)
gradle (on mac, brew install gradle)
git (on mac, install xcode dev tools)
IntelliJ

Code organisation

[apps]          # contains javascript apps, one folder per app.
[backend]       # contains java modules, one folder per module. 
[helm-chart]    # contains sources for helm chart
[data]          # contains data model modules, one folder per module
[docs]          # published within EMX2 app
[gradle]        # contains source for gradle
build.gradle    # master build file, typically don't need to edit
settings.gradle # listing of all subprojects for gradle build, edit when adding
docker-compose  # master docker file
gradlew         # platform independent build file

To build all from commandline

Requires local postgresql installation, see above under 'How to run' option 2.

git clone https://github.com/molgenis/molgenis-emx2.git
./gradlew build

Takes long first time because download of all dependencies (5 mins on my machine), but much less later (few seconds if you don't change anything) thanks to caches.

To develop java/backend 'service'

Backend is developed using Java. We typically use IntelliJ IDEA for this. Clone the repo, then open IntelliJ and then 'import' and select the git clone folder. IntelliJ will recognize gradle and build all. First time that takes a few minutes.

To develop javascript/frontend 'apps'

Frontend apps are developed using vuejs and vue-cli. To develop, first cd into to folder 'apps' and install all dependencies for all apps. This also automatically links local dependencies using yarn workspaces.

cd apps
yarn install

There is one central frontend library called 'styleguide' that contains all shared components (using bootstrap for styling). To view this run:

cd apps
yarn styleguide

All other folders contain apps created using vue-cli. In order to develop you need to start a molgenis-exm2 backend as described above, e.g. docker-compose up. The /api and /graphql path is then proxied, see vue.config.js In order to preview individual apps using yarn serve.

cd apps/schema
yarn serve

To create a new app

use vue create [name]
add to apps/package.json 'workspaces' so it can be used as dependency
copy a vue.config.js from another app to have the proxy.
add to settings.gradle so it is added to the build process

Guiding principles and features

Below summary of directions that have guided development.

starting point

EMX2 simplified metadata format: only one sheet 'molgenis' needed
Organize data in schemas; each schema functions as permission group and as scope of multi-tenancy if desired
GraphQL endpoint for each schema, as well as 1 overall
Uses PostgreSQL for all heavy lifting (incl search, permissions, JSON generation, file storage)
Can be packaged as one artifact to ease sharing
Well isolated components (microfrontend using little spa, we envision microservice for server side add-ons)

dependencies

Jooq for injection safe database interaction
Sparkjava for lightweight webservice
Jackson for json and csv parsing
POI for Excel parsing
graphql-java for graphql api
OpenApi for web service spec (for file based services that don't use graphql) To minimize dependencies, no Spring stuff, no Elasticsearch, just the least that can work. Outside scope: file service (simple file column type inside postgresql), script service, authentication (asumed all to be other services used as dependency) Most core ideas where already described in https://docs.google.com/document/d/19YEGG8OGgtjCu5WlJKmHbHvosJzw3Mp6e6e7B8EioPY/edit#

Backend modules

emx2: interface and base classes
emx2-sql: implementation into postgresql
emx2-io: emx2 format, csv import/export of data, legacy import
emx2-graphql: all for generating the graphql on top of sql
emx2-semantics: endpoint for linked data serving in json-ld and ttl
emx2-webapi: ties it all together onto SparkJava embedded web server
emx2-exampledata: test data models and data, used in various test
emx2-run: packages all into one fat jar Work in progress
emx2-job: toward asynchronous calls for long running transactions/queries

Feature list (mostly in POC or 'walking skeleton' state)

simplified EMX '2.0' format
support for multiple schemas
- schemas probably should be called 'databases'
- each project/group can get their own schema
- each schema will have roles with basic permissions (viewer, editor, creator (for rls), manager, admin)
- envisioned is that each table will also have these roles, so you can define advanced roles on top
- row level permission where a 'role' can get edit permission
permission systems implemented purely using postgresql permission system
- role based permission system from postresql (view, edit, manage)
- users are also roles;
- permissions from molgenis perspective are implemented as default roles on schema, table, row level
- users can adopt these roles
extended data definition capabilities
- simple columnTypes uuid, string, int, decimal, date, datetime, text
- can create multi-column primary keys and secondary keys ('uniques')
- can create columns of columnType 'array'
- can create foreign keys (standard in postgresql)
- can create arrays of foreign keys (uses triggers)
- foreign keys can be made to all unique fields, not only primary key (so no mapping between keys needed during import)
  - use cascade updates to circument need for meaningless keys
  - checking of foreign keys is defered to end of transaction to ease consistent batch imports
- can create multi-column foreign keys (discuss if that is useful)
- many-to-many relationship produce real tables that can be queried/interacted with
reduced frills and limitations in the metadata
- there is a metadata schema 'molgenis' with schema, table, column metadata tables
- no advanced columnTypes; envisioned is that those will be defined as property extensions
- no UI options are known inside data service; again envisioned to be property extensionos
- no feature for 'labels', items can only have names for schemas, tables, columns
- freedom in schema, table and column names; they contain spaces and other charachters beyond a-zA-Z09
rudimentary import/export for files
- including molgenis.csv metadata
- simplified interface to CSV and Excel files (called 'row stores' for now)
extended query capabilities
- can query in joins accross tables by following foreign key references
- query model similar to trac i.e.
  - each condition can have multiple values, i.e. a or b or c
  - multiple conditions are assumed to be combined with 'and' into one clause
  - multiple clauses can be provided assuming 'or' between them
basic search capability using Postgresql full text search capability
- will be very interesting to see how this compares in cost/features to our needs, and elastic.
- see https://www.postgresql.org/docs/11/textsearch.html
programmer friendly API (or at least, that is what I think)
- Rows class enables columnType safe code, with magic columnType conversions
- assume always sets / arrays / batches
- reflection based POJO to table mapping, without heavy metadata / mapping code
- no magic with lazy loading and stuff
simple web service
- somewhat REST like but tuned to our needs
- make it easy for non REST specialists to use
- aim to minimize the number of calls

For developers, what I have that works

last updated 15 nov 2020

IntelliJ 20202 with
- vue plugin
- google-java-format plugin
- prettier plugin, set run for files to include '.vue' and 'on save'
- auto save and auto format using 'save actions' plugin
hook in .git/hooks/pre-push

./gradlew test --info
RESULTS=$?
if[$RESULTS -ne 0]; then
    exit 1
fi
exit 0

P.S. sometimes it help to reset gradle cache and stop the gradle daemon

./gradlew --stop rm -rf $HOME/.gradle/

If you want to delete all schemas in database use molgenis-emx2-sql/test/java/.../AToolToCleanDatabase

Name		Name	Last commit message	Last commit date
Latest commit History 1,723 Commits
apps		apps
backend		backend
data/datacatalogue		data/datacatalogue
docs		docs
gradle/wrapper		gradle/wrapper
helm-chart		helm-chart
.gitignore		.gitignore
.sonarcloud.properties		.sonarcloud.properties
.travis.yml		.travis.yml
Jenkinsfile		Jenkinsfile
README.md		README.md
ROADMAP.md		ROADMAP.md
build.gradle		build.gradle
docker-compose.yml		docker-compose.yml
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
renovate.json		renovate.json
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

molgenis-emx2 preview

How to run

1. Using docker compose

2. Using java and your own postgresql

3. Using Helm on Kubernetes

How to develop

Basics

Software we use to develop

Code organisation

To build all from commandline

To develop java/backend 'service'

To develop javascript/frontend 'apps'

Guiding principles and features

starting point

dependencies

Backend modules

Feature list (mostly in POC or 'walking skeleton' state)

For developers, what I have that works

About

Releases

Packages

Languages

BrendaHijmans/molgenis-emx2

Folders and files

Latest commit

History

Repository files navigation

molgenis-emx2 preview

How to run

1. Using docker compose

2. Using java and your own postgresql

3. Using Helm on Kubernetes

How to develop

Basics

Software we use to develop

Code organisation

To build all from commandline

To develop java/backend 'service'

To develop javascript/frontend 'apps'

Guiding principles and features

starting point

dependencies

Backend modules

Feature list (mostly in POC or 'walking skeleton' state)

For developers, what I have that works

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages