NVA Data Report API

This repository contains the NVA data report API.

How to run a bulk upload

The steps below can be outlined briefly as:

Pre-run
- Stop incoming live-update events
- Delete data from previous runs
- Delete all data in database
Bulk upload
- Generate batches of document keys for upload
- Transform the data to a format compatible with the bulk-upload action
- Initiate bulk upload
- Verify data integrity
Post-run
- Start incoming live-update events

Pre-run steps

Remove all objects from S3 bucket loader-input-files-{accountName}
Turn off S3 event notifications for bucket persisted-resources-{accountName} In aws console, go to
S3 -> persisted-resources-{accountName} -> Properties -> Amazon EventBridge -> Edit -> Off
Press ResetDatabaseButton (Trigger DatabaseResetHandler). This might take around a minute to complete.

Verify that database is empty. You can use SageMaker notebook to query the database*. Example sparql queries:

SELECT (COUNT(DISTINCT ?g) as ?gCount) WHERE {GRAPH ?g {?s ?p ?o}}

or

SELECT ?g ?s ?p ?o WHERE {GRAPH ?g {?s ?p ?o}} LIMIT 100

Bulk upload steps

Generate key batches for both locations: resources and nvi-candidates. Manually trigger GenerateKeyBatchesHandler with the following input:
```
{
   "detail": {
      "location": "resources|nvi-candidates"
   }
}
```
Verify that GenerateKeyBatchesHandler is done processing (i.e. check logs) and that key batches have been generated S3 bucket data-report-key-batches-{accountName}
Trigger BulkTransformerHandler
Verify that BulkTransformerHandler is done processing (i.e. check logs) and that nquads have been generated S3 bucket loader-input-files-{accountName}
Trigger BulkDataLoader
To check progress for bulk upload to Neptune. Trigger BulkDataLoader with the following input:
```
{
 "loadId": "{copy loadId UUID from test log}"
}
```
Verify that expected count is in database. Query for counting distinct named graphs:
```
SELECT (COUNT(DISTINCT ?g) as ?gCount) WHERE {GRAPH ?g {?s ?p ?o}}
```

Post-run steps

Turn on S3 event notifications for bucket persisted-resources-{accountName}. In aws console, go to
S3 -> persisted-resources-{accountName} -> Properties -> Amazon EventBridge -> Edit -> On

*Note: You can use SageMaker notebook to query the database. Notebook can be opened from the AWS console through SageMaker -> Notebooks -> Notebook instances -> Open JupyterLab

Name		Name	Last commit message	Last commit date
Latest commit History 260 Commits
.github/workflows		.github/workflows
buildSrc		buildSrc
bulk-export		bulk-export
bulk-load		bulk-load
config		config
data-loading		data-loading
data-report-commons		data-report-commons
database-tools		database-tools
docs		docs
gradle		gradle
report-api		report-api
testing-utils		testing-utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
buildspec.yaml		buildspec.yaml
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle
template.yaml		template.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NVA Data Report API

How to run a bulk upload

Pre-run steps

Bulk upload steps

Post-run steps

About

Releases

Packages

Contributors 6

Languages

License

BIBSYSDEV/nva-data-report-api

Folders and files

Latest commit

History

Repository files navigation

NVA Data Report API

How to run a bulk upload

Pre-run steps

Bulk upload steps

Post-run steps

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages