Skip to content

Harvest ERM usage statistics. Funded by European Regional Development Fund (EFRE).

License

Notifications You must be signed in to change notification settings

folio-org/mod-erm-usage-harvester

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mod-erm-usage-harvester

Copyright (C) 2018-2024 The Open Library Foundation

This software is distributed under the terms of the Apache License, Version 2.0. See the file "LICENSE" for more information.

Development funded by European Regional Development Fund (EFRE)

Introduction

Module for harvesting counter reports.

Requirements

  • The module needs to know about the Okapi URL (see here).
  • For scheduled harvesting you need to provide user credentials (see here).
  • Environment variables for database connectivity need to be provided (see here).

Installation

$ git clone ...
$ cd mod-erm-usage-harvester
$ mvn clean install

Run plain jar

$ env OKAPI_URL=http://127.0.0.1:9130 java -jar \
  mod-erm-usage-harvester-bundle/target/mod-erm-usage-harvester-bundle-fat.jar

Run via Docker

Build docker image

$ docker build -t mod-erm-usage-harvester .

Run docker image

$ docker run -e OKAPI_URL=http://127.0.0.1:9130 -p 8081:8081 mod-erm-usage-harvester

Configuration

Listening port

The default listening port is 8081 and can be set by using -Dhttp.port parameter when running the jar file or using the -p flag when using docker run.

Setting the Okapi URL

Use the environment variable named OKAPI_URL to provide the URL to Okapi.

Proxy configuration

Proxy settings are configured via JVM system properties if you are running the plain jar.

  • http.proxyHost, http.proxyPort, https.proxyHost, https.proxyPort, http.nonProxyHosts

And via environment variables if you are running the Docker container.

  • HTTP_PROXY, HTTPS_PROXY, NO_PROXY
    These get translated into JVM system properties by the base image.

Quartz scheduler

Quartz configuration is located in quartz.properties . If you wish to use another file, you must define the system property org.quartz.properties to point to the file you want. You can also set individual quartz properties using system properties ( e.g. -Dorg.quartz.threadPool.threadCount=8). The org.quartz.threadPool.threadCount property controls how many providers are harvested concurrently.

Hazelcast

The default Quartz configuration uses the HazelcastJobStore for clustering which relies on Hazelcast. By default the standard configuration shipped with hazelcast is used. You can supply your own XML or YAML configuration through the hazelcast.config system property or just put it into the working directory. If you're using clustering, make sure that member discovery is working by inspecting the logs. You might want to tailor the Hazelcast configuration to suit your particular deployment environment. You can read about Hazelcast discovery mechanisms here.

Periodic harvesting

Periodic harvesting requires the module to log in using user credentials. These credentials are defined separately for each tenant via the environment variables {TENANT}_USER_NAME and {TENANT}_USER_PASS, where {TENANT} serves as a placeholder for the tenant ID and must be in uppercase. The user also needs the ermusageharvester.start-all.get permission.

Example for tenant 'diku':

DIKU_USER_NAME=mod-erm-usage-harvester
DIKU_USER_PASS=password123

Periodic harvesting is set up through the erm-usage-harvester/periodic API. Configuration is done for each tenant separately by using the X-Okapi-Tenant header. See PeriodicConfig and periodic.raml.

Example:

curl --request POST \
  --url http://localhost:9130/erm-usage-harvester/periodic \
  --header 'content-type: application/json' \
  --header 'x-okapi-tenant: diku' \
  --data '{
  "startAt": "2019-01-01T08:00:00.000+0000",
  "periodicInterval": "daily"
}'

This request will create a schedule which triggers harvesting for tenant diku each day at 8am UTC starting on 2019-01-01.

Note: Using "periodicInterval: "monthly" and startAt with days > 28 will result in a 'last day of month' schedule.

Example 2:

{
  "startAt": "2019-01-29T08:00:00.000+0000",
  "periodicInterval": "monthly"
}

This configuration will trigger harvesting every last day of month at 8am UTC starting on 2019-01-31 followed by 2019-02-28, 2019-03-31, 2019-04-30, ... .

ServiceEndpoint implementations

The ServiceEndpoint implementation defines how reports are fetched for a provider. To provide additional implementations you will need to implement the ServiceEndpointProvider interface and make it available on the classpath.

So far 3 implementations are provided:

Implementations available at runtime can be listed at /erm-usage-harvester/impl.

{
  "implementations": [
    {
      "name": "Counter-Sushi 4.1",
      "description": "SOAP-based implementation for CounterSushi 4.1",
      "type": "cs41",
      "isAggregator": false
    },
    {
      "name": "Counter-Sushi 5.0",
      "description": "Implementation for Counter/Sushi 5",
      "type": "cs50",
      "isAggregator": false
    },
    {
      "name": "Nationaler Statistikserver",
      "description": "Implementation for Germanys National Statistics Server (https://sushi.redi-bw.de).",
      "type": "NSS",
      "isAggregator": true,
      "configurationParameters": [
        "apiKey",
        "requestorId",
        "customerId",
        "reportRelease"
      ]
    }
  ]
}

mod-erm-usage-harvester-cs50

Request parameters

To enable the creation of standard views, master reports are retrieved with the following additional parameters:

Report Attributes_To_Show Include_Parent_Details
DR Data_Type|Access_Method
IR Authors|Publication_Date|Article_Version|Data_Type|YOP|Access_Type|Access_Method True
PR Data_Type|Access_Method
TR Data_Type|Section_Type|YOP|Access_Type|Access_Method

Example:
/reports/dr?requestor_id=xxx&customer_id=xxx&begin_date=2021-01&end_date=2021-12&attributes_to_show=Data_Type|Access_Method

Additional processing

Due to providers responding in various ways the provider response is intercepted and adjusted before processing.
This is nescessary as some providers use 2xx status codes to send sushi errors, but the generated client expects 2xx codes to return counter reports and different codes to return sushi errors.
So if reponses with status code 2xx are received, it is checked whether the response data structure matches one of the 4 counter master reports (TR, PR, DR and IR). If it does match, no changes are made to the response. If it does not match, the response gets transformed into a 400 - Bad Request response, preserving the original response body in cases listed below.

Some observations and how they are handled so far:

  • Providers use 2xx status codes to return sushi errors, not reports (gets routed and handled as 400 with original response body)
  • Providers return sushi errors as array instead of object (array makes it into the response body)
  • Providers return "null" instead of sushi error (returns a InvalidReportException: null)
  • Providers return reports with a Report_Header that contains a Exception object instead of a Exceptions array (not handled, will be interpreted as report without Exceptions)

Additional information

Issue tracker

See project MODEUSHARV at the FOLIO issue tracker.

Other documentation

Other modules are described, with further FOLIO Developer documentation at dev.folio.org