Copyright (C) 2018-2024 The Open Library Foundation
This software is distributed under the terms of the Apache License, Version 2.0. See the file "LICENSE" for more information.
Module for harvesting counter reports.
- The module needs to know about the Okapi URL (see here).
- For scheduled harvesting you need to provide user credentials (see here).
- Environment variables for database connectivity need to be provided (see here).
$ git clone ...
$ cd mod-erm-usage-harvester
$ mvn clean install
$ env OKAPI_URL=http://127.0.0.1:9130 java -jar \
mod-erm-usage-harvester-bundle/target/mod-erm-usage-harvester-bundle-fat.jar
$ docker build -t mod-erm-usage-harvester .
$ docker run -e OKAPI_URL=http://127.0.0.1:9130 -p 8081:8081 mod-erm-usage-harvester
The default listening port is 8081
and can be set by using -Dhttp.port
parameter when running
the jar file or using the -p
flag when using docker run
.
Use the environment variable named OKAPI_URL
to provide the URL to Okapi.
Proxy settings are configured via JVM system properties if you are running the plain jar.
http.proxyHost
,http.proxyPort
,https.proxyHost
,https.proxyPort
,http.nonProxyHosts
And via environment variables if you are running the Docker container.
HTTP_PROXY
,HTTPS_PROXY
,NO_PROXY
These get translated into JVM system properties by the base image.
Quartz configuration is located
in quartz.properties
. If you wish to use another file, you must define the system property org.quartz.properties
to
point to the file you want. You can also set individual quartz properties using system properties (
e.g. -Dorg.quartz.threadPool.threadCount=8
). The org.quartz.threadPool.threadCount
property controls how many providers are harvested concurrently.
The default Quartz configuration uses the HazelcastJobStore
for clustering which relies on
Hazelcast. By default the standard configuration
shipped with hazelcast is used. You can supply your own XML or YAML configuration through the
hazelcast.config
system property or just put it into the working directory. If you're using
clustering, make sure that member discovery is working by inspecting the logs. You might want to
tailor the Hazelcast configuration to suit your particular deployment environment. You can read
about Hazelcast discovery mechanisms here.
Periodic harvesting requires the module to log in using user credentials. These credentials are
defined separately for each tenant via the environment variables {TENANT}_USER_NAME
and
{TENANT}_USER_PASS
, where {TENANT}
serves as a placeholder for the tenant ID and must be in
uppercase. The user also needs the ermusageharvester.start-all.get
permission.
Example for tenant 'diku':
DIKU_USER_NAME=mod-erm-usage-harvester
DIKU_USER_PASS=password123
Periodic harvesting is set up through the erm-usage-harvester/periodic
API. Configuration is done
for each tenant separately by using the X-Okapi-Tenant
header.
See PeriodicConfig
and periodic.raml.
Example:
curl --request POST \
--url http://localhost:9130/erm-usage-harvester/periodic \
--header 'content-type: application/json' \
--header 'x-okapi-tenant: diku' \
--data '{
"startAt": "2019-01-01T08:00:00.000+0000",
"periodicInterval": "daily"
}'
This request will create a schedule which triggers harvesting for tenant diku
each day at 8am UTC
starting on 2019-01-01
.
Note: Using "periodicInterval: "monthly"
and startAt
with days > 28 will result in a 'last
day of month' schedule.
Example 2:
{
"startAt": "2019-01-29T08:00:00.000+0000",
"periodicInterval": "monthly"
}
This configuration will trigger harvesting every last day of month at 8am UTC starting
on 2019-01-31
followed by 2019-02-28
, 2019-03-31
, 2019-04-30
, ... .
The ServiceEndpoint implementation defines how reports are fetched for a provider. To provide additional implementations you will need to implement the ServiceEndpointProvider interface and make it available on the classpath.
So far 3 implementations are provided:
mod-erm-usage-harvester-cs41
– Counter Sushi 4.1mod-erm-usage-harvester-cs50
– Counter Sushi 5.0 APImod-erm-usage-harvester-nss
– Germanys National Statistics Server
Implementations available at runtime can be listed at /erm-usage-harvester/impl
.
{
"implementations": [
{
"name": "Counter-Sushi 4.1",
"description": "SOAP-based implementation for CounterSushi 4.1",
"type": "cs41",
"isAggregator": false
},
{
"name": "Counter-Sushi 5.0",
"description": "Implementation for Counter/Sushi 5",
"type": "cs50",
"isAggregator": false
},
{
"name": "Nationaler Statistikserver",
"description": "Implementation for Germanys National Statistics Server (https://sushi.redi-bw.de).",
"type": "NSS",
"isAggregator": true,
"configurationParameters": [
"apiKey",
"requestorId",
"customerId",
"reportRelease"
]
}
]
}
To enable the creation of standard views, master reports are retrieved with the following additional parameters:
Report | Attributes_To_Show | Include_Parent_Details |
---|---|---|
DR | Data_Type|Access_Method | |
IR | Authors|Publication_Date|Article_Version|Data_Type|YOP|Access_Type|Access_Method | True |
PR | Data_Type|Access_Method | |
TR | Data_Type|Section_Type|YOP|Access_Type|Access_Method |
Example:
/reports/dr?requestor_id=xxx&customer_id=xxx&begin_date=2021-01&end_date=2021-12&attributes_to_show=Data_Type|Access_Method
Due to providers responding in various ways the provider response is intercepted and adjusted before processing.
This is nescessary as some providers use 2xx
status codes to send sushi errors, but the generated client expects 2xx
codes to return counter reports and different codes to return sushi errors.
So if reponses with status code 2xx
are received, it is checked whether the response data structure matches one of the 4 counter master reports (TR
, PR
, DR
and IR
). If it does match, no changes are made to the response. If it does not match, the response gets transformed into a 400 - Bad Request
response, preserving the original response body in cases listed below.
Some observations and how they are handled so far:
- Providers use
2xx
status codes to return sushi errors, not reports (gets routed and handled as400
with original response body) - Providers return sushi errors as array instead of object (array makes it into the response body)
- Providers return
"null"
instead of sushi error (returns aInvalidReportException: null
) - Providers return reports with a
Report_Header
that contains aException
object instead of aExceptions
array (not handled, will be interpreted as report withoutExceptions
)
See project MODEUSHARV at the FOLIO issue tracker.
Other modules are described, with further FOLIO Developer documentation at dev.folio.org