Feature/emm #1516

jannistsiroyannis · 2024-11-13T12:23:13Z

This is not a finalized thing, but enough (in my opinion) to merge and start testing at scale.

niklasl · 2024-11-19T15:45:05Z

emm/src/main/java/whelk/Dump.java

+        sendDumpPageResponse(whelk, apiBaseUrl, dump, dumpFilePath, offsetNumeric, res);
+    }
+
+    private static void sendDumpIndexResponse(String apiBaseUrl, HttpServletResponse res) throws IOException {


This is a good idea which we need to integrate into the linked data surface.

We need to describe these as JSON-LD. This response has no @context yet, so we don't know what the index is for; Nor what categories mean?

I think EMM is silent about this, but XL can describe and link to them using KBV/platform terminology. These entity sets should be linked to as datadumps of regularly described datasets (to avoid adding yet another notion of dataset, cf. data.kb.se and libris datasets).

I agree! And I will need your help to get the details right!

I'll draft something. What do we need?

The shape of the download; e.g. a gzipped archive stream of jsonld-files, each representing our named graphs? That is, the stored data with added @context and @id of the record, both aside the top @graph ( ping @olovy).

Discovery of the entity sets, possibly based on the Dataset and datasetDistribution (ping @olovy and @klngwll). Could be added later, with initial entity sets just shared on a "need to know" basis (we have a bunch of possible nice-to-haves, e.g. NB and subject headings; see the dump page in the devops repo which we can make obsolete with this).

(Also relevant: https://www.w3.org/TR/vocab-dcat-3/.)

Let's talk back-channel, ill write to you!

olovy · 2024-11-20T17:20:17Z

emm/src/main/java/whelk/Dump.java

+
+public class Dump {
+    private static final Logger logger = LogManager.getLogger(Dump.class);
+    private static final String DUMP_END_MARKER = "_DUMP_END_MARKER\n"; // Must be 17 bytes


I think it would be nice to let the reader know what all the 17s are about.
Looks like it will work until the year 2593!

olovy · 2024-11-20T17:22:29Z

emm/src/main/java/whelk/EmmChangeSet.java

+            return;
+        }
+
+        // THIS SHIT is so painful in Java :(


Can use Map.of(...) and List.of(...)
new HashMap<>(Map.of(...)) if mutability is needed.

Still pretty painful

But using put() on a LinkedHashMap which keeps insertion order will give a nicer response for humans.

olovy · 2024-11-20T17:50:51Z

emm/src/main/java/whelk/EmmServlet.java

+            String apiBaseUrl = req.getRequestURL().toString();
+
+            res.setCharacterEncoding("utf-8");
+            res.setContentType("application/json");


application/activity+json?
See https://www.w3.org/TR/activitystreams-core/#syntaxconventions

olovy · 2024-11-21T11:30:20Z

emm/src/main/java/whelk/Dump.java

+
+                // Is there not enough data for a full page yet ?
+                long offsetBytes = 17 * offsetLines;
+                while (!dumpFinished && file.length() < offsetBytes + (17 * (long)EmmChangeSet.TARGET_HITS_PER_PAGE)) {


Need some kind of timeout/limit here?
Otherwise it will get stuck in an infinite loop if generateDump fails in the middle for some reason.

olovy

👍
As discussed offline we can merge this and then explore the options for full dumps further.

jannistsiroyannis added 14 commits October 30, 2024 16:30

First attempt at getting paged EMM data, not complete.

349ef2a

Progress.

2807edd

Starting to look a little like EMM.

782f0cb

Add a missing file.

087ae95

Fix DB index usage.

2cfae28

Add the entry point response.

dfc0c8c

Work in progress, dumps

73f48b9

Read dumps, but don't send em yet.

48b7b3f

Looks like somewhat working dumps

6c99a10

No categories for feed.

675055e

Add a few basic dump categories.

bd5882e

Embedd instances when dumping itemAndInstance-categories.

ae381e5

Type categories for EMM-dumps.

db0280c

Add dump IDs

b72529d

jannistsiroyannis requested review from niklasl, andersju, kaipoykio, lrosenstrom, olovy and kwahlin November 13, 2024 12:23

jannistsiroyannis added 5 commits November 18, 2024 09:41

Beginnings of an EMM client.

2d81979

emm client, write changes on update.

71b40b7

embellish in emm client

702d9a5

somewhat working emm client.

9b08017

Do a little cleanup of the EMM client.

c1d9b18

niklasl reviewed Nov 19, 2024

View reviewed changes

jannistsiroyannis added 3 commits November 20, 2024 10:46

Fix EMM client embellish.

cda8b8f

Add create/delete handling in EMM client.

b087eb9

Remove temporary hack from EMM client.

3ea4abe

olovy reviewed Nov 20, 2024

View reviewed changes

Minor EMM cleanups.

43c8e31

olovy reviewed Nov 21, 2024

View reviewed changes

olovy approved these changes Nov 21, 2024

View reviewed changes

jannistsiroyannis merged commit 6d25b30 into develop Nov 21, 2024
1 check passed

jannistsiroyannis deleted the feature/emm branch November 21, 2024 12:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/emm #1516

Feature/emm #1516

jannistsiroyannis commented Nov 13, 2024

niklasl Nov 19, 2024

jannistsiroyannis Nov 20, 2024

niklasl Nov 20, 2024

niklasl Nov 20, 2024

jannistsiroyannis Nov 21, 2024

olovy Nov 20, 2024 •

edited

Loading

olovy Nov 20, 2024 •

edited

Loading

olovy Nov 20, 2024

olovy Nov 20, 2024

olovy Nov 21, 2024

olovy left a comment

Feature/emm #1516

Feature/emm #1516

Conversation

jannistsiroyannis commented Nov 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

olovy Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

olovy Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

olovy left a comment

Choose a reason for hiding this comment

olovy Nov 20, 2024 •

edited

Loading

olovy Nov 20, 2024 •

edited

Loading