-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/emm #1516
Feature/emm #1516
Conversation
sendDumpPageResponse(whelk, apiBaseUrl, dump, dumpFilePath, offsetNumeric, res); | ||
} | ||
|
||
private static void sendDumpIndexResponse(String apiBaseUrl, HttpServletResponse res) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea which we need to integrate into the linked data surface.
We need to describe these as JSON-LD. This response has no @context
yet, so we don't know what the index is for; Nor what categories
mean?
I think EMM is silent about this, but XL can describe and link to them using KBV/platform terminology. These entity sets should be linked to as datadumps of regularly described datasets (to avoid adding yet another notion of dataset, cf. data.kb.se and libris datasets).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree! And I will need your help to get the details right!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll draft something. What do we need?
- The shape of the download; e.g. a gzipped archive stream of jsonld-files, each representing our named graphs? That is, the stored data with added
@context
and@id
of the record, both aside the top@graph
( ping @olovy). - Discovery of the entity sets, possibly based on the Dataset and datasetDistribution (ping @olovy and @klngwll). Could be added later, with initial entity sets just shared on a "need to know" basis (we have a bunch of possible nice-to-haves, e.g. NB and subject headings; see the dump page in the devops repo which we can make obsolete with this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also relevant: https://www.w3.org/TR/vocab-dcat-3/.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's talk back-channel, ill write to you!
|
||
public class Dump { | ||
private static final Logger logger = LogManager.getLogger(Dump.class); | ||
private static final String DUMP_END_MARKER = "_DUMP_END_MARKER\n"; // Must be 17 bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to let the reader know what all the 17s are about.
Looks like it will work until the year 2593!
return; | ||
} | ||
|
||
// THIS SHIT is so painful in Java :( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can use Map.of(...)
and List.of(...)
new HashMap<>(Map.of(...))
if mutability is needed.
Still pretty painful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But using put()
on a LinkedHashMap
which keeps insertion order will give a nicer response for humans.
String apiBaseUrl = req.getRequestURL().toString(); | ||
|
||
res.setCharacterEncoding("utf-8"); | ||
res.setContentType("application/json"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
application/activity+json
?
See https://www.w3.org/TR/activitystreams-core/#syntaxconventions
|
||
// Is there not enough data for a full page yet ? | ||
long offsetBytes = 17 * offsetLines; | ||
while (!dumpFinished && file.length() < offsetBytes + (17 * (long)EmmChangeSet.TARGET_HITS_PER_PAGE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need some kind of timeout/limit here?
Otherwise it will get stuck in an infinite loop if generateDump
fails in the middle for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
As discussed offline we can merge this and then explore the options for full dumps further.
This is not a finalized thing, but enough (in my opinion) to merge and start testing at scale.