Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

basic collection records endpoint #195

Merged
merged 6 commits into from
Dec 12, 2024
Merged

Conversation

lukavdplas
Copy link
Contributor

This implements a basic endpoint to view the records inside a collection.

The current response will contain the URIs of records in the collection, but does not yet contain the contents of records. It also doesn't include pagination. Unlike with catalogue results, that doesn't seem immediately necessary.

The view will return a graph with an ActivityStreams Collection. Unlike the catalogue search endpoint, this isn't an OrderedCollection, because collections are not ordered, but it should be doable to reuse some of the same views.

@jgonggrijp
Copy link
Member

@lukavdplas is this finished? Shall I review it?

@lukavdplas
Copy link
Contributor Author

Yes, I think it's ready for review.

Copy link
Member

@jgonggrijp jgonggrijp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the collection-records endpoint also produce the contents of the records in the given collection? Otherwise the frontend would have to issue additional requests, potentially one per record, in order to fetch enough data to display the overview table.


items = list(graph.objects(list_node, RDF.first))
rest_nodes = graph.objects(list_node, RDF.rest)
for rest in rest_nodes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is rest a single-element iterator? Otherwise, I find the for loop a little bit alarming in combination with the recursion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell, there should be only one rest to a list, but nothing will enforce that on a technical level. (The specification of rest does not mention such a restriction, though.)

Anyway, if that happens, this solution is to concatenate the different lists, but another option would be to pick one and ignore the rest.

Now that you mention it, the recursive definition is a bit problematic since you can easily define circular lists in RDF - which would lead to infinite recursion. This can also happen if you drop all but one rest element.

I don't think this is a huge issue here, though, since edpop collections are created by the application itself. It would be an issue with a generic collection reader, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this algorithm should use a set membership check to prevent an infinite loop. Up to you whether you want to implement that on this branch or defer it to a new issue.

backend/collect/graphs.py Outdated Show resolved Hide resolved
return g


def collection_triples(graph: Graph, list_node: IdentifiedNode) -> Triples:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this function do exactly the same thing as list_from_graph_collection?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see the difference. This function returns full triples, the other only bare objects. Maybe just use the result from list_from_graph_collection? You know that the subject is always list_node, that the first predicate is RDF.first and that the remaining predicates are all RDF.rest. Saves some duplicate logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale here was that the graph may be implemented in a non-standard way, such as having multiple first or rest elements for a list node. If we're deleting or changing a collection, it's relevant to select the triples that are currently stored, rather than reconstruct how they should be stored.

You could compare these results to see the difference:

found_triples = collection_triples(g, node)

items = list_from_graph_collection(g, node)
reconstructed_collection = list_to_graph_collection(items, node)
normalized_triples = all_triples(reconstructed_collection)

The second is essentially what you're proposing, which will yield the same result if the list is stored the way you would expect. (If you use model.save(), it will replace the found_triples with the normalized_triples, by the way.)

So in theory yes, these would be the same, but they might not be and these functions have different uses in that scenario. I'll add a comment to explain this.

Comment on lines 61 to 62
g += collection_obj._class_triples()
g += EDPOPCollection.records._stored_triples(collection_obj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention for methods starting with an underscore is that they are not supposed to be called, or even known, outside their own class. Should these methods be private in the first place, and if so, isn't there some kind of public interface that you can use instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think I initially envisioned these as "private" methods but they probably should just be public. You don't really need the method here, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I you don't really need it (which of the two, by the way?), then why is it invoked?

@jgonggrijp
Copy link
Member

Heads-up @lukavdplas: I'm going to rebase and force-push this branch before merging it. Contents will not change but your local reference will go stale. You can realign with git reset --hard if desired.

@jgonggrijp jgonggrijp changed the base branch from feature/collection-api to develop December 12, 2024 16:11
@jgonggrijp jgonggrijp force-pushed the feature/collection-records-api branch from 30fe3dd to fbd22f7 Compare December 12, 2024 16:18
@jgonggrijp jgonggrijp merged commit 4427cee into develop Dec 12, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants