-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/collection api #181
Conversation
delete obsolete file scaffold collection model
It does. This happens because migrations use a reconstruction of the model based on the migration history, rather than the actual class. So this migration tried to access the custom method This is caused by a mistake in the source code, I'll update it and let you know. |
Okay, I've updated it and it looks like the migrations run fine now. I'm not very happy with the solution (importing the actual model class during the migration) but it works. |
Getting one for
|
Ah, that should be fixed now! Thanks for trying all this out btw 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit confused by this comment:
Current status: this PR is ready for review; I've made it a draft because it should only be merged in combination with a frontend update.
The pull request compares to feature/rdf-modeling-utils
, not develop
. Surely, you are planning to build the feature/rdf-modeling-utils
branch incrementally, and the current branch does not need to be complete with frontend adaptations before you merge it?
Side note: even on develop
, I wouldn't necessarily be against you merging this without corresponding frontend changes. We just did something like that in #174. As far as I'm concerned, only on master
/main
does everything need to be completely consistent and bug-free. The purpose of release branches is to ensure this.
Anyway, back to the actual changes. I like the cleanness of the code, but I would prefer more docstrings and comments to explain why things are the way they are. I also noticed a bug.
I created a new collection with the slug extra
through the browsable API. The URI of this collection is http://localhost:8000/rdf/collections/extra
. Create, retrieve, update and delete all work. However:
- In the triplestore, the slug of the associated project is duplicated. Its URI is
http://localhost:8000/rdf/project/Dev/Dev
everywhere (so at least the URI is consistent). Note thatproject
is singular as well. By contrast, my project that was migrated from a group has a pluralprojects
and contains the slug only once:http://localhost:8000/rdf/projects/dev_g
. - I updated my newly created collection to associate it with the
dev_g
project instead. However, the corresponding object node in the triplestore now held the URIhttp://localhost:8000/rdf/project/dev_g/dev_g
.
Some more details in the comments. Keep up the sweet refactoring!
|
||
# try to create the same collection again | ||
fail_response = post_collection(client, project.name) | ||
assert is_client_error(fail_response.status_code) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once database synchronization is added, I would also want to test here that there is no duplicate entry in the database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My impression was that if you add triples that already exist in blazegraph (in the same graph), it will have no effect. That is, the same triple won't be stored twice. Since the request to make a collection is idempotent, and this test makes the same request twice, I don't think you could check if the second time was executed?
What does make sense to me is to create a different collection in the second request (e.g. with a different description), to check that the create request doesn't store the new data. (This scenario is also why the api should reject the request.)
Correct me if I'm wrong about blazegraph here, though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right about Blazegraph, but I meant the representation of the collection in the PostgreSQL database. Hence "once database synchronization is added".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see! Though it was my understanding that we plan to only save collections in blazegraph in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on how we administrate which people have (write) access to which collections.
('Test!!', 'test'), | ||
('Test test test', 'test_test_test'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These test cases are quite conservative. Add some Bobby Tables and some chimpanzee speak as well.
from triplestore.utils import Quads | ||
|
||
|
||
class CollectionsField(RDFQuadField): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a docstring to explain the purpose of this class.
def get(self, instance: RDFModel): | ||
return [ | ||
s | ||
for (s, p, o, g) in self._stored_quads(instance) | ||
] | ||
|
||
def _stored_quads(self, instance: RDFModel) -> Quads: | ||
store = settings.RDFLIB_STORE | ||
results = store.query(f''' | ||
SELECT ?col WHERE {{ | ||
?col a edpopcol:Collection ; | ||
as:context <{instance.uri}> . | ||
}} | ||
''', initNs={'as': AS, 'edpopcol': EDPOPCOL}) | ||
|
||
return [ | ||
(result, AS.context, instance.uri, Graph(store, result)) | ||
for (result, ) in results | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, but it looks to me as if the query in _stored_quads
gets the information that get
is looking for. However, _stored_quads
decorates this information with additional data, which get
then peels off again. Wouldn't it be more straightforward and efficient to put the query in get
and then call get
from _stored_quads
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the confusion. The reason is that _stored_quads
is also used to update the triplestore. When you call save()
on the model, the field will compare _stored_quads()
with _quads_to_store()
to update the graph. _stored_quads()
is also used when you call delete()
.
So _stored_quads()
returns the representation that is saved in the triplestore, which is needed to make updates. get()
translates the graph representation to one that is more convenient in the model.
This is implemented on the parent class (RDFQuadField
), but it looks convoluted in isolation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but you are in full control here because you are overriding both methods. What stops you from writing this?
def get(self, instance: RDFModel): | |
return [ | |
s | |
for (s, p, o, g) in self._stored_quads(instance) | |
] | |
def _stored_quads(self, instance: RDFModel) -> Quads: | |
store = settings.RDFLIB_STORE | |
results = store.query(f''' | |
SELECT ?col WHERE {{ | |
?col a edpopcol:Collection ; | |
as:context <{instance.uri}> . | |
}} | |
''', initNs={'as': AS, 'edpopcol': EDPOPCOL}) | |
return [ | |
(result, AS.context, instance.uri, Graph(store, result)) | |
for (result, ) in results | |
] | |
def get(self, instance: RDFModel): | |
store = settings.RDFLIB_STORE | |
results = store.query(f''' | |
SELECT ?col WHERE {{ | |
?col a edpopcol:Collection ; | |
as:context <{instance.uri}> . | |
}} | |
''', initNs={'as': AS, 'edpopcol': EDPOPCOL}) | |
return [ | |
s | |
for (s, ) in results | |
] | |
def _stored_quads(self, instance: RDFModel) -> Quads: | |
store = settings.RDFLIB_STORE | |
return [ | |
(result, AS.context, instance.uri, Graph(store, result)) | |
for result in self.get(instance) | |
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right. Nothing, but to explain why it wasn't written like that: in general, _stored_triples
/_stored_quads
may contain information that is ignored by the model (and thus not retrievable from the output of get()
).
For instance, you could write this field to be agnostic about the graph context. In that case, _stored_quads
should include the graph the as:context
triple was stored in, because that information is necessary for save/delete operations. But get()
can filter it out.
So in general, it makes sense to write _stored_triples
/_stored_quads
first, and then convert that to an internal value in get()
. In this specific case, though, you're right that the dependency can flow either way.
@lukavdplas Is this ready to merge? Should I have another look at it first? |
I think it's ready to merge. Bear in mind that this API isn't compatible with the frontend, so it will make the develop branch not-release-ready for a while. But I think we already discussed that's not an issue. |
As of #181, /api/collections only retrieves the user's own collections by default. The .mine method is retained for easier porting and as an easy way to resume the distinction if it is added again in the future.
The fact that this was not correctly set yet is a historical artifact that apparently has gone unnoticed for a long time. It is unrelated to #181.
As of #181, /api/collections only retrieves the user's own collections by default. The .mine method is retained for easier porting and as an easy way to resume the distinction if it is added again in the future.
The fact that this was not correctly set yet is a historical artifact that apparently has gone unnoticed for a long time. It is unrelated to #181.
This adds an API for collections which are saved in the triplestore. close #159, close #160
It adds list, retrieve, create, update, and delete endpoints for collections. (Which replace the existing API for collections.)
This is implemented as a "regular" JSON API, but collections are identified by an URI instead of an integer.
This update only implements the API on the backend, so it will break the frontend (which expects a slightly different API for collections). I expect adjusting the frontend will go smoothly, but I did not want to bloat this PR.
In the meantime, you can test the browsable API at
/api/collections/
.This update also does not include a data migration or adjust the existing data migration script; I'll add that in a later PR.
Other changes:
vre
app to the project moduleedpop
.