Feature/collection api #181

lukavdplas · 2024-07-16T14:13:33Z

This adds an API for collections which are saved in the triplestore. close #159, close #160

It adds list, retrieve, create, update, and delete endpoints for collections. (Which replace the existing API for collections.)

This is implemented as a "regular" JSON API, but collections are identified by an URI instead of an integer.

This update only implements the API on the backend, so it will break the frontend (which expects a slightly different API for collections). I expect adjusting the frontend will go smoothly, but I did not want to bloat this PR.

In the meantime, you can test the browsable API at /api/collections/.

This update also does not include a data migration or adjust the existing data migration script; I'll add that in a later PR.

Other changes:

Projects now explicitly store their URI in the database. This makes lookups easier.
The core URL configuration is moved from the vre app to the project module edpop.
Adds defined namespaces (implement defined namespaces #180)

delete obsolete file scaffold collection model

lukavdplas · 2024-07-17T15:34:46Z

It does. This happens because migrations use a reconstruction of the model based on the migration history, rather than the actual class. So this migration tried to access the custom method graph(), but that isn't available during the migration.

This is caused by a mistake in the source code, I'll update it and let you know.

lukavdplas · 2024-07-18T12:15:29Z

Okay, I've updated it and it looks like the migrations run fine now. I'm not very happy with the solution (importing the actual model class during the migration) but it works.

jgonggrijp · 2024-07-18T12:49:36Z

Getting one for Collection as well:

2024-07-18 14:48:00 System check identified no issues (0 silenced).
2024-07-18 14:48:00 Operations to perform:
2024-07-18 14:48:00   Apply all migrations: admin, auth, authtoken, contenttypes, projects, sessions, vre
2024-07-18 14:48:00 Running migrations:
2024-07-18 14:48:01   Applying projects.0002_researchgroups_to_projects... OK
2024-07-18 14:48:01   Applying projects.0003_project_uri... OK
2024-07-18 14:48:01   Applying projects.0004_fill_project_uri... OK
2024-07-18 14:48:01   Applying projects.0005_alter_project_uri... OK
2024-07-18 14:48:01 Traceback (most recent call last):
2024-07-18 14:48:01   File "/usr/src/app/backend/manage.py", line 22, in <module>
2024-07-18 14:48:01     execute_from_command_line(sys.argv)
2024-07-18 14:48:01   File "/usr/local/lib/python3.9/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
2024-07-18 14:48:01     utility.execute()
2024-07-18 14:48:01   File "/usr/local/lib/python3.9/site-packages/django/core/management/__init__.py", line 436, in execute
2024-07-18 14:48:01     self.fetch_command(subcommand).run_from_argv(self.argv)
2024-07-18 14:48:01   File "/usr/local/lib/python3.9/site-packages/django/core/management/base.py", line 412, in run_from_argv
2024-07-18 14:48:01     self.execute(*args, **cmd_options)
2024-07-18 14:48:01   File "/usr/local/lib/python3.9/site-packages/django/core/management/base.py", line 458, in execute
2024-07-18 14:48:01     output = self.handle(*args, **options)
2024-07-18 14:48:01   File "/usr/local/lib/python3.9/site-packages/django/core/management/base.py", line 106, in wrapper
2024-07-18 14:48:01     res = handle_func(*args, **kwargs)
2024-07-18 14:48:01   File "/usr/local/lib/python3.9/site-packages/django/core/management/commands/migrate.py", line 356, in handle
2024-07-18 14:48:01     post_migrate_state = executor.migrate(
2024-07-18 14:48:01   File "/usr/local/lib/python3.9/site-packages/django/db/migrations/executor.py", line 135, in migrate
2024-07-18 14:48:01     state = self._migrate_all_forwards(
2024-07-18 14:48:01   File "/usr/local/lib/python3.9/site-packages/django/db/migrations/executor.py", line 167, in _migrate_all_forwards
2024-07-18 14:48:01     state = self.apply_migration(
2024-07-18 14:48:01   File "/usr/local/lib/python3.9/site-packages/django/db/migrations/executor.py", line 252, in apply_migration
2024-07-18 14:48:01     state = migration.apply(state, schema_editor)
2024-07-18 14:48:01   File "/usr/local/lib/python3.9/site-packages/django/db/migrations/migration.py", line 132, in apply
2024-07-18 14:48:01     operation.database_forwards(
2024-07-18 14:48:01   File "/usr/local/lib/python3.9/site-packages/django/db/migrations/operations/special.py", line 193, in database_forwards
2024-07-18 14:48:01     self.code(from_state.apps, schema_editor)
2024-07-18 14:48:01   File "/usr/src/app/backend/vre/migrations/0006_annotation_context_collection_context.py", line 46, in save_collection_managing_group_as_context
2024-07-18 14:48:01     matching_name = obj.managing_group.filter(name=obj.name)
2024-07-18 14:48:01 AttributeError: 'Collection' object has no attribute 'name'
2024-07-18 14:48:01   Applying vre.0006_annotation_context_collection_context...

lukavdplas · 2024-07-18T13:11:04Z

Ah, that should be fixed now! Thanks for trying all this out btw 😅

jgonggrijp

I was a bit confused by this comment:

Current status: this PR is ready for review; I've made it a draft because it should only be merged in combination with a frontend update.

The pull request compares to feature/rdf-modeling-utils, not develop. Surely, you are planning to build the feature/rdf-modeling-utils branch incrementally, and the current branch does not need to be complete with frontend adaptations before you merge it?

Side note: even on develop, I wouldn't necessarily be against you merging this without corresponding frontend changes. We just did something like that in #174. As far as I'm concerned, only on master/main does everything need to be completely consistent and bug-free. The purpose of release branches is to ensure this.

Anyway, back to the actual changes. I like the cleanness of the code, but I would prefer more docstrings and comments to explain why things are the way they are. I also noticed a bug.

I created a new collection with the slug extra through the browsable API. The URI of this collection is http://localhost:8000/rdf/collections/extra. Create, retrieve, update and delete all work. However:

In the triplestore, the slug of the associated project is duplicated. Its URI is http://localhost:8000/rdf/project/Dev/Dev everywhere (so at least the URI is consistent). Note that project is singular as well. By contrast, my project that was migrated from a group has a plural projects and contains the slug only once: http://localhost:8000/rdf/projects/dev_g.
I updated my newly created collection to associate it with the dev_g project instead. However, the corresponding object node in the triplestore now held the URI http://localhost:8000/rdf/project/dev_g/dev_g.

Some more details in the comments. Keep up the sweet refactoring!

jgonggrijp · 2024-07-18T15:01:22Z

backend/collect/api_test.py

+
+    # try to create the same collection again
+    fail_response = post_collection(client, project.name)
+    assert is_client_error(fail_response.status_code)


Once database synchronization is added, I would also want to test here that there is no duplicate entry in the database.

My impression was that if you add triples that already exist in blazegraph (in the same graph), it will have no effect. That is, the same triple won't be stored twice. Since the request to make a collection is idempotent, and this test makes the same request twice, I don't think you could check if the second time was executed?

What does make sense to me is to create a different collection in the second request (e.g. with a different description), to check that the create request doesn't store the new data. (This scenario is also why the api should reject the request.)

Correct me if I'm wrong about blazegraph here, though!

You are right about Blazegraph, but I meant the representation of the collection in the PostgreSQL database. Hence "once database synchronization is added".

I see! Though it was my understanding that we plan to only save collections in blazegraph in the future.

It depends on how we administrate which people have (write) access to which collections.

backend/collect/api_test.py

backend/collect/rdf_models.py

backend/collect/utils.py

jgonggrijp · 2024-07-18T15:43:15Z

backend/collect/utils_test.py

+        ('Test!!', 'test'),
+        ('Test test test', 'test_test_test'),


These test cases are quite conservative. Add some Bobby Tables and some chimpanzee speak as well.

backend/projects/migrations/0004_fill_project_uri.py

jgonggrijp · 2024-07-19T00:28:05Z

backend/projects/rdf_models.py

+from triplestore.utils import Quads
+
+
+class CollectionsField(RDFQuadField):


Please add a docstring to explain the purpose of this class.

jgonggrijp · 2024-07-19T00:30:30Z

backend/projects/rdf_models.py

+    def get(self, instance: RDFModel):
+        return [
+            s
+            for (s, p, o, g) in self._stored_quads(instance)
+        ]
+
+    def _stored_quads(self, instance: RDFModel) -> Quads:
+        store = settings.RDFLIB_STORE
+        results = store.query(f'''
+        SELECT ?col WHERE {{                              
+                ?col a edpopcol:Collection ;
+                as:context <{instance.uri}> .
+        }}
+        ''', initNs={'as': AS, 'edpopcol': EDPOPCOL})
+
+        return [
+            (result, AS.context, instance.uri, Graph(store, result))
+            for (result, ) in results
+        ]


I am not sure, but it looks to me as if the query in _stored_quads gets the information that get is looking for. However, _stored_quads decorates this information with additional data, which get then peels off again. Wouldn't it be more straightforward and efficient to put the query in get and then call get from _stored_quads?

I get the confusion. The reason is that _stored_quads is also used to update the triplestore. When you call save() on the model, the field will compare _stored_quads() with _quads_to_store() to update the graph. _stored_quads() is also used when you call delete().

So _stored_quads() returns the representation that is saved in the triplestore, which is needed to make updates. get() translates the graph representation to one that is more convenient in the model.

This is implemented on the parent class (RDFQuadField), but it looks convoluted in isolation.

Sure, but you are in full control here because you are overriding both methods. What stops you from writing this?

Suggested change

def get(self, instance: RDFModel):

return [

s

for (s, p, o, g) in self._stored_quads(instance)

]

def _stored_quads(self, instance: RDFModel) -> Quads:

store = settings.RDFLIB_STORE

results = store.query(f'''

SELECT ?col WHERE {{

?col a edpopcol:Collection ;

as:context <{instance.uri}> .

}}

''', initNs={'as': AS, 'edpopcol': EDPOPCOL})

return [

(result, AS.context, instance.uri, Graph(store, result))

for (result, ) in results

]

def get(self, instance: RDFModel):

store = settings.RDFLIB_STORE

results = store.query(f'''

SELECT ?col WHERE {{

?col a edpopcol:Collection ;

as:context <{instance.uri}> .

}}

''', initNs={'as': AS, 'edpopcol': EDPOPCOL})

return [

s

for (s, ) in results

]

def _stored_quads(self, instance: RDFModel) -> Quads:

store = settings.RDFLIB_STORE

return [

(result, AS.context, instance.uri, Graph(store, result))

for result in self.get(instance)

]

Ah right. Nothing, but to explain why it wasn't written like that: in general, _stored_triples/_stored_quads may contain information that is ignored by the model (and thus not retrievable from the output of get()).

For instance, you could write this field to be agnostic about the graph context. In that case, _stored_quads should include the graph the as:context triple was stored in, because that information is necessary for save/delete operations. But get() can filter it out.

So in general, it makes sense to write _stored_triples/_stored_quads first, and then convert that to an internal value in get(). In this specific case, though, you're right that the dependency can flow either way.

backend/projects/signals.py

jgonggrijp · 2024-10-02T16:17:08Z

@lukavdplas Is this ready to merge? Should I have another look at it first?

lukavdplas · 2024-10-03T11:54:26Z

I think it's ready to merge. Bear in mind that this API isn't compatible with the frontend, so it will make the develop branch not-release-ready for a while. But I think we already discussed that's not an issue.

As of #181, /api/collections only retrieves the user's own collections by default. The .mine method is retained for easier porting and as an easy way to resume the distinction if it is added again in the future.

The fact that this was not correctly set yet is a historical artifact that apparently has gone unnoticed for a long time. It is unrelated to #181.

As of #181, /api/collections only retrieves the user's own collections by default. The .mine method is retained for easier porting and as an easy way to resume the distinction if it is added again in the future.

The fact that this was not correctly set yet is a historical artifact that apparently has gone unnoticed for a long time. It is unrelated to #181.

This appears to have been an omission from #181.

lukavdplas added 30 commits July 16, 2024 15:08

implement defined namespaces

268ef37

start collect app

f9239fd

draft collection class

9b369aa

delete obsolete file scaffold collection model

add name/summary fields

7d3c641

add projects and records fields

85389d8

add new collectionviewset to router

72c4c97

get project in collection viewset

ec1b944

add name_to_slug function

83f0dc3

create collection

010771a

add collections field to project

5f85a1f

set project in rdf model

f000d65

save project uri in database

24643a6

correct saved uri for collections

4490214

set lookup value regex

3a9e7c2

implement retrieve endpoint

31e5ab8

outfactor creation function

5596d1b

add tests for detail api

a0f0f34

check project permissions

45356fe

implement deleting collections

d5d7b3f

implement update method on collection view

ddc6f15

block creation if collection exists

9cb41be

add collection serializer class

12fdc71

create ProjectField serializer field

b9e3241

use conventional method names get_queryset, get_object

84c2a06

implement permission class for detail views

3f8ff46

add test for project permission

2026e02

add project validation

4201156

use ModelViewset

239984b

update collection models to prev commit

825b84f

store each collection in its own graph

f1d5a39

lukavdplas added 2 commits July 18, 2024 14:05

Merge branch 'feature/rdf-modelling-utils' into feature/collection-api

40b08fd

set project uri during migration

32b217c

Merge branch 'feature/rdf-modelling-utils' into feature/collection-api

0cf37a3

jgonggrijp requested changes Jul 19, 2024

View reviewed changes

lukavdplas added 5 commits July 19, 2024 13:03

outfactor url function

8394e09

update name_to_slug

f6cee0c

use single source of truth for project uris

e4dcd62

add docstrings

a98d4df

expand test for creating duplicate collection

e3d5a01

lukavdplas mentioned this pull request Jul 22, 2024

migrate ResearchGroup to Project #167

Merged

fix project uri migration

0a76240

Base automatically changed from feature/rdf-modelling-utils to develop August 5, 2024 09:39

lukavdplas marked this pull request as ready for review October 3, 2024 11:54

jgonggrijp merged commit 2868c23 into develop Oct 3, 2024

This was referenced Oct 3, 2024

Rewire frontend with backend collections endpoint #224

Closed

Manage collections from the frontend #225

Closed

Migrate old-style collections to triplestore #226

Open

jgonggrijp added a commit that referenced this pull request Oct 3, 2024

Set the idAttribute of the frontend Project model (#224)

cf5218e

The fact that this was not correctly set yet is a historical artifact that apparently has gone unnoticed for a long time. It is unrelated to #181.

jgonggrijp added a commit that referenced this pull request Oct 8, 2024

Set the idAttribute of the frontend Project model (#224)

163d14a

The fact that this was not correctly set yet is a historical artifact that apparently has gone unnoticed for a long time. It is unrelated to #181.

jgonggrijp mentioned this pull request Oct 9, 2024

Basic frontend collection management #230

Merged

jgonggrijp deleted the feature/collection-api branch December 12, 2024 16:28

jgonggrijp added a commit that referenced this pull request Jan 13, 2025

Add the collect app to the INSTALLED_APPS (#141 #142)

9cb5ddf

This appears to have been an omission from #181.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/collection api #181

Feature/collection api #181

lukavdplas commented Jul 16, 2024

lukavdplas commented Jul 17, 2024

lukavdplas commented Jul 18, 2024

jgonggrijp commented Jul 18, 2024

lukavdplas commented Jul 18, 2024

jgonggrijp left a comment

jgonggrijp Jul 18, 2024

lukavdplas Jul 19, 2024

jgonggrijp Jul 19, 2024

lukavdplas Oct 3, 2024

jgonggrijp Oct 3, 2024

jgonggrijp Jul 18, 2024

jgonggrijp Jul 19, 2024

jgonggrijp Jul 19, 2024

lukavdplas Jul 19, 2024

jgonggrijp Jul 19, 2024

lukavdplas Jul 22, 2024

jgonggrijp commented Oct 2, 2024

lukavdplas commented Oct 3, 2024

		from triplestore.utils import Quads


		class CollectionsField(RDFQuadField):

Feature/collection api #181

Feature/collection api #181

Conversation

lukavdplas commented Jul 16, 2024

lukavdplas commented Jul 17, 2024

lukavdplas commented Jul 18, 2024

jgonggrijp commented Jul 18, 2024

lukavdplas commented Jul 18, 2024

jgonggrijp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgonggrijp commented Oct 2, 2024

lukavdplas commented Oct 3, 2024