Object_ID is not a stable field, choose new field for occurrenceID? #155

PietrH · 2024-01-23T13:37:16Z

De door ArcGIS gegenereerde ObjectID is nooit een stabiele primary key.
~Sander, GIS consultant RATO

Discussed with Sander and Anke, propose using Dossier_ID concatenated with Laatst_Bewerkt_Datum as occurrenceID

Discussed with @damianooldoni and @LienReyserhove:

What we really want is for the dataset to just contain a stabile identifier, preferably a GUID or UUID. This is the preferred option.

All the other points are what we'd need to do if this is impossible:

If we introduce a new occurrenceID, we need to archive the data as it is on GBIF, and start a new dataset for the new data
The cutoff between the datasets is the dataset as it is, before the 7000 records got removed and recovered with a different ID
If split, and older records (before the cutoff) change, these changes will not be reflected on GBIF
If split, there will be one repo per dataset, this repo will be transitioned into the new dataset that will remain updated, a second repository will be created for the old dataset as it stands from a fork as this one. The current GBIF dataset (on the IPT) will need to be updated so it refers to the url of this new repo (with the old data in it)
All other requested changes, such as Terms that indicate that a record is an intervention (sampling action, catch) #134 samplingEffort requires looking at the previous record, not the current one #135 remove/avoid administrative records #136 Remove time information from all date fields #133 will only be applied to the new dataset, this is possibly a problem for samplingEffort requires looking at the previous record, not the current one #135
Lien will schedule a meeting with RATO to discuss the implications of this change, we'd really prefer a stabile identifier to be provided instead
If it turns out that Dossier_ID and/or Laatst_Bewerkt_Datum are not stabile (I've been assured that if they change a record, they make a new one instead) all this effort will be for naught, they have changed Dossier_ID in the past
The occurrenceId generated from these two fields should really be a hash of the two fields instead
Dates are notorious for causing trouble, this will require extra care implementing

The text was updated successfully, but these errors were encountered:

PietrH · 2024-01-23T13:38:00Z

This issue is not actionable until the meeting with RATO, a provided stabile identifier is vastly preferred

PietrH · 2024-02-02T09:59:47Z

I made a mistake in the first post, we need to split off the older records anyway to maintain the old occurrenceIDs. This means we'll need to archive the current dataset in any case.

PietrH · 2024-02-02T15:10:10Z

Had a call with Emiel, he agrees a stabile identifier on the database side is the preferred option. That way we would not need to mint our own as a data processor, a formula that is bound to go wrong sooner or later.

He'll discuss it with the GIS consultant (Sander), and get back to us.

PietrH · 2024-05-28T08:33:46Z

Now using a GUID

PietrH added question Further information is requested automated workflow mapping labels Jan 23, 2024

PietrH mentioned this issue May 22, 2024

New mapping service #179

Closed

PietrH closed this as completed May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Object_ID is not a stable field, choose new field for occurrenceID? #155

Object_ID is not a stable field, choose new field for occurrenceID? #155

PietrH commented Jan 23, 2024

PietrH commented Jan 23, 2024

PietrH commented Feb 2, 2024

PietrH commented Feb 2, 2024

PietrH commented May 28, 2024

Object_ID is not a stable field, choose new field for occurrenceID? #155

Object_ID is not a stable field, choose new field for occurrenceID? #155

Comments

PietrH commented Jan 23, 2024

PietrH commented Jan 23, 2024

PietrH commented Feb 2, 2024

PietrH commented Feb 2, 2024

PietrH commented May 28, 2024