Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object_ID is not a stable field, choose new field for occurrenceID? #155

Closed
PietrH opened this issue Jan 23, 2024 · 4 comments
Closed

Object_ID is not a stable field, choose new field for occurrenceID? #155

PietrH opened this issue Jan 23, 2024 · 4 comments
Labels
automated workflow mapping question Further information is requested

Comments

@PietrH
Copy link
Member

PietrH commented Jan 23, 2024

De door ArcGIS gegenereerde ObjectID is nooit een stabiele primary key.
~Sander, GIS consultant RATO

Discussed with Sander and Anke, propose using Dossier_ID concatenated with Laatst_Bewerkt_Datum as occurrenceID

Discussed with @damianooldoni and @LienReyserhove:

  • What we really want is for the dataset to just contain a stabile identifier, preferably a GUID or UUID. This is the preferred option.

All the other points are what we'd need to do if this is impossible:

  • If we introduce a new occurrenceID, we need to archive the data as it is on GBIF, and start a new dataset for the new data
  • The cutoff between the datasets is the dataset as it is, before the 7000 records got removed and recovered with a different ID
  • If split, and older records (before the cutoff) change, these changes will not be reflected on GBIF
  • If split, there will be one repo per dataset, this repo will be transitioned into the new dataset that will remain updated, a second repository will be created for the old dataset as it stands from a fork as this one. The current GBIF dataset (on the IPT) will need to be updated so it refers to the url of this new repo (with the old data in it)
  • All other requested changes, such as Terms that indicate that a record is an intervention (sampling action, catch) #134 samplingEffort requires looking at the previous record, not the current one #135 remove/avoid administrative records #136 Remove time information from all date fields #133 will only be applied to the new dataset, this is possibly a problem for samplingEffort requires looking at the previous record, not the current one #135
  • Lien will schedule a meeting with RATO to discuss the implications of this change, we'd really prefer a stabile identifier to be provided instead
  • If it turns out that Dossier_ID and/or Laatst_Bewerkt_Datum are not stabile (I've been assured that if they change a record, they make a new one instead) all this effort will be for naught, they have changed Dossier_ID in the past
  • The occurrenceId generated from these two fields should really be a hash of the two fields instead
  • Dates are notorious for causing trouble, this will require extra care implementing
@PietrH
Copy link
Member Author

PietrH commented Jan 23, 2024

This issue is not actionable until the meeting with RATO, a provided stabile identifier is vastly preferred

@PietrH PietrH added question Further information is requested automated workflow mapping labels Jan 23, 2024
@PietrH
Copy link
Member Author

PietrH commented Feb 2, 2024

I made a mistake in the first post, we need to split off the older records anyway to maintain the old occurrenceIDs. This means we'll need to archive the current dataset in any case.

@PietrH
Copy link
Member Author

PietrH commented Feb 2, 2024

Had a call with Emiel, he agrees a stabile identifier on the database side is the preferred option. That way we would not need to mint our own as a data processor, a formula that is bound to go wrong sooner or later.

He'll discuss it with the GIS consultant (Sander), and get back to us.

@PietrH
Copy link
Member Author

PietrH commented May 28, 2024

Now using a GUID

@PietrH PietrH closed this as completed May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automated workflow mapping question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant