You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As mentioned in our meeting, I discovered that some URLs to images get deleted from the GBIF database.
On Baskerville, in folder /bask/projects/v/vjgo8416-amber/data/gbif_download_standalone/dwca_files/ you will find two dwca files, one for Sesiidae downloaded in August 2023 and an updated one downloaded in October 2023.
Those files are also uploaded here on our sharedrive, for those without Baskerville access:
Attached here is also a CSV file for one of the UK species for which I (by chance) noticed that images no longer get downloaded if I point to the October dwca file instead of the August one. The species is "Pyropteron chrysidiformis". I presume similar issue might have occurred with other species too.
@KatrionaGoldmann the result can be easily reproduced by using the 03_download_images/fetch_images_whole_dwca_wrapper.ipynb notebook, and changing the dwca_dir argument to point to the folder containing extracted files from either the October or the August Sesiidae dwca file. The results will show that when pointing to the August file we get some images downloaded, but not when pointing to the October file.
This cannot be an issue with the URLs being broken, because the August dwca files still have the URLs which work. So the URL entries themselves must have been deleted from the October file, or perhaps the whole occurrence records have been deleted, including the URLs.
As mentioned in our meeting, I discovered that some URLs to images get deleted from the GBIF database.
On Baskerville, in folder
/bask/projects/v/vjgo8416-amber/data/gbif_download_standalone/dwca_files/
you will find two dwca files, one for Sesiidae downloaded in August 2023 and an updated one downloaded in October 2023.Those files are also uploaded here on our sharedrive, for those without Baskerville access:
Attached here is also a CSV file for one of the UK species for which I (by chance) noticed that images no longer get downloaded if I point to the October dwca file instead of the August one. The species is "Pyropteron chrysidiformis". I presume similar issue might have occurred with other species too.
@KatrionaGoldmann the result can be easily reproduced by using the
03_download_images/fetch_images_whole_dwca_wrapper.ipynb
notebook, and changing thedwca_dir
argument to point to the folder containing extracted files from either the October or the August Sesiidae dwca file. The results will show that when pointing to the August file we get some images downloaded, but not when pointing to the October file.This cannot be an issue with the URLs being broken, because the August dwca files still have the URLs which work. So the URL entries themselves must have been deleted from the October file, or perhaps the whole occurrence records have been deleted, including the URLs.
uksi-moths-keys-nodup-small-Pyropteron-chrysidiformis.csv
The text was updated successfully, but these errors were encountered: