-
Notifications
You must be signed in to change notification settings - Fork 0
2013 07 23 species Checking
See issue12 for background.
Computer: cbig-arnold
(For filenaming inconsistencies, see issue14.)
- (Manually) downloaded following species layers from
P:\h573\bcig\Data\Global_Analyses\Species\
:
- Amphibians:
IUCN_AmphibiansCBIGClassification_r15o
IUCN_AmphibiansCBIGClassification_r16
FIXME: should the latter beIUCN_AmphibiansCBIGClassification_r16o
? - Birds:
IUCN_BirdsCBIGClassification_r15o
IUCN_BirdsCBIGClassification_r16o
- Corals:
IUCN_CoralsCBIGClassification_r15o
IUCN_CoralsCBIGClassification_r16o
- Mammals:
IUCN_MammalsCBIGClassification_r15
IUCN_MammalsCBIGClassification_r16
FIXME: should these beIUCN_MammalsCBIGClassification_r15o
andIUCN_MammalsCBIGClassification_r16o
? - Mangroves:
IUCN_MangrovesCBIGClassification_r15o
IUCN_MangrovesCBIGClassification_r16o
- MarineFish:
IUCN_MarinefishCBIGClassification_r15o
IUCN_MarinefishCBIGClassification_r16o
- Reptiles:
IUCN_ReptilesCBIGClassification_r15o
IUCN_ReptilesCBIGClassification_r16o
- Seagrasses:
IUCN_SeagrassCBIGClassification_r15o
IUCN_SeagrassCBIGClassification_r16o
- Seasnakes:
IUCN_SeasnakesOrigClassification_r15o
IUCN_SeasnakesCBIGClassification_r16o
FIXME: should the former beIUCN_SeasnakesCBIGClassification_r15o
?
- Based on the unzipping benchmarks, using Python's zipfile module seems to be the fastest decompression method (at least on
cbig-arnold
). This might be because the zipfiles have been created (?) using zipfile module. - New tool
punzip.py
created, part of the new gedatools repo. - Extraction of all species zips (
R15
andR16
, n=18) takes ~24 mins oncbig-arnold
. Size after extraction is 114.2 GB.
- Tool fcompare.py modified to exactly match species names within filenames of different resolutions.
- Running fcompare for different species groups produces the following results:
2013-07-23
414 species missing in R16
, listed in unmatched_amphibians.csv. See issue15.
./fcompare.py --extension=.tif -o /srv/gedawiki/unmatched_amphibians.csv /media/DATAPART1/GlobalAnalyses/Species/Amphibians/IUCN_AmphibiansCBIGClassification_r15o /media/DATAPART1/GlobalAnalyses/Species/Amphibians/IUCN_Amphibians_r16/
Number of <.tif> files in folder F1: 6265
Matches using feature name:
match: 5851
no match: 414
2013-08-08
All species (+ 3 extra species!) present in R16
.
./fcompare.py --extension=.tif -o /srv/gedawiki/2013-08-08-unmatched_amphibians.csv /mnt/biosci-cbig/Data/GlobalData/7_Species/IUCN_Red_list/Amphibians/FINALS/IUCN_AmphibiansCBIGClassification_r15o /mnt/biosci-cbig/Data/GlobalData/7_Species/IUCN_Red_list/Amphibians/FINALS/IUCN_AmphibiansCBIGClassification_r16o/
WARNING: number of files in folders differs (6265 and 6268)
**************************************************
Number of <.tif> files in folder F1: 6265
Matches using feature name:
match: 6265
no match: 0
2013-07-23
All species present in R16
.
./fcompare.py --extension=.tif -v -o /home/jlehtoma/Dev/src/gedawiki/missing_R16/unmatched_birds.csv /media/DATAPART1/GlobalAnalyses/Species/Birds/IUCN_BirdsCBIGClassification_r15o /media/DATAPART1/GlobalAnalyses/Species/Birds/IUCN_BirdsCBIGClassification_r16o
Number of <.tif> files in folder F1: 10246
Matches using feature name:
match: 10246
no match: 0
213 species missing in R16
, listed in unmatched_corals.csv. See issue16.
./fcompare.py --extension=.tif -v -o /home/jlehtoma/Dev/src/gedawiki/missing_R16/unmatched_corals.csv /media/DATAPART1/GlobalAnalyses/Species/Corals/IUCN_CoralsCBIGClassification_r15o /media/DATAPART1/GlobalAnalyses/Species/Corals/IUCN_CoralsCBIGClassification_r16
Number of <.tif> files in folder F1: 843
Matches using feature name:
match: 628
no match: 213
All species present in R16
.
./fcompare.py --extension=.tif -o /home/jlehtoma/Dev/src/gedawiki/missing_R16/unmatched_mammals.csv /media/DATAPART1/GlobalAnalyses/Species/Mammals/IUCN_MammalsCBIGClassification_r15o/ /media/DATAPART1/GlobalAnalyses/Species/Mammals/IUCN_MammalsCBIGClassification_r16o/
Number of <.tif> files in folder F1: 5412
Matches using feature name:
match: 5412
no match: 0
All species present in R16
.
./fcompare.py --extension=.tif -o /home/jlehtoma/Dev/src/gedawiki/missing_R16/unmatched_mangroves.csv /media/DATAPART1/GlobalAnalyses/Species/Mangroves/IUCN_MangrovesCBIGClassification_r15o/ /media/DATAPART1/GlobalAnalyses/Species/Mangroves/IUCN_MangrovesCBIGClassification_r16/
Number of <.tif> files in folder F1: 67
Matches using feature name:
match: 67
no match: 0
All species present in R16
.
./fcompare.py --extension=.tif -o /home/jlehtoma/Dev/src/gedawiki/missing_R16/unmatched_marinefish.csv /media/DATAPART1/GlobalAnalyses/Species/MarineFish/IUCN_MarinefishCBIGClassification_r15o/ /media/DATAPART1/GlobalAnalyses/Species/MarineFish/IUCN_MarinefishCBIGClassification_r16/
Number of <.tif> files in folder F1: 1133
Matches using feature name:
match: 1133
no match: 0
2013-07-23
1 species missing in R16
, listed in unmatched_reptiles.csv. See issue17.
./fcompare.py --extension=.tif -o /home/jlehtoma/Dev/src/gedawiki/missing_R16/unmatched_reptiles.csv /media/DATAPART1/GlobalAnalyses/Species/Reptiles/IUCN_ReptilesCBIGClassification_r15o/ /media/DATAPART1/GlobalAnalyses/Species/Reptiles/IUCN_ReptilesCBIGClassification_r16oNEW/
Number of <.tif> files in folder F1: 3086
Matches using feature name:
match: 3084
no match: 0
2 files are unmatched because of non-pattern filenames: IUCN_Aipysurus tenuis_r15o.tif
and IUCN_PhelsumaVNigra_r15o.tif
. The former is actually missing in R16
, the latter is in R16
but the filename is just funny.
2013-08-08
All species present in R16
.
All species present in R16
.
./fcompare.py --extension=.tif -o /home/jlehtoma/Dev/src/gedawiki/missing_R16/unmatched_seagrasses.csv /media/DATAPART1/GlobalAnalyses/Species/Seagrasses/IUCN_SeagrassCBIGClassification_r15o/ /media/DATAPART1/GlobalAnalyses/Species/Seagrasses/IUCN_SeagrassCBIGClassification_r16/
Number of <.tif> files in folder F1: 72
Matches using feature name:
match: 72
no match: 0
Filename patterns different in R15
(IUCN_Acalyptophis peronii_r15o.tif
) and R16
(IUCN_AcalyptophisPeronii_r16o.tif
) and fcompare does not have heuristics to deal with this yet. R16
also seems to have less species (106) than R15
(132). See issue18.
./fcompare.py --extension=.tif -o /home/jlehtoma/Dev/src/gedawiki/missing_R16/unmatched_seasnakes.csv /media/DATAPART1/GlobalAnalyses/Species/Seasnakes/IUCN_SeasnakesOrigClassification_r15o/ /media/DATAPART1/GlobalAnalyses/Species/Seasnakes/IUCN_SeasnakesCBIGClassification_r16/
Number of <.tif> files in folder F1: 106
Matches using feature name:
match: 0
no match: 106