Skip to content

2013 07 23 species Checking

Joona Lehtomäki edited this page Aug 8, 2013 · 19 revisions

Species data

See issue12 for background.

Computer: cbig-arnold

Unzipping

(For filenaming inconsistencies, see issue14.)

  1. (Manually) downloaded following species layers from P:\h573\bcig\Data\Global_Analyses\Species\:
  • Amphibians:
    IUCN_AmphibiansCBIGClassification_r15o
    IUCN_AmphibiansCBIGClassification_r16
    FIXME: should the latter be IUCN_AmphibiansCBIGClassification_r16o?
  • Birds:
    IUCN_BirdsCBIGClassification_r15o
    IUCN_BirdsCBIGClassification_r16o
  • Corals:
    IUCN_CoralsCBIGClassification_r15o
    IUCN_CoralsCBIGClassification_r16o
  • Mammals:
    IUCN_MammalsCBIGClassification_r15
    IUCN_MammalsCBIGClassification_r16
    FIXME: should these be IUCN_MammalsCBIGClassification_r15o and IUCN_MammalsCBIGClassification_r16o?
  • Mangroves:
    IUCN_MangrovesCBIGClassification_r15o
    IUCN_MangrovesCBIGClassification_r16o
  • MarineFish:
    IUCN_MarinefishCBIGClassification_r15o
    IUCN_MarinefishCBIGClassification_r16o
  • Reptiles:
    IUCN_ReptilesCBIGClassification_r15o
    IUCN_ReptilesCBIGClassification_r16o
  • Seagrasses:
    IUCN_SeagrassCBIGClassification_r15o
    IUCN_SeagrassCBIGClassification_r16o
  • Seasnakes:
    IUCN_SeasnakesOrigClassification_r15o
    IUCN_SeasnakesCBIGClassification_r16o
    FIXME: should the former be IUCN_SeasnakesCBIGClassification_r15o?
  1. Based on the unzipping benchmarks, using Python's zipfile module seems to be the fastest decompression method (at least on cbig-arnold). This might be because the zipfiles have been created (?) using zipfile module.
  2. New tool punzip.py created, part of the new gedatools repo.
  3. Extraction of all species zips (R15 and R16, n=18) takes ~24 mins on cbig-arnold. Size after extraction is 114.2 GB.

Existing file comparison between R15 and 16

  1. Tool fcompare.py modified to exactly match species names within filenames of different resolutions.
  2. Running fcompare for different species groups produces the following results:

Amphibians

2013-07-23

414 species missing in R16, listed in unmatched_amphibians.csv. See issue15.

./fcompare.py --extension=.tif -o /srv/gedawiki/unmatched_amphibians.csv /media/DATAPART1/GlobalAnalyses/Species/Amphibians/IUCN_AmphibiansCBIGClassification_r15o /media/DATAPART1/GlobalAnalyses/Species/Amphibians/IUCN_Amphibians_r16/  
Number of <.tif> files in folder F1: 6265  
Matches using feature name:  
 match: 5851  
 no match: 414

2013-08-08

All species (+ 3 extra species!) present in R16.

./fcompare.py --extension=.tif -o /srv/gedawiki/2013-08-08-unmatched_amphibians.csv /mnt/biosci-cbig/Data/GlobalData/7_Species/IUCN_Red_list/Amphibians/FINALS/IUCN_AmphibiansCBIGClassification_r15o /mnt/biosci-cbig/Data/GlobalData/7_Species/IUCN_Red_list/Amphibians/FINALS/IUCN_AmphibiansCBIGClassification_r16o/
WARNING: number of files in folders differs (6265 and 6268)
**************************************************
Number of <.tif> files in folder F1: 6265
Matches using feature name:
 match: 6265
 no match: 0

Birds

2013-07-23

All species present in R16.

./fcompare.py --extension=.tif -v -o /home/jlehtoma/Dev/src/gedawiki/missing_R16/unmatched_birds.csv /media/DATAPART1/GlobalAnalyses/Species/Birds/IUCN_BirdsCBIGClassification_r15o /media/DATAPART1/GlobalAnalyses/Species/Birds/IUCN_BirdsCBIGClassification_r16o
 
Number of <.tif> files in folder F1: 10246
Matches using feature name:
 match: 10246
 no match: 0

Corals

213 species missing in R16, listed in unmatched_corals.csv. See issue16.

./fcompare.py --extension=.tif -v -o /home/jlehtoma/Dev/src/gedawiki/missing_R16/unmatched_corals.csv /media/DATAPART1/GlobalAnalyses/Species/Corals/IUCN_CoralsCBIGClassification_r15o /media/DATAPART1/GlobalAnalyses/Species/Corals/IUCN_CoralsCBIGClassification_r16  
Number of <.tif> files in folder F1: 843
Matches using feature name:
 match: 628
 no match: 213

Mammals

All species present in R16.

./fcompare.py --extension=.tif -o /home/jlehtoma/Dev/src/gedawiki/missing_R16/unmatched_mammals.csv /media/DATAPART1/GlobalAnalyses/Species/Mammals/IUCN_MammalsCBIGClassification_r15o/ /media/DATAPART1/GlobalAnalyses/Species/Mammals/IUCN_MammalsCBIGClassification_r16o/
Number of <.tif> files in folder F1: 5412
Matches using feature name:
 match: 5412
 no match: 0

Mangroves

All species present in R16.

./fcompare.py --extension=.tif -o /home/jlehtoma/Dev/src/gedawiki/missing_R16/unmatched_mangroves.csv /media/DATAPART1/GlobalAnalyses/Species/Mangroves/IUCN_MangrovesCBIGClassification_r15o/ /media/DATAPART1/GlobalAnalyses/Species/Mangroves/IUCN_MangrovesCBIGClassification_r16/
Number of <.tif> files in folder F1: 67
Matches using feature name:
 match: 67
 no match: 0

MarineFish

All species present in R16.

./fcompare.py --extension=.tif -o /home/jlehtoma/Dev/src/gedawiki/missing_R16/unmatched_marinefish.csv /media/DATAPART1/GlobalAnalyses/Species/MarineFish/IUCN_MarinefishCBIGClassification_r15o/ /media/DATAPART1/GlobalAnalyses/Species/MarineFish/IUCN_MarinefishCBIGClassification_r16/
Number of <.tif> files in folder F1: 1133
Matches using feature name:
 match: 1133
 no match: 0

Reptiles

2013-07-23

1 species missing in R16, listed in unmatched_reptiles.csv. See issue17.

./fcompare.py --extension=.tif -o /home/jlehtoma/Dev/src/gedawiki/missing_R16/unmatched_reptiles.csv /media/DATAPART1/GlobalAnalyses/Species/Reptiles/IUCN_ReptilesCBIGClassification_r15o/ /media/DATAPART1/GlobalAnalyses/Species/Reptiles/IUCN_ReptilesCBIGClassification_r16oNEW/
Number of <.tif> files in folder F1: 3086
Matches using feature name:
 match: 3084
 no match: 0

2 files are unmatched because of non-pattern filenames: IUCN_Aipysurus tenuis_r15o.tif and IUCN_PhelsumaVNigra_r15o.tif. The former is actually missing in R16, the latter is in R16 but the filename is just funny.

2013-08-08

All species present in R16.

Seagrasses

All species present in R16.

./fcompare.py --extension=.tif -o /home/jlehtoma/Dev/src/gedawiki/missing_R16/unmatched_seagrasses.csv /media/DATAPART1/GlobalAnalyses/Species/Seagrasses/IUCN_SeagrassCBIGClassification_r15o/ /media/DATAPART1/GlobalAnalyses/Species/Seagrasses/IUCN_SeagrassCBIGClassification_r16/
Number of <.tif> files in folder F1: 72
Matches using feature name:
 match: 72
 no match: 0

SeaSnakes

Filename patterns different in R15 (IUCN_Acalyptophis peronii_r15o.tif) and R16 (IUCN_AcalyptophisPeronii_r16o.tif) and fcompare does not have heuristics to deal with this yet. R16 also seems to have less species (106) than R15 (132). See issue18.

./fcompare.py --extension=.tif -o /home/jlehtoma/Dev/src/gedawiki/missing_R16/unmatched_seasnakes.csv /media/DATAPART1/GlobalAnalyses/Species/Seasnakes/IUCN_SeasnakesOrigClassification_r15o/ /media/DATAPART1/GlobalAnalyses/Species/Seasnakes/IUCN_SeasnakesCBIGClassification_r16/
Number of <.tif> files in folder F1: 106
Matches using feature name:
 match: 0
 no match: 106