Skip to content

Commit

Permalink
Merge pull request #70 from nasaharvest/clean-generate
Browse files Browse the repository at this point in the history
Version 0.1.0
  • Loading branch information
ivanzvonkov authored Jul 19, 2022
2 parents 39f3a33 + 709bdad commit 37be210
Show file tree
Hide file tree
Showing 72 changed files with 867 additions and 4,367 deletions.
4 changes: 1 addition & 3 deletions .github/workflows/buildings-example-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,7 @@ jobs:
# https://dvc.org/doc/user-guide/setup-google-drive-remote#authorization
GDRIVE_CREDENTIALS_DATA: ${{ secrets.GDRIVE_CREDENTIALS_DATA }}
run: |
dvc pull $(openmapflow datapath PROCESSED_LABELS) -f
dvc pull $(openmapflow datapath COMPRESSED_FEATURES) -f
tar -xvzf $(openmapflow datapath COMPRESSED_FEATURES) -C data/
dvc pull $(openmapflow datapath DATASETS) -f
dvc pull $(openmapflow datapath MODELS) -f
- name: Integration test - Project
Expand Down
4 changes: 1 addition & 3 deletions .github/workflows/crop-mask-example-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,7 @@ jobs:
# https://dvc.org/doc/user-guide/setup-google-drive-remote#authorization
GDRIVE_CREDENTIALS_DATA: ${{ secrets.GDRIVE_CREDENTIALS_DATA }}
run: |
dvc pull $(openmapflow datapath PROCESSED_LABELS) -f
dvc pull $(openmapflow datapath COMPRESSED_FEATURES) -f
tar -xvzf $(openmapflow datapath COMPRESSED_FEATURES) -C data/
dvc pull $(openmapflow datapath DATASETS) -f
dvc pull $(openmapflow datapath MODELS) -f
- name: Integration test - Project
Expand Down
4 changes: 1 addition & 3 deletions .github/workflows/maize-example-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,7 @@ jobs:
# https://dvc.org/doc/user-guide/setup-google-drive-remote#authorization
GDRIVE_CREDENTIALS_DATA: ${{ secrets.GDRIVE_CREDENTIALS_DATA }}
run: |
dvc pull $(openmapflow datapath PROCESSED_LABELS) -f
dvc pull $(openmapflow datapath COMPRESSED_FEATURES) -f
tar -xvzf $(openmapflow datapath COMPRESSED_FEATURES) -C data/
dvc pull $(openmapflow datapath DATASETS) -f
dvc pull $(openmapflow datapath MODELS) -f
- name: Integration test - Project
Expand Down
26 changes: 12 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,19 +88,22 @@ After all configuration is set, the following project structure will be generate
└─── data
│ raw_labels/ # User added labels
│ processed_labels/ # Labels standardized to common format
│ features/ # Labels combined with satellite data
│ compressed_features.tar.gz # Allows faster features downloads
│ models/ # Models trained using features
│ datasets/ # ML ready datasets (labels + earth observation data)
│ models/ # Models trained using datasets
| raw_labels.dvc # Reference to a version of raw_labels/
| processed_labels.dvc # Reference to a version of processed_labels/
│ compressed_features.tar.gz.dvc # Reference to a version of features/
| datasets.dvc # Reference to a version of datasets/
│ models.dvc # Reference to a version of models/
```

This project contains all the code necessary for: Adding data ➞ Training a model ➞ Creating a map.

**Important:** When code is pushed to the repository a Github action will be run to verify project configuration, data integrity, and script functionality. This action will pull data using dvc and thereby needs access to remote storage (your Google Drive). To allow the Github action to access the data add a new repository secret ([instructions](https://docs.github.com/en/actions/security-guides/encrypted-secrets#creating-encrypted-secrets-for-a-repository)).
- In step 5 of the instructions, name the secret: `GDRIVE_CREDENTIALS_DATA`
- In step 6, enter the value in .dvc/tmp/gdrive-user-creditnals.json (in your repository)

After this the Github action should successfully run.


## Adding data [![cb]](https://colab.research.google.com/github/nasaharvest/openmapflow/blob/main/openmapflow/notebooks/new_data.ipynb)

Expand Down Expand Up @@ -134,25 +137,20 @@ datasets = [
...
]
```
Run feature creation:
Run dataset creation:
```bash
earthengine authenticate # For getting new earth observation data
gcloud auth login # For getting cached earth observation data

openmapflow create-features # Initiatiates or checks progress of features creation
openmapflow create-dataset # Initiatiates or checks progress of dataset creation
openmapflow datasets # Shows the status of datasets

dvc commit && dvc push # Push new data to data version control

git add .
git commit -m'Created new features'
git commit -m'Created new dataset'
git push
```
**Important:** When new data is pushed to the repository a Github action will be run to verify data integrity. This action will pull data using dvc and thereby needs access to remote storage (your Google Drive). To allow the Github action to access the data add a new repository secret ([instructions](https://docs.github.com/en/actions/security-guides/encrypted-secrets#creating-encrypted-secrets-for-a-repository)).
- In step 5 of the instructions, name the secret: `GDRIVE_CREDENTIALS_DATA`
- In step 6, enter the value in .dvc/tmp/gdrive-user-creditnals.json (in your repository)

After this the Github action should successfully run if the data is valid.


## Training a model [![cb]](https://colab.research.google.com/github/nasaharvest/openmapflow/blob/main/openmapflow/notebooks/train.ipynb)
Expand Down
4 changes: 1 addition & 3 deletions buildings-example/data/.gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
/datasets
/raw_labels
/processed_labels
/compressed_features.tar.gz
/models
/features
4 changes: 0 additions & 4 deletions buildings-example/data/compressed_features.tar.gz.dvc

This file was deleted.

5 changes: 5 additions & 0 deletions buildings-example/data/datasets.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
outs:
- md5: db853058c80b597bb44bfc0ecf37866f.dir
size: 121467360
nfiles: 2
path: datasets
4 changes: 0 additions & 4 deletions buildings-example/data/duplicates.txt

This file was deleted.

133 changes: 0 additions & 133 deletions buildings-example/data/missing.txt

This file was deleted.

5 changes: 0 additions & 5 deletions buildings-example/data/processed_labels.dvc

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,20 @@ DATASET REPORT (autogenerated, do not edit directly)

Uganda_buildings_2020 (Timesteps: 24)
----------------------------------------------------------------------------
eo_data_complete 8117
eo_data_duplicate 4
✔ training amount: 6445, positive class: 100.0%
✔ testing amount: 848, positive class: 100.0%
✔ validation amount: 824, positive class: 100.0%
✔ testing amount: 848, positive class: 100.0%



geowiki_landcover_2017 (Timesteps: 24)
----------------------------------------------------------------------------
eo_data_complete 13993
eo_data_export_failed 242
eo_data_missing_values 132
✔ training amount: 12582, positive class: 0.0%
✔ validation amount: 743, positive class: 0.0%
✔ testing amount: 668, positive class: 0.0%


All data:
✔ Found no empty features
✔ No duplicates found
Loading

0 comments on commit 37be210

Please sign in to comment.