Init/runoff #156

tommylees112 · 2020-03-04T11:50:29Z

NOTE:

EarlyStopping is currently not working because I haven't created a train/validation test set

Create xy samples dynamically from Data loaded into memory

sorry this is a huge PR where we have basically re-written the Engineer/DataLoaders/Models to work with data loaded into memory. Better for hard disk constrained modelling problems where the size of the seq_length is larger (e.g. 365 daily timesteps as input to the LSTM models).

Use the Pipeline for working with runoff data.

data is 2D instead of 3D (station_id, time)
data is on smaller timesteps than monthly (daily)
create dynamic engineer
create dynamic dataloader
update the EALSTM / Neural Networks to work with DynamicDataLoaders
new arguments to models = 'seq_length', 'target_var', 'forecast_horizon'

We have created an experiment file for running the OneTimestepForecast Runoff modelling:
scripts/experiments/18_runoff_init.py

Analysis updates

We have added some updates to the analysis code:

overview: update all rmse/r2 functions to calculate spatial scores (score for each spatial unit) and temporal scores (time series of each station)
add more catching of the inversion problem (turns out it occurs when the order of lat, lon is reversed -> lon, lat

Engineer updates

Create new engineer OneTimestepForecast - src/engineer/one_timestep_forecast.py
Created a new DynamicEngineer for use with the DynamicDataLoader
NOTE do we want this or do we ideally want to generalise the one_month_forecast?
Major difference is collapsing things not by lat, lon but by dimension_name = [c for c in static_ds.coords][0]

DataLoader Updates

self.get_reducing_dims to get the spatial dimensions (either latlon or area or station_id or whatever is not time!)
aggregations collapse over these reducing dimensions
global_mean = x.mean(dim=reducing_dims)
build_loc_to_idx_mapping building a dictionary to ensure we can track what id relates to what spatial unit
Various examples of if len(static_np.shape) == 3: having to account for 2D spatial information (time, lat, lon) or 1D spatial information (time, station_id)

TODO:
# TODO: why so many static nones?

This is because the standard deviation of some of the values, stored in the normalizing_dict become 0, so dividing by 0 we get np.nan

Model updates

seq_length // include_timestep_aggs
use a dataloader for the load in timesteps for x, y in tqdm.tqdm(train_dataloader):
include_monthly_aggs -> include_timestep_aggs = spatial aggregation (map of mean values for that pixel)

create_raw_input_data

…o init/runoff

tommylees112 added the wip Work in progress - not ready for merging label Mar 4, 2020

tommylees112 added 29 commits November 16, 2020 11:20

update create raw data

f8a2a0e

remove teherror in run_regression code

3b0d4e2

update RAw regression code

12069d9

comment out RAW cell state regression

8e59748

add assert False TODO in

f5c377b

create_raw_input_data

add cell state regression code

e87b896

update cs config raw correlations

b070f70

update args to raw correlations

56e9f95

add upsample xarray function

08a89b1

black cpde

5e1416c

test upsampling

97c5b60

sys path

e3727b1

get data path

c37b0a8

open dataset

f178575

manually update datA_dir

286af33

update path

037921a

data_dir

db4165b

numpy

06ae1c9

update method to nearest

b109783

grid factor=1

0b7c158

grid factor 2

f0d309f

update grid factor 2

38182c9

save HR version

7f5a00d

ti netcdf

96b2e52

catch errors loading lstm/ealstm

6854df1

update read sm data

22bb10b

fix arg call to func

fe7c3ea

don't upsample too

52d2831

update where data is saved

3c58dd3

tommylees112 added 30 commits February 22, 2021 15:09

update hydrographs

f53cdfa

update hgraphs

0d67a4c

updata hgraph black mypy

5c8a2a3

fix function

db98155

fix 2ndary bar

a6b7f02

fix

eb8261d

fig

b1c0e48

environments

36548b7

update lookup regions

df03a95

update nbs

af27511

update read all members

ea546b5

Merge branch 'init/runoff' of https://github.com/esowc/ml_drought int…

aebaa27

…o init/runoff

add coord to variable function

53a2450

fix function

479e49e

fix function

3d57e96

update plot cdf

25f5c4e

update notebooks

3cc712b

update cell state

fa2f01b

update hydrograph plotting to be more general

b054360

update plots

4897ab5

update

1a974af

update plot function

774b3d1

generalsie hydrograph plotting

334f231

generalise

4ccc9d7

update noteboos

228c113

update cell state regression to work with dict

fc3a885

no dict just tensor

ff63b9e

update scripts

7e81eb5

add mutual info

96eb903

UPDATE PLOT HGS

f80623c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Init/runoff #156

Init/runoff #156

tommylees112 commented Mar 4, 2020 •

edited

Loading

Init/runoff #156

Are you sure you want to change the base?

Init/runoff #156

Conversation

tommylees112 commented Mar 4, 2020 • edited Loading

NOTE:

Create xy samples dynamically from Data loaded into memory

Analysis updates

Engineer updates

DataLoader Updates

Model updates

tommylees112 commented Mar 4, 2020 •

edited

Loading