Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the meaning of the "arbitrary indexing within dimension is not yet supported" warning? #94

Open
matteodefelice opened this issue Aug 22, 2019 · 6 comments

Comments

@matteodefelice
Copy link

I apologise for the question that looks naive, but I reading the code I couldn't understand the meaning of this warning. Given that I am going to use tidync heavily to analyse complex NetCDFs (15 dimensions, 27 grids) I would like to have everything under control.
I do something like:

filtered_results = results %>% 
      hyper_filter(n = n %in% selection)

And I get that warning. This is not happens if I use other operators, for example, ==.

@mdsumner
Copy link
Collaborator

Ah, sorry for the terrible wording! It means that we can't slice out arbitrary levels.

I.e. if we have an axis n:

n <- 1:10

n > 2 & n < 8  ## is a valid subset  3, 4, 5, 6, 7

n < 2 & n >8 ## is not valid  1, 9, 10 (not a contiguous slice)

It has to be a contiguous range. Does that make sense?

(I see it's not in the documentation for hyper_filter)

It's definitely possible to do it, but ensuring efficiency could be pretty tough - read the whole range and subset, or iterate over elements - without an obvious way to know what's better. This is related to the "chunking" (tiling in other contexts) concept NetCDF has, and how seeking data in batches is handled.

@matteodefelice
Copy link
Author

But apparently it works, so it's possible but it's not efficient, isn't it?

@mdsumner
Copy link
Collaborator

Oh right, forgot it was not an error, it finds the range, the warning is that you will get everything in between although you asked for specific slices

@matteodefelice
Copy link
Author

Ah...and now I realise that it is not getting everything but rather messing up the data :(
Look at my example:

results %>% activate(Location)

Let me show only the dimensions to avoid confusion:

Dimensions 15 (2 active): 
  
  dim   name  length   min   max start count  dmin  dmax unlim coord_dim 
  <chr> <chr>  <dbl> <dbl> <dbl> <int> <int> <dbl> <dbl> <lgl> <lgl>     
1 D3    n         29    NA    NA     1    29    NA    NA FALSE TRUE      
2 D5    u        347    NA    NA     1   347    NA    NA FALSE TRUE   

So (sorry for the complicate example, trust me about that last filter):

> results %>% activate(Location) %>% hyper_tibble() %>% filter(Location == 1)
# A tibble: 347 x 3
   Location n     u                                  
      <int> <chr> <chr>                              
 1        1 ES    [0] - ES_Hydro reservoir           
 2        1 DE    [100] - DE_CCGT fleet CHP          
 3        1 DE    [101] - DE_CCGT fleet F class      
 4        1 DE    [102] - DE_CCGT fleet G_H class    
 5        1 DE    [103] - DE_CCGT fleet cogen        
 6        1 DE    [104] - DE_Coal fleet IGCC         
 7        1 DE    [105] - DE_Coal fleet Subcritical  
 8        1 DE    [106] - DE_Coal fleet Supercritical
 9        1 DE    [107] - DE_Oil fleet Standard      
10        1 DE    [108] - DE_Oil fleet Subcritical   
# … with 337 more rows

Then:

> subsetted = results %>% hyper_filter(n = n %in% c('AT', 'IT'))
Warning message:
In update_slices(.x) :
  arbitrary indexing within dimension is not yet supported

And then apparently it works:

# A tibble: 233 x 3
   Location n     u                                     
      <int> <chr> <chr>                                 
 1        1 AT    [103] - DE_CCGT fleet cogen           
 2        1 AT    [10] - ES_CCGT fleet G_H class        
 3        1 IT    [117] - FI_Hydro reservoir            
 4        1 AT    [125] - FI_CCGT fleet E class         
 5        1 IT    [132] - GB_Nuclear energy             
 6        1 AT    [140] - GB_CCGT fleet E class         
 7        1 IT    [148] - GR_Hydro RoR fleet            
 8        1 AT    [156] - GR_Lignite fleet Supercritical
 9        1 IT    [163] - HR_Waste fleet                
10        1 AT    [171] - HR_Oil fleet Subcritical      
# … with 223 more rows

As you can see there is something wrong, look:

rbind(
results   %>% activate(Location) %>% hyper_tibble() %>% filter(u == '[117] - FI_Hydro reservoir', Location == 1), 
subsetted %>% activate(Location) %>% hyper_tibble() %>% filter(u == '[117] - FI_Hydro reservoir', Location == 1))
# A tibble: 2 x 3
  Location n     u                         
     <int> <chr> <chr>                     
1        1 FI    [117] - FI_Hydro reservoir
2        1 IT    [117] - FI_Hydro reservoir
Warning messages:
1: In update_slices(.data) :
  arbitrary indexing within dimension is not yet supported
2: In update_slices(.x) :
  arbitrary indexing within dimension is not yet supported

Do you have in mind a workaround?

@mdsumner
Copy link
Collaborator

Any chance you can share a file? I think it will be a struggle without a reprex

@matteodefelice
Copy link
Author

You are right, I apologise. The NetCDF is here: https://send.firefox.com/download/e7f5bedfbc665528/#mhoAoncZZBWUSWdasVp7TA

The 'u' field represents the name of a power plant and its name include also the country where is located, then "FI_Hydro reservoir" should be in Finland (FI) and not in Italy (IT) as after the subsetting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants