Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: invalid 'type' (environment) of argument #1423

Closed
rjpeng98 opened this issue Apr 2, 2024 · 8 comments
Closed

RuntimeError: invalid 'type' (environment) of argument #1423

rjpeng98 opened this issue Apr 2, 2024 · 8 comments

Comments

@rjpeng98
Copy link

rjpeng98 commented Apr 2, 2024

He, I am using keras and tensorflow in R to train a mixture density network. My customized loss function has been tested.
However, when I try to fit the model, there is always an error:

Screenshot 2024-04-01 at 8 53 44 PM

My codes are following here:

library(keras)
library(tensorflow)

num_components = 2  # Number of mixture components

input <- layer_input(shape = c(100))  # a 1-dimensional input

# Define a hidden layer
hidden <- input %>%
  layer_dense(units = 128, activation = 'relu') %>%
  layer_dense(units = 64, activation = 'relu')

# Output layers for mixture components
mu <- hidden %>%
  layer_dense(units = num_components, name = 'mu')  # Means of the Gaussians

sigma <- hidden %>%
  layer_dense(units = num_components, activation = 'softplus', name = 'sigma')  # Standard deviation of the Gaussians (positive)

p <- hidden %>%
  layer_dense(units = num_components, activation = 'softmax', name = 'p')  # Mixture coefficients (sum to 1)

model <- keras_model(inputs = input, outputs = list(mu, sigma, p))


mdn_loss <- function(y_true, model_output) {
  # Extract components from model output
  mu = model_output[[1]]
  sigma = model_output[[2]] + keras::k_epsilon()
  p = model_output[[3]]
  
  single_gaussian_nll = function(y, mu, sigma, p) {
    return(-log(sum(exp(log(p)+
                          (-log(sigma)-log(2*pi)/2-1/2*((y-mu)/sigma)^2)))))
  }
  
  total_nll <- 0
  for (i in 1:nrow(y_true)) {
    
  total_nll = sum(total_nll, (single_gaussian_nll(y_true[i], mu, sigma, p)))
  
  }
  
  return((total_nll))
}

model %>% compile(
  optimizer = 'adam',
  loss = mdn_loss)


#data simulation 
theta_alpha<- -10
theta_beta<- 10
alpha<- 1/4
sigma_1<- 1
sigma_2<- 0.1
n<-10000 #number of samples from prior distribution

theta_prior<- runif(n, min = theta_alpha, max = theta_beta)
x_simulated<- matrix(nrow = n,ncol = 100)

for (i in 1:n) {
  for (j in 1:100) {
  
  indic<- rbinom(1,1,alpha)
  x_simulated[i,j]<- indic*rnorm(1,mean = theta_prior[i], sd = sigma_1)+
    (1-indic)*rnorm(1, mean = -theta_prior[i], sd = sigma_2)
  }
}

model %>% fit(x_simulated,  matrix(theta_prior, ncol = 1), epochs = 10, batch_size = 100)

Thanks in advance for your any comments.

@t-kalinowski
Copy link
Member

t-kalinowski commented Apr 2, 2024

Using the functional API in Keras to train a multi-output model, the value supplied to fit(loss = ) is expected to be a list of callables, with the same length as model outputs. Each output will be called with only one output. This works great if the loss from each output can be calculated without the values of the other outputs.

However, this API does not cover the use case when you have multiple outputs, and you need the values of all the outputs in one scope to calculate the loss. There are however two straightforward ways to do this:

  1. If all the outputs have compatible shapes, you can call layer_concatenate() to combine the outputs along an axis, and then unstack them in the custom loss function.

    Here is your code adapted using this approach. Note, I updated it to use keras3 instead of keras, and I used the new op_* functions where appropriate. (e.g., replaced the for loop with op_vectorized_map():

    model <- keras_model(inputs = input, outputs = layer_concatenate(mu, sigma, p))
    
    custom_loss_fn <- function(y_true, y_pred) {
      str(y_true)
      str(y_pred)
      ## browser() is safe to use here to be able to work with the `y_true` and
      ## `y_pred` tracing tensors interactively. Just be sure to exit the browser
      ## context by pressing "Continue" (to raise an error) than by "Quit". If you
      ## "Quit" the R browser context, it leaves the TensorFlow tracing context
      ## open, and nothing else will work as expected (and it will eventually
      ## segfault).
      # browser()
      c(mu, sigma, p) %<-% op_split(y_pred, 3, axis = 2)
    
      sigma %<>% `+`(config_epsilon())
    
      single_gaussian_nll <- function(.x) {
        c(y, mu, sigma, p) %<-% .x
        -log(sum(exp(
          log(p) +
            (-log(sigma) - log(2 * pi) / 2 - 1 / 2 * ((y - mu) / sigma) ^ 2)
        )))
      }
    
      total_nll <-
        op_sum(op_vectorized_map(list(y_true, mu, sigma, p),
                                 single_gaussian_nll))
    
      total_nll
    }
  2. You can subclass Model and define a custom train_step.
    See https://keras.posit.co/articles/custom_train_step_in_tensorflow.html for examples.

    Note, I don't think this example requires a custom train_step, that would only be required if the outputs did not share a shape and could not be concatenated. If you still want to have a model with 3 outputs, you can define two models that share weights, one for training, and one for inference. E.g.,

    model <- ... # same as before
    training_model <- keras_model(inputs = model$inputs,
                                  outputs = layer_concatenate(model$outputs))
                                  
    # Training 'training_model' will also train 'model', since the two
    # models share weights. 
    training_model |> compile() |> fit() # same as before
    model |> predict()                   # inference from trained model with 3 outputs

@t-kalinowski t-kalinowski transferred this issue from rstudio/reticulate Apr 2, 2024
@rjpeng98
Copy link
Author

rjpeng98 commented Apr 2, 2024

Thanks for your prompt reply.

It works but it returns to me only nans.
Screenshot 2024-04-02 at 11 43 48 AM

Could you please give me more comments on it? (In my codes, epsilon and LogSumExp have been used to avoid such an issue.)

Thanks in advance.

@rjpeng98 rjpeng98 closed this as completed Apr 2, 2024
@rjpeng98 rjpeng98 reopened this Apr 2, 2024
@t-kalinowski
Copy link
Member

t-kalinowski commented Apr 2, 2024

You can pass compile(run_eagerly = TRUE), and then insert browser(), print(), str(), and message() calls in your custom loss function as needed to track down where the NaN's are coming from. E.g.,

custom_loss_fn <- function(y_true, y_pred) {
  ... # same as before

  single_gaussian_nll <- function(.x) {
    c(y, mu, sigma, p) %<-% .x
    result <- ... # calculate as before
    if (py_bool(op_isnan(result))) browser()
    result
  }

  total_nll <- ... # same as before

  if (py_bool(op_isnan(total_nll))) browser()
  total_nll
}

model |> compile(run_eagerly = TRUE, loss = custom_loss_fn)

@rjpeng98
Copy link
Author

rjpeng98 commented Apr 2, 2024

Many thanks.

@rjpeng98
Copy link
Author

rjpeng98 commented Apr 3, 2024

Greetings

I tried to debug my codes by adding "if (py_bool(op_isnan(result))) browser()" in "custom_loss_fn".
However, there is an error related to py_bool as
Screenshot 2024-04-02 at 8 59 47 PM

I appreciate your any comments as usual. Btw, by browser(), it looks like the y_pred is NA after the very first iteration. Does it mean my network is wrong?

The codes are as following:

#install_keras()
library(keras3)
library(tensorflow)
library(reticulate)
num_components = 2  # Number of mixture components

input <- layer_input(shape = c(100))  # a 1-dimensional input

# Define a hidden layer
hidden <- input %>%
  layer_dense(units = 128, activation = 'relu') %>%
  layer_dense(units = 64, activation = 'relu')

# Output layers for mixture components
mu <- hidden %>%
  layer_dense(units = num_components, name = 'mu')  # Means of the Gaussians

sigma <- hidden %>%
  layer_dense(units = num_components, activation = 'softplus', name = 'sigma')  # Standard deviation of the Gaussians (positive)

p <- hidden %>%
  layer_dense(units = num_components, activation = 'softmax', name = 'p')  # Mixture coefficients (sum to 1)

model <- keras_model(inputs = input, outputs = layer_concatenate(mu, sigma, p))

custom_loss_fn <- function(y_true, y_pred) {
  str(y_true)
  str(y_pred)
  ## browser() is safe to use here to be able to work with the `y_true` and
  ## `y_pred` tracing tensors interactively. Just be sure to exit the browser
  ## context by pressing "Continue" (to raise an error) than by "Quit". If you
  ## "Quit" the R browser context, it leaves the TensorFlow tracing context
  ## open, and nothing else will work as expected (and it will eventually
  ## segfault).
  
  c(mu, sigma, p) %<-% op_split(y_pred, 3, axis = 2)
  
  
  sigma %<>% `+`(config_epsilon())
  

  single_gaussian_nll <- function(.x) {
    c(y, mu, sigma, p) %<-% .x
    result<- -log(sum(exp(
      log(p) +
        (-log(sigma) - log(sqrt(2 * pi))  - (1 / 2) * ((y - mu)^2 /sigma^2))
    )))
    if (py_bool(op_isnan(result))) browser()
    result
    
  }

  total_nll <-
    op_sum(op_vectorized_map(list(y_true, mu, sigma, p),
                             single_gaussian_nll))
  
  total_nll
  
}
#debug
model |> compile(run_eagerly = TRUE, loss = custom_loss_fn)
#data simulation 
theta_alpha<- -10
theta_beta<- 10
alpha<- 1/4
sigma_1<- 1
sigma_2<- 0.1
n<-10000 #number of samples from prior distribution

theta_prior<- runif(n, min = theta_alpha, max = theta_beta)
x_simulated<- matrix(nrow = n,ncol = 100)

for (i in 1:n) {
  for (j in 1:100) {
  
  indic<- rbinom(1,1,alpha)
  x_simulated[i,j]<- indic*rnorm(1,mean = theta_prior[i], sd = sigma_1)+
    (1-indic)*rnorm(1, mean = -theta_prior[i], sd = sigma_2)
  }
}
model %>% fit(x_simulated,  matrix(theta_prior, ncol = 1), epochs = 10, batch_size = 100)


Many thanks.

@t-kalinowski
Copy link
Member

Thanks, I could reproduce. This was slightly harder to track down than I expected, because op_vectorized_map() traces f even in eager mode. I will add an example to the docs of op_vectorized_map(), showing how implement a debuggable version of it, op_vectorized_map_debug().

The issue is that the custom loss you are calculating returns inf sometimes, which the optimizer then uses to updates the weights into nan. The nan values in y_pred are only encountered after the first batch of updates.

Here is your code updated with inserted if(py_bool(op_any(op_isinf(total_nll)))) ...
calls, using op_vectorized_map_debug().

#install_keras()
Sys.setenv("CUDA_VISIBLE_DEVICES"="")
library(keras3)
# library(tensorflow, exclude = c("set_random_seed", "shape"))
library(reticulate)

num_components = 2  # Number of mixture components

input <- layer_input(shape = c(100))  # a 1-dimensional input

# Define a hidden layer
hidden <- input %>%
  layer_dense(units = 128, activation = 'relu') %>%
  layer_dense(units = 64, activation = 'relu')

# Output layers for mixture components
mu <- hidden %>%
  layer_dense(units = num_components,
              name = 'mu')  # Means of the Gaussians

sigma <- hidden %>%
  layer_dense(units = num_components,
              activation = 'softplus',
              name = 'sigma')  # Standard deviation of the Gaussians (positive)

p <- hidden %>%
  layer_dense(units = num_components,
              activation = 'softmax',
              name = 'p')  # Mixture coefficients (sum to 1)

model <-
  keras_model(inputs = input,
              outputs = layer_concatenate(mu, sigma, p))

op_vectorized_map_debug <- function(elements, fn) {

  batch_size <- elements[[1]] |> op_shape() |> _[[1]]

  elements |>
    lapply(\(e) op_split(e, batch_size)) |>
    zip_lists() |>
    lapply(fn) |>
    op_stack()

}


ii <- 0L
custom_loss_fn <- function(y_true, y_pred) {
  ii <<- ii + 1L
  str(keras3:::named_list(ii, y_true, y_pred))
  ## browser() is safe to use here to be able to work with the `y_true` and
  ## `y_pred` tracing tensors interactively. Just be sure to exit the browser
  ## context by pressing "Continue" (to raise an error) than by "Quit". If you
  ## "Quit" the R browser context, it leaves the TensorFlow tracing context
  ## open, and nothing else will work as expected (and it will eventually
  ## segfault).

  if(py_bool(op_any(op_isnan(y_pred)))) browser()
  c(mu, sigma, p) %<-% op_split(y_pred, 3, axis = 2)

  sigma %<>% `+`(config_epsilon())

  single_gaussian_nll <- function(.x) {
    c(y, mu, sigma, p) %<-% .x
    result <- -op_log(op_sum(op_exp(
      op_log(p) +
        (
          -op_log(sigma) - op_log(op_sqrt(2 * pi)) - (1 / 2)
          * ((y - mu) ^ 2 / sigma ^ 2)
        )
    )))
    if(py_bool(op_isinf(result))) str(c(.x, result = result))
    result
  }

  total_nll <-
    op_sum(op_vectorized_map_debug(list(y_true, mu, sigma, p),
                                   single_gaussian_nll))

  if(py_bool(op_any(op_isnan(total_nll)))) browser()
  if(py_bool(op_any(op_isinf(total_nll)))) browser()

  str(keras3:::named_list(ii, total_nll))
  print(total_nll)

  total_nll

}

model |> compile(run_eagerly = TRUE,
                 loss = custom_loss_fn)



#data simulation
theta_alpha <- -10
theta_beta <- 10
alpha <- 1 / 4
sigma_1 <- 1
sigma_2 <- 0.1
n <- 10000 #number of samples from prior distribution

theta_prior <- runif(n, min = theta_alpha, max = theta_beta)
x_simulated <- matrix(nrow = n, ncol = 100)

for (i in 1:n) {
  for (j in 1:100) {
    indic <- rbinom(1, 1, alpha)
    x_simulated[i, j] <-
      indic * rnorm(1, mean = theta_prior[i], sd = sigma_1) +
      (1 - indic) * rnorm(1, mean = -theta_prior[i], sd = sigma_2)
  }
}

model %>% fit(x_simulated,
              matrix(theta_prior, ncol = 1),
              epochs = 10,
              batch_size = 100)

@rjpeng98
Copy link
Author

rjpeng98 commented Apr 3, 2024

Many thanks for your prompt reply. I will continue to debug under your generous help.

@rjpeng98
Copy link
Author

rjpeng98 commented Apr 5, 2024

It works. Thanks a lot.

@rjpeng98 rjpeng98 closed this as completed Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants