Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyper Parameter Exploration #14

Open
grezesf opened this issue Feb 22, 2017 · 4 comments
Open

Hyper Parameter Exploration #14

grezesf opened this issue Feb 22, 2017 · 4 comments
Assignees

Comments

@grezesf
Copy link
Contributor

grezesf commented Feb 22, 2017

Run experiments constantly, exploring the hyper-parameter space:
parameters and scope to be determined shortly.
(possibly use hyperas: https://github.com/maxpumperla/hyperas)

@grezesf grezesf self-assigned this Feb 22, 2017
@grezesf
Copy link
Contributor Author

grezesf commented Mar 7, 2017

I've (finally) gotten hyper-parameter search to work. Here is the possible search space. Before I launch it on the server, which should I remove?

hyper-parameter space

    # size of LSTM output [256,512,1024,2048])
    # make LSTM bidirectional or not
    # number of LSTM layers [1,2,3,more?]
    # merge_mode for bidirectional LSTM ['sum', 'mul', 'concat', 'ave', None]
    # activation function of output Dense layer: [softmax, softplus, softsign, relu, tanh, sigmoid, hard_sigmoid, linear]
    # loss for whole model: ['mean_squared_error / mse', 'mean_absolute_error / mae', 'mean_absolute_percentage_error / mape', 'mean_squared_logarithmic_error / msle'
    # (continued) squared_hinge, hinge, binary_crossentropy, kullback_leibler_divergence, poisson, cosine_proximity]
    # optimizer [SGD, RMSprop, AdaGrad, AdaDelta, Adam, Adam, Adamax, Nadam]
    # 'batch_size [32, 64, 128, 256, 512]

@mim
Copy link
Contributor

mim commented Mar 8, 2017

What does no merge for bidirectional LSTM mean?

For the loss for the whole model, I thought we were either using the mask-aware loss or the phase-aware loss, right?

And for the activation function of the output, if it is predicting a mask, it should be sigmoid.

The other parameters look good for searching.

@grezesf
Copy link
Contributor Author

grezesf commented Mar 8, 2017

  1. "If None, the outputs will not be combined, they will be returned as a list." (I have to admit I'm not 100% clear on how bidirectional networks function)
  2. The model right now is mask-aware, but I guess there is more than 1 way to compute a loss between a predicted and target mask. MSE corresponds to the Erdogan paper.
  3. I'll restrict to sigmoid and hard sigmoid

@mim
Copy link
Contributor

mim commented Mar 8, 2017

  1. Try the other combinations, but not None
  2. I think the loss between the predicted and target mask should be cross entropy and the loss between the masked noisy speech and clean speech should be MSE.
  3. Sounds good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants