-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU issues? #8
Comments
When using the version included in pull request #5, I did not run into any issues during hyperparameter optimization on a GPU. Did you use the exact same code on a CPU and on a GPU? |
Yes I used the #5 version. I ran the identical code on CPU and GPU. |
Unfortunately, I cannot reproduce your issues. Could you please provide a hypermodel (e.g. for MNIST) and the HPO and training procedure as code snippet (e.g. in a gist) that produces the issue? |
Using your data, I tried to reproduce your issues again after minor modifications of your code (Gist) and in branch fixGPUissues. I used the same computer with Tensorflow version 2.11.0 and Keras-Tuner version 1.1.3. Again, I could not reproduce your issues (see out-files in Gist):
However, I used |
Thank you for the help! I will look around my environment. There must be something disagreeing with your package. |
Which version of tensorflow-gpu is recommended to use with keras-tuner-cv? |
I did not test it with another version than Tensorflow 2.11. |
So you run the script on Linux or WSL? Perhaps, that's my issue. Running it on native Win. |
Using WSL2. |
It seems that I get the same issue after an update from TensorFlow 2.11 and Kerastuner 1.1.3 to TensorFlow 2.12 and Kerastuner 1.3.5. |
I have simply migrated to Linux. Gave up on virtual environments as I couldn't make the library run in a full day of fiddlin'. I used the TF and tuner versions cited by you in a previous comment (Tensorflow version 2.11.0 and Keras-Tuner version 1.1.3.). I had no issues on native Linux (Ubuntu) using anaconda. |
@Feheragyar : Thanks for answering so quickly. Most probably, that will be my solution as well... Using TensorFlow 2.12 and Kerastuner 1.3.5, |
No worries. I believe that's all I did, let me know if you run into troubles I'll try to retrace my steps for you. I believe I tested it with the most up to date TF, and kept the old tuner version and it still worked perfectly. |
I tried different versions of Tensorflow and KerasTuner. It seems that keras-tuner-cv currently only works with KerasTuner 1.1.3 and numpy 1.20. When using KerasTuner 1.1.3 with Tensorflow >2.11, you will get several deprecation warnings. However, even with Tensorflow 2.11, KerasTuner 1.1.3 and numpy 1.20, you get:
|
I think, I found the issue: in KerasTuner 1.1.3, the status of a trail was set to "completed" by |
I fixed the issue for KerasTuner 1.3.5 in https://github.com/VZoche-Golob/keras-tuner-cv |
After merging #5, this issue should be fixed. |
I have been using your extension on CPUs and it runs perfectly. I recently moved over to a using a GPU and the loss calculation looks completely chaotic now. Are there some issues in the implementation that prohibit the use of GPUs?
Here is a snippet for you to see the loss calculation issues (Best loss is 'none'; The recovered weights result in previously unseen loss values after early stopping; and due to 'none' best loss value the Best hyperparameters remain as the were set for the very first trial):
Inner Cross-Validation 5/5
Epoch 1/50
6/6 [==============================] - 5s 575ms/step - loss: 0.5369 - mean_squared_error: 0.5369 - mean_absolute_error: 0.6359 - mean_absolute_percentage_error: 263.9126 - root_mean_squared_error: 0.7327 - val_loss: 0.0721 - val_mean_squared_error: 0.0721 - val_mean_absolute_error: 0.2148 - val_mean_absolute_percentage_error: 22.1264 - val_root_mean_squared_error: 0.2685
Epoch 2/50
6/6 [==============================] - 3s 475ms/step - loss: 0.1652 - mean_squared_error: 0.1652 - mean_absolute_error: 0.3106 - mean_absolute_percentage_error: 323.5719 - root_mean_squared_error: 0.4065 - val_loss: 0.0850 - val_mean_squared_error: 0.0850 - val_mean_absolute_error: 0.2492 - val_mean_absolute_percentage_error: 25.4391 - val_root_mean_squared_error: 0.2915
Epoch 3/50
6/6 [==============================] - 3s 478ms/step - loss: 0.1079 - mean_squared_error: 0.1079 - mean_absolute_error: 0.2405 - mean_absolute_percentage_error: 256.0751 - root_mean_squared_error: 0.3284 - val_loss: 0.0103 - val_mean_squared_error: 0.0103 - val_mean_absolute_error: 0.0714 - val_mean_absolute_percentage_error: 7.3397 - val_root_mean_squared_error: 0.1013
Epoch 4/50
6/6 [==============================] - 3s 478ms/step - loss: 0.1035 - mean_squared_error: 0.1035 - mean_absolute_error: 0.1980 - mean_absolute_percentage_error: 354.6868 - root_mean_squared_error: 0.3217 - val_loss: 0.0538 - val_mean_squared_error: 0.0538 - val_mean_absolute_error: 0.2179 - val_mean_absolute_percentage_error: 22.2260 - val_root_mean_squared_error: 0.2319
Epoch 5/50
6/6 [==============================] - 3s 481ms/step - loss: 0.1149 - mean_squared_error: 0.1149 - mean_absolute_error: 0.2556 - mean_absolute_percentage_error: 254.6845 - root_mean_squared_error: 0.3389 - val_loss: 0.0229 - val_mean_squared_error: 0.0229 - val_mean_absolute_error: 0.1178 - val_mean_absolute_percentage_error: 12.0714 - val_root_mean_squared_error: 0.1513
Epoch 6/50
6/6 [==============================] - 2s 381ms/step - loss: 0.0978 - mean_squared_error: 0.0978 - mean_absolute_error: 0.2223 - mean_absolute_percentage_error: 208.5932 - root_mean_squared_error: 0.3127 - val_loss: 0.0734 - val_mean_squared_error: 0.0734 - val_mean_absolute_error: 0.2140 - val_mean_absolute_percentage_error: 22.2007 - val_root_mean_squared_error: 0.2710
Epoch 7/50
6/6 [==============================] - 1s 225ms/step - loss: 0.0789 - mean_squared_error: 0.0789 - mean_absolute_error: 0.2038 - mean_absolute_percentage_error: 213.5430 - root_mean_squared_error: 0.2808 - val_loss: 0.0186 - val_mean_squared_error: 0.0186 - val_mean_absolute_error: 0.0969 - val_mean_absolute_percentage_error: 10.0373 - val_root_mean_squared_error: 0.1364
Epoch 8/50
6/6 [==============================] - 1s 228ms/step - loss: 0.0708 - mean_squared_error: 0.0708 - mean_absolute_error: 0.1652 - mean_absolute_percentage_error: 276.1188 - root_mean_squared_error: 0.2662 - val_loss: 0.0087 - val_mean_squared_error: 0.0087 - val_mean_absolute_error: 0.0701 - val_mean_absolute_percentage_error: 7.1587 - val_root_mean_squared_error: 0.0935
Epoch 9/50
6/6 [==============================] - 1s 219ms/step - loss: 0.0676 - mean_squared_error: 0.0676 - mean_absolute_error: 0.1503 - mean_absolute_percentage_error: 282.9794 - root_mean_squared_error: 0.2600 - val_loss: 0.0090 - val_mean_squared_error: 0.0090 - val_mean_absolute_error: 0.0536 - val_mean_absolute_percentage_error: 5.5848 - val_root_mean_squared_error: 0.0950
Epoch 10/50
6/6 [==============================] - 2s 409ms/step - loss: 0.0663 - mean_squared_error: 0.0663 - mean_absolute_error: 0.1536 - mean_absolute_percentage_error: 242.2759 - root_mean_squared_error: 0.2574 - val_loss: 0.0151 - val_mean_squared_error: 0.0151 - val_mean_absolute_error: 0.0738 - val_mean_absolute_percentage_error: 7.7006 - val_root_mean_squared_error: 0.1227
Epoch 11/50
6/6 [==============================] - 3s 481ms/step - loss: 0.0696 - mean_squared_error: 0.0696 - mean_absolute_error: 0.1742 - mean_absolute_percentage_error: 183.5706 - root_mean_squared_error: 0.2638 - val_loss: 0.0395 - val_mean_squared_error: 0.0395 - val_mean_absolute_error: 0.1167 - val_mean_absolute_percentage_error: 12.3000 - val_root_mean_squared_error: 0.1986
Epoch 12/50
6/6 [==============================] - 2s 269ms/step - loss: 0.0635 - mean_squared_error: 0.0635 - mean_absolute_error: 0.1620 - mean_absolute_percentage_error: 193.5781 - root_mean_squared_error: 0.2520 - val_loss: 0.0258 - val_mean_squared_error: 0.0258 - val_mean_absolute_error: 0.0838 - val_mean_absolute_percentage_error: 8.8847 - val_root_mean_squared_error: 0.1606
Epoch 13/50
6/6 [==============================] - 2s 409ms/step - loss: 0.0594 - mean_squared_error: 0.0594 - mean_absolute_error: 0.1509 - mean_absolute_percentage_error: 208.7011 - root_mean_squared_error: 0.2438 - val_loss: 0.0404 - val_mean_squared_error: 0.0404 - val_mean_absolute_error: 0.1378 - val_mean_absolute_percentage_error: 14.4424 - val_root_mean_squared_error: 0.2011
Restoring model weights from the end of the best epoch.
Epoch 00013: early stopping
1/1 [==============================] - 1s 579ms/step
1/1 [==============================] - 0s 500ms/step
1/1 [==============================] - 1s 1s/step - loss: 0.0499 - mean_squared_error: 0.0499 - mean_absolute_error: 0.1130 - mean_absolute_percentage_error: 234.8392 - root_mean_squared_error: 0.2234
1/1 [==============================] - 1s 609ms/step - loss: 0.1864 - mean_squared_error: 0.1864 - mean_absolute_error: 0.2046 - mean_absolute_percentage_error: 106.4081 - root_mean_squared_error: 0.4317
Trial 1 Complete [00h 02m 55s]
Best val_loss So Far: None
Total elapsed time: 00h 02m 55s
The text was updated successfully, but these errors were encountered: