-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Classification example - segmentation fault on some systems #136
Comments
Answered ✅
Conflicts are avoided by using a separate Python environment to install the requirements into. From experience I know it can be real troublesome to work with deep learning repos with loose requirements. Considering this an application and not so much a library I think it should be acceptable to have fixed versions? But I'm curious to hear arguments for setting them loose.
With a Python=3.10 conda env + pip installing the requirements.txt (as instructed in the classification demo docs) training works for me out of the box. |
Hi @gemenerik, I still have segmentation fault, event after creating a conda environment from scratch. Anything else I can do to execute the code? |
Can you share some more details? Like what OS you are using? A terminal printout? Anything that helps me reproduce the problem. |
Sure, here it is.
This is the terminal output when I try to run the
Is it a problem if I store and run everything from an external SSD? |
Oof, that is not a very informative error. Can you run any of the official tensorflow examples for this install? |
Some more info,
I tried this quickstart example , and the model is correctly trained (exactly as in here) |
Good news, the |
Good idea to try a docker container. Instead of an nvidia one, I will try to find a EDIT: that will likely be |
If you have a chance to test it; create a file #!/usr/bin/env bash
set -e
full_path=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
cd ${full_path}
pip install pillow scipy
python train_classifier.py From repository root folder run:
|
Thanks for your support. However it does not work. This is the output I got:
|
Curious. Do you have an NVIDIA GPU? |
Sorry for the late reply. Yes I have an NVIDIA GPU, this is my
|
Thanks! I think for now we'll leave this issue open and consider the NVIDIA docker a workaround for NVIDIA GPU users that run into the segmentation fault. |
@luigifeola the above might work with the |
Hi @gemenerik sorry for the super late reply. Actually even with
|
Hi! Rik will be back next week so I'll notify him once he is back |
It may be related to how TensorFlow is built, possibly involving the GPU. Works fine on a GTX 1080 system. Haven't been able to reproduce the problem and a workaround was found, so not digging deeper for now. |
Hi @gemenerik, Additionally, the tensorflow Docker container recommends running in non-root mode. To follow this best practice, I created a custom image based on The Lite model works well on my custom dataset, but when deployed, it detects ~90% of the time the Thanks again for your support! |
Related to this, documentation has been updated to include instructions for Docker-based training |
There is a discussion indicating that there are issues running the classification example.
I did a quick test and found some (other) problems:
python train_classifier.py
I get a segmentation fault(!), not sure why.My conclusion is that we should take a look at this example and make sure it works.
The text was updated successfully, but these errors were encountered: