-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: Caught RuntimeError in pin memory thread for device 0 #642
Comments
Exactly the same conf and command works on
|
Try this argument --device cuda:0 --batch-size 12 Also, check the GPU use with this command: watch -n 1 nvidia-smi |
Error: No such option: --batch-size (Possible options: --resize, --step-size)
|
I train YOLOv8 bigger models on the same environment without any problems. |
12Gb is fairly close to the 10Gb that is usually required to train a segmentation model so it is possible that torch is running out of memory. Could you just try training the model in 16bit mixed precision with the 4.3.12 didn't use lightning yet which was slightly more memory efficient. |
Sure, I’ll try that tomorrow when I get home. Even though, it never happend
on kraken 4.x
Anyway, I kept an eye on the GPU while starting to train and the memory was far from being exhausted.
…On Wed, 25 Sep 2024 at 01:13, mittagessen ***@***.***> wrote:
12Gb is fairly close to the 10Gb that is usually required to train a
segmentation model so it is possible that torch is running out of memory.
Could you just try training the model in 16bit mixed precision with the
--precision option for a quick fix?
—
Reply to this email directly, view it on GitHub
<#642 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD44GHWB5HPA4VTJTZ6ZG7LZYHW27AVCNFSM6AAAAABOQHKQ2GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZSGU2DSOJSGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Doesn't crash now with
|
why do you segtrain with such a small input mask that and an architecture that looks more like the recognition model?? |
I'm trying different tests on a small model. Is the arch for recognition? I want to train from scratch. |
yes you are trying to train with a reco architecture. to me it seems nonsense. |
I never trained a seg model from scratch. What's the arch for segtrain then? |
just dont submit -s and it will train on the default. |
ok, found it in the doc: |
the default arch for segtrain is mentioned if you type ketos segtrain --help. The one you quote above is outdated. |
Working like this:
|
training on 36 imgs is very little data if you do not train on top of a model. I do not remember whether ketos segtrain will automatically load the blla basemodel as point of departure, but I believe that not. |
I pretrain the model because the little data dataset is made by hand by me, so I train a model to help me in trainscribing for more groundtruth |
Specs:
Training small amount of data, about 35 pages. Worked on 25 pages.
Error:
The text was updated successfully, but these errors were encountered: