-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chunks.npy and Dataset.py not being generated #385
Comments
Hi all, I am also trying to troubleshoot this issue with Sam, and it's a bit unclear what files should be generated during this step.
Is there a test dataset available that users can work through from the beginning to see what the expected outputs of calling |
@Sgreenfield9 your reads need to map to your reference for any training data to be created. Can you confirm you reads map? You can check in the Also, note that you seem to be passing |
I found that enable Lines 583 to 587 in 0c7fcce
with
without
|
Hello, I'm trying to train a basecaller using DNA that has been run through an RNA pore. When I run the following code:
bonito basecaller [email protected] --min-accuracy-save-ctc 0 --reference /home/remote /data/minknow/PolyA_DNA_SG/PolyA_DNA_SG/20240320_1335_P2S-01618-B_PAU71604_94a542e0/fast5_pass > /home/remote/basecalls.sam
I receive the following output:
`> calling: 100%|###########################################9| 8969/8979 [15:08<0 > completed reads: 8979
No errors being thrown so I assume everything is going fine. The issue arrises when I try to run the subsequent bonito train command:
bonito train --epochs 1 --lr 5e-4 --pretrained [email protected] --directory /home/remote /home/remote/fine-tuned-model
`[loading model]
[using pretrained model [email protected]]
[loading data]
Traceback (most recent call last):
File "/home/remote/.local/lib/python3.8/site-packages/bonito/cli/train.py", line 58, in main
train_loader_kwargs, valid_loader_kwargs = load_numpy(
File "/home/remote/.local/lib/python3.8/site-packages/bonito/data.py", line 40, in load_numpy
train_data = load_numpy_datasets(limit=limit, directory=directory)
File "/home/remote/.local/lib/python3.8/site-packages/bonito/data.py", line 66, in load_numpy_datasets
chunks = np.load(os.path.join(directory, "chunks.npy"), mmap_mode='r')
File "/home/remote/.local/lib/python3.8/site-packages/numpy/lib/npyio.py", line 405, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '/home/remote/chunks.npy'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/remote/.local/bin/bonito", line 8, in
sys.exit(main())
File "/home/remote/.local/lib/python3.8/site-packages/bonito/init.py", line 34, in main
args.func(args)
File "/home/remote/.local/lib/python3.8/site-packages/bonito/cli/train.py", line 62, in main
train_loader_kwargs, valid_loader_kwargs = load_script(
File "/home/remote/.local/lib/python3.8/site-packages/bonito/data.py", line 31, in load_script
spec.loader.exec_module(module)
File "", line 844, in exec_module
File "", line 980, in get_code
File "", line 1037, in get_data
FileNotFoundError: [Errno 2] No such file or directory: '/home/remote/dataset.py'`
When I look at the directory I wrote my files to /home/remote I find that only a .sam file has been generated but chunks.npy has not. Is my chunks.npy file not being written or is it being written to another location? Any help would be greatly appreciated.
The text was updated successfully, but these errors were encountered: