Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xron call Produces Empty FASTQ Directories #7

Open
BrendanBeahan opened this issue Nov 28, 2024 · 6 comments
Open

xron call Produces Empty FASTQ Directories #7

BrendanBeahan opened this issue Nov 28, 2024 · 6 comments

Comments

@BrendanBeahan
Copy link

Hello,

I am attempting to run xron for basecalling, but I am encountering an issue where the output directories are empty. Below are the details of my setup and the steps I followed.

Built a Singularity Image Format (SIF) file using the following Dockerfile:

FROM python:3.8-slim

ENV DEBIAN_FRONTEND=noninteractive \
    PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1

WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    wget \
    curl \
    zlib1g-dev \
    unzip && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

RUN pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

RUN pip install xron

RUN xron init

CMD ["bash"]

Converted the Docker image into a SIF file and ran the following command within the Singularity container:

xron call \
  -i /rhea/scratch/brussel/vo/000/bvo00030/vsc11010/pod5_scp/OHMX20240022_001/pod5_pass/ \
  -o /rhea/scratch/brussel/vo/000/bvo00030/vsc11010/xRon/results_2 \
  -m models/RNA004

Observed the following output:

No GPU is detected, the batch_size is setting to default 1200
Construct and load the model.
/usr/local/lib/python3.8/site-packages/xron/xron_eval.py:32: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ckpt = torch.load(os.path.join(model_folder,latest_ckpt),
Begin basecall.
0it [00:00, ?it/s]
NN_time:0.000000,assembly_time:0.000000,writing_time:0.000053

The output directory contains no .fastq files, and nothing appears to have been processed.

Also, the input directory /rhea/scratch/brussel/vo/000/bvo00030/vsc11010/pod5_scp/OHMX20240022_001/pod5_pass/ contains valid .pod5 files.

Finally, I noticed a warning about torch.load in the output:

FutureWarning: You are using torch.loadwithweights_only=False ...

Please let me know if there’s any additional information you need to help troubleshoot this issue. Thank you for your assistance!

Best,
Brendan

@abcdtree
Copy link

abcdtree commented Nov 28, 2024

@BrendanBeahan I am having the same issue.
I found this in code

parser.add_argument('--input_format', default = "fast5",help = "The input file format, defautl is pod5 file, can be fast5.")

While it is said the default is pod5, it is actually fast5 format. So I am trying to add --input_format pod5 to have another try. Will let you know whether it works or not.

@BrendanBeahan just to update, this does not fix the issue

@BrendanBeahan
Copy link
Author

@abcdtree Thanks Josh!

@abcdtree
Copy link

abcdtree commented Dec 2, 2024

@BrendanBeahan just to update, The following command works for me now

xron call -i ${seq_home}/${sample}/fast5_pass -o ${output_folder} -m ${xron_model}/ENEYFT  --threads 20 --batch_size 2000 --device cuda

I found that

  1. even I added --input_format pod5 parameter, in the config.toml file, the input_format was still set as fast5.
  2. if you don't set --batch_size it will be None and the command will stop

The command is running now for a while, and fastqs were created. I will let you know what this will end up.
Let us solve this together. :D

Josh

@BrendanBeahan
Copy link
Author

@abcdtree

Haha this is almost entirely you my man. But good work! Excited to see the result.

@abcdtree
Copy link

abcdtree commented Dec 4, 2024

@BrendanBeahan The command above works well. It will output fastqs. I also found out it will mark as "M" instead of ATGC in the fastq file if there is possible modification at the position. I mapped the reads to reference and pileup to see the proportion of the modification base comparing to regular, and find some sites that will high proportion of modification.

@haotianteng Is this the right application using xron to call methylation?

Josh

@haotianteng
Copy link
Owner

haotianteng commented Jan 16, 2025

@abcdtree Yes, it was the expecting behavior. M means a m6A modification call. You can replace it back to A for mapping to the reference. As currently there is no common standard for representing modified base, there is no function in Xron yet to generate bam file, but if you want to generate bam file with same format like dorado, probably need to do some manually coding.

Thank you for locating the format argument issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants