xron call Produces Empty FASTQ Directories #7

BrendanBeahan · 2024-11-28T09:37:46Z

Hello,

I am attempting to run xron for basecalling, but I am encountering an issue where the output directories are empty. Below are the details of my setup and the steps I followed.

Built a Singularity Image Format (SIF) file using the following Dockerfile:

FROM python:3.8-slim

ENV DEBIAN_FRONTEND=noninteractive \
    PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1

WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    wget \
    curl \
    zlib1g-dev \
    unzip && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

RUN pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

RUN pip install xron

RUN xron init

CMD ["bash"]

Converted the Docker image into a SIF file and ran the following command within the Singularity container:

xron call \
  -i /rhea/scratch/brussel/vo/000/bvo00030/vsc11010/pod5_scp/OHMX20240022_001/pod5_pass/ \
  -o /rhea/scratch/brussel/vo/000/bvo00030/vsc11010/xRon/results_2 \
  -m models/RNA004

Observed the following output:

No GPU is detected, the batch_size is setting to default 1200
Construct and load the model.
/usr/local/lib/python3.8/site-packages/xron/xron_eval.py:32: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ckpt = torch.load(os.path.join(model_folder,latest_ckpt),
Begin basecall.
0it [00:00, ?it/s]
NN_time:0.000000,assembly_time:0.000000,writing_time:0.000053

The output directory contains no .fastq files, and nothing appears to have been processed.

Also, the input directory /rhea/scratch/brussel/vo/000/bvo00030/vsc11010/pod5_scp/OHMX20240022_001/pod5_pass/ contains valid .pod5 files.

Finally, I noticed a warning about torch.load in the output:

FutureWarning: You are using torch.loadwithweights_only=False ...

Please let me know if there’s any additional information you need to help troubleshoot this issue. Thank you for your assistance!

Best,
Brendan

The text was updated successfully, but these errors were encountered:

abcdtree · 2024-11-28T21:58:10Z

@BrendanBeahan I am having the same issue.
I found this in code

parser.add_argument('--input_format', default = "fast5",help = "The input file format, defautl is pod5 file, can be fast5.")

While it is said the default is pod5, it is actually fast5 format. So I am trying to add --input_format pod5 to have another try. Will let you know whether it works or not.

@BrendanBeahan just to update, this does not fix the issue

BrendanBeahan · 2024-11-29T12:23:20Z

@abcdtree Thanks Josh!

abcdtree · 2024-12-02T10:17:13Z

@BrendanBeahan just to update, The following command works for me now

xron call -i ${seq_home}/${sample}/fast5_pass -o ${output_folder} -m ${xron_model}/ENEYFT  --threads 20 --batch_size 2000 --device cuda

I found that

even I added --input_format pod5 parameter, in the config.toml file, the input_format was still set as fast5.
if you don't set --batch_size it will be None and the command will stop

The command is running now for a while, and fastqs were created. I will let you know what this will end up.
Let us solve this together. :D

Josh

BrendanBeahan · 2024-12-02T14:21:14Z

@abcdtree

Haha this is almost entirely you my man. But good work! Excited to see the result.

abcdtree · 2024-12-04T03:51:22Z

@BrendanBeahan The command above works well. It will output fastqs. I also found out it will mark as "M" instead of ATGC in the fastq file if there is possible modification at the position. I mapped the reads to reference and pileup to see the proportion of the modification base comparing to regular, and find some sites that will high proportion of modification.

@haotianteng Is this the right application using xron to call methylation?

Josh

haotianteng · 2025-01-16T23:15:43Z

@abcdtree Yes, it was the expecting behavior. M means a m6A modification call. You can replace it back to A for mapping to the reference. As currently there is no common standard for representing modified base, there is no function in Xron yet to generate bam file, but if you want to generate bam file with same format like dorado, probably need to do some manually coding.

Thank you for locating the format argument issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xron call Produces Empty FASTQ Directories #7

xron call Produces Empty FASTQ Directories #7

BrendanBeahan commented Nov 28, 2024

abcdtree commented Nov 28, 2024 •

edited

Loading

BrendanBeahan commented Nov 29, 2024

abcdtree commented Dec 2, 2024

BrendanBeahan commented Dec 2, 2024

abcdtree commented Dec 4, 2024

haotianteng commented Jan 16, 2025 •

edited

Loading

xron call Produces Empty FASTQ Directories #7

xron call Produces Empty FASTQ Directories #7

Comments

BrendanBeahan commented Nov 28, 2024

abcdtree commented Nov 28, 2024 • edited Loading

BrendanBeahan commented Nov 29, 2024

abcdtree commented Dec 2, 2024

BrendanBeahan commented Dec 2, 2024

abcdtree commented Dec 4, 2024

haotianteng commented Jan 16, 2025 • edited Loading

abcdtree commented Nov 28, 2024 •

edited

Loading

haotianteng commented Jan 16, 2025 •

edited

Loading