Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scratch/tmp pod5 problem #110

Open
Macdot3 opened this issue Feb 7, 2024 · 21 comments
Open

Scratch/tmp pod5 problem #110

Macdot3 opened this issue Feb 7, 2024 · 21 comments

Comments

@Macdot3
Copy link

Macdot3 commented Feb 7, 2024

Hi everyone,
I tried to install pod5 from the conda channel by @JannesSP because I don't have access to corporate permissions from pip. Subsequently, after installing and loading all the plugins, when I launch the --one-to-one command I receive this error:

Converting 22 Fast5s:   0%|          | 0/88000 [00:00<?, ?Reads/s]
                                                                  
Can't read data (can't open directory: /scratch/tmp/tmpjccgpwm2/lib)

I can't find any folder with this name, do I need to install something? they are temp files.
I hope you can help me. Thank you

@HalfPhoton
Copy link
Collaborator

Hi @Macdot3,
Can you share the full command that you run please?

Kind regards,
Rich

@Macdot3
Copy link
Author

Macdot3 commented Feb 7, 2024

`#!/bin/bash 
#SBATCH --job-name=fast5_conv
#SBATCH --mem=64GB  # amout of RAM in MB required (and max ram available).
##SBATCH --mem-per-cpu=5000 # amount of ram per Core (see ntasks, if you ask for ntasks
#SBATCH --time=INFINITE  ## OR #SBATCH --time=10:00 means 10 minutes OR --time=01:00:00 means 1 hour
#SBATCH --cpus-per-task=10  # number of required cores
#SBATCH --nodes=1  # not really useful for not mpi jobs
##SBATCH --partition=work  ##work is the default and unique queue, you do not need to specify.
#SBATCH --error="/home/barresi.m/Nanopore/Dorado/Dorado_ERR/fast5_conv.err"
#SBATCH --output="/home/barresi.m/Nanopore/Dorado/Dorado_OUT/fast5_conv.out"

source /opt/common/tools/besta/miniconda3/bin/activate
conda activate /home/barresi.m/Nanopore/Dorado/POD5_ENV

pod5 convert fast5 /home/barresi.m/Nanopore/Dorado/FAST5/barcode5/*.fast5 \
     --output /home/barresi.m/Nanopore/Dorado/POD5/POD5_barcode5/ \
     --one-to-one /home/barresi.m/Nanopore/Dorado/FAST5/barcode5/ -t 10`

@HalfPhoton
Copy link
Collaborator

From your command I don't see any issues or any reason why it would need to access /tmp as any temp files created by pod5 are written locally.

Could you please add debugging to the pod5 command with POD5_DEBUG=1 pod5 convert .... and share the log files that are generated?

@Macdot3
Copy link
Author

Macdot3 commented Feb 7, 2024

I added
export POD5_DEBUG=1 to the previous command pod5 convert and saved logfile.txt. and this is the result:

`Converting 22 Fast5s:   0%|          | 0/88000 [00:07<?, ?Reads/s]
                                                                  
Can't read data (can't open directory: /scratch/tmp/tmp2w9ypgxb/lib)

Converting 22 Fast5s:   0%|          | 0/88000 [00:07<?, ?Reads/s]
                                                                  
Can't read data (can't open directory: /scratch/tmp/tmprh5stj_2/lib)

Converting 22 Fast5s:   0%|          | 0/88000 [00:07<?, ?Reads/s]
                                                                  
Can't read data (can't open directory: /scratch/tmp/tmp_picg_t3/lib)

Converting 22 Fast5s:   0%|          | 0/88000 [00:07<?, ?Reads/s]
                                                                  
Can't read data (can't open directory: /scratch/tmp/tmp0799lli3/lib)

Converting 22 Fast5s:   0%|          | 0/88000 [00:07<?, ?Reads/s]
                                                                  
Can't read data (can't open directory: /scratch/tmp/tmp8t2d7mn7/lib)

Converting 22 Fast5s:   0%|          | 0/88000 [00:07<?, ?Reads/s]
                                                                  
Can't read data (can't open directory: /scratch/tmp/tmp1at9xm89/lib)

Converting 22 Fast5s:   0%|          | 0/88000 [00:08<?, ?Reads/s]
                                                                  
Can't read data (can't open directory: /scratch/tmp/tmp2w9ypgxb/lib)

Converting 22 Fast5s:   0%|          | 0/88000 [00:08<?, ?Reads/s]
                                                                  
Can't read data (can't open directory: /scratch/tmp/tmp_dmao20f/lib)

Converting 22 Fast5s:   0%|          | 0/88000 [00:08<?, ?Reads/s]
                                                                  
Can't read data (can't open directory: /scratch/tmp/tmprh5stj_2/lib)

Converting 22 Fast5s:   0%|          | 0/88000 [00:08<?, ?Reads/s]
                                                                  
Can't read data (can't open directory: /scratch/tmp/tmprm70eucb/lib)

Converting 22 Fast5s:   0%|          | 0/88000 [00:08<?, ?Reads/s]
                                                                  
Can't read data (can't open directory: /scratch/tmp/tmpc9nszjth/lib)

Converting 22 Fast5s:   0%|          | 0/88000 [00:08<?, ?Reads/s]
                                                                  
Can't read data (can't open directory: /scratch/tmp/tmpzz07dc_1/lib)

Converting 22 Fast5s:   0%|          | 0/88000 [00:09<?, ?Reads/s]
Converting 22 Fast5s:   0%|          | 0/88000 [00:09<?, ?Reads/s]
`

@HalfPhoton
Copy link
Collaborator

POD5_DEBUG will have generated .log files in the working directory - could you share those please?

@HalfPhoton
Copy link
Collaborator

This error can occur from HDF5 not finding the plugin to open the fast5 files.

Can you please ensure that you have vbz_h5py_plugin installed in the python environment?

Kind regards,
Rich

@Macdot3
Copy link
Author

Macdot3 commented Feb 7, 2024

sorry, here is the folder with the log files
Log.zip

@Macdot3
Copy link
Author

Macdot3 commented Feb 7, 2024

# packages in environment at /home/barresi.m/Nanopore/Dorado/POD5_ENV:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
aws-c-auth                0.7.14               h70caa3e_3    conda-forge
aws-c-cal                 0.6.9                h14ec70c_3    conda-forge
aws-c-common              0.9.12               hd590300_0    conda-forge
aws-c-compression         0.2.17               h572eabf_8    conda-forge
aws-c-event-stream        0.4.1                h17cd1f3_5    conda-forge
aws-c-http                0.8.0                hc6da83f_5    conda-forge
aws-c-io                  0.14.3               h3c8c088_1    conda-forge
aws-c-mqtt                0.10.1               h0ef3971_3    conda-forge
aws-c-s3                  0.5.0                hb337f33_1    conda-forge
aws-c-sdkutils            0.1.14               h572eabf_0    conda-forge
aws-checksums             0.1.17               h572eabf_7    conda-forge
aws-crt-cpp               0.26.1               h0637f07_8    conda-forge
aws-sdk-cpp               1.11.242             h65f022c_0    conda-forge
bzip2                     1.0.8                hd590300_5    conda-forge
c-ares                    1.26.0               hd590300_0    conda-forge
ca-certificates           2024.2.2             hbcca054_0    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
gflags                    2.2.2             he1b5a44_1004    conda-forge
glog                      0.6.0                h6f12383_0    conda-forge
h5py                      3.9.0           py312h34c39bb_0
hdf5                      1.12.1          nompi_h4df4325_104    conda-forge
icu                       73.2                 h59595ed_0    conda-forge
iso8601                   2.1.0              pyhd8ed1ab_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.21.2               h659d440_0    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
lib-pod5                  0.3.6                   py312_0    jannessp
libabseil                 20230802.1      cxx17_h59595ed_0    conda-forge
libarrow                  15.0.0           he2c5238_2_cpu    conda-forge
libarrow-acero            15.0.0           h59595ed_2_cpu    conda-forge
libarrow-dataset          15.0.0           h59595ed_2_cpu    conda-forge
libarrow-flight           15.0.0           hdc44a87_2_cpu    conda-forge
libarrow-flight-sql       15.0.0           hfbc7f12_2_cpu    conda-forge
libarrow-gandiva          15.0.0           hacb8726_2_cpu    conda-forge
libarrow-substrait        15.0.0           hfbc7f12_2_cpu    conda-forge
libblas                   3.9.0           21_linux64_openblas    conda-forge
libbrotlicommon           1.1.0                hd590300_1    conda-forge
libbrotlidec              1.1.0                hd590300_1    conda-forge
libbrotlienc              1.1.0                hd590300_1    conda-forge
libcblas                  3.9.0           21_linux64_openblas    conda-forge
libcrc32c                 1.1.2                h9c3ff4c_0    conda-forge
libcurl                   8.5.0                hca28451_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 hd590300_2    conda-forge
libevent                  2.1.12               hf998b51_1    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.2.0               h807b86a_5    conda-forge
libgfortran-ng            13.2.0               h69a702a_5    conda-forge
libgfortran5              13.2.0               ha4646dd_5    conda-forge
libgomp                   13.2.0               h807b86a_5    conda-forge
libgoogle-cloud           2.12.0               hef10d8f_5    conda-forge
libgrpc                   1.60.0               h74775cd_1    conda-forge
libiconv                  1.17                 hd590300_2    conda-forge
liblapack                 3.9.0           21_linux64_openblas    conda-forge
libllvm15                 15.0.7               hb3ce162_4    conda-forge
libnghttp2                1.58.0               h47da74e_1    conda-forge
libnl                     3.9.0                hd590300_0    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libnuma                   2.0.16               h0b41bf4_1    conda-forge
libopenblas               0.3.26          pthreads_h413a1c8_0    conda-forge
libparquet                15.0.0           h352af49_2_cpu    conda-forge
libprotobuf               4.25.1               hf27288f_1    conda-forge
libre2-11                 2023.06.02           h7a70373_0    conda-forge
libsqlite                 3.44.2               h2797004_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx-ng              13.2.0               h7e041cc_5    conda-forge
libthrift                 0.19.0               hb90f79a_1    conda-forge
libutf8proc               2.8.0                h166bdaf_0    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libxml2                   2.12.5               h232c23b_0    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
lz4-c                     1.9.4                hcb278e6_0    conda-forge
more-itertools            10.2.0             pyhd8ed1ab_0    conda-forge
ncurses                   6.4                  h59595ed_2    conda-forge
numpy                     1.26.3          py312heda63a1_0    conda-forge
ont_vbz_hdf_plugin        1.0.1                hb6da537_4    bioconda
openssl                   3.2.1                hd590300_0    conda-forge
orc                       1.9.2                h7829240_1    conda-forge
packaging                 23.2               pyhd8ed1ab_0    conda-forge
pip                       24.0               pyhd8ed1ab_0    conda-forge
pod5                      0.3.6                   py312_0    jannessp
polars                    0.20.7          py312hfa2e56e_0    conda-forge
pyarrow                   15.0.0          py312h176e3d2_2_cpu    conda-forge
python                    3.12.1          hab00c5b_1_cpython    conda-forge
python_abi                3.12                    4_cp312    conda-forge
pytz                      2024.1             pyhd8ed1ab_0    conda-forge
rdma-core                 50.0                 hd3aeb46_0    conda-forge
re2                       2023.06.02           h2873b5e_0    conda-forge
readline                  8.2                  h8228510_1    conda-forge
s2n                       1.4.3                h06160fa_0    conda-forge
setuptools                69.0.3             pyhd8ed1ab_0    conda-forge
snappy                    1.1.10               h9fff704_0    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
tqdm                      4.66.1             pyhd8ed1ab_0    conda-forge
tzdata                    2024a                h0c530f3_0    conda-forge
ucx                       1.15.0               h75e419f_3    conda-forge
wheel                     0.42.0             pyhd8ed1ab_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
zlib                      1.2.13               hd590300_5    conda-forge
zstd                      1.5.5                hfc55251_0    conda-forge

@HalfPhoton
Copy link
Collaborator

@Macdot3,

The log files show that this could be a problem with HDF5

  File "/home/barresi.m/Nanopore/Dorado/POD5_ENV/lib/python3.12/site-packages/pod5/tools/pod5_convert_from_fast5.py", line 540, in convert_fast5_read
    signal = raw["Signal"][()]
             ~~~~~~~~~~~~~^^^^

As for the plugin:

ont_vbz_hdf_plugin        1.0.1                hb6da537_4    bioconda

is not the same as the vbz_h5py_plugin in the pod5 dependencies.

Can you run pip list please?

@Macdot3
Copy link
Author

Macdot3 commented Feb 7, 2024

(/home/barresi.m/Nanopore/Dorado/POD5_ENV) barresi.m@login01:~$ pip list
DEPRECATION: Loading egg at /home/barresi.m/Nanopore/Dorado/POD5_ENV/lib/python3.12/site-packages/vbz_h5py_plugin-1.0.1-py3.12.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation.. Discussion can be found at https://github.com/pypa/pip/issues/12330
Package         Version
--------------- -------
colorama        0.4.6
h5py            3.9.0
iso8601         2.1.0
lib_pod5        0.3.6
more-itertools  10.2.0
numpy           1.26.3
packaging       23.2
pip             24.0
pod5            0.3.6
polars          0.20.7
pyarrow         15.0.0
pytz            2024.1
setuptools      69.0.3
tqdm            4.66.1
vbz_h5py_plugin 1.0.1
vbz_h5py_plugin 1.0.1
vbz_h5py_plugin 1.0.1
wheel           0.42.0

@HalfPhoton
Copy link
Collaborator

Hi @Macdot3,

This could be an issue with the vbz_h5py_plugin. We're seeing the deprecation warning about .egg files at the top and the 3 duplicate lines is also an issue - but it does seem to be installed.

Could you try this script in your environment - it's the __init__.py from the vbz_h5py_plugin module with some print statements.

I suspect that print(f"{lib_path=}") will be lib_path=/scratch/tmp/<tmp>/lib

import sys

def get_vbz_resource_path() -> str:
    """Get the path to the vbz plugin (lib) resource"""

    vbz_package = "vbz_h5py_plugin"
    vbz_target = "lib"

    # importlib.resources superseeded pkg_resources from python3.9+
    if sys.version_info.major == 3 and sys.version_info.minor > 8:
        import importlib.resources

        vbz_lib = importlib.resources.files(vbz_package) / vbz_target
        with importlib.resources.as_file(vbz_lib) as path:
            return str(path.absolute())
    else:
        import pkg_resources

        return pkg_resources.resource_filename(vbz_package, vbz_target)


def register_plugin() -> str:
    """Register the vbz hdf plugins with h5py"""

    lib_path = get_vbz_resource_path()
    try:
        # Add the vbz library path to the h5 plugin search paths
        from h5py import h5pl

        h5pl.prepend(bytes(lib_path, "UTF-8"))
        print(f"{lib_path=}")
    except (ImportError, AttributeError):
        # We don't have the plugin library in h5py<2.10 so we fall
        # back on an environment variable
        import os

        os.environ["HDF5_PLUGIN_PATH"] = lib_path
        print(f"{os.environ["HDF5_PLUGIN_PATH"]=}")
    return lib_path

register_plugin()

@Macdot3
Copy link
Author

Macdot3 commented Feb 7, 2024

I opened the /POD5_ENV/vbz_h5py_plugin-1.0.1/build/lib/vbz_h5py_plugin/init.py script and at the bottom the line relating to
print(f"{os.environ["HDF5_PLUGIN_PATH"]=}") is missing.
What should I replace or add?

Here the script:

`""" vbz_hdf_plugin imported at module import-time"""
# pylint: disable=E1101,C0415

import sys


def get_vbz_resource_path() -> str:
    """Get the path to the vbz plugin (lib) resource"""

    vbz_package = "vbz_h5py_plugin"
    vbz_target = "lib"

    # importlib.resources superseeded pkg_resources from python3.9+
    if sys.version_info.major == 3 and sys.version_info.minor > 8:
        import importlib.resources

        vbz_lib = importlib.resources.files(vbz_package) / vbz_target
        with importlib.resources.as_file(vbz_lib) as path:
            return str(path.absolute())
    else:
        import pkg_resources

        return pkg_resources.resource_filename(vbz_package, vbz_target)


def register_plugin() -> str:
    """Register the vbz hdf plugins with h5py"""

    lib_path = get_vbz_resource_path()
    try:
        # Add the vbz library path to the h5 plugin search paths
        from h5py import h5pl

        h5pl.prepend(bytes(lib_path, "UTF-8"))
    except (ImportError, AttributeError):
        # We don't have the plugin library in h5py<2.10 so we fall
        # back on an environment variable
        import os

        os.environ["HDF5_PLUGIN_PATH"] = lib_path
    return lib_path


register_plugin()
`

@HalfPhoton
Copy link
Collaborator

I wrote a copy of it above to run as a script - that copy should be good to go if you write it to a file e.g. test_h5py.py and run it

@Macdot3
Copy link
Author

Macdot3 commented Feb 7, 2024

You're right, I run your script and as results:

(/home/barresi.m/Nanopore/Dorado/POD5_ENV) barresi.m@login01:~$ python /home/barresi.m/Nanopore/Dorado/POD5_ENV/vbz_h5py_plugin-1.0.1/test_h5py.py install
lib_path='/scratch/tmp/tmpg90ofdww/lib'

Now, should I change the last path with the ine the script gives me?

@HalfPhoton
Copy link
Collaborator

This is telling us that the HDF5 library can't load the vbz decode library in your Slurm cluster environment.

I'm intrigued by this. Maybe it's my lack of experience with conda but why is the lib being loaded from a temporary directory instead of where the site-packages are installed in the conda environment?

For example, when I run the script from my python venv I get:

/.../venv/lib/python3.10/site-packages/vbz_h5py_plugin/lib

Does your conda environment work locally i.e. not in the slurm cluster?

You might be able to run the script locally get the real path and set with:
export HDF5_PLUGIN_PATH=</path/site-packages/vbz_h5py_plugin/lib>

@Macdot3
Copy link
Author

Macdot3 commented Feb 8, 2024

The cluster network administrators have created a python environment in which I cannot install any packages for business reasons. When I tried they denied me access. Therefore I chose to create a conda environment from my cluster folder to install pod5. However, once I did this, it initially didn't let me run the package commands because the vbz_h5py_plugin-1.0.1 plugin was missing. For the same reason as above, I downloaded the .tar.gz file locally and installed it in my environment and ran the script.
Now my script has this path /POD5_ENV/vbz_h5py_plugin-1.0.1/build/lib/vbz_h5py_plugin/__init__.py.
To this I added print(f"{os.environ["HDF5_PLUGIN_PATH"]=}") but I get no results. Maybe I need to run test_h5py.py? And then do the export path?

@HalfPhoton
Copy link
Collaborator

Hi @Macdot3, you don't need to edit vbz_h5py_plugin/__init__.py.. Just run the test_h5py.py script that I sent which should print the lib location. This should tell you where to set the path.

Alternatively, you could ask the administrators to download and install the vbz plugin manually
which should add the plugin to the correct localtion https://github.com/nanoporetech/vbz_compression?tab=readme-ov-file

@Macdot3
Copy link
Author

Macdot3 commented Feb 8, 2024

Thanks @HalfPhoton, I followed what you wrote to me. The only problem is that I noticed that it gives me a different path every time. I think the only way is to ask the administrators if they will reply to me

@HalfPhoton
Copy link
Collaborator

HalfPhoton commented Feb 8, 2024

@Macdot3.

The only problem is that I noticed that it gives me a different path every time.

Are you running the script locally - i.e. not on the slurm cluster

Does your conda environment work locally i.e. not in the slurm cluster?
You might be able to run the script locally get the real path and set with: ...

@Macdot3
Copy link
Author

Macdot3 commented Feb 8, 2024

Of course

@HalfPhoton
Copy link
Collaborator

Ok, I suggest asking an administrator to install the vbx plugin. Let us know if that helps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants