An error occurred while generating random regions. #14

jinqiyuan1 · 2025-01-09T09:17:09Z

Does this software have any other requirements for BAM files? I provided the sorted BAM file for running, and the command is as follows:
python /public/home/yjq/tools/cfDNA_GCcorrection/cfDNA_GCcorrection/computeGCBias_background.py
-b /public/home/yjq/projects/PA_projects/data/NBT_WGS/bamfilter/SRR17478154_filter.sorted.bam
-g /public/home/yjq/genome_anno/hg19/hg19_UCSC.2bit
-p 2
-i
--output /public/home/yjq/projects/PA_projects/data/NBT_WGS/GC_correction/background/
--debug
The following error occurs:
Traceback (most recent call last):
File "/public/home/yjq/tools/cfDNA_GCcorrection/cfDNA_GCcorrection/computeGCBias_background.py", line 596, in
main()
File "/public/home/yjq/.local/lib/python3.8/site-packages/click/core.py", line 1161, in call
return self.main(*args, **kwargs)
File "/public/home/yjq/.local/lib/python3.8/site-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
File "/public/home/yjq/.local/lib/python3.8/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/public/home/yjq/.local/lib/python3.8/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
File "/public/home/yjq/tools/cfDNA_GCcorrection/cfDNA_GCcorrection/computeGCBias_background.py", line 549, in main
regions = get_regions(
File "/public/home/yjq/tools/cfDNA_GCcorrection/cfDNA_GCcorrection/computeGCBias_background.py", line 125, in get_regions
random_regions.to_dataframe(
File "/public/home/yjq/miniconda3/envs/celfeer_env/lib/python3.8/site-packages/pybedtools/bedtool.py", line 3762, in to_dataframe
return pandas.read_csv(self.fn, *args, sep="\t", **kwargs) # type: ignore
File "/public/home/yjq/miniconda3/envs/celfeer_env/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv
return _read(filepath_or_buffer, kwds)
File "/public/home/yjq/miniconda3/envs/celfeer_env/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 577, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/public/home/yjq/miniconda3/envs/celfeer_env/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1407, in init
self._engine = self._make_engine(f, self.engine)
File "/public/home/yjq/miniconda3/envs/celfeer_env/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1679, in _make_engine
return mapping[engine](f, **self.options)
File "/public/home/yjq/miniconda3/envs/celfeer_env/lib/python3.8/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 93, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 557, in pandas._libs.parsers.TextReader.cinit
pandas.errors.EmptyDataError: No columns to parse from file.
I am sure that I have successfully installed pandas and pybedtools. My deeptools version is 3.5.5, pandas version is 2.0.3, bedtools version is v2.31.1, pybedtools version is 0.11.0, and the Python version is 3.8.19.

sroener · 2025-01-09T10:56:15Z

Hi, from the error traceback it looks like no regions were selected.

Could you please run the command again with the --debug flag and share the resulting output?

Please make sure that you provide the same reference version (e.g. hg19/hg38) that the bam file was mapped to.

Additionally, if possible it would be great if you share your bam file and your whole software environment. This way I could see why no regions were created. One possible cause of this behavior could be that your reference 2bit file and your bam file do not share common chromosome names, which should be handled by an automatically created mapping between these files.

jinqiyuan1 · 2025-01-17T02:30:09Z

Thank you for your enthusiastic response. Your reply provided me with ideas on how to solve the problem, and I successfully ran the computeGCBias_background.py script by replacing the reference .2bit file as you suggested. However, I encountered the following issues in subsequent tests.

When running the computeGCBias_readlen script, if I include the -i parameter, the script successfully executes and produces the result file. However, if I remove this parameter, the script will encounter the following error: Yet, when I check the --help option, I see that the purpose of this parameter is to reduce precision but improve speed. Perhaps it should not result in an error when this parameter is omitted.
`/public/home/yjq/miniconda3/envs/celfeer_env/bin/computeGCBias_readlen:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
import('pkg_resources').require('cfDNA-GCcorrection==0.1')
Traceback (most recent call last):
File "/public/home/yjq/miniconda3/envs/celfeer_env/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3653, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 176, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 2606, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 2630, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 31

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/public/home/yjq/miniconda3/envs/celfeer_env/bin/computeGCBias_readlen", line 7, in
exec(compile(f.read(), file, 'exec'))
File "/public/home/yjq/tools/cfDNA_GCcorrection/bin/computeGCBias_readlen", line 12, in
main(args)
File "/public/home/yjq/.local/lib/python3.8/site-packages/click/core.py", line 1161, in call
return self.main(*args, **kwargs)
File "/public/home/yjq/.local/lib/python3.8/site-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
File "/public/home/yjq/.local/lib/python3.8/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/public/home/yjq/.local/lib/python3.8/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
File "/public/home/yjq/tools/cfDNA_GCcorrection/cfDNA_GCcorrection/computeGCBias_readlen.py", line 966, in main
r_data = get_ratio(data)
File "/public/home/yjq/tools/cfDNA_GCcorrection/cfDNA_GCcorrection/computeGCBias_readlen.py", line 576, in get_ratio
f_tmp = F_GC.loc[i].to_numpy()
File "/public/home/yjq/miniconda3/envs/celfeer_env/lib/python3.8/site-packages/pandas/core/indexing.py", line 1103, in getitem
return self._getitem_axis(maybe_callable, axis=axis)
File "/public/home/yjq/miniconda3/envs/celfeer_env/lib/python3.8/site-packages/pandas/core/indexing.py", line 1343, in _getitem_axis
return self._get_label(key, axis=axis)
File "/public/home/yjq/miniconda3/envs/celfeer_env/lib/python3.8/site-packages/pandas/core/indexing.py", line 1293, in _get_label
return self.obj.xs(label, axis=axis)
File "/public/home/yjq/miniconda3/envs/celfeer_env/lib/python3.8/site-packages/pandas/core/generic.py", line 4095, in xs
loc = index.get_loc(key)
File "/public/home/yjq/miniconda3/envs/celfeer_env/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3655, in get_loc
raise KeyError(key) from err
KeyError: 31`

When running the computeGCBias_readlen script, if I add the --precomputed_background parameter and provide the file generated by running the computeGCBias_background.py script, the following error occurs. This precomputed background file may lose its significance if there are mismatches in chromosome names between the background file and the data being processed.
/public/home/yjq/miniconda3/envs/celfeer_env/bin/computeGCBias_readlen:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html __import__('pkg_resources').require('cfDNA-GCcorrection==0.1') Traceback (most recent call last): File "/public/home/yjq/miniconda3/envs/celfeer_env/bin/computeGCBias_readlen", line 7, in <module> exec(compile(f.read(), __file__, 'exec')) File "/public/home/yjq/tools/cfDNA_GCcorrection/bin/computeGCBias_readlen", line 12, in <module> main(args) File "/public/home/yjq/.local/lib/python3.8/site-packages/click/core.py", line 1161, in __call__ return self.main(*args, **kwargs) File "/public/home/yjq/.local/lib/python3.8/site-packages/click/core.py", line 1082, in main rv = self.invoke(ctx) File "/public/home/yjq/.local/lib/python3.8/site-packages/click/core.py", line 1443, in invoke return ctx.invoke(self.callback, **ctx.params) File "/public/home/yjq/.local/lib/python3.8/site-packages/click/core.py", line 788, in invoke return __callback(*args, **kwargs) File "/public/home/yjq/tools/cfDNA_GCcorrection/cfDNA_GCcorrection/computeGCBias_readlen.py", line 906, in main chrom_dict = { File "/public/home/yjq/tools/cfDNA_GCcorrection/cfDNA_GCcorrection/computeGCBias_readlen.py", line 907, in <dictcomp> precomputed_chrom_mapping[key]: tuple(value) KeyError: 'chr1_gl000191_random'
When running the correctGCBias_readlen script, you are prompted that the -w parameter does not exist. You may want to check if there is an error in your Readme file.
Error: No such option: -w /public/home/yjq/miniconda3/envs/celfeer_env/bin/correctGCBias_readlen:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html __import__('pkg_resources').require('cfDNA-GCcorrection==0.1') Usage: correctGCBias_readlen [OPTIONS] Try 'correctGCBias_readlen --help' for help.
If the BAM files that require GC correction are quite large, the software will correspondingly demand a significant amount of memory. If there is insufficient memory, the software will not be able to run. I suspect that using bedtools might be the cause of this issue.
Finally, thank you sincerely for your reply and assistance. Thank you very much～

sroener · 2025-01-23T18:19:09Z

Hi,

thank you for using the software and reporting issues you ran into. I'll try to answer your questions as good as possible.

Thank you for pointing the error out. This seems to be an edge case that should be fixed. First I want to describe how you can proceed with your work. The help message for the -i message was outdated and is now updated. If activated, it uses splines to interpolate missing values and smoothes existing values considering neighbouring bins. By now I would recommend using this option to get better correction values. Runtime should not be impacted too much.

Related to the issue, could you please run the script on one of your bam files with the options -i --MeasurementOutput and send me the resulting table. These flags will interpolate the values, but save a copy of the raw measured values that would be the input for the function causing the error you reported. It is a simple table containing counts for measured and expected reads binned by their fragment lengt and GC content.

I'm sorry to hear that you expected different behavior. The idea of the script is to create a background file that is representative for multiple files aligned to the same reference genome. That way the computation does not have to be repeated. The pitfall is that all files need to have the same chromosomes, which sometimes can be tricky with non-standard chromosomes. If you are looking for standard chromosomes (1-22+x+y) an easy fix to your issue would be the --standard_chroms option.
You are right. The -w shorthand for --weights was depricated, but still in the example code. I updated the documentation. In theory, you could just ommit the option, because it is the default by now.
Could you give me a bit more information? Which script is demanding a significant amount of memory? How much memory would that be?

Otherwise, it's hard to determine, what causes the memory demand. I spent some time optimizing the resource requirements, and had no problems on reasonable hardware. One idea I have, might be the number of cores. You can think of it as spawning workers that read lots of chunks from your BAM file. If many of them are open at the same time, the memory footprint increases. If you are comfortable to do so, you could profile the memory usage with a memory profiler (I had good experiences with scalene).

I hope this helps you in any way.

sroener self-assigned this Jan 9, 2025

sroener mentioned this issue Jan 23, 2025

Hotfix1.0.1 #15

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An error occurred while generating random regions. #14

An error occurred while generating random regions. #14

jinqiyuan1 commented Jan 9, 2025

sroener commented Jan 9, 2025

jinqiyuan1 commented Jan 17, 2025

sroener commented Jan 23, 2025

An error occurred while generating random regions. #14

An error occurred while generating random regions. #14

Comments

jinqiyuan1 commented Jan 9, 2025

sroener commented Jan 9, 2025

jinqiyuan1 commented Jan 17, 2025

sroener commented Jan 23, 2025