All notable changes, updates, and fixes to pod5 will be documented here
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- Removed use of python
build
when building wheel in cmake.
ArrowTableHandle
stream
member to store theBatchFileReader
backendArrowTableHandle
options
argument to pass inIpcReadOptions
pod5::default_memory_pool
function which selects an appropriate memory pool even on large page systems.
- Refactored Multi-threading in
DatasetReader
to prevent too many open files errors - Updated dependency to
pyarrow~=18.0.0
forpython>=3.9
- Relaxed h5py python dependency
- Support for python 3.13.
- Removed use of Boost. This does not affect the C interface, but may require changes to consumers of the C++ headers.
- Refactored directio writing engine to open up async io support.
- Fixed Boost version compatibility checking in Conan packages.
- New end reason for reads terminated due to an analysis configuration change.
- Reduced allocations when compressing signal.
- Crash when searching empty file for reads.
- Ability to disable flushing on batch complete
- Use new LinuxOutputStream to cache allocations and reduce memory when writing many files.
- Move svb headers to correct subdirectory in
- svb16 headers packaged with pod5
- Directio output now writes on batch complete without flushing explicitly.
- Added new end reasons "api_request" and "device_data_error" to allow for new read end reasons future minknow versions will generate.
- Allow directio to specify the chunk size directly.
- gcc8 builds
- Instability when creating a pod5 writer fails.
- Issue with directio mode where space is over reserved.
- Fixed issues reading signal from uncompressed pod5 files.
- Typechecking on
Writer.add_reads
to inform users incorrectly passingReadRecords
- Compatibility with numpy 2.0.
DatasetReader
correctly handles string paths
- Required pypa project metadata.
- Dropped support OSX builds for XCode < 14.2.
ReadRecord.to_read()
missing fields
- Conan windows upload jobs failure due to using different line endings.
- CI package uploading to PyPi following API token migration.
- Documentation for some functions.
- Explicitly sized type in
pod5_vbz_decompress_signal()
. - CI execution of tests.
- Updated
pre-commit
toclang-format-17
. - Updated Arrow to 12.0.0.
- Polars
ColumnNotFoundError: not_set
introduced bypolars==0.20.0
- Arrow build flags in conanfile are now configured in the configure() fnc rather than being default options.
- boost_internal_build flag in conanfile.
- CI now builds with the above flag turned on.
- CI for appleclang 14
- cppstd builds
- Support for Python 3.12
- Logging no longer calls
basicConfig
which may unintentionally edit users logging configuration
- Transfers dataframes used in subsetting / filter use categorical fields to reduce memory consumption
- Polars version increased to
~=0.19
- Documentation regarding positional arguments
- Renamed deprecated
polars.groupby
topolars.group_by
- Fixed a bug in the build scripts that prevented iOS and Windows Conan packages from being uploaded.
- Remove exposed artifactory URL env var from gitlab ci config.
convert to_fast5
writes byte encoded read_ids to match Minkow (wasstr
)
- Removed python3.7 support
- Corrected the visibility of dependencies when building pod5 as a shared library.
- Added compression status to
pod5 inspect summary <file>
- Added environment override "POD5_DISABLE_MMAP_OPEN" to force non-mmapped opening of files.
- Remove exposed artifactory URL env var from gitlab ci config.
convert to_fast5
writes byte encoded read_ids to match Minkow (wasstr
)
DatasetReader
class for reading collections of pod5 files- Return index errors when querying invalid errors from API's
- Recursive search for files now traverses symbolic links and ignores hidden files
- Tweak block size of directio writes to 1MB.
- Write pod5 files using DirectIO on Linux platforms (performance)
- Shared builds to conan
num_minknow_events
field description fromint8
touint64
ReadRecord.num_minknow_events
return type-hint fromfloat
toint
- Increased
numpy
minimum version to>= 1.21.0
- Improved performance of
subset
,filter
andmerge
tools. Repacker.wait
andRepacker.waiter
parameters
Repacker.wait
andRepacker.waiter
some parameters are deprecated and issueFutureWarning
Repacker.is_complete
returningTrue
when work is queued.
- Add API (pod5_open_file_options) to prevent pod5 from opening a file using mmap, instead using direct file IO.
- Default field values (empty string) when converting fast5 files with missing fields
- Corrected Oxford Nanopore Technologies company name in package metadata to use Public Limited Company (Plc) instead of Limited (Ltd)
- Limited the number of processes created when specifying
--threads
to the number of cpu cores availableos.cpu_count()
- Reduced the default value for
--threads
from 8 to 4 to improve stability on resource constrained systems
- Add API error when adding reads with invalid end reason, pore type or run info.
- Update internal arrow lib to not export flatbuffers symbols.
pod5 view
tool to view / inspect pod5 files as tables. Gives a >200x speed improvement compared topod5 inspect reads
pod5 recover
tool to recover data from corrupted / truncated pod5 filespod5 update
documentation- source distributions to pypi
pod5 subset
andpod5 filter
usespolars
to parse inputspod5 subset
andpod5 filter
csv formatting requirements tightenedpod5
tools which use multiple pod5 file inputs now accept directories which can be searched recursively with-r/--recursive
pod5 subset
--read-id-column
argument abbreviateion-r
change to-R
to allow-r/--recursive
to be consistent for all toolspod5
tools use hyphens in all arguments (e.g.--force-overwrite
and--read-id-column
)pod5 merge
andpod5 update
uses named-o/--output
argument instead of positionaloutput
argument to standardise toolspod5 update
progress bar and better detection of name conflicts- Minimised number of open file handles in tools to prevent
Too many open files
error - Logging added to
merge
,filter
andsubset
. Enabled withPOD5_DEBUG=1
pod5 inspect reads
deprecated in-favour ofpod5 view
- Exception raised when calling
pod5
without any arguments - Exception raised in
pod5 convert fast5
where closed writers were reopened after being closed by a caught exception - Fixed Gitlab 38, pod5_get_end_reason and pod5_get_pore_type ignoring input string length checks.
pod5 subset
--json
mapping argumentspod5 merge
--chunk-size
argumentReadTableVersion
replaced with an integer value
- Repacker
reads_completed
value while copying a selection of reads. - Fixed crash when trying to load files with a bad footer.
- Fixed merging many files running out the size limit of dictionary indices.
pod5 convert fast5
now creates logs whenPOD5_DEBUG=1
setpod5 convert fast5
checks multi-read fast5s at conversion time
- Fixed memory usage growth over time as signal was loaded with large pod5 files.
- Fixed crash loading malicious files (found via fuzz testing)
- Fixed leaks and UB when running unit tests.
- Fixed run-away memory consumption during fast5 conversion
- Updated internal arrow version to 8.0.0.3
- Fixed issue where pod5 would read out of bounds memory when decompressing some reads.
- Refactored
pod5 convert fast5
to useconcurrent.futures
only. - Add further info to error message when signal cannot be decompressed by zstd
- Make merge operation not generate multiple identical run infos.
- Fixed closing uninitialised file handles.
- Fixed
pod5 inspect reads
repeating header - Fixed a crash with certain pod5 search operations.
- Fix loading large pod5 files on virtual-memory limited systems.
- Added
--output
argument topod5 convert fast5
andto_fast5
replacing positional argument of the same name - Added
--strict
argument topod5 convert fast5
to promptly stop on exceptions - Added readthedocs documentation links in README.md
- Updated developer installation instructions to use
conan<2
- Reworked
pod5 convert fast5
to tolerate runtime exceptions - Use same type
run_info_index_t
forpod5_get_file_run_info_count
andpod5_get_file_run_info
.
- Fixed file handle leak in repacker
- Python API supports python 3.11
- Added missing python API wheels on windows
- Changed python API dependency version
pyarrow~=11.0.0
from8.0.0
to support python 3.11 - Changed python API dependency version
hdf5~=8.0.0
fromv7.0.0
to support python 3.11
- Added
pod5_get_read_count
to find the count of all reads in file - Added
pod5_get_read_ids
to retrieve all read id's in file - Added
pod5_get_file_run_info
to retrieve a run info at an absolute index in the file - Added
pod5_free_run_info
to free run info's (replacespod5_release_run_info
) - Added
pod5_get_file_run_info_count
to find the number of run info's in a file - Added
pod5 filter
tool to subset pod5 files with simple list of read ids - Added
tqdm
progress bar topod5 subset
(disable withPOD5_PBAR=0
)
- Reworked
pod5 subset
to give better control over resources used pod5 subset
can now parse csv and tsv tables / summariespod5 repack
now repacks all inputs one-to-one
- Deprecated
pod5_release_run_info
(seepod5_free_run_info
)
- Removed filepath header line from
pod5 inspect reads
- Added version attributes to
lib-pod5
- Versioning now controlled by VCS inspection using
setuptools_scm
- Added more
read_id
getter methods toReader
- Added support for python 3.8 + 3.10 on windows
- Added gcc7 linux build of pod5
- Update to zlib 1.2.13
- Update to zstd 1.5.4
- Pinned
pre-commit=v2.21.0
while supportingpython3.7
- Reworked
pod5 convert to_fast5
output filenames to allow for1-1
mapping
- Fixed
pod5 inspect read
- Fixed
pod5 convert to_fast5
creating an empty fast5 output - Fixed
pod5 convert to_fast5
ignoring the--force_overwrite
argument - Fixed issue where thread_pool.h wasn't shipped.
- Explicitly re-exported
lib-pod5
public symbols and addedpy.typed
marker file to support type-checking.
- Fixed issue where closing many pod5 files in sequence is slow.
- Fixed incorrect python types and adopted python type-checking.
- Linux python 3.11 wheels
- ReadTheDocs documentation support
- OSX arm64 wheel naming corrections - works with wider set of python executables
- Added
Reader.__iter__
method.
- Renamed
EndReason.name
toEndReason.reason
to access the inner enum and addedEndReason.name
as a property to return the string representation of this enum value. BaseRead
,Read
,CompressedRead
,Calibration
andPore
dataclasses are now mutable.
- Removed deprecated
Writer
functions.
- Fixed osx arm64 wheel compatibility for older python versions.
- Fixed EndReason type errors.
- Fixed EndReason in pod5 to fast5 conversion.
- Optimised the file writing utilities
- Restricted exported boost dependencies of conan package to just the boost::headers component.
- Documentation edits
Writer.add_reads
now handles bothRead
andCompressedRead
.
- Deprecated
Writer
methodsadd_read_object
andadd_read_objects
foradd_read
andadd_reads
respectively.
- Removed direct pod5 tool scripts.
- Fixed name of internal utils - "pad_file".
- Fixed spelling of various internal variables.
- Fixed
pod5 convert to_fast5
- Reformat c++ code with more consistent format file.
- Added
pod5
tools entry-point - Added api to query file version information as written on disk.
- Fixed signal_chunk_size type error in convert-from-fast5
- Replaced
ont_fast5_api
dependency withvbz_h5py_plugin
- Restructured Python packaging to include
lib_pod5_format
which contains the native bindings build from pybind11. pod5_format
andpod5_format_tools
are now pure python packages which depend onlib_pod5_format
- Python packages
pod5_format
andpod5_format_tools
have been merged into singlepod5
pure-python package. pod5-convert-from-fast5
--output-one-to-one
reworked so that output files maintain the input structure making this argument more flexible and avoid filename clobbering.- Added missing
lib_pod5.update_file
function to pyi. pod5-convert-from-fast5
output
now takes existing directories and writesoutput.pod5
(current behaviour) or creates a new file with the given name if it doesn't exist.- Renamed arguments in tools relating to multi-processing / multi-threading from
-p/--processes
to the mode common-t/--threads
.
- Fixed pod5-inspect erroring when loading data.
- Fixed issue where some files in between 0.34 - 0.38 wouldn't load correctly.
- Fixed migrating of large files from older versions.
- Fixed building against the c++ api - previously missing include files.
- All data in the read table that was previously contained in dictionaries of structs is now stored in the read table, or a new "run info" table.
This change simplifies data access into the pod5 files, and helps users who want to convert the pod5 data to pandas or other arrow-compatible reader formats.
Old data is migrated on load, and will continue to work, data can be permanently migrated using the tool
pod5-migrate
- Support for opening and writing "split" pod5 files. All API's now expect and return combined pod5 files.
- Updated Conan recipe to support building without specifying C++ standard version.
- Bump the Boost and Arrow versions to pick up latest changes.
- Support C++17 + C++20 with the conan package pod5 generates.
- Modified
pod5_format_tools/pod5_convert_to_fast5.py
to separatepod5_convert_to_fast5_argparser()
andconvert_from_fast5()
out frompod5_convert_from_fast5.main()
.
- Added
num_samples
field to read table, containing the total number of samples a read contains. The field is filled in by API if it doesn't exist.
- File version is now V2, due to the addition of
num_samples
.
- Fixed an issue where multi-threaded access to a single batch could cause a crash discovered by dorado testing.
- Fixed help text in convert to fast5 script.