Skip to content

Commit

Permalink
Fix two bugs with split_file_to_collection (#1358)
Browse files Browse the repository at this point in the history
* fix two bugs

* allow chunksize to be greater than the number of records in the input file without failing
* allow an empty file to be split without an error

* bump version number
  • Loading branch information
simonbray authored Nov 23, 2023
1 parent cffacdd commit 37305ce
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,7 @@ def split_by_record(args, in_file, out_dir, top, ftype):
for i in range(top):
f.readline()
n_records = 0
last_line_matched = False
for line in f:
if (num == 0 and re.match(sep, line) is not None) or (
num > 0 and n_records % num == 0
Expand All @@ -241,7 +242,7 @@ def split_by_record(args, in_file, out_dir, top, ftype):
if chunksize == 0: # i.e. no chunking
n_per_file = n_records // numnew
else:
numnew = n_records // chunksize
numnew = max(n_records // chunksize, 1) # should not be less than 1
n_per_file = chunksize

# make new files
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<tool id="split_file_to_collection" name="Split file" version="0.5.0">
<tool id="split_file_to_collection" name="Split file" version="0.5.1">
<description>to dataset collection</description>
<macros>
<xml name="regex_sanitizer">
Expand Down

0 comments on commit 37305ce

Please sign in to comment.