Write speedups for multi-tiles runs #461
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds a couple of -- simple but efficient -- "tricks" to speed-up the individual
fba-TILEID.fits
file writing:write_assignment_fits()
: when N>1 tiles are run at once, those are sequentially written in the current main;desitarget.geomask.match()
function, instead of building a dictionary with aTARGETID
for each key, which takes a long time when dealing with 10M+ targets.Those two "tricks" are controlled by two new arguments in
parse_assign()
infiberassign/scripts/assign.py
:--write_fits_numproc (default=0)
: number of jobs to be run in parallel when callingwrite_assignment_fits()
;--fast_match (default=False)
.The use case is when one runs N>>1 tiles at once with tens of millions of targets (I developed that to explore the expected fiberassign when the GD1 tiles will be added to the Bright program).
The default values are disabling the "tricks", so that it is backwards-compatible, and this will change nothing in operations.
Test case with the GD1 tiles:
This case had ~800 tiles, and 7M targets plus 43M sky targets, so 50M targets.
The actual part of running the fiberassign is reasonably fast (1h), but the writing part was not.
Here are few timings to write the ~800
fba-TILEID.fits
files (ie once the fiberassign is computed):--fast_match True --write_fits_numproc 0
: 1h30.--fast_match True --write_fits_numproc 64
: 13min--fast_match True --write_fits_numproc 128
: 10min (no significant improvement w.r.t. 64 processes).Note about the
fast_match
trick:It assumes that the input
TARGETID
are unique.If that is not the case, results will be wrong (ie the column values will be wrongly filled); but that s probably also the case for the current main... (I don t have clear, in-depth, understanding of the process).
As the goal is to look for speed, no checks are done (uniqueness of input the
TARGETID
, correctness of the matching).Note to myself: a sanity check is planned to be implemented in
desitarget.geomask.match()
: desihub/desitarget#811; once that is done, I will have to make sure that such check is disable in the call here, to preserve the speed-up.For sanity, I've rerun 1% of the main tiles (125 backup, bright, dark tiles) with the current main and with turning on the
--fast_match
option, and I get no difference in the output data.Note about the
write_fits_numproc
trick:I had to use
multiprocessing.pool.ThreadPool
instead of the "usual"multiprocessing.Pool
, because the latter was crashing with the Assignment() objectasgn
.I found this workaround with a quick google search and it worked.
@dstndstn: in case you have time to have a look, I'd appreciate! but there is no rush.