-
Notifications
You must be signed in to change notification settings - Fork 1
Add pipeline example for Flares #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
""" WalkthroughA new example pipeline script for the FLARES simulation is introduced. It loads galaxy data from HDF5 files, constructs galaxy objects with computed properties, and distributes processing using MPI and multiprocessing. The script sets up emission models, filters, and instruments, computes synthetic observations including spectra, lines, photometry, and imaging, attaches analysis functions for galaxy metadata and spectral slopes, and saves results to an HDF5 file. Changes
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (5)
FLARES/flares_pipeline_example.py (5)
19-19
: Drop unusedload_FLARES
import
load_FLARES
is never referenced – remove it to keep the import list tidy and avoid pulling in an unnecessary dependency.-from synthesizer.load_data.load_flares import load_FLARES
25-32
: Avoid per-call MPI lookup in_print
Fetching
COMM_WORLD
on every call adds avoidable overhead. Cachecomm
,rank
, and the formatted prefix once at module load.-def _print(*args, **kwargs): - """Overload print with rank info.""" - comm = mpi.COMM_WORLD - rank = comm.Get_rank() - size = comm.Get_size() - print(f"[{str(rank).zfill(len(str(size)) + 1)}]: ", end="") - print(*args, **kwargs) +_COMM = mpi.COMM_WORLD +_RANK = _COMM.Get_rank() +_SIZE = _COMM.Get_size() +_PREFIX = f"[{str(_RANK).zfill(len(str(_SIZE)) + 1)}]: " + +def _print(*args, **kwargs): + """MPI-rank aware print.""" + print(_PREFIX, *args, **kwargs)
196-202
: Update docstring to match signature
filter_collection
&emission_model
are no longer parameters; remove them from the Args section to avoid confusion.
312-331
: Return type & unused loop variable
get_UV_slopes
returns a dict, not afloat
, and the second loop variable is unused.for spec_type, _ in obj.spectra.items(): slopes[spec_type] = ...Also fix the docstring’s “Returns” section to
dict[str, float]
.
334-353
: Mirror the UV-slope cleanup for IRSame unused variable and docstring mismatch apply here; replicate the fix.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
FLARES/flares_pipeline_example.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
FLARES/flares_pipeline_example.py
19-19: synthesizer.load_data.load_flares.load_FLARES
imported but unused
Remove unused import: synthesizer.load_data.load_flares.load_FLARES
(F401)
326-326: Loop control variable d
not used within loop body
Rename unused d
to _d
(B007)
348-348: Loop control variable d
not used within loop body
Rename unused d
to _d
(B007)
🪛 Pylint (3.3.7)
FLARES/flares_pipeline_example.py
[refactor] 34-34: Too many local variables (40/15)
(R0914)
[refactor] 184-184: Too many arguments (7/5)
(R0913)
[refactor] 184-184: Too many positional arguments (7/5)
(R0917)
[refactor] 184-184: Too many local variables (24/15)
(R0914)
[error] 259-259: Possibly using variable 'kernel' before assignment
(E0606)
🔇 Additional comments (1)
FLARES/flares_pipeline_example.py (1)
455-460
: Analysis lambdas rely on attributes that may not existOnce the metadata patch above is applied this block is fine; otherwise it will crash. Re-run the pipeline after adding those fields to verify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
FLARES/flares_pipeline_example.py (1)
257-269
: Function relies on a globalkernel
; pass it explicitly
get_flares_galaxies
dereferenceskernel
, which is injected later in the main-block. Importing this module or re-using the function elsewhere will raiseNameError
. Pass the kernel as a parameter and drop the hidden global dependency.-def get_flares_galaxies(master_file_path, region, snap, nthreads, comm, rank, size): +def get_flares_galaxies(master_file_path, region, snap, nthreads, comm, rank, size, kernel): @@ - for gal in galaxies: + for gal in galaxies: if gal.gas.nparticles > 0: @@ - kernel=kernel, + kernel=kernel,Corresponding call site:
- galaxies = get_flares_galaxies(..., size) + galaxies = get_flares_galaxies(..., size, kernel)
🧹 Nitpick comments (3)
FLARES/flares_pipeline_example.py (3)
19-19
: Remove unusedload_FLARES
import
load_FLARES
is imported but never referenced, triggering a Ruff F401 warning. Dropping it avoids needless namespace pollution.-from synthesizer.load_data.load_flares import load_FLARES
330-334
: Rename throw-away loop variable to_
The loop variabled
is never used, raising Ruff B007. Use_
to signal intent.-for spec_type, d in obj.spectra.items(): +for spec_type, _ in obj.spectra.items():Repeat for the IR-slope loop on lines 352-356.
Also applies to: 352-356
25-32
: Rank-aware_print
produces superfluous leading zeros
zfill(len(str(size)) + 1)
pads one extra digit, so forsize=8
rank 0 prints00
instead of0
.- print(f"[{str(rank).zfill(len(str(size)) + 1)}]: ", end="") + print(f"[{str(rank).zfill(len(str(size)))}]: ", end="")Pure cosmetics, but helps log readability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
FLARES/flares_pipeline_example.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
FLARES/flares_pipeline_example.py
19-19: synthesizer.load_data.load_flares.load_FLARES
imported but unused
Remove unused import: synthesizer.load_data.load_flares.load_FLARES
(F401)
330-330: Loop control variable d
not used within loop body
Rename unused d
to _d
(B007)
352-352: Loop control variable d
not used within loop body
Rename unused d
to _d
(B007)
🪛 Pylint (3.3.7)
FLARES/flares_pipeline_example.py
[refactor] 34-34: Too many local variables (40/15)
(R0914)
[refactor] 188-188: Too many arguments (7/5)
(R0913)
[refactor] 188-188: Too many positional arguments (7/5)
(R0917)
[refactor] 188-188: Too many local variables (24/15)
(R0914)
[error] 263-263: Possibly using variable 'kernel' before assignment
(E0606)
"--nthreads", | ||
type=int, | ||
help="The number of threads to use.", | ||
) | ||
|
||
# Parse the arguments | ||
args = parser.parse_args() | ||
nthreads = args.nthreads | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nthreads
can be None
→ multiprocessing.Pool()
will raise TypeError
args.nthreads
defaults to None
when the CLI flag is omitted, but mp.Pool(None)
is invalid. Guard or default to os.cpu_count()
.
- nthreads = args.nthreads
+ # Fallback to all logical CPUs when the flag is absent
+ nthreads = args.nthreads or os.cpu_count()
Also consider capping nthreads
to avoid spawning size × nthreads
processes under MPI.
Also applies to: 250-252
🤖 Prompt for AI Agents
In FLARES/flares_pipeline_example.py around lines 250-252 and 368-376, the
nthreads argument can be None if the CLI flag is omitted, causing
multiprocessing.Pool(None) to raise a TypeError. Fix this by setting a default
value for nthreads to os.cpu_count() when args.nthreads is None. Additionally,
add logic to cap nthreads to a reasonable maximum to prevent excessive process
spawning under MPI environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
FLARES/flares_pipeline_example.py (1)
369-378
: Guardnthreads
– None propagates downstream and can oversubscribe MPI ranks
mp.Pool()
acceptsNone
, but later you pass the same value intoPipeline(nthreads=…)
. InPipeline
this is forwarded tonumba.set_num_threads
, which does raise onNone
. Default toos.cpu_count()
and allow users to down-scale explicitly.- args = parser.parse_args() - nthreads = args.nthreads + args = parser.parse_args() + # Fallback to all logical CPUs, but allow explicit down-scaling. + nthreads = args.nthreads or os.cpu_count()
🧹 Nitpick comments (5)
FLARES/flares_pipeline_example.py (5)
19-20
: Remove the dead import to keep the namespace clean
load_FLARES
is never referenced; delete the line to avoid the Ruff F401 warning and unnecessary I/O overhead when the module is imported.-from synthesizer.load_data.load_flares import load_FLARES
235-242
: Variable shadowing hides the outerrank
The dictionary comprehension re-uses the loop variable namerank
, masking the MPI rank received as a function argument and hurting readability.- gals_on_rank = {rank: [] for rank in range(size)} - for i in order: + gals_on_rank = {r: [] for r in range(size)} + for i in order: ... - select = np.argmin(parts_per_rank) - gals_on_rank[select].append(i) - parts_per_rank[select] += s_lens[i] + target = np.argmin(parts_per_rank) + gals_on_rank[target].append(i) + parts_per_rank[target] += s_lens[i]
331-336
: Suppress unused loop variable warnings
d
is unused in both loops; rename to_
(or_d
) to silence Ruff B007 and clarify intent.- for spec_type, d in obj.spectra.items(): + for spec_type, _ in obj.spectra.items():(Apply the same change in
get_IR_slopes
.)
302-317
: Docstring return type is wrong
Both helpers return adict
mapping spectrum types → slope, not a singlefloat
. Adjusting the docstring prevents confusion in interactive use and IDE autocompletion.- float: The UV slope. + dict[str, float]: β slopes keyed by spectrum label.(Replicate for
get_IR_slopes()
.)
250-252
: Pool size might be zero whennthreads
is 0
A user could accidentally pass--nthreads 0
. Guard and fall back to1
.- with mp.Pool(nthreads) as pool: + with mp.Pool(max(1, nthreads)) as pool:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
FLARES/flares_pipeline_example.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
FLARES/flares_pipeline_example.py
19-19: synthesizer.load_data.load_flares.load_FLARES
imported but unused
Remove unused import: synthesizer.load_data.load_flares.load_FLARES
(F401)
332-332: Loop control variable d
not used within loop body
Rename unused d
to _d
(B007)
354-354: Loop control variable d
not used within loop body
Rename unused d
to _d
(B007)
🪛 Pylint (3.3.7)
FLARES/flares_pipeline_example.py
[refactor] 34-34: Too many local variables (40/15)
(R0914)
[refactor] 188-188: Too many arguments (7/5)
(R0913)
[refactor] 188-188: Too many positional arguments (7/5)
(R0917)
[refactor] 188-188: Too many local variables (23/15)
(R0914)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
FLARES/flares_pipeline_example.py (1)
377-377
: Handle potential None value for nthreads.The
nthreads
argument can beNone
if the CLI flag is omitted, which will causemultiprocessing.Pool(None)
to raise aTypeError
.- nthreads = args.nthreads + nthreads = args.nthreads or os.cpu_count()
🧹 Nitpick comments (6)
FLARES/flares_pipeline_example.py (6)
19-19
: Remove unused import.The
load_FLARES
import is never used in the code.-from synthesizer.load_data.load_flares import load_FLARES
114-114
: Fix typo in comment."Early exist" should be "Early exit".
- # Early exist if there are fewer than 100 stars + # Early exit if there are fewer than 100 stars
225-225
: Fix typo in comment."Early exist" should be "Early exit".
- # Early exist if there are no galaxies + # Early exit if there are no galaxies
258-274
: Remove commented code or make it conditional.The commented optical depth calculation code should either be removed if not needed or made conditional based on the emission model configuration.
If this code is intended for future use when attenuation is enabled, consider making it conditional:
- # Loop over galaxies and calculate the optical depths - # This is needed if attenuation is included in the emission model - """ - for gal in galaxies: - if gal.gas.nparticles > 0: - # stars - gal.stars.tau_v = gal.get_stellar_los_tau_v( - kappa=0.0795, - kernel=kernel, - ) - # BH - gal.black_holes.tau_v = gal.get_black_hole_los_tau_v( - kappa=0.07, - kernel=kernel, - ) - else: - gal.stars.tau_v = np.zeros(gal.stars.nparticles) - gal.black_holes.tau_v = np.zeros(gal.stars.nparticles) - """ + # Calculate optical depths if attenuation is enabled + # (Currently disabled for intrinsic emission model) + if hasattr(model, 'dust_curve') and model.dust_curve is not None: + for gal in galaxies: + if gal.gas.nparticles > 0: + gal.stars.tau_v = gal.get_stellar_los_tau_v( + kappa=0.0795, + kernel=kernel, + ) + gal.black_holes.tau_v = gal.get_black_hole_los_tau_v( + kappa=0.07, + kernel=kernel, + ) + else: + gal.stars.tau_v = np.zeros(gal.stars.nparticles) + gal.black_holes.tau_v = np.zeros(gal.black_holes.nparticles)
332-332
: Remove unused loop variable.The loop variable
d
is not used within the loop body.- for spec_type, d in obj.spectra.items(): + for spec_type, _d in obj.spectra.items():
354-354
: Remove unused loop variable.The loop variable
d
is not used within the loop body.- for spec_type, d in obj.spectra.items(): + for spec_type, _d in obj.spectra.items():
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
FLARES/flares_pipeline_example.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
FLARES/flares_pipeline_example.py
19-19: synthesizer.load_data.load_flares.load_FLARES
imported but unused
Remove unused import: synthesizer.load_data.load_flares.load_FLARES
(F401)
332-332: Loop control variable d
not used within loop body
Rename unused d
to _d
(B007)
354-354: Loop control variable d
not used within loop body
Rename unused d
to _d
(B007)
🪛 Pylint (3.3.7)
FLARES/flares_pipeline_example.py
[refactor] 34-34: Too many local variables (40/15)
(R0914)
[refactor] 188-188: Too many arguments (7/5)
(R0913)
[refactor] 188-188: Too many positional arguments (7/5)
(R0917)
[refactor] 188-188: Too many local variables (23/15)
(R0914)
region = "04" # region | ||
snap = "008_z007p000" # the fourth snapshot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Make region and snapshot configurable.
The region and snapshot should be configurable through command-line arguments rather than hardcoded.
parser.add_argument(
"--region",
type=str,
default="04",
help="FLARES region to process."
)
parser.add_argument(
"--snapshot",
type=str,
default="008_z007p000",
help="Snapshot to process."
)
🤖 Prompt for AI Agents
In FLARES/flares_pipeline_example.py around lines 406 to 407, the region and
snapshot variables are hardcoded. Modify the script to accept these values as
command-line arguments by adding argparse arguments for --region and --snapshot
with the provided defaults. Replace the hardcoded assignments with the parsed
argument values to make region and snapshot configurable at runtime.
grid_dir = '/yourpath/grids/' | ||
grid_name = 'bpass-2.2.1-bin_chabrier03-0.1,100.0_cloudy-c23.01-sps' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Replace hardcoded paths with configurable options.
The hardcoded paths should be made configurable through command-line arguments or configuration files to improve usability.
Consider adding command-line arguments for these paths:
parser.add_argument(
"--grid-dir",
type=str,
default="/yourpath/grids/",
help="Directory containing the grids."
)
parser.add_argument(
"--master-file",
type=str,
default="/cosma7/data/dp004/dc-payy1/my_files/flares_pipeline/data/flares.hdf5",
help="Path to the FLARES master file."
)
parser.add_argument(
"--output-file",
type=str,
default="/yourpath/pipeline_results.hdf5",
help="Output file path for pipeline results."
)
Then use these arguments instead of hardcoded paths.
Also applies to: 394-395, 405-405, 479-479
🤖 Prompt for AI Agents
In FLARES/flares_pipeline_example.py around lines 380-381, replace the hardcoded
grid_dir and grid_name paths with command-line arguments using argparse. Add
arguments for grid-dir, master-file, and output-file with appropriate default
values and help descriptions. Then update the code to use these argument values
instead of the hardcoded strings. Apply similar changes to lines 394-395, 405,
and 479 to ensure all hardcoded paths are configurable via command-line options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (6)
FLARES/flares_pipeline_example.py (6)
380-381
: Make hardcoded paths configurable.These hardcoded paths were flagged in previous reviews but remain unaddressed. Consider adding command-line arguments for better usability.
+ parser.add_argument( + "--grid-dir", + type=str, + default="/yourpath/grids/", + help="Directory containing the grids." + )
394-395
: Make filter file paths configurable.These hardcoded filter file paths should be configurable through command-line arguments as suggested in previous reviews.
405-407
: Make master file path and processing parameters configurable.The master file path, region, and snapshot should be configurable rather than hardcoded, as suggested in previous reviews.
479-479
: Make output path configurable.The output file path should be configurable through command-line arguments as suggested in previous reviews.
250-251
: Fix potential TypeError with None nthreads.When
--nthreads
is not provided,args.nthreads
will beNone
, causingmp.Pool(None)
to raise aTypeError
. This issue was flagged in previous reviews but remains unaddressed.+ # Handle None nthreads case + if nthreads is None: + nthreads = os.cpu_count() # Get all the galaxies using multiprocessing with mp.Pool(nthreads) as pool:
369-377
: Address the nthreads None handling and consider more configurability.Two issues:
args.nthreads
can beNone
, which causes issues downstream- Previous reviews suggested making region, snapshot, and paths configurable
parser.add_argument( "--nthreads", type=int, + default=os.cpu_count(), help="The number of threads to use.", ) + parser.add_argument( + "--region", + type=str, + default="04", + help="FLARES region to process." + ) + parser.add_argument( + "--snapshot", + type=str, + default="008_z007p000", + help="Snapshot to process." + )
🧹 Nitpick comments (3)
FLARES/flares_pipeline_example.py (3)
19-19
: Remove unused import.The
load_FLARES
import is not used in this script and should be removed to keep the code clean.-from synthesizer.load_data.load_flares import load_FLARES
332-332
: Fix unused loop variable.The loop variable
d
is not used within the loop body. Rename it to_
to indicate it's intentionally unused.- for spec_type, d in obj.spectra.items(): + for spec_type, _ in obj.spectra.items():
354-354
: Fix unused loop variable.The loop variable
d
is not used within the loop body. Rename it to_
to indicate it's intentionally unused.- for spec_type, d in obj.spectra.items(): + for spec_type, _ in obj.spectra.items():
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
FLARES/flares_pipeline_example.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
FLARES/flares_pipeline_example.py
19-19: synthesizer.load_data.load_flares.load_FLARES
imported but unused
Remove unused import: synthesizer.load_data.load_flares.load_FLARES
(F401)
332-332: Loop control variable d
not used within loop body
Rename unused d
to _d
(B007)
354-354: Loop control variable d
not used within loop body
Rename unused d
to _d
(B007)
🪛 Pylint (3.3.7)
FLARES/flares_pipeline_example.py
[refactor] 34-34: Too many local variables (40/15)
(R0914)
[refactor] 188-188: Too many arguments (7/5)
(R0913)
[refactor] 188-188: Too many positional arguments (7/5)
(R0917)
[refactor] 188-188: Too many local variables (23/15)
(R0914)
🔇 Additional comments (4)
FLARES/flares_pipeline_example.py (4)
25-32
: LGTM - Clean MPI-aware print wrapper.The function properly handles MPI rank information for distributed debugging.
118-124
: LGTM - Metadata attributes properly added.The Galaxy object now includes all required metadata attributes (master_index, region, grp_id, subgrp_id) that were missing in previous versions. This addresses the downstream AttributeError issue.
279-316
: LGTM - Well-structured filter loading functions.Both functions properly handle file caching and use appropriate filter configurations for JWST NIRCam and UVJ observations.
440-479
: LGTM - Comprehensive pipeline execution with one dependency.The pipeline setup and execution is well-structured, covering all major synthetic observation types (spectra, lines, photometry, imaging) and including appropriate analysis functions. However, this depends on fixing the
nthreads
None handling issue mentioned earlier.
Here I have added an example showing how to run Pipeline for Flares, using a data loader from @WillJRoper . This example shows how to load info from the Flares Master file on Cosma to then generate spectra, photometry, and imaging.
Summary by CodeRabbit