Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] H5 Files with split_output renaming scheme duplicates name #38

Open
bear-is-asleep opened this issue Jan 30, 2025 · 1 comment
Open
Labels
bug Something isn't working

Comments

@bear-is-asleep
Copy link
Contributor

Describe the bug

base:
  split_output: true

Causes the file name to duplicate in what may be an unecessary way. For example I have a list of files

larcv_electron.root
larcv_muon.root
larcv_pion.root

And the output is

larcv_electron_spine.h5
larcv_muon_spine.h5
larcv_pion_spine.h5

Which is fine, but if want to run a new post processor on these files (even after moving them to a new folder) I get

larcv_electron_larcv_electron_spine.h5
larcv_muon_larcv_muon_spine.h5
larcv_pion_larcv_pion_spine.h5

and so on

To Reproduce
My yaml

# Load HDF5 files
io:
  reader:
    name: hdf5
  writer:
    name: hdf5
    overwrite: false
    file_name: null
    keys:
      - run_info
      - meta
      - points
      - points_label
      - points_g4
      - depositions
      - depositions_label
      - depositions_q_label
      - depositions_g4
      - sources
      - sources_label
      - reco_particles
      - truth_particles
      - reco_interactions
      - truth_interactions
      - flashes
      - flashes_xa

# Base configuration
base:
  split_output: true
  iterations: -1
  overwrite_log: true

# Build output representations
build:
  mode: both
  units: cm
  fragments: false
  particles: true
  interactions: true

# Run post-processors
post:
  flash_match:
    method: likelihood
    cfg: /sdf/data/neutrino/bearc/spine/prod/pgun_uniform/flashmatch_pgun.cfg
    flash_key: flashes
    volume: tpc
    detector: sbnd
    #Light yield calculation - https://arxiv.org/pdf/1909.07920
    scaling: 1 #1/555 (If reprocessing data that's already been scaled, use 1)
    alpha: 0.21
    #recombination_mip: 0.277 #based on fmatch BNB studies - to move bias to 0
    recombination_mip: 0.6 #default

To run

. /sdf/data/neutrino/bearc/spine/prod/pgun_uniform/configure_myspineprod.sh
bash $MLPROD_BASEDIR/run.sh --config fmatch.cfg -s h5_files.txt --flashmatch --ntasks 3 --time 00:30:00 --partition ampere --account neutrino:icarus-ml

Expected behavior
Do not copy the root name in these cases, I'm not sure how this should be handled to be honest.

Code base
Provide the following:

Additional context
It may just be that my spine version is old, lmk if this is the case.

@bear-is-asleep bear-is-asleep added the bug Something isn't working label Jan 30, 2025
@francois-drielsma
Copy link
Contributor

Hi @bear-is-asleep! If I run the above procedure (produce *_spine.h5 files first, then run the exact configuration above in SPINE), I simply get a set of *_spine_spine.h5 files, which is what I would expect (default suffix if not specified explicitly). Specifically:

python3 bin/run.py -c fmatch.cfg -s larcv_muon_spine.h5 larcv_electron_spine.h5 larcv_pion_spine.h5

simply adds a _spine suffix to the file names.

Moving on to a test with spine_prod instead, I tested with the command:

. /sdf/data/neutrino/bearc/spine/prod/pgun_uniform/configure_myspineprod.sh
bash $MLPROD_BASEDIR/run.sh --config fmatch.cfg --flashmatch --ntasks 3 --time 00:30:00 --partition ampere --account neutrino:icarus-ml file_list.txt

which is different from the above (I don't believe -s actually works with spine_prod).

I got the following file names:

output_spine/larcv_electron_spine_larcv_electron_spine_spine.h5
output_spine/larcv_muon_spine_larcv_muon_spine_spine.h5
output_spine/larcv_pion_spine_larcv_pion_spine_spine.h5

Final attempt, I used the latest spine_prod release (instead of the custom install you are using), and the problem went away:

output_spine/larcv_electron_spine_spine.h5
output_spine/larcv_muon_spine_spine.h5
output_spine/larcv_pion_spine_spine.h5

TLDR: I believe your local spine_prod install is outdated. Please try to update it and let me know if it resolves the issue! If it does, I'll close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants