Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edge case for processing 1 file when >1 workers are provided #773

Open
pweigel opened this issue Nov 25, 2024 · 0 comments
Open

Edge case for processing 1 file when >1 workers are provided #773

pweigel opened this issue Nov 25, 2024 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@pweigel
Copy link
Collaborator

pweigel commented Nov 25, 2024

Describe the bug
There seems to be a weird edge case when processing single file "datasets" using more than one worker. I guess this is because

n_workers = min(self._num_workers, nb_files)
if n_workers > 1:
self.info(
f"Starting pool of {n_workers} workers to process"
" {nb_files} {unit}"
)
manager = Manager()
index = Value("i", 0)
output_files = manager.list()
pool = Pool(
processes=n_workers,
initializer=init_global_index,
initargs=(index, output_files),
)
map_fn = pool.imap
is setting n_workers = 1 when there is one file and does not use multiprocessing, but
if self._num_workers > 1:
with global_index.get_lock(): # type: ignore[name-defined]
start_idx = global_index.value # type: ignore[name-defined]
event_nos = np.arange(start_idx, start_idx + n_ids, 1).tolist()
global_index.value += n_ids # type: ignore[name-defined]
uses self._num_workers and tries to access the global variables that are used for multiprocessing.

To Reproduce
Steps to reproduce the behavior:

  1. Process i3 files using >1 workers and only one file in the input folder

Expected behavior
It should allocate just one worker and be processed normally.

Full traceback

File "<path>/graphnet/src/graphnet/data/dataconverter.py", line 260, in _request_event_nos

    with global_index.get_lock():  # type: ignore[name-defined]
         ^^^^^^^^^^^^
NameError: name 'global_index' is not defined. Did you mean: 'init_global_index'?
@pweigel pweigel added the bug Something isn't working label Nov 25, 2024
@pweigel pweigel self-assigned this Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant