Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when continuing training #5

Open
tgeijten opened this issue Nov 13, 2023 · 7 comments
Open

Error when continuing training #5

tgeijten opened this issue Nov 13, 2023 · 7 comments

Comments

@tgeijten
Copy link

When an earlier run is found, I get the following error message:

Found earlier run, continuing training.
Traceback (most recent call last):
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "D:\Development\depRL\deprl\main.py", line 151, in <module>
    main()
  File "D:\Development\depRL\deprl\main.py", line 147, in main
    train(config)
  File "D:\Development\depRL\deprl\main.py", line 77, in train
    logger.initialize(
  File "D:\Development\depRL\deprl\vendor\tonic\utils\logger.py", line 272, in initialize
    current_logger = Logger(*args, **kwargs)
  File "D:\Development\depRL\deprl\vendor\tonic\utils\logger.py", line 79, in __init__
    create_resumed_results_path(config, env)
  File "D:\Development\depRL\deprl\vendor\tonic\utils\path_utils.py", line 10, in wrapper
    result = func(*args, **kwargs)
  File "D:\Development\depRL\deprl\vendor\tonic\utils\logger.py", line 56, in create_resumed_results_path
    folder = get_sorted_folders(folders[0][1])[-1]
  File "D:\Development\depRL\deprl\vendor\tonic\utils\logger.py", line 27, in get_sorted_folders
    sorted_folders = sorted(folders, key=get_datetime_key)
  File "D:\Development\depRL\deprl\vendor\tonic\utils\logger.py", line 18, in get_datetime_key
    date_time_str = s.split(".")[0] + s.split(".")[1]
IndexError: list index out of range
@P-Schumacher
Copy link
Collaborator

Hi tgeijten, what exactly did you run and when did this error occur?

@tgeijten
Copy link
Author

Hi Pierre, here's an example of steps to reproduce the issue (Windows 10):

  1. Run: python -m deprl.main scone_run_h0918.yaml
  2. Wait until some checkpoints are generated
  3. Cancel the optimization
  4. Run again: python -m deprl.main scone_run_h0918.yaml

@tgeijten
Copy link
Author

Any idea yet what could be causing this? If you point me at the right bit of code, I can have a look for myself 😁

@P-Schumacher
Copy link
Collaborator

P-Schumacher commented Jul 10, 2024

hey Thomas,
some folder path isn't recognised correctly and then the string splitting tries to index something which doesn't exist.
I believe this error only happens on windows, because of some difference in how folder paths are handled.

I can take a look at the linux version to make sure it works. I'll try to get you a more precise update on it, which can help you fix it for windows

I'm relatively certain it's happening in this line:

folder = get_sorted_folders(folders[0][1])[-1]

@tgeijten
Copy link
Author

Thanks for the update. Let me know if there's anything I can do to help testing on Windows.

@P-Schumacher
Copy link
Collaborator

Hey Thomas,
I pushed an update to the dev branch on the repo, that should print the path that is being loaded.
Can you try again after installing from the dev branch?

I also ran github actions on windows, but I can't run hyfydy, as I didn't install the license key on the github test cluster.
My tests worked, so it might be related to the specific path of the reloaded hyfydy experiment.

When running it again, can you search for a line that looks like this and report back:
Found earlier run, continuing training: Path is: ./tests/test_DEPRL\myoLeg

@tgeijten
Copy link
Author

Thanks, this is what I get:

Found earlier run, continuing training: Path is: D:/Dropbox/Documents/SCONE/_output\sconerun_h0918_v1

I suppose this path doesn't include the date/time string, so the subsequent call get_datetime_key() fails?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants