Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in elitist_race -> race_wrapper -> race_wrapper_helper -> set #80

Open
sbomsdorf opened this issue Jan 28, 2025 · 12 comments
Open
Labels
bug Something isn't working

Comments

@sbomsdorf
Copy link
Contributor

Dear @MLopez-Ibanez,

Both a colleague and I found the same internal error in iracedump.rda files which have the following traceback:

> iracedump[["irace.internal.error(msg)"]][["bt"]]
[1] "7: irace.assert(isTRUE(all.equal(configurations_id, sapply(experiments, " 
[2] "       getElement, \"id_configuration\"))))"                              
[3] "6: execute_evaluator(race_state$target_evaluator, experiments, scenario, "
[4] "       target_output, configurations[[\".ID.\"]])"                        
[5] "5: testConfigurations(configurations, scenario)"                          
[6] "4: testing_common(configurations, scenario, iraceResults)"                
[7] "3: testing_fromlog(logFile = scenario$logFile)"                           
[8] "2: irace_common(scenario = scenario, simple = FALSE)"                     
[9] "1: irace_cmdline()"  

We execute irace with different code in different programming languages but on the same hardware using slurm and therefore both a target_runner and target_evaluator. The training works perfectly fine and also for the test instances, the output files are there. In the iracedump.rda the output also show, e.g., in iracedump[["testConfigurations(configurations, scenario)"]][["target_output"]][[1]][["outputRaw"]]. The above error did not show in the terminal output of irace but there the output ends with the table of elite configurations that are to be tested. Along the above lines, the irace.Rdata lacks the testing data.

Could you please point us to whether this can be a problem in some parameter definition in a scenario.txt file and/or how we can assess the actual data that causes the assertion to be thrown?

Please let us know if you need more information/data to further investigate the issue.

Many thanks in advance!

@MLopez-Ibanez
Copy link
Owner

MLopez-Ibanez commented Jan 29, 2025 via email

@sbomsdorf
Copy link
Contributor Author

It happened with irace 4.0.886dd4c. We are currently running everything again using 4.2.0.c9d441b-dirty and will keep you posted on the results with the development version.

@MLopez-Ibanez
Copy link
Owner

It happened with irace 4.0.886dd4c. We are currently running everything again using 4.2.0.c9d441b-dirty and will keep you posted on the results with the development version.

Thanks. Please let me know if you detect anything wrong.

@MLopez-Ibanez
Copy link
Owner

Hi, @sbomsdorf any news about this?

@sbomsdorf
Copy link
Contributor Author

Hi,

I personally encounter another issue (irace is stuck when evaluating the first batch of instances run using the target-evaluator; I suspect an issue with the file system and/or my code but not irace since my colleague's code runs) . My colleague currently performs the exact same runs as before, but has not reached the testing phase yet. We will keep you posted.

Regards,
Stefan

@MLopez-Ibanez
Copy link
Owner

Thanks.

Are you using target-evaluator just because of using slurm ? If you have some knowledge about R, it may be better to create a targetRunnerParallel function in your scenario.txt and the batchtools package to implement the parallelization:

https://mllg.github.io/batchtools/reference/makeClusterFunctionsSlurm
https://mllg.github.io/batchtools/reference/btlapply.html

Or the clustermq pacakge: https://mschubert.github.io/clustermq/

This may be more reliable than what target-evaluator is currently doing.

@sbomsdorf
Copy link
Contributor Author

Hi again,

First, the original issue seems to be resolved in the updated version of irace (the development version mentioned above, to be precise). The change from training to testing in irace works now and the output is as expected, as confirmed by my colleague.

Still, my problem with irace not being able to evaluate the output files of the first run persists. Indeed, we are using target-evaluator only because of slurm, i.e., to wait for all the output files to be available. We do not have sufficient knowledge in R to use the R packages, or, in other words, using the target-evaluator script is more accessibly/clear to us from a usability point of view.

The most recent output of my irace run is the table header and there is no iracedump.rda (stuck/running forever?).

+-+-----------+-----------+-----------+----------------+-----------+--------+-----+----+------+
| |   Instance|      Alive|       Best|       Mean best| Exp so far|  W time|  rho|KenW|  Qvar|
+-+-----------+-----------+-----------+----------------+-----------+--------+-----+----+------+

How do I use the debugInfo parameter? What are allowed input values?
(Section 11.1 General options of the user guide lacks this info)

@MLopez-Ibanez
Copy link
Owner

First, the original issue seems to be resolved in the updated version of irace (the development version mentioned above, to be precise). The change from training to testing in irace works now and the output is as expected, as confirmed by my colleague.

Great!

How do I use the debugInfo parameter? What are allowed input values? (Section 11.1 General options of the user guide lacks this info)

You can use values 1, 2 or 3, with 3 being the more verbose (I have updated the user-guide to mention this explicitly).

I'd suggest you use debugLevel=3 (or --debug-level 3 when invoking irace) and it will report what is running at that point. You may want to redirect the output of irace to a file using "irace --debug-level 3 .... &> irace-output.txt", where "..." is any other command-line parameters that you use.

If irace is stuck at that point, it is usually because the target-runner (or target-evaluator) are still running. A process can be running but consume no CPU.

@sbomsdorf
Copy link
Contributor Author

Thank you very much for the prompt update of the guide!

I've used debugLevel=3 and am now able to see the output of the target-evaluator. All the instances submitted to slurm ran and the target-evaluator also processes the results correctly, i.e., prints the cost for irace. However, the output with debugLevel=3 just stops after the last instance is processed by the target-evaluator. Apparently, there is some kind of issue with the interface of the target-evaluator output and irace. For example, this is the last lines of the output of irace with debugLevel=3:

# 2025-02-13 10:32:53 CET: /home/<user>/<project>/code/tuning/target-evaluator 33
/home/<user>/<project>/code/tuning/target-evaluator 30
/home/<user>/<project>/code/tuning/target-evaluator 1189893065
/home/<user>/<project>/code/tuning/target-evaluator /home/<user>/<project>/code/data/instances/"instance1.txt --capacity=45"
/home/<user>/<project>/code/tuning/target-evaluator 33
/home/<user>/<project>/code/tuning/target-evaluator 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
# 2025-02-13 10:33:03 CET: DONE (33) Elapsed wall-clock seconds: 10.01
55182608

Thereafter, it stops. In the output without setting the debugLevel this corresponds to only the table header of the first irace run being printed.

I have already tried to use different number formats for the output costs (5.5182608123e7 vs. 55182608.123 vs. 55182608) without changing any change in the above-described issue.

@sbomsdorf
Copy link
Contributor Author

I just found an error message in the iracedump.rda (only output if debugLevel=3):

> attributes(iracedump)[["error.message"]]
[1] "Error in set(target_output, j = "configuration", value = unlist_element(experiments,  :
Supplied 31 items to be assigned to 124 items of column 'configuration'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
Calls: irace_cmdline ... elitist_race -> race_wrapper -> race_wrapper_helper -> set
"

Unfortunately, I do not understand what underlying problem the error message suggests. What are the items, what is the column configuration?

@MLopez-Ibanez
Copy link
Owner

That looks like a genuine bug. Perhaps some race condition. Could you share the iracedump.rda and the full output when using debugLevel=3? If you don't want to share it in github, just send me an email. It is also strange that irace does not simply stop and report the error, but fixing the error may fix that.

@MLopez-Ibanez MLopez-Ibanez added the bug Something isn't working label Feb 13, 2025
@sbomsdorf
Copy link
Contributor Author

Okay. I have shared the output via email. Please let me know if you need any other info to support the debugging process.

@sbomsdorf sbomsdorf changed the title After running test instances: assertion error in execute_evaluator() Error in Error elitist_race -> race_wrapper -> race_wrapper_helper -> set Feb 13, 2025
@sbomsdorf sbomsdorf changed the title Error in Error elitist_race -> race_wrapper -> race_wrapper_helper -> set Error in elitist_race -> race_wrapper -> race_wrapper_helper -> set Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants