-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add error details for failed remote job #370
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks a lot for having implemented this so quickly. I did not see any issues in the code, though I am not extremely familiar with this part of the codebase.
Have you been able to run an end-to-end test of this?
Codecov Report
@@ Coverage Diff @@
## master #370 +/- ##
=======================================
Coverage 97.69% 97.70%
=======================================
Files 52 52
Lines 6478 6485 +7
=======================================
+ Hits 6329 6336 +7
Misses 149 149
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
strawberryfields/engine.py
Outdated
@@ -548,7 +548,7 @@ def run(self, program: Program, *, compile_options=None, **kwargs) -> Optional[R | |||
if job.status == "failed": | |||
message = ( | |||
"The remote job %s failed due to an internal " | |||
"server error. Please try again." % job.id | |||
"server error. Please try again. %s" % (job.id, job.meta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to use new-style formatting here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I think this is an artifact of Python's logger expecting printf
string formatting. At one point, this line must have been inside the self.log.error()
function.
Now that it's been refactored out, it's best to update this to use new-style formatting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
strawberryfields/engine.py
Outdated
@@ -548,7 +548,7 @@ def run(self, program: Program, *, compile_options=None, **kwargs) -> Optional[R | |||
if job.status == "failed": | |||
message = ( | |||
"The remote job %s failed due to an internal " | |||
"server error. Please try again." % job.id | |||
"server error. Please try again. %s" % (job.id, job.meta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I think this is an artifact of Python's logger expecting printf
string formatting. At one point, this line must have been inside the self.log.error()
function.
Now that it's been refactored out, it's best to update this to use new-style formatting.
self._status = JobStatus(self._connection.get_job_status(self.id)) | ||
job_info = self._connection.get_job(self.id) | ||
self._status = JobStatus(job_info.status) | ||
self._meta = job_info.meta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be a good idea to log the retrieved metadata at the DEBUG level here? Combined with #360, this will allow retrieved job metadata to appear within the users logs (if they decide to configure it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, this has been added! Let me know if you think the message formatting needs to be changed.
Just FYI, @lneuhaus and I coordinated an end-to-end test for a deliberately failing job and didn't find any problems. |
along with the job result if the job is newly completed. | ||
""" | ||
if self._status.is_final: | ||
self.log.warning("A %s job cannot be refreshed", self._status.value) | ||
return | ||
self._status = JobStatus(self._connection.get_job_status(self.id)) | ||
job_info = self._connection.get_job(self.id) | ||
self._status = JobStatus(job_info.status) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logically for me, the meta
data belongs to the status of the Job
. At least this attribute will change the similarly to the change in status
and as far as I understand it, is meant to provide more details about the status of the Job
.
However, making this change in abstraction might not be so easy (would involve restructuring JobStatus
, etc.), so I'm happy as is now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree with the general idea, and also that it would be a relatively big change, so it seems best to tackle that in the future 🙂
"The remote job {} failed due to an internal " | ||
"server error. Please try again. {}".format(job.id, job.meta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were previously made so such that they can be used for logging, but here the message is already pre-defined, so I guess format
should work just fine as well?
tests/api/test_remote_engine.py
Outdated
JobStatus.COMPLETED | ||
if self.request_count >= self.REQUESTS_BEFORE_COMPLETED | ||
else JobStatus.QUEUED | ||
) | ||
return Job(id_="123", status=status, connection=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small dummy meta dictionary could be added here, and then later checked similarly to the status
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, this has been added!
Thanks so much for this @pauljxtan, looks really good! 💯 Had a couple of suggestions which might be improvements to add, but I'm also happy with the current state. |
Co-Authored-By: antalszava <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! 🚀
Co-Authored-By: Theodor <[email protected]>
Context:
Currently, a failed remote job produces a generic error message that is difficult to diagnose. The job info response from the remote platform contains a
meta
field that is currently unused, but has some information that may be useful, e.g.,Description of the Change:
meta
property to theJob
class. In the event of job failure, this dictionary will contain anerror-code
anderror-detail
.meta
attribute (along withstatus
) whenJob.refresh()
is called.meta
to the error message for a failed job.Example:
"The remote job 928ee53e-677a-48c0-8ba5-a0566ba3e295 failed due to an internal server error. Please try again."
"The remote job 928ee53e-677a-48c0-8ba5-a0566ba3e295 failed due to an internal server error. Please try again. {'error_code': 'hardware-message-error', 'error_detail': 'The job failed because the message received from the hardware could not be understood.'}"
Benefits:
The added context in a copy-pastable error message is likely to make it easier to diagnose and debug job failures.
Possible Drawbacks:
The new error message is slightly verbose.
Related GitHub Issues:
N/A