Add error details for failed remote job #370

pauljxtan · 2020-04-17T19:15:12Z

Context:

Currently, a failed remote job produces a generic error message that is difficult to diagnose. The job info response from the remote platform contains a meta field that is currently unused, but has some information that may be useful, e.g.,

{
    'error_code': 'hardware-message-error',
    'error_detail': 'The job failed because the message received from the hardware could not be understood.'
}

Description of the Change:

Add a meta property to the Job class. In the event of job failure, this dictionary will contain an error-code and error-detail.
Refresh the meta attribute (along with status) when Job.refresh() is called.
Add the contents of meta to the error message for a failed job.

Example:

Previous error message: "The remote job 928ee53e-677a-48c0-8ba5-a0566ba3e295 failed due to an internal server error. Please try again."
New error message: "The remote job 928ee53e-677a-48c0-8ba5-a0566ba3e295 failed due to an internal server error. Please try again. {'error_code': 'hardware-message-error', 'error_detail': 'The job failed because the message received from the hardware could not be understood.'}"

Benefits:

The added context in a copy-pastable error message is likely to make it easier to diagnose and debug job failures.

Possible Drawbacks:

The new error message is slightly verbose.

Related GitHub Issues:

N/A

lneuhaus

Nice, thanks a lot for having implemented this so quickly. I did not see any issues in the code, though I am not extremely familiar with this part of the codebase.

Have you been able to run an end-to-end test of this?

codecov · 2020-04-17T21:05:37Z

Codecov Report

Merging #370 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #370   +/-   ##
=======================================
  Coverage   97.69%   97.70%           
=======================================
  Files          52       52           
  Lines        6478     6485    +7     
=======================================
+ Hits         6329     6336    +7     
  Misses        149      149

Impacted Files	Coverage Δ
strawberryfields/api/connection.py	`100.00% <ø> (ø)`
strawberryfields/engine.py	`94.90% <ø> (ø)`
strawberryfields/api/job.py	`94.11% <100.00%> (+0.93%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5917bac...4c81ff2. Read the comment docs.

jswinarton

LGTM!

jswinarton · 2020-04-17T21:53:48Z

strawberryfields/engine.py

@@ -548,7 +548,7 @@ def run(self, program: Program, *, compile_options=None, **kwargs) -> Optional[R
                if job.status == "failed":
                    message = (
                        "The remote job %s failed due to an internal "
-                        "server error. Please try again." % job.id
+                        "server error. Please try again. %s" % (job.id, job.meta)


Any reason not to use new-style formatting here?

Oh, I think this is an artifact of Python's logger expecting printf string formatting. At one point, this line must have been inside the self.log.error() function.

Now that it's been refactored out, it's best to update this to use new-style formatting.

josh146

💯

josh146 · 2020-04-18T06:09:19Z

strawberryfields/engine.py

@@ -548,7 +548,7 @@ def run(self, program: Program, *, compile_options=None, **kwargs) -> Optional[R
                if job.status == "failed":
                    message = (
                        "The remote job %s failed due to an internal "
-                        "server error. Please try again." % job.id
+                        "server error. Please try again. %s" % (job.id, job.meta)


Oh, I think this is an artifact of Python's logger expecting printf string formatting. At one point, this line must have been inside the self.log.error() function.

Now that it's been refactored out, it's best to update this to use new-style formatting.

josh146 · 2020-04-18T06:11:34Z

strawberryfields/api/job.py

-        self._status = JobStatus(self._connection.get_job_status(self.id))
+        job_info = self._connection.get_job(self.id)
+        self._status = JobStatus(job_info.status)
+        self._meta = job_info.meta


Would it be a good idea to log the retrieved metadata at the DEBUG level here? Combined with #360, this will allow retrieved job metadata to appear within the users logs (if they decide to configure it).

Good idea, this has been added! Let me know if you think the message formatting needs to be changed.

pauljxtan · 2020-04-20T16:16:00Z

Just FYI, @lneuhaus and I coordinated an end-to-end test for a deliberately failing job and didn't find any problems.

strawberryfields/api/job.py

antalszava · 2020-04-20T17:39:57Z

strawberryfields/api/job.py

        along with the job result if the job is newly completed.
        """
        if self._status.is_final:
            self.log.warning("A %s job cannot be refreshed", self._status.value)
            return
-        self._status = JobStatus(self._connection.get_job_status(self.id))
+        job_info = self._connection.get_job(self.id)
+        self._status = JobStatus(job_info.status)


Logically for me, the meta data belongs to the status of the Job. At least this attribute will change the similarly to the change in status and as far as I understand it, is meant to provide more details about the status of the Job.

However, making this change in abstraction might not be so easy (would involve restructuring JobStatus, etc.), so I'm happy as is now.

Yes, I agree with the general idea, and also that it would be a relatively big change, so it seems best to tackle that in the future 🙂

antalszava · 2020-04-20T17:44:03Z

strawberryfields/engine.py

+                        "The remote job {} failed due to an internal "
+                        "server error. Please try again. {}".format(job.id, job.meta)


These were previously made so such that they can be used for logging, but here the message is already pre-defined, so I guess format should work just fine as well?

antalszava · 2020-04-20T17:55:48Z

tests/api/test_remote_engine.py

            JobStatus.COMPLETED
            if self.request_count >= self.REQUESTS_BEFORE_COMPLETED
            else JobStatus.QUEUED
        )
+        return Job(id_="123", status=status, connection=None)


A small dummy meta dictionary could be added here, and then later checked similarly to the status.

Good idea, this has been added!

antalszava · 2020-04-20T17:57:33Z

Thanks so much for this @pauljxtan, looks really good! 💯 Had a couple of suggestions which might be improvements to add, but I'm also happy with the current state.

Co-Authored-By: antalszava <[email protected]>

thisac

Looks good! 🚀

strawberryfields/api/job.py

Co-Authored-By: Theodor <[email protected]>

pauljxtan added 2 commits April 17, 2020 14:57

Add details to failed job error message

71e9d2b

Remove brackets

e1ec6b0

pauljxtan added the API label Apr 17, 2020

pauljxtan marked this pull request as ready for review April 17, 2020 19:16

pauljxtan marked this pull request as draft April 17, 2020 19:17

pauljxtan added the WIP label Apr 17, 2020

pauljxtan added 4 commits April 17, 2020 15:26

Add meta attr to Job, populate on refresh

8a87a98

Update Job constructor and docstrings

c83bb86

Update tests

535d2ed

Update remote engine tests

5339d06

pauljxtan requested review from josh146, thisac, antalszava, jswinarton and lneuhaus April 17, 2020 20:53

pauljxtan marked this pull request as ready for review April 17, 2020 20:53

pauljxtan removed the WIP label Apr 17, 2020

lneuhaus requested changes Apr 17, 2020

View reviewed changes

lneuhaus self-requested a review April 17, 2020 20:57

jswinarton approved these changes Apr 17, 2020

View reviewed changes

josh146 approved these changes Apr 18, 2020

View reviewed changes

pauljxtan added 2 commits April 20, 2020 09:23

Use new string formatting for error message

71bee78

Log debug message with job metadata

076d7b7

lneuhaus approved these changes Apr 20, 2020

View reviewed changes

Update changelog

bb926f6

antalszava reviewed Apr 20, 2020

View reviewed changes

strawberryfields/api/job.py Outdated Show resolved Hide resolved

antalszava reviewed Apr 20, 2020

View reviewed changes

Paul Tan and others added 2 commits April 20, 2020 14:55

Update strawberryfields/api/job.py

f33a17c

Co-Authored-By: antalszava <[email protected]>

Add check for meta attr in async job test

899229e

thisac approved these changes Apr 20, 2020

View reviewed changes

strawberryfields/api/job.py Outdated Show resolved Hide resolved

Update strawberryfields/api/job.py

4c81ff2

Co-Authored-By: Theodor <[email protected]>

pauljxtan merged commit b497f0f into master Apr 20, 2020

pauljxtan deleted the failed-job-error-details branch April 20, 2020 21:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add error details for failed remote job #370

Add error details for failed remote job #370

pauljxtan commented Apr 17, 2020 •

edited

Loading

lneuhaus left a comment

codecov bot commented Apr 17, 2020 •

edited

Loading

jswinarton left a comment

jswinarton Apr 17, 2020

josh146 Apr 18, 2020

pauljxtan Apr 20, 2020

josh146 left a comment

josh146 Apr 18, 2020

josh146 Apr 18, 2020

pauljxtan Apr 20, 2020

pauljxtan commented Apr 20, 2020

antalszava Apr 20, 2020 •

edited

Loading

pauljxtan Apr 20, 2020

antalszava Apr 20, 2020

antalszava Apr 20, 2020

pauljxtan Apr 20, 2020

antalszava commented Apr 20, 2020

thisac left a comment

		"The remote job {} failed due to an internal "
		"server error. Please try again. {}".format(job.id, job.meta)

Add error details for failed remote job #370

Add error details for failed remote job #370

Conversation

pauljxtan commented Apr 17, 2020 • edited Loading

lneuhaus left a comment

Choose a reason for hiding this comment

codecov bot commented Apr 17, 2020 • edited Loading

Codecov Report

jswinarton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josh146 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pauljxtan commented Apr 20, 2020

antalszava Apr 20, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antalszava commented Apr 20, 2020

thisac left a comment

Choose a reason for hiding this comment

pauljxtan commented Apr 17, 2020 •

edited

Loading

codecov bot commented Apr 17, 2020 •

edited

Loading

antalszava Apr 20, 2020 •

edited

Loading