Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dagster-gcp - Dataproc: retrieve job logs from GCS #1417

Open
natekupp opened this issue Jun 2, 2019 · 0 comments
Open

dagster-gcp - Dataproc: retrieve job logs from GCS #1417

natekupp opened this issue Jun 2, 2019 · 0 comments
Labels
area: integrations Related to general integrations, including requests for a new integration area: logging Related to Logging type: troubleshooting Related to debugging and error messages

Comments

@natekupp
Copy link
Contributor

natekupp commented Jun 2, 2019

When a dataproc job fails, we get an error like:

DagsterEventType.STEP_FAILURE for step dataproc_solid.compute dagster_gcp.dataproc.types.DataprocError: Job error: {'state': 'ERROR', 'details': 'Job failed with message [Exception in thread "main" scala.reflect.internal.FatalError: Incorrect options]. Additional details can be found in 'gs://dataproc-.../driveroutput'.

Inspecting those error logs shows the real problem:

gsutil cat "gs://dataproc-.../driveroutput*"

Error: Unknown option --date 2019-05-28
Error: Missing option --date
Usage: EventPipeline [options]

  --s3-bucket <value>      S3 bucket to read
  --s3-prefix <value>      S3 prefix to read
  --gcs-input-bucket <value>
                           GCS input bucket to read
  --gcs-output-bucket <value>
                           GCS output bucket to write
  --local-path <value>     Local path prefix
  --date <value>
Exception in thread "main" scala.reflect.internal.FatalError: Incorrect options
	at io.dagster.events.EventPipelineConfig$$anonfun$parse$1.apply(EventPipeline.scala:150)
	at io.dagster.events.EventPipelineConfig$$anonfun$parse$1.apply(EventPipeline.scala:150)
	at scala.Option.getOrElse(Option.scala:121)
	at io.dagster.events.EventPipelineConfig$.parse(EventPipeline.scala:149)
	at io.dagster.events.EventPipeline$.run(EventPipeline.scala:73)
	at io.dagster.events.models.SparkJob.main(SparkJob.scala:17)
	at io.dagster.events.EventPipeline.main(EventPipeline.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
@natekupp natekupp changed the title [Enhancement] Dataproc: retrieve job logs from GCS dagster-gcp - Dataproc: retrieve job logs from GCS Jun 3, 2019
@natekupp natekupp added the area: logging Related to Logging label Jun 3, 2019
@mgasner mgasner added the area: integrations Related to general integrations, including requests for a new integration label Mar 29, 2020
@natekupp natekupp added this to the Future Release milestone Jun 1, 2020
@catherinewu catherinewu added the type: troubleshooting Related to debugging and error messages label Aug 30, 2020
@mgasner mgasner modified the milestone: Future Release Oct 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: integrations Related to general integrations, including requests for a new integration area: logging Related to Logging type: troubleshooting Related to debugging and error messages
Projects
None yet
Development

No branches or pull requests

5 participants