You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues and did not find a match.
Who can help?
No response
What are you working on?
We are trying to execute deidentification pipeline which reads data from Maria DB table and writes to snowflake
these are the steps that are involved to execute this script in AWS step function
1)create EMR cluster using step function
2)Install dependencies
3)Trigger deidentification script
Current Behavior
When i run my script manually on EMR cluster my script is running fine. But when i try to execute the script on EMR cluster using step function .It is failing at downloading the pretrained models step. error_log.txt
Attaching the complete log for your reference
Expected Behavior
we expect it download models and load deidentified data into table. Here is the log that is created when we run manually on EMR output.log
Since you mentioned deidentification pipeline, this is a licensed pipeline that requires a licensed library. It's best if you look for support either here or on the Slack (under healthcare channel)
Is there an existing issue for this?
Who can help?
No response
What are you working on?
We are trying to execute deidentification pipeline which reads data from Maria DB table and writes to snowflake
these are the steps that are involved to execute this script in AWS step function
1)create EMR cluster using step function
2)Install dependencies
3)Trigger deidentification script
Current Behavior
When i run my script manually on EMR cluster my script is running fine. But when i try to execute the script on EMR cluster using step function .It is failing at downloading the pretrained models step.
error_log.txt
Attaching the complete log for your reference
Expected Behavior
we expect it download models and load deidentified data into table. Here is the log that is created when we run manually on EMR
output.log
Steps To Reproduce
Use the below code to create step function and run the script.
{
"Comment": "A description of my state machine",
"StartAt": "StoreClusterId",
"States": {
"StoreClusterId": {
"Type": "Pass",
"Result": {
"ClusterId": "j-3LN9LXY44F0W2"
},
"ResultPath": "$.Input",
"Next": "CopyZipFile"
},
"CopyZipFile": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId": "j-3LN9LXY44F0W2",
"Step": {
"Name": "Copy ZIP File",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"bash",
"-c",
"aws s3 cp s3://inno-data/cldp_emr_phase2_v4.zip /home/hadoop/cldp_emr_phase2_v4.zip"
]
}
}
},
"ResultPath": "$.CopyInfo",
"Next": "UnzipCode"
},
"UnzipCode": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId": "j-3LN9LXY44F0W2",
"Step": {
"Name": "Unzip Code",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"bash",
"-c",
"unzip /home/hadoop/cldp_emr_phase2_v4.zip -d /home/hadoop/cldp_emr_phase2_v4"
]
}
}
},
"ResultPath": "$.UnzipInfo",
"Next": "SetExecutePermission"
},
"SetExecutePermission": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId": "j-3LN9LXY44F0W2",
"Step": {
"Name": "Set Execute Permission",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"bash",
"-c",
"chmod +x /home/hadoop/cldp_emr_phase2_v4/cldp_emr_phase2_v4/jsl/setup_env_nlp.sh && /home/hadoop/cldp_emr_phase2_v4/cldp_emr_phase2_v4/jsl/setup_env_nlp.sh"
]
}
}
},
"ResultPath": "$.PermissionInfo",
"Next": "RunPythonScript"
},
"RunPythonScript": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId": "j-3LN9LXY44F0W2",
"Step": {
"Name": "Run Python Script",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"bash",
"-c",
"chmod +x /home/hadoop/cldp_emr_phase2_v4/cldp_emr_phase2_v4/jsl/install_nlp.py && /home/hadoop/cldp_emr_phase2_v4/cldp_emr_phase2_v4/jsl/jsl_env/bin/python /home/hadoop/cldp_emr_phase2_v4/cldp_emr_phase2_v4/jsl/install_nlp.py"
]
}
}
},
"ResultPath": "$.StepInfo",
"Next": "RunPythonScript2"
},
"RunPythonScript2": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId": "j-3LN9LXY44F0W2",
"Step": {
"Name": "Run Python Script",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"bash",
"-c",
"cd /home/hadoop/cldp_emr_phase2_v4/cldp_emr_phase2_v4/jsl && /home/hadoop/cldp_emr_phase2_v4/cldp_emr_phase2_v4/jsl/jsl_env/bin/python /home/hadoop/cldp_emr_phase2_v4/cldp_emr_phase2_v4/jsl/jsl_convatec_deid_script.py"
]
}
}
},
"ResultPath": "$.StepInfo",
"End": true
}
}
}
It is failing at the step name:RunPythonScript2
Spark NLP version and Apache Spark
sparknlp version :5.5.0
spark version :3.4.0
Type of Spark Application
Python Application
Java Version
No response
Java Home Directory
No response
Setup and installation
No response
Operating System and Version
No response
Link to your project (if available)
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered: