Skip to content

Commit

Permalink
Added recommendation and example for job submitter node placement on …
Browse files Browse the repository at this point in the history
…ON_DEMAND nodes (aws#45)
  • Loading branch information
kmsiddh authored Apr 13, 2023
1 parent 90bbbe1 commit 7f98742
Show file tree
Hide file tree
Showing 2 changed files with 49 additions and 4 deletions.
52 changes: 49 additions & 3 deletions content/node-placement/docs/eks-node-placement.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,14 +115,60 @@ Multiple key value pairs for spark.kubernetes.node.selector.[labelKey] can be pa
## **Job submitter pod placement**

Similar to driver and executor pods, you can configure the job submitter pod's node selectors as well using the `emr-job-submitter` classification.
Using this classification, you can place the job submitter pod in a single AZ or using any Kubernetes labels that are applied to the nodes.
It is recommended for job submitter pods to have node placement on `ON_DEMAND` nodes and not `SPOT` nodes as the job will fail if the job submitter pod gets Spot instance interruptions.
You can also place the job submitter pod in a single AZ or use any Kubernetes labels that are applied to the nodes.

**Note: The job submitter pod is also referred as the job-runner pod**

StartJobRun request with ON_DEMAND node placement for job submitter pod

```
cat >spark-python-in-s3-nodeselector-job-submitter.json << EOF
{
"name": "spark-python-in-s3-nodeselector",
"virtualClusterId": "<virtual-cluster-id>",
"executionRoleArn": "<execution-role-arn>",
"releaseLabel": "emr-6.2.0-latest",
"jobDriver": {
"sparkSubmitJobDriver": {
"entryPoint": "s3://<s3 prefix>/trip-count.py",
"sparkSubmitParameters": "--conf spark.driver.cores=5 --conf spark.executor.memory=20G --conf spark.driver.memory=15G --conf spark.executor.cores=6"
}
},
"configurationOverrides": {
"applicationConfiguration": [
{
"classification": "spark-defaults",
"properties": {
"spark.dynamicAllocation.enabled":"false"
}
},
{
"classification": "emr-job-submitter",
"properties": {
"jobsubmitter.node.selector.eks.amazonaws.com/capacityType": "ON_DEMAND"
}
}
],
"monitoringConfiguration": {
"cloudWatchMonitoringConfiguration": {
"logGroupName": "/emr-containers/jobs",
"logStreamNamePrefix": "demo"
},
"s3MonitoringConfiguration": {
"logUri": "s3://joblogs"
}
}
}
}
EOF
aws emr-containers start-job-run --cli-input-json file:///spark-python-in-s3-nodeselector-job-submitter.json
```

StartJobRun request with Single AZ node placement for job submitter pod:

```
cat >spark-python-in-s3-job-submitter-nodeselector.json << EOF
cat >spark-python-in-s3-nodeselector-job-submitter-az.json << EOF
{
"name": "spark-python-in-s3-nodeselector",
"virtualClusterId": "<virtual-cluster-id>",
Expand Down Expand Up @@ -161,7 +207,7 @@ cat >spark-python-in-s3-job-submitter-nodeselector.json << EOF
}
}
EOF
aws emr-containers start-job-run --cli-input-json file:///spark-python-in-s3-nodeselector.json
aws emr-containers start-job-run --cli-input-json file:///spark-python-in-s3-nodeselector-job-submitter-az.json
```

StartJobRun request with single AZ and ec2 instance type placement for job submitter pod:
Expand Down
1 change: 0 additions & 1 deletion content/performance/docs/dra.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,6 @@ aws emr-containers start-job-run --cli-input-json file:///spark-python-in-s3-dra
**Observed Behavior:**
When the job gets started, the driver pod gets created and 10 executors are initially created. (`"spark.dynamicAllocation.initialExecutors":"10"`) Then the number of executors can scale up to a maximum of 100 (`"spark.dynamicAllocation.maxExecutors":"100"`).
**Configurations to note:**
**Please note that this feature is marked as Experimental as of Spark 3.0.0**

`spark.dynamicAllocation.shuffleTracking.enabled` - `**`Experimental`**`. Enables shuffle file tracking for executors, which allows dynamic allocation without the need for an external shuffle service. This option will try to keep alive executors that are storing shuffle data for active jobs.

Expand Down

0 comments on commit 7f98742

Please sign in to comment.