Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting spark.app.id to a more intuitive name #124

Merged
merged 3 commits into from
Aug 17, 2023
Merged

Conversation

edingroot
Copy link
Contributor

@edingroot edingroot commented Aug 17, 2023

  1. Setting spark app id to same as the spark app name
  2. Replacing -, ., , characters to _ for app id

Test: test starting a new spark session and call the Spark's API at http://<driver_ip>:<ui_port>/api/v1/applications:

[ {
  "id" : "jupyterhub_chi_test_spark_39091_1692232285",
  "name" : "jupyterhub_chi_test-spark_39091_1692232285",
  "attempts" : [ {
    "startTime" : "2023-08-17T00:31:28.283GMT",
    "endTime" : "1969-12-31T23:59:59.999GMT",
    "lastUpdated" : "2023-08-17T00:31:28.283GMT",
    "duration" : 1038136,
    "sparkUser" : "chi",
    "completed" : false,
    "appSparkVersion" : "3.2.4",
    "startTimeEpoch" : 1692232288283,
    "endTimeEpoch" : -1,
    "lastUpdatedEpoch" : 1692232288283
  } ]
} ]


if aws_creds[2] is not None:
spark_conf['spark.hadoop.fs.s3a.aws.credentials.provider'] = AWS_ENV_CREDENTIALS_PROVIDER

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just for moving the following lines down a bit for better code readability:

# app_name from env is already appended port and time to make it unique
app_name = (spark_opts_from_env or {}).get('spark.app.name')
if not app_name:
# We want to make the app name more unique so that we can search it
# from history server.
app_name = f'{app_base_name}_{ui_port}_{int(time.time())}'

@edingroot edingroot merged commit 457b5d1 into master Aug 17, 2023
1 check passed
@edingroot edingroot deleted the u/chi/spark_app_id branch August 17, 2023 17:36
# in all places for metric systems:
# - since in the Promehteus metrics endpoint those will be converted to '_'
# - while the 'spark-app-selector' executor pod label will keep the original app id
app_id = re.sub(r'[\.,-]', '_', app_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed it earlier.
There is a character limit of 63 for app id and I think 253 for app name. app id needs to be trimmed to make sure it is within the limit.
Secondly, would likely be useful to have app name as: service__instance_timestamp or service__job__id__action_name_timestamp or adhoc/tron jobs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants