-
Notifications
You must be signed in to change notification settings - Fork 164
TonY Configurations
Phat Dai Tran edited this page Sep 21, 2018
·
26 revisions
Name | Default | Meaning |
---|---|---|
tony.other.namenodes | Namenode URIs to get delegation tokens from. | |
tony.yarn.queue | default | Default queue to submit to YARN. |
tony.application.name | TensorFlowApplication | Name of your YARN application. |
tony.application.node-label | YARN partition which this application should run in. | |
tony.application.single-node | false | Whether this is single node training or not. |
tony.application.enable-preprocess | false | Whether the AM should invoke the user's python script or not. |
tony.application.timeout | 0 | Max runtime of the application before killing it, in milliseconds. |
Name | Default | Meaning |
---|---|---|
tony.task.executor.jvm.opts | -Xmx1536m | JVM opts for each TaskExecutor. |
tony.task.registration-timeout-sec | 300 | Timeout, in seconds, for AM to resubmit unregistered tasks (or fail if no retries configured). |
tony.task.registration-retry-count | 0 | How many times we should resubmit unregistered tasks after the timeout interval. |
tony.task.heartbeat-interval | 1000 | Frequency, in milliseconds, for which TaskExecutors should heartbeat with AM. |
tony.task.max-missed-heartbeats | 25 | How many missed heartbeats before declaring a TaskExecutor dead. |
Name | Default | Meaning |
---|---|---|
tony.am.retry-count | 0 | How many times a failed AM should retry. |
tony.am.memory | 2g | AM memory size, requested as a string (e.g. '2g' or '2048m'). |
tony.am.vcores | 1 | Number of AM vcores to use. |
tony.am.gpus | 0 | Number of AM GPUs to use. (In general, should only be applicable in single node mode.) |
Name | Default | Meaning |
---|---|---|
tony.ps.memory | 2g | Parameter server memory size, requested as a string (e.g. '2g' or '2048m'). |
tony.ps.vcores | 1 | Number of vcores per parameter server. |
tony.ps.instances | 1 | Number of parameter servers to request. |
tony.ps.instances | 0 | Timeout, in milliseconds for the user's python processes before forcibly killing them. |
tony.worker.memory | 2g | Worker memory size, requested as a string (e.g. '2g' or '2048m'). |
tony.worker.vcores | 1 | Number of vcores per worker. |
tony.worker.gpus | 0 | Number of GPUs per worker. |
tony.worker.instances | 1 | Number of workers to request. |
Name | Default | Meaning |
---|---|---|
tony.application.security.enabled | true | Whether this application is running in a Kerberized grid. Setting this to true will fetch tokens from the cluster as well as between the client and AM. |
tony.application.hdfs-conf-path | Path to HDFS configuration, to be passed as an environment variable to the python training scripts. | |
tony.application.yarn-conf-path | Path to YARN configuration, to be passed as an environment variable to the python training scripts. |