Skip to content

Commit

Permalink
Added command line arguments for Horovod knob environment variables, …
Browse files Browse the repository at this point in the history
…config file, and new knobs for autotuning (horovod#1345)
  • Loading branch information
tgaddair authored Aug 27, 2019
1 parent 6efd5dd commit 356ff69
Show file tree
Hide file tree
Showing 15 changed files with 617 additions and 47 deletions.
20 changes: 20 additions & 0 deletions docs/mpirun.rst
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,26 @@ example below:
Other MPI RDMA implementations may or may not benefit from disabling multithreading, so please consult vendor
documentation.

Horovod Parameter Knobs
-----------------------

Many of the configurable parameters available as command line arguments to ``horovodrun`` can be used with ``mpirun``
through the use of environment variables.

Tensor Fusion:

.. code-block:: bash
$ mpirun -x HOROVOD_FUSION_THRESHOLD=33554432 -x HOROVOD_CYCLE_TIME=3.5 ... python train.py
Timeline:

.. code-block:: bash
$ mpirun -x HOROVOD_TIMELINE=/path/to/timeline.json -x HOROVOD_TIMELINE_MARK_CYCLES=1 ... python train.py
Note that when using ``horovodrun``, any command line arguments will override values set in the environment.

Hangs due to non-routed network interfaces
------------------------------------------

Expand Down
15 changes: 6 additions & 9 deletions docs/tensor-fusion.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,25 +16,22 @@ one reduction operation. The algorithm of Tensor Fusion is as follows:
5. Copy data from the fusion buffer into the output tensors.
6. Repeat until there are no more tensors to reduce in this cycle.

The fusion buffer size can be tweaked using the ``HOROVOD_FUSION_THRESHOLD`` environment variable:
The fusion buffer size can be adjusted using the ``--fusion-threshold-mb`` command line argument to ``horovodrun``:

.. code-block:: bash
$ HOROVOD_FUSION_THRESHOLD=33554432 horovodrun -np 4 python train.py
$ horovodrun -np 4 --fusion-threshold-mb 32 python train.py
Setting the ``HOROVOD_FUSION_THRESHOLD`` environment variable to zero disables Tensor Fusion:
Setting ``--fusion-threshold-mb`` to zero disables Tensor Fusion:

.. code-block:: bash
$ HOROVOD_FUSION_THRESHOLD=0 horovodrun -np 4 python train.py
$ horovodrun -np 4 --fusion-threshold-mb 0 python train.py
You can tweak time between cycles (defined in milliseconds) using the ``HOROVOD_CYCLE_TIME`` environment variable:
You can tweak time between cycles (defined in milliseconds) using the ``--cycle-time-ms`` command line argument:

.. code-block:: bash
$ HOROVOD_CYCLE_TIME=3.5 horovodrun -np 4 python train.py
$ horovodrun -np 4 --cycle-time-ms 3.5 python train.py
.. inclusion-marker-end-do-not-remove
11 changes: 4 additions & 7 deletions docs/timeline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,12 @@ Horovod has the ability to record the timeline of its activity, called Horovod T
:alt: Horovod Timeline


To record a Horovod Timeline, set the ``HOROVOD_TIMELINE`` environment variable to the location of the timeline
To record a Horovod Timeline, set the ``--timeline-filename`` command line argument to the location of the timeline
file to be created. This file is only recorded on rank 0, but it contains information about activity of all workers.

.. code-block:: bash
$ HOROVOD_TIMELINE=/path/to/timeline.json horovodrun -np 4 python train.py
$ horovodrun -np 4 --timeline-filename /path/to/timeline.json python train.py
You can then open the timeline file using the ``chrome://tracing`` facility of the `Chrome <https://www.google.com/chrome/browser/>`__ browser.
Expand Down Expand Up @@ -49,13 +49,10 @@ Horovod performs work in cycles. These cycles are used to aid `Tensor Fusion <h
:alt: Cycle Markers


Since this information makes timeline view very crowded, it is not enabled by default. To add cycle markers to the timeline, set the ``HOROVOD_TIMELINE_MARK_CYCLES`` environment variable to ``1``:
Since this information makes timeline view very crowded, it is not enabled by default. To add cycle markers to the timeline, set the ``--timeline-mark-cycles`` flag:

.. code-block:: bash
$ HOROVOD_TIMELINE=/path/to/timeline.json HOROVOD_TIMELINE_MARK_CYCLES=1 \
horovodrun -np 4 python train.py
$ horovodrun -np 4 --timeline-filename /path/to/timeline.json --timeline-mark-cycles python train.py
.. inclusion-marker-end-do-not-remove
4 changes: 4 additions & 0 deletions horovod/common/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,10 @@ namespace common {
#define HOROVOD_TIMELINE_MARK_CYCLES "HOROVOD_TIMELINE_MARK_CYCLES"
#define HOROVOD_AUTOTUNE "HOROVOD_AUTOTUNE"
#define HOROVOD_AUTOTUNE_LOG "HOROVOD_AUTOTUNE_LOG"
#define HOROVOD_AUTOTUNE_WARMUP_SAMPLES "HOROVOD_AUTOTUNE_WARMUP_SAMPLES"
#define HOROVOD_AUTOTUNE_STEPS_PER_SAMPLE "HOROVOD_AUTOTUNE_STEPS_PER_SAMPLE"
#define HOROVOD_AUTOTUNE_BAYES_OPT_MAX_SAMPLES "HOROVOD_AUTOTUNE_BAYES_OPT_MAX_SAMPLES"
#define HOROVOD_AUTOTUNE_GAUSSIAN_PROCESS_NOISE "HOROVOD_AUTOTUNE_GAUSSIAN_PROCESS_NOISE"
#define HOROVOD_FUSION_THRESHOLD "HOROVOD_FUSION_THRESHOLD"
#define HOROVOD_CYCLE_TIME "HOROVOD_CYCLE_TIME"
#define HOROVOD_STALL_CHECK_DISABLE "HOROVOD_STALL_CHECK_DISABLE"
Expand Down
36 changes: 23 additions & 13 deletions horovod/common/parameter_manager.cc
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,15 @@
#include <limits>

#include "logging.h"
#include "utils/env_parser.h"

namespace horovod {
namespace common {

#define WARMUPS 3
#define CYCLES_PER_SAMPLE 10
#define BAYES_OPT_MAX_SAMPLES 20
#define GAUSSIAN_PROCESS_NOISE 0.8
#define DEFAULT_WARMUPS 3
#define DEFAULT_STEPS_PER_SAMPLE 10
#define DEFAULT_BAYES_OPT_MAX_SAMPLES 20
#define DEFAULT_GAUSSIAN_PROCESS_NOISE 0.8

Eigen::VectorXd CreateVector(double x1, double x2) {
Eigen::VectorXd v(2);
Expand All @@ -38,23 +39,28 @@ Eigen::VectorXd CreateVector(double x1, double x2) {

// ParameterManager
ParameterManager::ParameterManager() :
warmups_(GetIntEnvOrDefault(HOROVOD_AUTOTUNE_WARMUP_SAMPLES, DEFAULT_WARMUPS)),
steps_per_sample_(GetIntEnvOrDefault(HOROVOD_AUTOTUNE_STEPS_PER_SAMPLE, DEFAULT_STEPS_PER_SAMPLE)),
hierarchical_allreduce_(CategoricalParameter<bool>(std::vector<bool>{false, true})),
hierarchical_allgather_(CategoricalParameter<bool>(std::vector<bool>{false, true})),
cache_enabled_(CategoricalParameter<bool>(std::vector<bool>{false, true})),
joint_params_(BayesianParameter(
std::vector<BayesianVariableConfig>{
{ BayesianVariable::fusion_buffer_threshold_mb, std::pair<double, double>(0, 64) },
{ BayesianVariable::cycle_time_ms, std::pair<double, double>(1, 100) }
}, std::vector<Eigen::VectorXd>{
},
std::vector<Eigen::VectorXd>{
CreateVector(4, 5),
CreateVector(32, 50),
CreateVector(16, 25),
CreateVector(8, 10)
})),
},
GetIntEnvOrDefault(HOROVOD_AUTOTUNE_BAYES_OPT_MAX_SAMPLES, DEFAULT_BAYES_OPT_MAX_SAMPLES),
GetDoubleEnvOrDefault(HOROVOD_AUTOTUNE_GAUSSIAN_PROCESS_NOISE, DEFAULT_GAUSSIAN_PROCESS_NOISE))),
parameter_chain_(std::vector<ITunableParameter*>{&joint_params_, &hierarchical_allreduce_, &hierarchical_allgather_,
&cache_enabled_}),
active_(false),
warmup_remaining_(WARMUPS),
warmup_remaining_(warmups_),
sample_(0),
rank_(-1),
root_rank_(0),
Expand All @@ -80,7 +86,7 @@ void ParameterManager::Initialize(int32_t rank, int32_t root_rank,

void ParameterManager::SetAutoTuning(bool active) {
if (active != active_) {
warmup_remaining_ = WARMUPS;
warmup_remaining_ = warmups_;
}
active_ = active;
};
Expand Down Expand Up @@ -140,8 +146,8 @@ bool ParameterManager::Update(const std::vector<std::string>& tensor_names,
}

for (const std::string& tensor_name : tensor_names) {
int32_t cycle = tensor_counts_[tensor_name]++;
if (cycle >= (sample_ + 1) * CYCLES_PER_SAMPLE) {
int32_t step = tensor_counts_[tensor_name]++;
if (step >= (sample_ + 1) * steps_per_sample_) {
auto now = std::chrono::steady_clock::now();
double duration = std::chrono::duration_cast<std::chrono::microseconds>(now - last_sample_start_).count();
scores_[sample_] = total_bytes_ / duration;
Expand Down Expand Up @@ -391,10 +397,14 @@ void ParameterManager::CategoricalParameter<T>::ResetState() {
// BayesianParameter
ParameterManager::BayesianParameter::BayesianParameter(
std::vector<BayesianVariableConfig> variables,
std::vector<Eigen::VectorXd> test_points) :
std::vector<Eigen::VectorXd> test_points,
int max_samples,
double gaussian_process_noise) :
TunableParameter<Eigen::VectorXd>(test_points[0]),
variables_(variables),
test_points_(test_points),
max_samples_(max_samples),
gaussian_process_noise_(gaussian_process_noise),
iteration_(0) {
ResetBayes();
Reinitialize(FilterTestPoint(0));
Expand Down Expand Up @@ -453,7 +463,7 @@ void ParameterManager::BayesianParameter::OnTune(double score, Eigen::VectorXd&
}

bool ParameterManager::BayesianParameter::IsDoneTuning() const {
return iteration_ > BAYES_OPT_MAX_SAMPLES;
return iteration_ > max_samples_;
}

void ParameterManager::BayesianParameter::ResetState() {
Expand All @@ -474,7 +484,7 @@ void ParameterManager::BayesianParameter::ResetBayes() {
}
}

bayes_.reset(new BayesianOptimization(bounds, GAUSSIAN_PROCESS_NOISE));
bayes_.reset(new BayesianOptimization(bounds, gaussian_process_noise_));
}

Eigen::VectorXd ParameterManager::BayesianParameter::FilterTestPoint(int i) {
Expand Down
10 changes: 8 additions & 2 deletions horovod/common/parameter_manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,8 @@ class ParameterManager {
// A set of numerical parameters optimized jointly using Bayesian Optimization.
class BayesianParameter : public TunableParameter<Eigen::VectorXd> {
public:
BayesianParameter(std::vector<BayesianVariableConfig> variables, std::vector<Eigen::VectorXd> test_points);
BayesianParameter(std::vector<BayesianVariableConfig> variables, std::vector<Eigen::VectorXd> test_points,
int max_samples, double gaussian_process_noise);

void SetValue(BayesianVariable variable, double value, bool fixed);
double Value(BayesianVariable variable) const;
Expand All @@ -201,6 +202,9 @@ class ParameterManager {

std::vector<BayesianVariableConfig> variables_;
std::vector<Eigen::VectorXd> test_points_;
int max_samples_;
double gaussian_process_noise_;

uint32_t iteration_;

struct EnumClassHash {
Expand All @@ -215,6 +219,9 @@ class ParameterManager {
std::unordered_map<BayesianVariable, int32_t, EnumClassHash> index_;
};

int warmups_;
int steps_per_sample_;

CategoricalParameter<bool> hierarchical_allreduce_;
CategoricalParameter<bool> hierarchical_allgather_;
CategoricalParameter<bool> cache_enabled_;
Expand All @@ -236,7 +243,6 @@ class ParameterManager {
int32_t root_rank_;
std::ofstream file_;
bool writing_;

};

} // namespace common
Expand Down
5 changes: 5 additions & 0 deletions horovod/common/utils/env_parser.cc
Original file line number Diff line number Diff line change
Expand Up @@ -154,5 +154,10 @@ int GetIntEnvOrDefault(const char* env_variable, int default_value) {
return env_value != nullptr ? std::strtol(env_value, nullptr, 10) : default_value;
}

double GetDoubleEnvOrDefault(const char* env_variable, double default_value) {
auto env_value = std::getenv(env_variable);
return env_value != nullptr ? std::strtod(env_value, nullptr) : default_value;
}

} // namespace common
}
2 changes: 2 additions & 0 deletions horovod/common/utils/env_parser.h
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ void SetIntFromEnv(const char* env, int& val);

int GetIntEnvOrDefault(const char* env_variable, int default_value);

double GetDoubleEnvOrDefault(const char* env_variable, double default_value);

} // namespace common
} // namespace horovod

Expand Down
Loading

0 comments on commit 356ff69

Please sign in to comment.