Multi node installation

This set up involves a number of steps run across multiple hosts. Brew a ☕ and take your time. You'll need to ssh into each node. Some commands will be issued on all hosts, others will be only for the master or slave(s), depending on the step. Rest assured, each snippet makes its target host explicit either in the title, or in the description.

Throughout these benchmarks we use the master/slave terminology only for legacy reasons (most Hadoop's instructions used that jargon). Winder.ai condemns any perpetration of slavery and therefore suggest the use of main/worker terminology in any future work.

For convienence, I'd recommend using iTerm2's broadcast input to quickly issue commands across multiple machines.

Proceed by installing:

Hadoop
Spark
Dask

If you'd like to track cpu/mem/etc. usage don't forget to set up CloudWatch agent. After that, you'll be ready to run the benchmarks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MULTI-NODE.md

MULTI-NODE.md

Multi node installation

Files

MULTI-NODE.md

Latest commit

History

MULTI-NODE.md

File metadata and controls

Multi node installation