Skip to content
This repository was archived by the owner on Jun 18, 2021. It is now read-only.

sjtu-sail/ops-hadoop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Jun 4, 2019
4e71145 · Jun 4, 2019
Feb 28, 2019
May 30, 2019
Jun 4, 2019
Nov 9, 2018
Nov 19, 2018
Nov 9, 2018
Mar 22, 2019
Dec 7, 2018

Repository files navigation

OPS

Build Status GitHub

OPtimized Shuffle is a distributed shuffle management system which focuses on optimizing the shuffle phase of DAG computing frameworks. The name OPS is originated from the ancient Roman religion, she was a fertility deity and earth goddess of Sabine origin.

System Overview

Architecture

Figure 1: OPS Architecture

Internal View

Figure 2: Internal View Comparison between Legacy Hadoop and Hadoop with OPS

LifeCycle

Figure 3: Lifecyle of OPS

Structure

Figure 4: OPS Structure

ETCD File Structure

  • ops/
    • jobs/
      • job-${jobId}: JobConf
    • mapTaskFirstAlloc/
      • mapTaskFirstAlloc-${jobId}: MapTaskAlloc
    • mapTaskSecondAlloc/
      • mapTaskSecondAlloc-${jobId}: MapTaskAlloc
    • reduceTaskAlloc/
      • reduceTaskAlloc-${jobId}: ReduceTaskAlloc
    • shuffle/
      • reduceNum/
        • reduceNum-${nodeIp}-${jobId}-${reduceId}: num
      • mapCompleted/
        • mapCompleted-${nodeIp}-${jobId}-${mapId}: MapReport
      • shuffleCompleted/
        • shuffleCompleted-${dstNodeIp}-${jobId}-${num}-${mapId}: path
      • indexRecords/
        • indexRecords-${nodeIp}-${jobId}-${mapId}: CollectionConf
    • tasks/
      • reduceTasks/
        • reduceTask-${nodeIp}-${jobId}-${reduceId}: ReduceConf

Compatibility

In order to use OPS, the modification on DAG computing frameworks is inevitable. For now, we only implement OPS on Hadoop MapReduce. Our customized Hadoop is available in here. We believe that the costs of enabling OPS on other DAG computing frameworks are also very low.

Build

$ mvn clean install

Run

OPS: Optimized Shuffle

Usage:
  ops.sh [command]

Commands:
  master         OpsMaster
  worker         OpsWorker

Options:
  -h, --help     Show usage

Use ops.sh [command] --help for more information about a command.

License

Apache License 2.0