Skip to content

Commit

Permalink
[GOBBLIN-895] Fixes Gobblin Standalone configs and scripts so that th…
Browse files Browse the repository at this point in the history
…e user guide is accurate

Closes #2751 from Will-Lo/fix-gobblin-standalone-
script
  • Loading branch information
William Lo authored and suvasude committed Oct 11, 2019
1 parent baf2abe commit 70afd6d
Show file tree
Hide file tree
Showing 16 changed files with 89 additions and 60 deletions.
2 changes: 1 addition & 1 deletion bin/gobblin-admin.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
# limitations under the License.
#

# @depricated: This script is kept for backward compatibility only and will be removed in future. Use gobblin.sh
# @deprecated: This script is kept for backward compatibility only and will be removed in future. Use gobblin.sh

CURRENT_DIR="$(cd `dirname $0`/..; pwd)"
$CURRENT_DIR/bin/gobblin cli $@
2 changes: 1 addition & 1 deletion bin/gobblin-aws.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
# limitations under the License.
#

# @depricated: This script is kept for backward compatibility only and will be removed in future. Use gobblin.sh
# @deprecated: This script is kept for backward compatibility only and will be removed in future. Use gobblin.sh

CURRENT_DIR="$(cd `dirname $0`/..; pwd)"
$CURRENT_DIR/bin/gobblin service aws $@
2 changes: 1 addition & 1 deletion bin/gobblin-cluster-master.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
# limitations under the License.
#

# @depricated: This script is kept for backward compatibility only and will be removed in future. Use gobblin.sh
# @deprecated: This script is kept for backward compatibility only and will be removed in future. Use gobblin.sh

CURRENT_DIR="$(cd `dirname $0`/..; pwd)"
$CURRENT_DIR/bin/gobblin service cluster-master $@
2 changes: 1 addition & 1 deletion bin/gobblin-cluster-worker.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
# limitations under the License.
#

# @depricated: This script is kept for backward compatibility only and will be removed in future. Use gobblin.sh
# @deprecated: This script is kept for backward compatibility only and will be removed in future. Use gobblin.sh

CURRENT_DIR="$(cd `dirname $0`/..; pwd)"
$CURRENT_DIR/bin/gobblin service cluster-worker $@
2 changes: 1 addition & 1 deletion bin/gobblin-mapreduce.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
# limitations under the License.
#

# @depricated: This script is kept for backward compatibility only and will be removed in future. Use gobblin.sh
# @deprecated: This script is kept for backward compatibility only and will be removed in future. Use gobblin.sh

##############################################################
############### Run Gobblin Jobs on Hadoop MR ################
Expand Down
2 changes: 1 addition & 1 deletion bin/gobblin-service.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
# limitations under the License.
#

# @depricated: This script is kept for backward compatibility only and will be removed in future. Use gobblin.sh
# @deprecated: This script is kept for backward compatibility only and will be removed in future. Use gobblin.sh

CURRENT_DIR="$(cd `dirname $0`/..; pwd)"
$CURRENT_DIR/bin/gobblin service service-manager $@
2 changes: 1 addition & 1 deletion bin/gobblin-standalone.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
# limitations under the License.
#

# @depricated: This script is kept for backward compatibility only and will be removed in future. Use gobblin.sh
# @deprecated: This script is kept for backward compatibility only and will be removed in future. Use gobblin.sh

CURRENT_DIR="$(cd `dirname $0`/..; pwd)"
$CURRENT_DIR/bin/gobblin service standalone $@
2 changes: 1 addition & 1 deletion bin/gobblin-yarn.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
# limitations under the License.
#

# @depricated: This script is kept for backward compatibility only and will be removed in future. Use gobblin.sh
# @deprecated: This script is kept for backward compatibility only and will be removed in future. Use gobblin.sh

CURRENT_DIR="$(cd `dirname $0`/..; pwd)"
$CURRENT_DIR/bin/gobblin service yarn $@
20 changes: 19 additions & 1 deletion bin/gobblin.sh
Original file line number Diff line number Diff line change
Expand Up @@ -238,6 +238,11 @@ if [[ "$GOBBLIN_MODE_TYPE" == "$CLI" ]]; then
fi
fi

CHECK_ENV_VARS=false
if [ $ACTION == "start" ] || [ $ACTION == "restart" ]; then
CHECK_ENV_VARS=true
fi

# derived based on input from user, $GOBBLIN_MODE
PID_FILE_NAME=".gobblin-$GOBBLIN_MODE.pid"
PID_FILE="$GOBBLIN_HOME/$PID_FILE_NAME"
Expand All @@ -263,6 +268,10 @@ if [[ -n "$USER_LOG4J_FILE" ]]; then
elif [[ -f ${GOBBLIN_CONF}/log4j2.xml ]]; then
LOG4J_FILE_PATH=file://${GOBBLIN_CONF}/log4j2.xml
LOG4J_OPTS="-Dlog4j.configuration=$LOG4J_FILE_PATH"
#prefer log4j.xml
elif [[ -f ${GOBBLIN_CONF}/log4j.xml ]]; then
LOG4J_FILE_PATH=file://${GOBBLIN_CONF}/log4j.xml
LOG4J_OPTS="-Dlog4j.configuration=$LOG4J_FILE_PATH"
#defaults to log4j.properties
elif [[ -f ${GOBBLIN_CONF}/log4j.properties ]]; then
LOG4J_FILE_PATH=file://${GOBBLIN_CONF}/log4j.properties
Expand Down Expand Up @@ -372,6 +381,7 @@ function start() {

LOG_OUT_FILE="${GOBBLIN_LOGS}/${GOBBLIN_MODE}.out"
LOG_ERR_FILE="${GOBBLIN_LOGS}/${GOBBLIN_MODE}.err"
ADDITIONAL_ARGS=""

# for all gobblin commands
if [[ "$GOBBLIN_MODE_TYPE" == "$CLI" ]]; then
Expand Down Expand Up @@ -417,7 +427,15 @@ function start() {
CLASS_N_ARGS=''
if [[ "$GOBBLIN_MODE" = "$STANDALONE_MODE" ]]; then
CLASS_N_ARGS="$STANDALONE_CLASS $GOBBLIN_CONF/application.conf"
ADDITIONAL_ARGS="-Dgobblin.logs.dir=${GOBBLIN_LOGS}"

if [ -z "$GOBBLIN_WORK_DIR" ] && [ "$CHECK_ENV_VARS" == true ]; then
die "GOBBLIN_WORK_DIR is not set!"
fi

if [ -z "$GOBBLIN_JOB_CONFIG_DIR" ] && [ "$CHECK_ENV_VARS" == true ]; then
die "Environment variable GOBBLIN_JOB_CONFIG_DIR not set!"
fi
elif [[ "$GOBBLIN_MODE" = "$AWS_MODE" ]]; then
CLASS_N_ARGS="$AWS_CLASS"

Expand All @@ -442,7 +460,7 @@ function start() {
echo "Invalid gobblin command or execution mode... [EXITING]"
exit 1
fi
GOBBLIN_COMMAND="$JAVA_HOME/bin/java -cp $GOBBLIN_CLASSPATH $GC_OPTS $JVM_OPTS $LOG4J_OPTS $CLASS_N_ARGS"
GOBBLIN_COMMAND="$JAVA_HOME/bin/java -cp $GOBBLIN_CLASSPATH $GC_OPTS $JVM_OPTS $LOG4J_OPTS $ADDITIONAL_ARGS $CLASS_N_ARGS"
fi

# execute the command
Expand Down
2 changes: 1 addition & 1 deletion bin/gobblin_password_encryptor.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
# limitations under the License.
#

# @depricated: This script is kept for backward compatibility only and will be removed in future. Use gobblin.sh
# @deprecated: This script is kept for backward compatibility only and will be removed in future. Use gobblin.sh

script_dir=$(dirname $0)
lib_dir=${script_dir}/../lib
Expand Down
2 changes: 1 addition & 1 deletion bin/historystore-manager.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
# limitations under the License.
#

# @depricated: This script is kept for backward compatibility only and will be removed in future. Use gobblin.sh
# @deprecated: This script is kept for backward compatibility only and will be removed in future. Use gobblin.sh

CURRENT_DIR="$(cd `dirname $0`/..; pwd)"
$CURRENT_DIR/bin/gobblin cli job-store-schema-manager $@
2 changes: 1 addition & 1 deletion bin/statestore-checker.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
# limitations under the License.
#

# @depricated: This script is kept for backward compatibility only and will be removed in future. Use gobblin.sh
# @deprecated: This script is kept for backward compatibility only and will be removed in future. Use gobblin.sh

CURRENT_DIR="$(cd `dirname $0`/..; pwd)"
$CURRENT_DIR/bin/gobblin cli job-state-to-json $@
2 changes: 1 addition & 1 deletion bin/statestore-cleaner.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
# limitations under the License.
#

# @depricated: This script is kept for backward compatibility only and will be removed in future. Use gobblin.sh
# @deprecated: This script is kept for backward compatibility only and will be removed in future. Use gobblin.sh

FWDIR="$(cd `dirname $0`/..; pwd)"

Expand Down
69 changes: 24 additions & 45 deletions conf/standalone/application.conf
Original file line number Diff line number Diff line change
Expand Up @@ -15,70 +15,44 @@
# limitations under the License.
#

# Cluster configuration properties
gobblin.cluster.app.name=GobblinStandaloneCluster
gobblin.cluster.email.notification.on.shutdown=false
gobblin.cluster.helix.instance.max.retries=2
gobblin.cluster.work.dir=/tmp/gobblin-cluster

# Helix/Zookeeper configuration properties
gobblin.cluster.helix.cluster.name=GobblinStandaloneCluster
gobblin.cluster.zk.connection.string="localhost:2181"

# job config monitor interval
jobconf.monitor.interval=30000

# Sample configuration properties for the Gobblin Standalone cluster
gobblin.cluster.workDir=${gobblin.cluster.work.dir}/GobblinStandaloneCluster

# default is the JobConfigurationManager
# use this manager to accept jobs from Kafka. It requires some additional Kafka related parameters.
#gobblin.cluster.job.configuration.manager=org.apache.gobblin.cluster.StreamingJobConfigurationManager
#spec.kafka.topics=ruyang_test_kafka_gobblin
#kafka.brokers="hostname:12913/kafka-queuing"
#jobSpecMonitor.kafka.zookeeper.connect="hostname:12913/kafka-queuing"

# Cluster configuration properties
gobblin.cluster.helix.cluster.name=GobblinStandaloneClusterCli

# used by the JobConfigurationManager
gobblin.cluster.job.conf.path=${gobblin.cluster.work.dir}/jobs
gobblin.cluster.jobconf.fullyQualifiedPath=${gobblin.cluster.work.dir}/jobs
gobblin.cluster.job.catalog=org.apache.gobblin.runtime.job_catalog.FSJobCatalog
# Thread pool settings for the task executor
taskexecutor.threadpool.size=2
taskretry.threadpool.coresize=1
taskretry.threadpool.maxsize=2

# File system URIs
fs.uri="file:///"
fs.uri=file:///
writer.fs.uri=${fs.uri}
state.store.fs.uri=${fs.uri}

# Writer related configuration properties
writer.destination.type=HDFS
writer.output.format=AVRO
writer.staging.dir=${gobblin.cluster.work.dir}/task-staging
writer.output.dir=${gobblin.cluster.work.dir}/task-output
writer.staging.dir=${env:GOBBLIN_WORK_DIR}/task-staging
writer.output.dir=${env:GOBBLIN_WORK_DIR}/task-output

# Data publisher related configuration properties
data.publisher.type=org.apache.gobblin.publisher.BaseDataPublisher
data.publisher.final.dir=${gobblin.cluster.work.dir}/job-output
data.publisher.final.dir=${env:GOBBLIN_WORK_DIR}/job-output
data.publisher.replace.final.dir=false

# Directory where job configuration files are stored
jobconf.dir=${env:GOBBLIN_JOB_CONFIG_DIR}
jobconf.fullyQualifiedPath=file://${env:GOBBLIN_JOB_CONFIG_DIR}

# Directory where job/task state files are stored
state.store.dir=${gobblin.cluster.work.dir}/state-store
state.store.dir=${env:GOBBLIN_WORK_DIR}/state-store

# Directory where error files from the quality checkers are stored
qualitychecker.row.err.file=${gobblin.cluster.work.dir}/err
# Directory where commit sequences are stored
gobblin.runtime.commit.sequence.store.dir=${env:GOBBLIN_WORK_DIR}/commit-sequence-store

# Disable job locking for now
job.lock.enabled=false
# Directory where error files from the quality checkers are stored
qualitychecker.row.err.file=${env:GOBBLIN_WORK_DIR}/err

# Directory where job locks are stored
job.lock.dir=${gobblin.cluster.work.dir}/locks
job.lock.dir=${env:GOBBLIN_WORK_DIR}/locks

# Directory where metrics log files are stored
metrics.log.dir=${gobblin.cluster.work.dir}/metrics

# Interval of task state reporting in milliseconds
task.status.reportintervalinms=1000
metrics.log.dir=${env:GOBBLIN_WORK_DIR}/metrics

# Enable metrics / events
metrics.enabled=true
Expand All @@ -94,3 +68,8 @@ rest.server.port=9090
# job history store ( WARN [GobblinYarnAppLauncher] NOT starting the admin UI because the job execution info server is NOT enabled )
job.execinfo.server.enabled=false
job.history.store.enabled=false
task.status.reportintervalinms=5000

# The time gap for Job Detector to detect modification/deletion/creation of jobconfig.
# Unit in milliseconds, configurable.
jobconf.monitor.interval=30000
32 changes: 32 additions & 0 deletions conf/standalone/log4j.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">

<log4j:configuration>

<appender name="FileRoll" class="org.apache.log4j.rolling.RollingFileAppender">
<param name="file" value="${gobblin.logs.dir}/standalone.out" />
<param name="append" value="true" />
<param name="encoding" value="UTF-8" />

<rollingPolicy class="org.apache.log4j.rolling.TimeBasedRollingPolicy">
<param name="FileNamePattern" value="${gobblin.logs.dir}/archive/gobblin.%d{yyyy-MM-dd}.log"/>
</rollingPolicy>

<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d{yyyy-MM-dd HH:mm:ss z} %-5p [%t] %C %X{tableName} %L - %m%n"/>
</layout>
</appender>

<logger name="org.apache.commons.httpclient">
<level value="DEBUG"/>
</logger>

<logger name="httpclient.wire">
<level value="ERROR"/>
</logger>

<root>
<priority value ="INFO" />
<appender-ref ref="FileRoll" />
</root>

</log4j:configuration>
4 changes: 2 additions & 2 deletions gobblin-docs/Getting-Started.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ Each Gobblin job minimally involves several constructs, e.g. [Source](https://gi

Some of the classes relevant to this example include [WikipediaSource](https://github.com/apache/incubator-gobblin/blob/master/gobblin-example/src/main/java/org/apache/gobblin/example/wikipedia/WikipediaSource.java), [WikipediaExtractor](https://github.com/apache/incubator-gobblin/blob/master/gobblin-example/src/main/java/org/apache/gobblin/example/wikipedia/WikipediaExtractor.java), [WikipediaConverter](https://github.com/apache/incubator-gobblin/blob/master/gobblin-example/src/main/java/org/apache/gobblin/example/wikipedia/WikipediaConverter.java), [AvroHdfsDataWriter](https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/writer/AvroHdfsDataWriter.java) and [BaseDataPublisher](https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/publisher/BaseDataPublisher.java).

To run Gobblin in standalone daemon mode we need a Gobblin configuration file (such as uses [gobblin-standalone.properties](https://github.com/apache/incubator-gobblin/blob/master/conf/gobblin-standalone-v2.properties)). And for each job we wish to run, we also need a job configuration file (such as [wikipedia.pull](https://github.com/apache/incubator-gobblin/blob/master/gobblin-example/src/main/resources/wikipedia.pull)). The Gobblin configuration file, which is passed to Gobblin as a command line argument, should contain a property `jobconf.dir` which specifies where the job configuration files are located. By default, `jobconf.dir` points to environment variable `GOBBLIN_JOB_CONFIG_DIR`. Each file in `jobconf.dir` with extension `.job` or `.pull` is considered a job configuration file, and Gobblin will launch a job for each such file. For more information on Gobblin deployment in standalone mode, refer to the [Standalone Deployment](user-guide/Gobblin-Deployment#Standalone-Deployment) page.
To run Gobblin in standalone daemon mode we need a Gobblin configuration file (such as uses [application.conf](https://github.com/apache/incubator-gobblin/blob/master/conf/standalone/application.conf)). And for each job we wish to run, we also need a job configuration file (such as [wikipedia.pull](https://github.com/apache/incubator-gobblin/blob/master/gobblin-example/src/main/resources/wikipedia.pull)). The Gobblin configuration file, which is passed to Gobblin as a command line argument, should contain a property `jobconf.dir` which specifies where the job configuration files are located. By default, `jobconf.dir` points to environment variable `GOBBLIN_JOB_CONFIG_DIR`. Each file in `jobconf.dir` with extension `.job` or `.pull` is considered a job configuration file, and Gobblin will launch a job for each such file. For more information on Gobblin deployment in standalone mode, refer to the [Standalone Deployment](user-guide/Gobblin-Deployment#Standalone-Deployment) page.

A list of commonly used configuration properties can be found here: [Configuration Properties Glossary](user-guide/Configuration-Properties-Glossary).

Expand All @@ -107,7 +107,7 @@ A list of commonly used configuration properties can be found here: [Configurati
gobblin service standalone start
```

The job log, which contains the progress and status of the job, will be written into `logs/<execution-mode>.out` & `logs/<execution-mode>.err` (to change where the log is written, modify the Log4j configuration file `conf/log4j.properties`).
Stdout and the job log, which contains the progress and status of the job, will be written into `logs/<execution-mode>.out` & `logs/<execution-mode>.err` (to change where the log is written, modify the Log4j configuration file `conf/log4j.xml`).

Among the job logs there should be the following information:

Expand Down

0 comments on commit 70afd6d

Please sign in to comment.