Skip to content
This repository has been archived by the owner on Oct 23, 2024. It is now read-only.

Add HA configuration #23

Closed
wants to merge 3 commits into from
Closed

Conversation

skonto
Copy link
Contributor

@skonto skonto commented May 10, 2017

Fixes #7, relates to d2iq-archive/universe#1163

The mesosphere image needs to be built and pushed. Don't have access.
Assumes hdfs is there for the default case (any proper setup should have HA enabled by default) and the storage dir (hdfs://hdfs/flink/recovery) to have been created.
In order to create the dir you need to download the appropriate hdfs distro within your dcos cluster and
execute:
dcos hdfs endpoints core-site.xml
dcos hdfs endpoints hdfs-site.xml

to download the required files, to override the defaults within the distro so that finally you can issue commands like:
hadoop fs -mkdir -p ...

In order to test it after you build the image under your own account, launch flink with HA, use the following command within a docker container of the flink docker image:

./bin/flink run -z /default ./examples/batch/WordCount.jar --input /etc/resolv.conf

Also you need to add to the flink-conf.yaml under the /flink-1.2.0 folder within the container the following settings:
high-availability: zookeeper
high-availability.zookeeper.quorum: master.mesos:2181
high-availability.zookeeper.storageDir: hdfs://hdfs/flink/recovery

@skonto skonto changed the title Add ha configuration Add HA configuration May 10, 2017
add_if_non_empty high-availability.job.delay "${FLINK_HIGH_AVAILABILITY_JOB_DELAY}"
add_if_non_empty high-availability.zookeeper.path.root "${FLINK_HIGH_AVAILABILITY_ZOOKEEPER_PATH_ROOT}"
# used for the client at ha services initiated by the mesos app master runner
add_if_non_empty ghigh-availability.zookeeper.client.session-timeout \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(typo) ghigh-availability

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix thnx.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -73,13 +73,37 @@ function add_kerberos_configurations() {
fi
}

function add_ha_configurations() {
if [[ "${FLINK_HA_ENABLED}" == true ]]; then
export FLINK_JAVA_OPTS="$FLINK_JAVA_OPTS -Dhigh-availability=ZOOKEEPER"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you need to set high-availability via FLINK_JAVA_OPTS as opposed to the config file?

Copy link
Contributor Author

@skonto skonto Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I follow the pattern there. This pre-existed just used previous work, didnt refactor it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a look at the add_if_non_empty function, it is basically a wrapper for appending to FLINK_JAVA_OPTS. For the sake of uniformity, please use it to set the high-availability flag.

I have a minor concern about the sheer number of settings being configured as dynamic properties, as opposed to being configured in flink-conf.yaml. Perhaps some of these HA settings can be set statically.

Copy link
Contributor Author

@skonto skonto Aug 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sure for the first option I missed that. By the way it wont work for every property, there is one that causes problems:
2017-08-02 00:00:20,277 INFO org.apache.flink.mesos.runtime.clusterframework.MesosApplicationMasterRunner - -Dhigh-availability.job.delay="10
2017-08-02 00:00:20,277 INFO org.apache.flink.mesos.runtime.clusterframework.MesosApplicationMasterRunner - s"

The property Dhigh-availability.job.delay="10 s" is not preserved when passed due to spaces.
In on order to fix this you can either pass each arg explicitly with quotes or if you want to keep a global variable you need to use eval. The first option is not handy as we have many variables and the second one is not a good choice since it opens security problems if we don't sanitize every input. It gets ugly so I will try put HA options in the file.

@skonto skonto force-pushed the ha_configuration branch from 1dc35aa to 2c0c70f Compare August 1, 2017 14:38
@skonto
Copy link
Contributor Author

skonto commented Aug 1, 2017

@EronWright I updated the PR. Will test and report back.

@skonto skonto force-pushed the ha_configuration branch from 2c0c70f to ed7bb4f Compare August 2, 2017 12:05
@skonto
Copy link
Contributor Author

skonto commented Aug 2, 2017

@EronWright fixed. Here is the guide how to run an example with the default settings:
https://gist.github.com/skonto/c8404d43142070dca860dda431459887
I will add an example to the example repos.

@skonto
Copy link
Contributor Author

skonto commented Aug 2, 2017

dcos/examples#196

@joerg84
Copy link
Contributor

joerg84 commented Aug 10, 2017

@EronWright is this good to go? I would like include it in the Flink 1.3.2 release of the package.

Copy link

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. +1 for merging.

@skonto
Copy link
Contributor Author

skonto commented Aug 20, 2017

@EronWright @joerg84 how do you want me to proceed?

@skonto
Copy link
Contributor Author

skonto commented Aug 31, 2017

@EronWright @joerg84 gentle ping...

@mqasimsarfraz
Copy link

Are we planning to merge that? this will be useful for our team as well?

@skonto
Copy link
Contributor Author

skonto commented Oct 20, 2017

@joerg84 gentle ping.

@skonto
Copy link
Contributor Author

skonto commented Feb 8, 2018

@joerg84 gentle ping...

@skonto
Copy link
Contributor Author

skonto commented Jun 9, 2019

closing this.

@skonto skonto closed this Jun 9, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make use of high-availability mode
5 participants