Skip to content
This repository has been archived by the owner on Dec 4, 2024. It is now read-only.

[DCOS-39050] Added files for Hive Docker image #392

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

susanxhuynh
Copy link
Contributor

@susanxhuynh susanxhuynh commented Aug 14, 2018

This Hive Docker image is intended for use in Spark integration tests.

It's based on https://github.com/tilakpatidar/cdh5_hive_postgres (the one Evan found) with the following changes:

  • Removed docker-compose (not supported by Marathon)
  • Added Kerberos (see the kerberos/ directory)
    • Kerberized versions of hadoop, hive xml config files
    • krb5.conf
    • Marathon config to run the container
  • Hive: changes in hive_pg/scripts/bootstrap.sh for Kerberos
  • You can also refer to my fork to see what I added.

Testing

  • I've tested by running a Spark Hive job against the kerberized image.
  • You can try the unkerberized image with: docker run -it susanxhuynh/cdh5-hive:latest /etc/hive-bootstrap.sh -bash
  • To run the kerberized image with a KDC, see the README.

@susanxhuynh
Copy link
Contributor Author

Tests are passing now. The failures yesterday seem to have been temporary connectivity problems at the sbt mirror site.

Copy link
Contributor

@samvantran samvantran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a few comments but overall looks good. I'll try testing it myself tomorrow

mkdir hive_pg/deps

#download cdh
echo "wget http://archive.cloudera.com/cdh${CDH_VERSION}/cdh/${CDH_VERSION}/hadoop-${HADOOP_VERSION}-cdh${CDH_EXACT_VERSION}.tar.gz | tar -xz -C /usr/local/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The end includes | tar -xz -C /usr/local/ but not in the actual shell command on L11. Typo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely don't want to extract to /usr/local.

ENV HADOOP_COMMON_HOME /usr/local/hadoop
ENV HADOOP_HDFS_HOME /usr/local/hadoop
ENV HADOOP_MAPRED_HOME /usr/local/hadoop
ENV HADOOP_YARN_HOME /usr/local/hadoop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These all point to the same directory... is it necessary to have all of them?
Also I don't think you use HADOOP_HDFS_HOME nor HADOOP_YARN_HOME in this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought they might be used while the container is running, but let me check on that.

ENV HADOOP_COMMON_HOME /usr/local/hadoop
ENV HADOOP_HDFS_HOME /usr/local/hadoop
ENV HADOOP_MAPRED_HOME /usr/local/hadoop
ENV HADOOP_YARN_HOME /usr/local/hadoop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment above, a few of these ENV vars are not used in this file including this one

fi

if [[ $1 == "-d" ]]; then
while true; do sleep 10000; done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever :)

RUN apt-get install -y krb5-user

# run bootstrap script
CMD ["/etc/hive-bootstrap.sh", "-d"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to setup hive and then another process ssh's/executes commands inside the container? Just trying to understand how all of these parts intersect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command sets up a few directories and then starts the Hive servers. To create and query Hive tables, we would install Spark on the cluster and run a Spark job ... the job would point to the Hive container's IP address. Alternatively, you could go inside the container and start the Hive "beeline" program, in which you could also perform Hive queries.

[
"hostname",
"IS",
"10.0.1.100"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this guaranteed to be there?

<!-- property>
<name>hive.execution.engine</name>
<value>tez</value>
</property -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal but can we delete this if its commented out?

HADOOP_HDFS_HOME=/usr/local/hadoop
HADOOP_MAPRED_HOME=/usr/local/hadoop
HADOOP_YARN_HOME=/usr/local/hadoop
HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a newline?

HADOOP_COMMON_HOME=/usr/local/hadoop
HADOOP_HDFS_HOME=/usr/local/hadoop
HADOOP_MAPRED_HOME=/usr/local/hadoop
HADOOP_YARN_HOME=/usr/local/hadoop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all of these supposed to point to the same dir?

elezar
elezar previously requested changes Aug 16, 2018
Copy link
Contributor

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @susanxhuynh. Great work in showing that this is possible and that we don't need an external Cloudera cluster for testing our Hive interactions. I have made comments while reviewing this, but it could definitely be that I'm missing some context along the way, so please let me know if that is the case.

Some general comments:

  • This 3 Docker Image (4 if one counts Kerberos) seems a bit excessive.
  • The images enable SSH access as well as an Apache web server.
  • The config for Hadoop / Hive is spread through the various layers and external file.
  • I don't know if the spark-build repo is the best place for this. As it is, we have problems with our "testing" docker images not being properly versioned, in the Mesosphere or, or under version control.
  • The removal of docker-compose has complicated the building of the images and identifying which changes are really required by us -- e.g. Kerberos-related changes.

Ideally, I would like to see:

  • The number of Docker images reduced. Another search for "standalone hive" brings up other solutions such as: https://github.com/jqcoffey/hive-standalone, which at first glance has the following points to note:
    ** ADVANTAGE: It uses a single image based off an OS image
    ** ADVANTAGE: It does not install postgres or sshd
    ** DISADVANTAGE: It installs system packages
  • The docker image moved to a separate repo so that its lifetime can be managed independently. For the Kafka and HDFS clients, we are looking at moving them to dcos-commons-ci for example.

mkdir hive_pg/deps

#download cdh
echo "wget http://archive.cloudera.com/cdh${CDH_VERSION}/cdh/${CDH_VERSION}/hadoop-${HADOOP_VERSION}-cdh${CDH_EXACT_VERSION}.tar.gz | tar -xz -C /usr/local/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely don't want to extract to /usr/local.

@@ -0,0 +1,21 @@
#!/usr/bin/env bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To tell you the truth, I'm not sure what advantage the download_deps.sh file really give us. We could download the archives as part of the docker build process directly. We have in the past (e.g. with TensorFlow) had issues with certain hadoop archives no longer being present, but this is an issue with this script too.

ADD ./deps/hadoop-${HADOOP_VERSION}-cdh${CDH_EXACT_VERSION}.tar.gz /usr/local
RUN cd /usr/local && ln -s ./hadoop-${HADOOP_VERSION}-cdh${CDH_EXACT_VERSION} hadoop

RUN sed -i '/^export JAVA_HOME/ s:.*:export JAVA_HOME=/usr/lib/jvm/java-8-oracle\nexport HADOOP_PREFIX=/usr/local/hadoop\nexport HADOOP_HOME=/usr/local/hadoop\n:' $HADOOP_PREFIX/etc/hadoop/hadoop-env.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be explicitly setting the contents of the /etc/hadoop/hadoop-env.sh file. Why is it not sufficient to rely on the environment variables in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is strange and Hadoop seems to work without this step, with the exception of setting JAVA_HOME. I don't know why, but it seems that JAVA_HOME has to be set directly in this file. https://stackoverflow.com/questions/14325594/working-with-hadoop-localhost-error-java-home-is-not-set

RUN sed -i '/^export HADOOP_CONF_DIR/ s:.*:export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop/:' $HADOOP_PREFIX/etc/hadoop/hadoop-env.sh

# copy hadoop site xml files
RUN mkdir $HADOOP_PREFIX/input
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just to preserve the contents of the original files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, removing ...

RUN $HADOOP_PREFIX/bin/hdfs namenode -format

# fixing the libhadoop.so
RUN rm -rf /usr/local/hadoop/lib/native/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it required to fix the native libraries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The native libraries are not part of the Cloudera distribution. OTOH, hadoop seems to work okay without the native libraries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the native libraries are only required when using Hadoop from languages such as C/C++.

@@ -0,0 +1,85 @@
<configuration>
<property>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comments as for the non-kerberized version. Switching to mustache should allow us to to unify the templates a little.

@@ -0,0 +1,60 @@
FROM ubuntu:trusty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty old version of ubuntu. Furthermore, all this image seems to do is add the SSD Daemon (which I'm not sure if we need), and set environment variables that are later overwritten.

ENV HIVE_HOME /usr/local/hive
ENV HADOOP_HOME /usr/local/hadoop

ENV PATH $PATH:$JAVA_HOME/bin:$HIVE_HOME/bin:$HADOOP_HOME:$HADOOP_HOME/bin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are setting the path to folders that don't exist yet.

# install dev tools
RUN apt-get update
RUN apt-get install -y curl wget tar openssh-server openssh-client rsync python-software-properties apt-file apache2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not cleaning up the cache.

RUN echo 'root:secretpasswd' | chpasswd
RUN sed -i 's/PermitRootLogin without-password/PermitRootLogin yes/' /etc/ssh/sshd_config
RUN echo "ServerName localhost" >> /etc/apache2/apache2.conf
RUN sed -i 's/Listen 80/Listen 9999/g' /etc/apache2/ports.conf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason that we need an Apache webserver too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is actually used. I will remove it.

@susanxhuynh
Copy link
Contributor Author

@elezar

@susanxhuynh
Copy link
Contributor Author

First pass at combining ubuntu + hadoop + hive into a single image (currently under the "single-image" directory). Still a WIP.

@susanxhuynh
Copy link
Contributor Author

@elezar @samvantran I think I have addressed most of your comments. The highlights are:

  • Removed about 2k LOC.
  • Based on ubuntu 16.04.
  • Combined ubuntu + hadoop + hive into one image. There's a second image that adds kerberos support.
  • Removed the apache2 web server, but kept sshd (used by Hadoop) and Postgres (for Hive Metastore "remote" mode).
  • Removed unnecessary properties from Hadoop / Hive xml config files and log4j files.
  • Removed unused HADOOP_ env vars.
  • No mustache, but there's a simple shell script generate_configs.sh that autogenerates the Kerberos config files.

@susanxhuynh
Copy link
Contributor Author

@samvantran @elezar Gentle ping. See comment above.

Copy link
Contributor

@samvantran samvantran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me w/ a minor question
Probably want Evan to 👍 this PR since his review was extensive

*also probably want to merge with master to get over the failing CI tests that look similar to the statsd jar errors

@@ -0,0 +1,2 @@
<configuration>
</configuration>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file necessary? It's essentially empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, it's not necessary, but it serves as a placeholder for the "generate_configs.sh", so that the script can treat all config files (including yarn-site) equally.


# templating of config files
sed s/{{HOSTNAME}}/$HOSTNAME/ /usr/local/hadoop/etc/hadoop/core-site.xml.template > /usr/local/hadoop/etc/hadoop/core-site.xml
sed s/{{HOSTNAME}}/$HOSTNAME/ /usr/local/hadoop/etc/hadoop/yarn-site.xml.template > /usr/local/hadoop/etc/hadoop/yarn-site.xml
Copy link
Contributor

@samvantran samvantran Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up to my question below, it doesn't look like yarn-site.xml.template has a {{HOSTNAME}} to replace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll admit this one is a little confusing. What happens is this script gets called in both the non-kerberized and kerberized images. And, the kerberized version of yarn-site.xml does have a {{HOSTNAME}} in it, and I wanted to avoid special processing to account for that.

Copy link
Contributor

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @susanxhuynh, this is looking great now.

I have made one or two comments, but none of them are blockers on getting this PR in. Depending on what you're priorities are, I would say that we could merge this as is and create a follow-up ticket to get any improvements made.

The one thing that I think should be addressed is the fact that there is no -d option supported in the hadoop-bootstrap.sh script.

@@ -0,0 +1,149 @@
FROM ubuntu:16.04
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can handle this in a follow-up, but should we consider using the 18.04 LTS image?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember this email thread not long ago asking about DCOS on 18.04 and it seemed like Mesos still had to sort out some issues.

Let's hold off this for now

EXPOSE 22

# oracle jdk 8
RUN apt-get update && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also pull in the java archive that we use in all our applications, but this isn't a blocker.

rm -rf /var/lib/apt/lists/*

# java env setup
ENV JAVA_HOME /usr/lib/jvm/java-8-oracle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 deleted the first instance

ENV PATH $PATH:$HIVE_HOME/bin

# add postgresql jdbc jar to classpath
RUN ln -s /usr/share/java/postgresql-jdbc4.jar $HIVE_HOME/lib/postgresql-jdbc4.jar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not be moved to AFTER the postgres install below?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like it was copied from the parent project: https://github.com/tilakpatidar/cdh5_hive_postgres/blob/master/hive_pg/Dockerfile#L31

If I move it, docker build runs fine but I'd have to test it against the hive integration PR to know everything works

USER postgres
# initialize hive metastore db
# create metastore db, hive user and assign privileges
RUN cd $HIVE_HOME/scripts/metastore/upgrade/postgres/ &&\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit &&\ => && \

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

printenv | cat >> /root/.bashrc

# hadoop bootstrap
/etc/hadoop-bootstrap.sh -d
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no -d flag in the hadoop-bootstrap.sh script above. Is this intentional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed -d

/bin/bash
fi

if [[ $1 == "-d" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use an elif here instead followed by an else block that prints something if the argument is unknown.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

# start hive metastore server
$HIVE_HOME/bin/hive --service metastore &

sleep 20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way to check for readiness?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. I'm not very familiar enough w/ Hive.


[realms]
LOCAL = {
kdc = kdc.marathon.mesos:2500
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our other applications, we use a different endpoint here Should we make this configurable too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this is just for testing. Not sure we need to make this configurable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that it is different to how we handle KDC in all our other applications. If we need to move forward with this though, I'm not going to block the PR, but we should consider creating a follow-up ticket to unify this.

Copy link
Contributor

@samvantran samvantran Sep 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

included in DCOS-42219

cd "$( dirname "${BASH_SOURCE[0]}" )"
for FILE_BASE in core-site hdfs-site hive-site yarn-site; do
COMBINED_FILE="../templates/${FILE_BASE}.xml.template"
echo "Generating config file: kerberos/templates/${FILE_BASE}.xml.template"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker, but if we were to use Python for this, we could combine the XML as a structured document?
(see for example: https://github.com/mesosphere/dcos-commons/blob/master/frameworks/hdfs/tests/test_overlay.py#L65)

@elezar elezar dismissed their stale review August 29, 2018 14:27

Changes addressed. No blockers currently.

Copy link
Contributor

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one thing that I think should be addressed is the fact that there is no -d option supported in the hadoop-bootstrap.sh script.

Copy link
Contributor

@samvantran samvantran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elezar, please take another look at this PR. Among other fixes, I removed the -d flag which seemed most important to address.

rm -rf /var/lib/apt/lists/*

# java env setup
ENV JAVA_HOME /usr/lib/jvm/java-8-oracle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 deleted the first instance

EXPOSE 50010 50020 50070 50075 50090 8020 9000 10020 19888 8030 8031 8032 8033 8040 8042 8088

# download cdh hive
RUN curl -L http://archive.cloudera.com/cdh${CDH_VERSION}/cdh/${CDH_VERSION}/hive-1.1.0-cdh${CDH_EXACT_VERSION}.tar.gz \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

USER postgres
# initialize hive metastore db
# create metastore db, hive user and assign privileges
RUN cd $HIVE_HOME/scripts/metastore/upgrade/postgres/ &&\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

ENV PATH $PATH:$HIVE_HOME/bin

# add postgresql jdbc jar to classpath
RUN ln -s /usr/share/java/postgresql-jdbc4.jar $HIVE_HOME/lib/postgresql-jdbc4.jar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like it was copied from the parent project: https://github.com/tilakpatidar/cdh5_hive_postgres/blob/master/hive_pg/Dockerfile#L31

If I move it, docker build runs fine but I'd have to test it against the hive integration PR to know everything works


# disable ssl in postgres.conf
ADD conf/postgresql.conf $POSTGRESQL_MAIN
RUN echo $POSTGRESQL_MAIN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

printenv | cat >> /root/.bashrc

# hadoop bootstrap
/etc/hadoop-bootstrap.sh -d
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed -d

$HADOOP_PREFIX/sbin/start-dfs.sh
$HADOOP_PREFIX/sbin/start-yarn.sh

if [[ $1 == "-bash" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. This is a script that gets called from hive-bootstrap.sh so it'll just continue on afterward

# start hive metastore server
$HIVE_HOME/bin/hive --service metastore &

sleep 20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. I'm not very familiar enough w/ Hive.

/bin/bash
fi

if [[ $1 == "-d" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


[realms]
LOCAL = {
kdc = kdc.marathon.mesos:2500
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this is just for testing. Not sure we need to make this configurable.

Copy link
Contributor

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @samvantran.

This is definitely something that we can iterate on.

Some final thoughts:


[realms]
LOCAL = {
kdc = kdc.marathon.mesos:2500
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that it is different to how we handle KDC in all our other applications. If we need to move forward with this though, I'm not going to block the PR, but we should consider creating a follow-up ticket to unify this.

@@ -0,0 +1,39 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other tests we generate these kinds of application definitions on the fly in python -- templating where applicable. Not a blocker, but we could create a follow-up ticket.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created https://jira.mesosphere.com/browse/DCOS-42219 to address this and other non-blocking-but-should-still-do comments

<!-- NameNode security config -->
<property>
<name>dfs.namenode.keytab.file</name>
<value>/usr/local/hadoop/etc/hadoop/hdfs.keytab</value> <!-- path to the HDFS keytab -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also quite different to how we currently deploy kerberized HDFS. I know there isn't necessary too much overlap, but it would be good to not have to context switch too much when debugging issues with hive / hdfs.

<configuration>
<!-- Authentication -->
<property>
<name>hive.server2.authentication</name>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is server2 a predefined property of some kind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@samvantran samvantran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned up the postgres conf file and added back an envvar I mistakenly deleted. I created ticket https://jira.mesosphere.com/browse/DCOS-42219 to address followups

Also I tried the jenkins job you mentioned but was unsuccessful in publishing a docker image

@@ -0,0 +1,630 @@
# -----------------------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaned up in fed868e

<configuration>
<!-- Authentication -->
<property>
<name>hive.server2.authentication</name>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,39 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created https://jira.mesosphere.com/browse/DCOS-42219 to address this and other non-blocking-but-should-still-do comments


[realms]
LOCAL = {
kdc = kdc.marathon.mesos:2500
Copy link
Contributor

@samvantran samvantran Sep 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

included in DCOS-42219

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants