[DCOS-39050] Added files for Hive Docker image #392

susanxhuynh · 2018-08-14T22:16:37Z

This Hive Docker image is intended for use in Spark integration tests.

It's based on https://github.com/tilakpatidar/cdh5_hive_postgres (the one Evan found) with the following changes:

Removed docker-compose (not supported by Marathon)
Added Kerberos (see the kerberos/ directory)
- Kerberized versions of hadoop, hive xml config files
- krb5.conf
- Marathon config to run the container
Hive: changes in hive_pg/scripts/bootstrap.sh for Kerberos
You can also refer to my fork to see what I added.

Testing

I've tested by running a Spark Hive job against the kerberized image.
You can try the unkerberized image with: docker run -it susanxhuynh/cdh5-hive:latest /etc/hive-bootstrap.sh -bash
To run the kerberized image with a KDC, see the README.

susanxhuynh · 2018-08-15T23:51:59Z

Tests are passing now. The failures yesterday seem to have been temporary connectivity problems at the sbt mirror site.

samvantran

Had a few comments but overall looks good. I'll try testing it myself tomorrow

samvantran · 2018-08-15T23:02:57Z

tools/hive/download_deps.sh

+mkdir hive_pg/deps
+
+#download cdh
+echo "wget http://archive.cloudera.com/cdh${CDH_VERSION}/cdh/${CDH_VERSION}/hadoop-${HADOOP_VERSION}-cdh${CDH_EXACT_VERSION}.tar.gz | tar -xz -C /usr/local/"


The end includes | tar -xz -C /usr/local/ but not in the actual shell command on L11. Typo?

We definitely don't want to extract to /usr/local.

samvantran · 2018-08-15T23:10:48Z

tools/hive/hadoop-2.6.0/Dockerfile

+ENV HADOOP_COMMON_HOME /usr/local/hadoop
+ENV HADOOP_HDFS_HOME /usr/local/hadoop
+ENV HADOOP_MAPRED_HOME /usr/local/hadoop
+ENV HADOOP_YARN_HOME /usr/local/hadoop


These all point to the same directory... is it necessary to have all of them?
Also I don't think you use HADOOP_HDFS_HOME nor HADOOP_YARN_HOME in this file.

I thought they might be used while the container is running, but let me check on that.

samvantran · 2018-08-15T23:22:20Z

tools/hive/hive_pg/Dockerfile

+ENV HADOOP_COMMON_HOME /usr/local/hadoop
+ENV HADOOP_HDFS_HOME /usr/local/hadoop
+ENV HADOOP_MAPRED_HOME /usr/local/hadoop
+ENV HADOOP_YARN_HOME /usr/local/hadoop


same comment above, a few of these ENV vars are not used in this file including this one

samvantran · 2018-08-15T23:44:42Z

tools/hive/hive_pg/scripts/bootstrap.sh

+fi
+
+if [[ $1 == "-d" ]]; then
+  while true; do sleep 10000; done


samvantran · 2018-08-15T23:47:52Z

tools/hive/kerberos/Dockerfile

+RUN apt-get install -y krb5-user
+
+# run bootstrap script
+CMD ["/etc/hive-bootstrap.sh", "-d"]


Is this supposed to setup hive and then another process ssh's/executes commands inside the container? Just trying to understand how all of these parts intersect.

This command sets up a few directories and then starts the Hive servers. To create and query Hive tables, we would install Spark on the cluster and run a Spark job ... the job would point to the Hive container's IP address. Alternatively, you could go inside the container and start the Hive "beeline" program, in which you could also perform Hive queries.

samvantran · 2018-08-15T23:48:59Z

tools/hive/kerberos/marathon/hdfs-hive-kerberos.json

+    [
+      "hostname",
+      "IS",
+      "10.0.1.100"


Is this guaranteed to be there?

samvantran · 2018-08-15T23:54:40Z

tools/hive/kerberos/templates/hive-site.xml.template

+  <!-- property>
+    <name>hive.execution.engine</name>
+    <value>tez</value>
+  </property -->


Not a big deal but can we delete this if its commented out?

samvantran · 2018-08-16T00:08:34Z

tools/hive/ubuntu/base.env

+HADOOP_HDFS_HOME=/usr/local/hadoop
+HADOOP_MAPRED_HOME=/usr/local/hadoop
+HADOOP_YARN_HOME=/usr/local/hadoop
+HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop


Needs a newline?

samvantran · 2018-08-16T00:08:48Z

tools/hive/ubuntu/base.env

+HADOOP_COMMON_HOME=/usr/local/hadoop
+HADOOP_HDFS_HOME=/usr/local/hadoop
+HADOOP_MAPRED_HOME=/usr/local/hadoop
+HADOOP_YARN_HOME=/usr/local/hadoop


Are all of these supposed to point to the same dir?

elezar

Thanks @susanxhuynh. Great work in showing that this is possible and that we don't need an external Cloudera cluster for testing our Hive interactions. I have made comments while reviewing this, but it could definitely be that I'm missing some context along the way, so please let me know if that is the case.

Some general comments:

This 3 Docker Image (4 if one counts Kerberos) seems a bit excessive.
The images enable SSH access as well as an Apache web server.
The config for Hadoop / Hive is spread through the various layers and external file.
I don't know if the spark-build repo is the best place for this. As it is, we have problems with our "testing" docker images not being properly versioned, in the Mesosphere or, or under version control.
The removal of docker-compose has complicated the building of the images and identifying which changes are really required by us -- e.g. Kerberos-related changes.

Ideally, I would like to see:

The number of Docker images reduced. Another search for "standalone hive" brings up other solutions such as: https://github.com/jqcoffey/hive-standalone, which at first glance has the following points to note:
** ADVANTAGE: It uses a single image based off an OS image
** ADVANTAGE: It does not install postgres or sshd
** DISADVANTAGE: It installs system packages
The docker image moved to a separate repo so that its lifetime can be managed independently. For the Kafka and HDFS clients, we are looking at moving them to dcos-commons-ci for example.

elezar · 2018-08-16T12:53:06Z

tools/hive/download_deps.sh

+mkdir hive_pg/deps
+
+#download cdh
+echo "wget http://archive.cloudera.com/cdh${CDH_VERSION}/cdh/${CDH_VERSION}/hadoop-${HADOOP_VERSION}-cdh${CDH_EXACT_VERSION}.tar.gz | tar -xz -C /usr/local/"


We definitely don't want to extract to /usr/local.

elezar · 2018-08-16T12:55:50Z

tools/hive/download_deps.sh

@@ -0,0 +1,21 @@
+#!/usr/bin/env bash


To tell you the truth, I'm not sure what advantage the download_deps.sh file really give us. We could download the archives as part of the docker build process directly. We have in the past (e.g. with TensorFlow) had issues with certain hadoop archives no longer being present, but this is an issue with this script too.

elezar · 2018-08-16T13:01:01Z

tools/hive/hadoop-2.6.0/Dockerfile

+ADD ./deps/hadoop-${HADOOP_VERSION}-cdh${CDH_EXACT_VERSION}.tar.gz /usr/local
+RUN cd /usr/local && ln -s ./hadoop-${HADOOP_VERSION}-cdh${CDH_EXACT_VERSION} hadoop
+
+RUN sed -i '/^export JAVA_HOME/ s:.*:export JAVA_HOME=/usr/lib/jvm/java-8-oracle\nexport HADOOP_PREFIX=/usr/local/hadoop\nexport HADOOP_HOME=/usr/local/hadoop\n:' $HADOOP_PREFIX/etc/hadoop/hadoop-env.sh


This seems to be explicitly setting the contents of the /etc/hadoop/hadoop-env.sh file. Why is it not sufficient to rely on the environment variables in this case?

That is strange and Hadoop seems to work without this step, with the exception of setting JAVA_HOME. I don't know why, but it seems that JAVA_HOME has to be set directly in this file. https://stackoverflow.com/questions/14325594/working-with-hadoop-localhost-error-java-home-is-not-set

elezar · 2018-08-16T13:01:53Z

tools/hive/hadoop-2.6.0/Dockerfile

+RUN sed -i '/^export HADOOP_CONF_DIR/ s:.*:export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop/:' $HADOOP_PREFIX/etc/hadoop/hadoop-env.sh
+
+# copy hadoop site xml files
+RUN mkdir $HADOOP_PREFIX/input


Is this just to preserve the contents of the original files?

I guess, removing ...

elezar · 2018-08-16T13:04:47Z

tools/hive/hadoop-2.6.0/Dockerfile

+RUN $HADOOP_PREFIX/bin/hdfs namenode -format
+
+# fixing the libhadoop.so
+RUN rm -rf /usr/local/hadoop/lib/native/*


Why is it required to fix the native libraries?

The native libraries are not part of the Cloudera distribution. OTOH, hadoop seems to work okay without the native libraries.

I think the native libraries are only required when using Hadoop from languages such as C/C++.

elezar · 2018-08-16T13:28:50Z

tools/hive/kerberos/templates/yarn-site.xml.template

@@ -0,0 +1,85 @@
+<configuration>
+    <property>


Similar comments as for the non-kerberized version. Switching to mustache should allow us to to unify the templates a little.

elezar · 2018-08-16T13:30:33Z

tools/hive/ubuntu/Dockerfile

@@ -0,0 +1,60 @@
+FROM ubuntu:trusty


This is pretty old version of ubuntu. Furthermore, all this image seems to do is add the SSD Daemon (which I'm not sure if we need), and set environment variables that are later overwritten.

elezar · 2018-08-16T13:30:48Z

tools/hive/ubuntu/Dockerfile

+ENV HIVE_HOME /usr/local/hive
+ENV HADOOP_HOME /usr/local/hadoop
+
+ENV PATH $PATH:$JAVA_HOME/bin:$HIVE_HOME/bin:$HADOOP_HOME:$HADOOP_HOME/bin


Here we are setting the path to folders that don't exist yet.

elezar · 2018-08-16T13:31:00Z

tools/hive/ubuntu/Dockerfile

+# install dev tools
+RUN apt-get update
+RUN apt-get install -y curl wget tar openssh-server openssh-client rsync python-software-properties apt-file apache2
+


We're not cleaning up the cache.

elezar · 2018-08-16T13:31:34Z

tools/hive/ubuntu/Dockerfile

+RUN echo 'root:secretpasswd' | chpasswd
+RUN sed -i 's/PermitRootLogin without-password/PermitRootLogin yes/' /etc/ssh/sshd_config
+RUN echo "ServerName localhost" >> /etc/apache2/apache2.conf
+RUN sed -i 's/Listen 80/Listen 9999/g' /etc/apache2/ports.conf


Is there a reason that we need an Apache webserver too?

I don't think this is actually used. I will remove it.

susanxhuynh · 2018-08-16T16:42:18Z

@elezar

I will reduce the number of images.
I can remove the apache2 web server but not sshd or postgres.
Sshd is a prerequisite for Hadoop. See https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html#Setup_passphraseless_ssh
Postgres is needed to run Hive in remote mode, which is necessary to reproduce the Spark bug that I need to repro (https://issues.apache.org/jira/browse/SPARK-24338). The exception occurs in the "ThriftHiveMetastore" Client class, which is only used in Hive remote mode. More information about remote mode: https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_hive_metastore_configure.html#topic_18_4_1__title_508
I'll move this docker image into a separate repo, but I'll make changes here for now since you and Sam have already left comments here.

…ems to be a prerequisite for installing the software-properties-common package. Removed ubuntu bootstrap script.

susanxhuynh · 2018-08-17T00:28:39Z

First pass at combining ubuntu + hadoop + hive into a single image (currently under the "single-image" directory). Still a WIP.

…v vars, (3) added "{{}}" to templated variable

susanxhuynh · 2018-08-23T02:18:31Z

@elezar @samvantran I think I have addressed most of your comments. The highlights are:

Removed about 2k LOC.
Based on ubuntu 16.04.
Combined ubuntu + hadoop + hive into one image. There's a second image that adds kerberos support.
Removed the apache2 web server, but kept sshd (used by Hadoop) and Postgres (for Hive Metastore "remote" mode).
Removed unnecessary properties from Hadoop / Hive xml config files and log4j files.
Removed unused HADOOP_ env vars.
No mustache, but there's a simple shell script generate_configs.sh that autogenerates the Kerberos config files.

susanxhuynh · 2018-08-24T01:40:59Z

@samvantran @elezar Gentle ping. See comment above.

samvantran

Looks good to me w/ a minor question
Probably want Evan to 👍 this PR since his review was extensive

*also probably want to merge with master to get over the failing CI tests that look similar to the statsd jar errors

samvantran · 2018-08-24T22:28:25Z

tools/hive/hadoop-hive/templates/yarn-site.xml.template

@@ -0,0 +1,2 @@
+<configuration>
+</configuration>


Is this file necessary? It's essentially empty.

Strictly speaking, it's not necessary, but it serves as a placeholder for the "generate_configs.sh", so that the script can treat all config files (including yarn-site) equally.

samvantran · 2018-08-24T22:29:09Z

tools/hive/hadoop-hive/scripts/hadoop-bootstrap.sh

+
+# templating of config files
+sed s/{{HOSTNAME}}/$HOSTNAME/ /usr/local/hadoop/etc/hadoop/core-site.xml.template > /usr/local/hadoop/etc/hadoop/core-site.xml
+sed s/{{HOSTNAME}}/$HOSTNAME/ /usr/local/hadoop/etc/hadoop/yarn-site.xml.template > /usr/local/hadoop/etc/hadoop/yarn-site.xml


Follow up to my question below, it doesn't look like yarn-site.xml.template has a {{HOSTNAME}} to replace

I'll admit this one is a little confusing. What happens is this script gets called in both the non-kerberized and kerberized images. And, the kerberized version of yarn-site.xml does have a {{HOSTNAME}} in it, and I wanted to avoid special processing to account for that.

elezar

Thanks @susanxhuynh, this is looking great now.

I have made one or two comments, but none of them are blockers on getting this PR in. Depending on what you're priorities are, I would say that we could merge this as is and create a follow-up ticket to get any improvements made.

The one thing that I think should be addressed is the fact that there is no -d option supported in the hadoop-bootstrap.sh script.

elezar · 2018-08-28T08:38:07Z

tools/hive/hadoop-hive/Dockerfile

@@ -0,0 +1,149 @@
+FROM ubuntu:16.04


We can handle this in a follow-up, but should we consider using the 18.04 LTS image?

I remember this email thread not long ago asking about DCOS on 18.04 and it seemed like Mesos still had to sort out some issues.

Let's hold off this for now

elezar · 2018-08-28T08:39:20Z

tools/hive/hadoop-hive/Dockerfile

+EXPOSE 22
+
+# oracle jdk 8
+RUN apt-get update && \


We could also pull in the java archive that we use in all our applications, but this isn't a blocker.

elezar · 2018-08-28T08:39:52Z

tools/hive/hadoop-hive/Dockerfile

+    rm -rf /var/lib/apt/lists/*
+
+# java env setup
+ENV JAVA_HOME /usr/lib/jvm/java-8-oracle


This is set on https://github.com/mesosphere/spark-build/pull/392/files#diff-0aa25f6cadbb637eae9df102b049a59dR5 as well. Rather just set it in one place.

👍 deleted the first instance

elezar · 2018-08-28T08:43:08Z

tools/hive/hadoop-hive/Dockerfile

+ENV PATH $PATH:$HIVE_HOME/bin
+
+# add postgresql jdbc jar to classpath
+RUN ln -s /usr/share/java/postgresql-jdbc4.jar $HIVE_HOME/lib/postgresql-jdbc4.jar


Should this not be moved to AFTER the postgres install below?

looks like it was copied from the parent project: https://github.com/tilakpatidar/cdh5_hive_postgres/blob/master/hive_pg/Dockerfile#L31

If I move it, docker build runs fine but I'd have to test it against the hive integration PR to know everything works

elezar · 2018-08-28T08:43:46Z

tools/hive/hadoop-hive/Dockerfile

+USER postgres
+# initialize hive metastore db
+# create metastore db, hive user and assign privileges
+RUN cd $HIVE_HOME/scripts/metastore/upgrade/postgres/ &&\


Nit &&\ => && \

elezar · 2018-08-28T11:37:12Z

tools/hive/hadoop-hive/scripts/hive-bootstrap.sh

+printenv | cat >> /root/.bashrc
+
+# hadoop bootstrap
+/etc/hadoop-bootstrap.sh -d


There is no -d flag in the hadoop-bootstrap.sh script above. Is this intentional?

removed -d

elezar · 2018-08-28T11:39:21Z

tools/hive/hadoop-hive/scripts/hive-bootstrap.sh

+  /bin/bash
+fi
+
+if [[ $1 == "-d" ]]; then


We could use an elif here instead followed by an else block that prints something if the argument is unknown.

elezar · 2018-08-28T11:39:40Z

tools/hive/hadoop-hive/scripts/hive-bootstrap.sh

+# start hive metastore server
+$HIVE_HOME/bin/hive --service metastore &
+
+sleep 20


Is there a better way to check for readiness?

Not sure. I'm not very familiar enough w/ Hive.

elezar · 2018-08-28T11:41:31Z

tools/hive/kerberos/conf/krb5.conf

+
+[realms]
+LOCAL = {
+  kdc = kdc.marathon.mesos:2500


In our other applications, we use a different endpoint here Should we make this configurable too?

Hm, this is just for testing. Not sure we need to make this configurable.

My concern is that it is different to how we handle KDC in all our other applications. If we need to move forward with this though, I'm not going to block the PR, but we should consider creating a follow-up ticket to unify this.

included in DCOS-42219

elezar · 2018-08-28T11:44:16Z

tools/hive/kerberos/scripts/generate_configs.sh

+cd "$( dirname "${BASH_SOURCE[0]}" )"
+for FILE_BASE in core-site hdfs-site hive-site yarn-site; do
+    COMBINED_FILE="../templates/${FILE_BASE}.xml.template"
+    echo "Generating config file: kerberos/templates/${FILE_BASE}.xml.template"


Not a blocker, but if we were to use Python for this, we could combine the XML as a structured document?
(see for example: https://github.com/mesosphere/dcos-commons/blob/master/frameworks/hdfs/tests/test_overlay.py#L65)

Changes addressed. No blockers currently.

elezar

The one thing that I think should be addressed is the fact that there is no -d option supported in the hadoop-bootstrap.sh script.

samvantran

@elezar, please take another look at this PR. Among other fixes, I removed the -d flag which seemed most important to address.

samvantran · 2018-09-18T20:48:20Z

tools/hive/hadoop-hive/Dockerfile

+    rm -rf /var/lib/apt/lists/*
+
+# java env setup
+ENV JAVA_HOME /usr/lib/jvm/java-8-oracle


👍 deleted the first instance

samvantran · 2018-09-18T21:14:31Z

tools/hive/hadoop-hive/Dockerfile

+EXPOSE 50010 50020 50070 50075 50090 8020 9000 10020 19888 8030 8031 8032 8033 8040 8042 8088
+
+# download cdh hive
+RUN curl -L http://archive.cloudera.com/cdh${CDH_VERSION}/cdh/${CDH_VERSION}/hive-1.1.0-cdh${CDH_EXACT_VERSION}.tar.gz \


samvantran · 2018-09-18T21:15:24Z

tools/hive/hadoop-hive/Dockerfile

+USER postgres
+# initialize hive metastore db
+# create metastore db, hive user and assign privileges
+RUN cd $HIVE_HOME/scripts/metastore/upgrade/postgres/ &&\


samvantran · 2018-09-18T21:41:58Z

tools/hive/hadoop-hive/Dockerfile

+ENV PATH $PATH:$HIVE_HOME/bin
+
+# add postgresql jdbc jar to classpath
+RUN ln -s /usr/share/java/postgresql-jdbc4.jar $HIVE_HOME/lib/postgresql-jdbc4.jar


looks like it was copied from the parent project: https://github.com/tilakpatidar/cdh5_hive_postgres/blob/master/hive_pg/Dockerfile#L31

If I move it, docker build runs fine but I'd have to test it against the hive integration PR to know everything works

samvantran · 2018-09-18T21:42:54Z

tools/hive/hadoop-hive/Dockerfile

+
+# disable ssl in postgres.conf
+ADD conf/postgresql.conf $POSTGRESQL_MAIN
+RUN echo $POSTGRESQL_MAIN


samvantran · 2018-09-18T22:31:31Z

tools/hive/hadoop-hive/scripts/hive-bootstrap.sh

+printenv | cat >> /root/.bashrc
+
+# hadoop bootstrap
+/etc/hadoop-bootstrap.sh -d


removed -d

samvantran · 2018-09-18T22:33:23Z

tools/hive/hadoop-hive/scripts/hadoop-bootstrap.sh

+$HADOOP_PREFIX/sbin/start-dfs.sh
+$HADOOP_PREFIX/sbin/start-yarn.sh
+
+if [[ $1 == "-bash" ]]; then


I don't think so. This is a script that gets called from hive-bootstrap.sh so it'll just continue on afterward

samvantran · 2018-09-18T22:37:23Z

tools/hive/hadoop-hive/scripts/hive-bootstrap.sh

+# start hive metastore server
+$HIVE_HOME/bin/hive --service metastore &
+
+sleep 20


Not sure. I'm not very familiar enough w/ Hive.

samvantran · 2018-09-18T22:40:40Z

tools/hive/hadoop-hive/scripts/hive-bootstrap.sh

+  /bin/bash
+fi
+
+if [[ $1 == "-d" ]]; then


samvantran · 2018-09-18T22:42:22Z

tools/hive/kerberos/conf/krb5.conf

+
+[realms]
+LOCAL = {
+  kdc = kdc.marathon.mesos:2500


Hm, this is just for testing. Not sure we need to make this configurable.

elezar

Thanks @samvantran.

This is definitely something that we can iterate on.

Some final thoughts:

There is a bit of a disconnect with how we handle Kerberos configuration for other services. I know that we're treating this as a one-off, but getting a bit more uniformity could be useful. Definitely out of scope for this Pr though.
We're depending on @susanxhuynh's private docker repository here. This means that we'll have one more docker image to migrate to the mesosphere repo. We should probably consider doing it now. For what it's worth -- we can use https://jenkins.mesosphere.com/service/jenkins/view/Infinity/job/infinity-tools/job/release-tools/job/build-docker-image/ to build arbitrary docker images.

elezar · 2018-09-19T15:14:12Z

tools/hive/kerberos/conf/krb5.conf

+
+[realms]
+LOCAL = {
+  kdc = kdc.marathon.mesos:2500


My concern is that it is different to how we handle KDC in all our other applications. If we need to move forward with this though, I'm not going to block the PR, but we should consider creating a follow-up ticket to unify this.

elezar · 2018-09-19T15:15:23Z

tools/hive/kerberos/marathon/hdfs-hive-kerberos.json

@@ -0,0 +1,39 @@
+{


In other tests we generate these kinds of application definitions on the fly in python -- templating where applicable. Not a blocker, but we could create a follow-up ticket.

created https://jira.mesosphere.com/browse/DCOS-42219 to address this and other non-blocking-but-should-still-do comments

elezar · 2018-09-19T15:16:49Z

tools/hive/kerberos/templates/hdfs-site-kerberos.xml.template

+    <!-- NameNode security config -->
+    <property>
+        <name>dfs.namenode.keytab.file</name>
+        <value>/usr/local/hadoop/etc/hadoop/hdfs.keytab</value> <!-- path to the HDFS keytab -->


This is also quite different to how we currently deploy kerberized HDFS. I know there isn't necessary too much overlap, but it would be good to not have to context switch too much when debugging issues with hive / hdfs.

elezar · 2018-09-19T15:17:15Z

tools/hive/kerberos/templates/hive-site-kerberos.xml.template

+<configuration>
+  <!-- Authentication -->
+  <property>
+    <name>hive.server2.authentication</name>


Is server2 a predefined property of some kind?

yes, its the improved version of hiveserver: https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2

samvantran

Cleaned up the postgres conf file and added back an envvar I mistakenly deleted. I created ticket https://jira.mesosphere.com/browse/DCOS-42219 to address followups

Also I tried the jenkins job you mentioned but was unsuccessful in publishing a docker image

samvantran · 2018-09-19T17:00:56Z

tools/hive/hadoop-hive/conf/postgresql.conf

@@ -0,0 +1,630 @@
+# -----------------------------


cleaned up in fed868e

samvantran · 2018-09-19T17:11:54Z

tools/hive/kerberos/templates/hive-site-kerberos.xml.template

+<configuration>
+  <!-- Authentication -->
+  <property>
+    <name>hive.server2.authentication</name>


yes, its the improved version of hiveserver: https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2

samvantran · 2018-09-19T17:19:37Z

tools/hive/kerberos/marathon/hdfs-hive-kerberos.json

@@ -0,0 +1,39 @@
+{


created https://jira.mesosphere.com/browse/DCOS-42219 to address this and other non-blocking-but-should-still-do comments

samvantran · 2018-09-19T17:19:51Z

tools/hive/kerberos/conf/krb5.conf

+
+[realms]
+LOCAL = {
+  kdc = kdc.marathon.mesos:2500


included in DCOS-42219

susanxhuynh added 2 commits August 14, 2018 15:03

added files for Hive Docker image

c497250

use CMD instead of ENTRYPOINT

edb729f

susanxhuynh requested review from samvantran and elezar August 15, 2018 00:08

samvantran reviewed Aug 16, 2018

View reviewed changes

elezar previously requested changes Aug 16, 2018

View reviewed changes

susanxhuynh added 5 commits August 16, 2018 10:41

Removed the apache2 configuration, but left the package because it se…

fdb6c45

…ems to be a prerequisite for installing the software-properties-common package. Removed ubuntu bootstrap script.

Switched to ubuntu 16.04

85cd11b

first pass combining ubuntu and hadoop images

2585dff

removed unused code in Hadoop Dockerfile / setup

81719af

combined hive into the single image

d2839f9

susanxhuynh added 5 commits August 21, 2018 10:40

removed unnecessary configs from yarn-site.xml and hive-site.xml

8ab718b

removed more hive-site.xml properties, removed log4j config files

16e032d

(1) removed ubuntu/, hadoop/, hive/, (2) removed redundant HADOOP_ en…

1bfcff9

…v vars, (3) added "{{}}" to templated variable

updated the README

4457238

shell script to autogenerate the Kerberos Hadoop config files

c42d82b

samvantran approved these changes Aug 24, 2018

View reviewed changes

elezar reviewed Aug 28, 2018

View reviewed changes

elezar suggested changes Aug 29, 2018

View reviewed changes

samvantran added 2 commits September 18, 2018 09:17

Merge branch 'master' into sh-dcos-39050

585571e

Address Evan's comments

7552c5c

samvantran reviewed Sep 18, 2018

View reviewed changes

elezar approved these changes Sep 19, 2018

View reviewed changes

Cleanup postgresql.conf

fed868e

Add back pg pw envvar

27a0a4d

samvantran mentioned this pull request Sep 19, 2018

[DCOS-40151] Added Sentry to Kerberized Hive image #397

Open

samvantran reviewed Sep 19, 2018

View reviewed changes

		@@ -0,0 +1,2 @@
		<configuration>
		</configuration>

[DCOS-39050] Added files for Hive Docker image #392

Are you sure you want to change the base?

[DCOS-39050] Added files for Hive Docker image #392

Conversation

susanxhuynh commented Aug 14, 2018 • edited Loading

susanxhuynh commented Aug 15, 2018

samvantran left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elezar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

susanxhuynh commented Aug 16, 2018

susanxhuynh commented Aug 17, 2018

susanxhuynh commented Aug 23, 2018

susanxhuynh commented Aug 24, 2018

samvantran left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samvantran Aug 24, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elezar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samvantran Sep 19, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elezar left a comment

Choose a reason for hiding this comment

samvantran left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

susanxhuynh commented Aug 14, 2018 •

edited

Loading

samvantran left a comment •

edited

Loading

samvantran Aug 24, 2018 •

edited

Loading

samvantran Sep 19, 2018 •

edited

Loading