-
Notifications
You must be signed in to change notification settings - Fork 34
[DCOS-39050] Added files for Hive Docker image #392
base: master
Are you sure you want to change the base?
Conversation
Tests are passing now. The failures yesterday seem to have been temporary connectivity problems at the sbt mirror site. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a few comments but overall looks good. I'll try testing it myself tomorrow
tools/hive/download_deps.sh
Outdated
mkdir hive_pg/deps | ||
|
||
#download cdh | ||
echo "wget http://archive.cloudera.com/cdh${CDH_VERSION}/cdh/${CDH_VERSION}/hadoop-${HADOOP_VERSION}-cdh${CDH_EXACT_VERSION}.tar.gz | tar -xz -C /usr/local/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The end includes | tar -xz -C /usr/local/
but not in the actual shell command on L11. Typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely don't want to extract to /usr/local
.
tools/hive/hadoop-2.6.0/Dockerfile
Outdated
ENV HADOOP_COMMON_HOME /usr/local/hadoop | ||
ENV HADOOP_HDFS_HOME /usr/local/hadoop | ||
ENV HADOOP_MAPRED_HOME /usr/local/hadoop | ||
ENV HADOOP_YARN_HOME /usr/local/hadoop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These all point to the same directory... is it necessary to have all of them?
Also I don't think you use HADOOP_HDFS_HOME
nor HADOOP_YARN_HOME
in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought they might be used while the container is running, but let me check on that.
tools/hive/hive_pg/Dockerfile
Outdated
ENV HADOOP_COMMON_HOME /usr/local/hadoop | ||
ENV HADOOP_HDFS_HOME /usr/local/hadoop | ||
ENV HADOOP_MAPRED_HOME /usr/local/hadoop | ||
ENV HADOOP_YARN_HOME /usr/local/hadoop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment above, a few of these ENV vars are not used in this file including this one
fi | ||
|
||
if [[ $1 == "-d" ]]; then | ||
while true; do sleep 10000; done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clever :)
tools/hive/kerberos/Dockerfile
Outdated
RUN apt-get install -y krb5-user | ||
|
||
# run bootstrap script | ||
CMD ["/etc/hive-bootstrap.sh", "-d"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to setup hive and then another process ssh's/executes commands inside the container? Just trying to understand how all of these parts intersect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This command sets up a few directories and then starts the Hive servers. To create and query Hive tables, we would install Spark on the cluster and run a Spark job ... the job would point to the Hive container's IP address. Alternatively, you could go inside the container and start the Hive "beeline" program, in which you could also perform Hive queries.
[ | ||
"hostname", | ||
"IS", | ||
"10.0.1.100" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this guaranteed to be there?
<!-- property> | ||
<name>hive.execution.engine</name> | ||
<value>tez</value> | ||
</property --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal but can we delete this if its commented out?
tools/hive/ubuntu/base.env
Outdated
HADOOP_HDFS_HOME=/usr/local/hadoop | ||
HADOOP_MAPRED_HOME=/usr/local/hadoop | ||
HADOOP_YARN_HOME=/usr/local/hadoop | ||
HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a newline?
tools/hive/ubuntu/base.env
Outdated
HADOOP_COMMON_HOME=/usr/local/hadoop | ||
HADOOP_HDFS_HOME=/usr/local/hadoop | ||
HADOOP_MAPRED_HOME=/usr/local/hadoop | ||
HADOOP_YARN_HOME=/usr/local/hadoop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all of these supposed to point to the same dir?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @susanxhuynh. Great work in showing that this is possible and that we don't need an external Cloudera cluster for testing our Hive interactions. I have made comments while reviewing this, but it could definitely be that I'm missing some context along the way, so please let me know if that is the case.
Some general comments:
- This 3 Docker Image (4 if one counts Kerberos) seems a bit excessive.
- The images enable SSH access as well as an Apache web server.
- The config for Hadoop / Hive is spread through the various layers and external file.
- I don't know if the
spark-build
repo is the best place for this. As it is, we have problems with our "testing" docker images not being properly versioned, in the Mesosphere or, or under version control. - The removal of
docker-compose
has complicated the building of the images and identifying which changes are really required by us -- e.g. Kerberos-related changes.
Ideally, I would like to see:
- The number of Docker images reduced. Another search for "standalone hive" brings up other solutions such as: https://github.com/jqcoffey/hive-standalone, which at first glance has the following points to note:
** ADVANTAGE: It uses a single image based off an OS image
** ADVANTAGE: It does not install postgres or sshd
** DISADVANTAGE: It installs system packages - The docker image moved to a separate repo so that its lifetime can be managed independently. For the Kafka and HDFS clients, we are looking at moving them to
dcos-commons-ci
for example.
tools/hive/download_deps.sh
Outdated
mkdir hive_pg/deps | ||
|
||
#download cdh | ||
echo "wget http://archive.cloudera.com/cdh${CDH_VERSION}/cdh/${CDH_VERSION}/hadoop-${HADOOP_VERSION}-cdh${CDH_EXACT_VERSION}.tar.gz | tar -xz -C /usr/local/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely don't want to extract to /usr/local
.
tools/hive/download_deps.sh
Outdated
@@ -0,0 +1,21 @@ | |||
#!/usr/bin/env bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To tell you the truth, I'm not sure what advantage the download_deps.sh
file really give us. We could download the archives as part of the docker build
process directly. We have in the past (e.g. with TensorFlow) had issues with certain hadoop archives no longer being present, but this is an issue with this script too.
tools/hive/hadoop-2.6.0/Dockerfile
Outdated
ADD ./deps/hadoop-${HADOOP_VERSION}-cdh${CDH_EXACT_VERSION}.tar.gz /usr/local | ||
RUN cd /usr/local && ln -s ./hadoop-${HADOOP_VERSION}-cdh${CDH_EXACT_VERSION} hadoop | ||
|
||
RUN sed -i '/^export JAVA_HOME/ s:.*:export JAVA_HOME=/usr/lib/jvm/java-8-oracle\nexport HADOOP_PREFIX=/usr/local/hadoop\nexport HADOOP_HOME=/usr/local/hadoop\n:' $HADOOP_PREFIX/etc/hadoop/hadoop-env.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be explicitly setting the contents of the /etc/hadoop/hadoop-env.sh
file. Why is it not sufficient to rely on the environment variables in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is strange and Hadoop seems to work without this step, with the exception of setting JAVA_HOME. I don't know why, but it seems that JAVA_HOME has to be set directly in this file. https://stackoverflow.com/questions/14325594/working-with-hadoop-localhost-error-java-home-is-not-set
tools/hive/hadoop-2.6.0/Dockerfile
Outdated
RUN sed -i '/^export HADOOP_CONF_DIR/ s:.*:export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop/:' $HADOOP_PREFIX/etc/hadoop/hadoop-env.sh | ||
|
||
# copy hadoop site xml files | ||
RUN mkdir $HADOOP_PREFIX/input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just to preserve the contents of the original files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, removing ...
tools/hive/hadoop-2.6.0/Dockerfile
Outdated
RUN $HADOOP_PREFIX/bin/hdfs namenode -format | ||
|
||
# fixing the libhadoop.so | ||
RUN rm -rf /usr/local/hadoop/lib/native/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it required to fix the native libraries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The native libraries are not part of the Cloudera distribution. OTOH, hadoop seems to work okay without the native libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the native libraries are only required when using Hadoop from languages such as C/C++.
@@ -0,0 +1,85 @@ | |||
<configuration> | |||
<property> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comments as for the non-kerberized version. Switching to mustache should allow us to to unify the templates a little.
tools/hive/ubuntu/Dockerfile
Outdated
@@ -0,0 +1,60 @@ | |||
FROM ubuntu:trusty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty old version of ubuntu. Furthermore, all this image seems to do is add the SSD Daemon (which I'm not sure if we need), and set environment variables that are later overwritten.
tools/hive/ubuntu/Dockerfile
Outdated
ENV HIVE_HOME /usr/local/hive | ||
ENV HADOOP_HOME /usr/local/hadoop | ||
|
||
ENV PATH $PATH:$JAVA_HOME/bin:$HIVE_HOME/bin:$HADOOP_HOME:$HADOOP_HOME/bin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are setting the path to folders that don't exist yet.
tools/hive/ubuntu/Dockerfile
Outdated
# install dev tools | ||
RUN apt-get update | ||
RUN apt-get install -y curl wget tar openssh-server openssh-client rsync python-software-properties apt-file apache2 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not cleaning up the cache.
tools/hive/ubuntu/Dockerfile
Outdated
RUN echo 'root:secretpasswd' | chpasswd | ||
RUN sed -i 's/PermitRootLogin without-password/PermitRootLogin yes/' /etc/ssh/sshd_config | ||
RUN echo "ServerName localhost" >> /etc/apache2/apache2.conf | ||
RUN sed -i 's/Listen 80/Listen 9999/g' /etc/apache2/ports.conf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason that we need an Apache webserver too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is actually used. I will remove it.
|
…ems to be a prerequisite for installing the software-properties-common package. Removed ubuntu bootstrap script.
First pass at combining ubuntu + hadoop + hive into a single image (currently under the "single-image" directory). Still a WIP. |
…v vars, (3) added "{{}}" to templated variable
@elezar @samvantran I think I have addressed most of your comments. The highlights are:
|
@samvantran @elezar Gentle ping. See comment above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me w/ a minor question
Probably want Evan to 👍 this PR since his review was extensive
*also probably want to merge with master to get over the failing CI tests that look similar to the statsd jar errors
@@ -0,0 +1,2 @@ | |||
<configuration> | |||
</configuration> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this file necessary? It's essentially empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking, it's not necessary, but it serves as a placeholder for the "generate_configs.sh", so that the script can treat all config files (including yarn-site
) equally.
|
||
# templating of config files | ||
sed s/{{HOSTNAME}}/$HOSTNAME/ /usr/local/hadoop/etc/hadoop/core-site.xml.template > /usr/local/hadoop/etc/hadoop/core-site.xml | ||
sed s/{{HOSTNAME}}/$HOSTNAME/ /usr/local/hadoop/etc/hadoop/yarn-site.xml.template > /usr/local/hadoop/etc/hadoop/yarn-site.xml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow up to my question below, it doesn't look like yarn-site.xml.template
has a {{HOSTNAME}}
to replace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll admit this one is a little confusing. What happens is this script gets called in both the non-kerberized and kerberized images. And, the kerberized version of yarn-site.xml
does have a {{HOSTNAME}}
in it, and I wanted to avoid special processing to account for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @susanxhuynh, this is looking great now.
I have made one or two comments, but none of them are blockers on getting this PR in. Depending on what you're priorities are, I would say that we could merge this as is and create a follow-up ticket to get any improvements made.
The one thing that I think should be addressed is the fact that there is no -d
option supported in the hadoop-bootstrap.sh
script.
@@ -0,0 +1,149 @@ | |||
FROM ubuntu:16.04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can handle this in a follow-up, but should we consider using the 18.04
LTS image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember this email thread not long ago asking about DCOS on 18.04 and it seemed like Mesos still had to sort out some issues.
Let's hold off this for now
EXPOSE 22 | ||
|
||
# oracle jdk 8 | ||
RUN apt-get update && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also pull in the java archive that we use in all our applications, but this isn't a blocker.
rm -rf /var/lib/apt/lists/* | ||
|
||
# java env setup | ||
ENV JAVA_HOME /usr/lib/jvm/java-8-oracle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is set on https://github.com/mesosphere/spark-build/pull/392/files#diff-0aa25f6cadbb637eae9df102b049a59dR5 as well. Rather just set it in one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 deleted the first instance
tools/hive/hadoop-hive/Dockerfile
Outdated
ENV PATH $PATH:$HIVE_HOME/bin | ||
|
||
# add postgresql jdbc jar to classpath | ||
RUN ln -s /usr/share/java/postgresql-jdbc4.jar $HIVE_HOME/lib/postgresql-jdbc4.jar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this not be moved to AFTER the postgres install below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like it was copied from the parent project: https://github.com/tilakpatidar/cdh5_hive_postgres/blob/master/hive_pg/Dockerfile#L31
If I move it, docker build
runs fine but I'd have to test it against the hive integration PR to know everything works
tools/hive/hadoop-hive/Dockerfile
Outdated
USER postgres | ||
# initialize hive metastore db | ||
# create metastore db, hive user and assign privileges | ||
RUN cd $HIVE_HOME/scripts/metastore/upgrade/postgres/ &&\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit &&\
=> && \
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
printenv | cat >> /root/.bashrc | ||
|
||
# hadoop bootstrap | ||
/etc/hadoop-bootstrap.sh -d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no -d
flag in the hadoop-bootstrap.sh
script above. Is this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed -d
/bin/bash | ||
fi | ||
|
||
if [[ $1 == "-d" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use an elif
here instead followed by an else
block that prints something if the argument is unknown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
# start hive metastore server | ||
$HIVE_HOME/bin/hive --service metastore & | ||
|
||
sleep 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better way to check for readiness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. I'm not very familiar enough w/ Hive.
|
||
[realms] | ||
LOCAL = { | ||
kdc = kdc.marathon.mesos:2500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our other applications, we use a different endpoint here Should we make this configurable too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this is just for testing. Not sure we need to make this configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that it is different to how we handle KDC in all our other applications. If we need to move forward with this though, I'm not going to block the PR, but we should consider creating a follow-up ticket to unify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
included in DCOS-42219
cd "$( dirname "${BASH_SOURCE[0]}" )" | ||
for FILE_BASE in core-site hdfs-site hive-site yarn-site; do | ||
COMBINED_FILE="../templates/${FILE_BASE}.xml.template" | ||
echo "Generating config file: kerberos/templates/${FILE_BASE}.xml.template" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker, but if we were to use Python for this, we could combine the XML as a structured document?
(see for example: https://github.com/mesosphere/dcos-commons/blob/master/frameworks/hdfs/tests/test_overlay.py#L65)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one thing that I think should be addressed is the fact that there is no -d option supported in the hadoop-bootstrap.sh script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elezar, please take another look at this PR. Among other fixes, I removed the -d
flag which seemed most important to address.
rm -rf /var/lib/apt/lists/* | ||
|
||
# java env setup | ||
ENV JAVA_HOME /usr/lib/jvm/java-8-oracle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 deleted the first instance
tools/hive/hadoop-hive/Dockerfile
Outdated
EXPOSE 50010 50020 50070 50075 50090 8020 9000 10020 19888 8030 8031 8032 8033 8040 8042 8088 | ||
|
||
# download cdh hive | ||
RUN curl -L http://archive.cloudera.com/cdh${CDH_VERSION}/cdh/${CDH_VERSION}/hive-1.1.0-cdh${CDH_EXACT_VERSION}.tar.gz \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
tools/hive/hadoop-hive/Dockerfile
Outdated
USER postgres | ||
# initialize hive metastore db | ||
# create metastore db, hive user and assign privileges | ||
RUN cd $HIVE_HOME/scripts/metastore/upgrade/postgres/ &&\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
tools/hive/hadoop-hive/Dockerfile
Outdated
ENV PATH $PATH:$HIVE_HOME/bin | ||
|
||
# add postgresql jdbc jar to classpath | ||
RUN ln -s /usr/share/java/postgresql-jdbc4.jar $HIVE_HOME/lib/postgresql-jdbc4.jar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like it was copied from the parent project: https://github.com/tilakpatidar/cdh5_hive_postgres/blob/master/hive_pg/Dockerfile#L31
If I move it, docker build
runs fine but I'd have to test it against the hive integration PR to know everything works
tools/hive/hadoop-hive/Dockerfile
Outdated
|
||
# disable ssl in postgres.conf | ||
ADD conf/postgresql.conf $POSTGRESQL_MAIN | ||
RUN echo $POSTGRESQL_MAIN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
printenv | cat >> /root/.bashrc | ||
|
||
# hadoop bootstrap | ||
/etc/hadoop-bootstrap.sh -d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed -d
$HADOOP_PREFIX/sbin/start-dfs.sh | ||
$HADOOP_PREFIX/sbin/start-yarn.sh | ||
|
||
if [[ $1 == "-bash" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. This is a script that gets called from hive-bootstrap.sh
so it'll just continue on afterward
# start hive metastore server | ||
$HIVE_HOME/bin/hive --service metastore & | ||
|
||
sleep 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. I'm not very familiar enough w/ Hive.
/bin/bash | ||
fi | ||
|
||
if [[ $1 == "-d" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
[realms] | ||
LOCAL = { | ||
kdc = kdc.marathon.mesos:2500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this is just for testing. Not sure we need to make this configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @samvantran.
This is definitely something that we can iterate on.
Some final thoughts:
- There is a bit of a disconnect with how we handle Kerberos configuration for other services. I know that we're treating this as a one-off, but getting a bit more uniformity could be useful. Definitely out of scope for this Pr though.
- We're depending on @susanxhuynh's private docker repository here. This means that we'll have one more docker image to migrate to the
mesosphere
repo. We should probably consider doing it now. For what it's worth -- we can use https://jenkins.mesosphere.com/service/jenkins/view/Infinity/job/infinity-tools/job/release-tools/job/build-docker-image/ to build arbitrary docker images.
|
||
[realms] | ||
LOCAL = { | ||
kdc = kdc.marathon.mesos:2500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that it is different to how we handle KDC in all our other applications. If we need to move forward with this though, I'm not going to block the PR, but we should consider creating a follow-up ticket to unify this.
@@ -0,0 +1,39 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other tests we generate these kinds of application definitions on the fly in python -- templating where applicable. Not a blocker, but we could create a follow-up ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created https://jira.mesosphere.com/browse/DCOS-42219 to address this and other non-blocking-but-should-still-do comments
<!-- NameNode security config --> | ||
<property> | ||
<name>dfs.namenode.keytab.file</name> | ||
<value>/usr/local/hadoop/etc/hadoop/hdfs.keytab</value> <!-- path to the HDFS keytab --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also quite different to how we currently deploy kerberized HDFS. I know there isn't necessary too much overlap, but it would be good to not have to context switch too much when debugging issues with hive / hdfs.
<configuration> | ||
<!-- Authentication --> | ||
<property> | ||
<name>hive.server2.authentication</name> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is server2
a predefined property of some kind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, its the improved version of hiveserver: https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaned up the postgres conf file and added back an envvar I mistakenly deleted. I created ticket https://jira.mesosphere.com/browse/DCOS-42219 to address followups
Also I tried the jenkins job you mentioned but was unsuccessful in publishing a docker image
@@ -0,0 +1,630 @@ | |||
# ----------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleaned up in fed868e
<configuration> | ||
<!-- Authentication --> | ||
<property> | ||
<name>hive.server2.authentication</name> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, its the improved version of hiveserver: https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2
@@ -0,0 +1,39 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created https://jira.mesosphere.com/browse/DCOS-42219 to address this and other non-blocking-but-should-still-do comments
|
||
[realms] | ||
LOCAL = { | ||
kdc = kdc.marathon.mesos:2500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
included in DCOS-42219
This Hive Docker image is intended for use in Spark integration tests.
It's based on https://github.com/tilakpatidar/cdh5_hive_postgres (the one Evan found) with the following changes:
kerberos/
directory)hive_pg/scripts/bootstrap.sh
for KerberosTesting
docker run -it susanxhuynh/cdh5-hive:latest /etc/hive-bootstrap.sh -bash