Skip to content
This repository has been archived by the owner on Dec 4, 2024. It is now read-only.

[DCOS-40151] Added Sentry to Kerberized Hive image #397

Open
wants to merge 11 commits into
base: sh-dcos-39050
Choose a base branch
from
7 changes: 5 additions & 2 deletions tools/hive/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
# Cloudera Hadoop and Hive Docker Image with Kerberos
# Cloudera Hadoop and Hive Docker Image with Kerberos, Sentry


This is a Hadoop Docker image running CDH5 versions of Hadoop and Hive, all in one container. There is a separate Kerberos image in which Hadoop and Hive use Kerberos for authentication. Adapted from https://github.com/tilakpatidar/cdh5_hive_postgres and based on Ubuntu (trusty).
This is a Hadoop Docker image running CDH5 versions of Hadoop and Hive, all in one container.
There is a separate Kerberos image in which Hadoop and Hive use Kerberos for authentication,
and Sentry for authorization.
Adapted from https://github.com/tilakpatidar/cdh5_hive_postgres and based on Ubuntu (trusty).

Postgres is also installed so that Hive can use it for its Metastore backend and run in remote mode.

Expand Down
22 changes: 19 additions & 3 deletions tools/hive/hadoop-hive/scripts/hive-bootstrap.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,16 @@ printenv | cat >> /root/.bashrc
# hadoop bootstrap
/etc/hadoop-bootstrap.sh

# init and start sentry
SENTRY_CONF_TEMPLATE=$SENTRY_HOME/conf/sentry-site.xml.template
SENTRY_CONF_FILE=$SENTRY_HOME/conf/sentry-site.xml
if [ -f "$SENTRY_CONF_TEMPLATE" ]; then
sed s/{{HOSTNAME}}/$HOSTNAME/ $SENTRY_HOME/conf/sentry-site.xml.template > $SENTRY_HOME/conf/sentry-site.xml
sed s/{{HOSTNAME}}/$HOSTNAME/ $HIVE_CONF/sentry-site.xml.template > $HIVE_CONF/sentry-site.xml
$SENTRY_HOME/bin/sentry --command schema-tool --conffile $SENTRY_CONF_FILE --dbType derby --initSchema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be in the if block?

$SENTRY_HOME/bin/sentry --command service --conffile $SENTRY_CONF_FILE &
fi

# restart postgresql
/etc/init.d/postgresql restart

Expand All @@ -19,16 +29,22 @@ do
echo "waiting for hdfs to be ready"; sleep 10;
done

# create hive user
useradd hive

# create hdfs directories
$HADOOP_PREFIX/bin/hdfs dfs -mkdir -p /user/root
hdfs dfs -mkdir -p /user/root
hdfs dfs -chown -R hdfs:supergroup /user

$HADOOP_PREFIX/bin/hdfs dfs -mkdir -p /apps/hive/warehouse
hdfs dfs -mkdir -p /apps/hive/warehouse
hdfs dfs -chown -R hive:supergroup /apps/hive
hdfs dfs -chmod 777 /apps/hive/warehouse

hdfs dfs -mkdir -p /tmp/hive
hdfs dfs -chmod 777 /tmp/hive

# altering the hive-site configuration
sed s/{{HOSTNAME}}/$HOSTNAME/ /usr/local/hive/conf/hive-site.xml.template > /usr/local/hive/conf/hive-site.xml
sed s/{{HOSTNAME}}/$HOSTNAME/ $HIVE_CONF/hive-site.xml.template > $HIVE_CONF/hive-site.xml
sed s/{{HOSTNAME}}/$HOSTNAME/ /opt/files/hive-site.xml.template > /opt/files/hive-site.xml

# start hive metastore server
Expand Down
17 changes: 17 additions & 0 deletions tools/hive/kerberos/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
FROM cdh5-hive

ENV SENTRY_VERSION 1.5.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question. Do we ever want to use the non kerberized Hive? If not, we could just drop this all into a single Dockerfile.

ENV SENTRY_HOME /usr/local/sentry

# download sentry
RUN curl -L http://archive.cloudera.com/cdh${CDH_VERSION}/cdh/${CDH_VERSION}/sentry-${SENTRY_VERSION}-cdh${CDH_EXACT_VERSION}.tar.gz \
| tar -xzC /usr/local && \
cd /usr/local && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be ${SENTRY_HOME}?

ln -s apache-sentry-${SENTRY_VERSION}-cdh${CDH_EXACT_VERSION}-bin/ sentry

# copy kerberized hadoop config files
ADD templates/core-site.xml.template $HADOOP_PREFIX/etc/hadoop/core-site.xml.template
ADD templates/hdfs-site.xml.template $HADOOP_PREFIX/etc/hadoop/hdfs-site.xml.template
Expand All @@ -9,6 +18,14 @@ ADD templates/yarn-site.xml.template $HADOOP_PREFIX/etc/hadoop/yarn-site.xml.tem
ADD templates/hive-site.xml.template /opt/files/
ADD templates/hive-site.xml.template $HIVE_CONF/hive-site.xml.template

# sentry config files
ADD templates/sentry-site.xml.hive-client.template /usr/local/hive/conf/sentry-site.xml.template
ADD templates/sentry-site.xml.server.template /usr/local/sentry/conf/sentry-site.xml.template

# hive / sentry test script
ADD scripts/grant-hive-privileges.sh /etc/grant-hive-privileges.sh
RUN chmod 700 /etc/grant-hive-privileges.sh

# krb5.conf
ADD conf/krb5.conf /etc/

Expand Down
2 changes: 1 addition & 1 deletion tools/hive/kerberos/marathon/hdfs-hive-kerberos.json
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
[
"hostname",
"IS",
"10.0.0.114"
"1.2.3.4"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used at all?

]
]
}
23 changes: 23 additions & 0 deletions tools/hive/kerberos/scripts/grant-hive-privileges.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/bin/bash
set -x

export HADOOP_HOME=/usr/local/hadoop

# Create a user "alice" since Sentry authorization relies on the Linux user and group information
/usr/sbin/useradd alice

# Grant permissions to user “alice”
echo "Grant permissions to user alice ..."
kdestroy
kinit -k -t /usr/local/hadoop/etc/hadoop/hdfs.keytab hive/${HOSTNAME}@LOCAL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have to be the hive user? could one use the hdfs user here too?

cat <<EOF >grant_alice.sql
CREATE ROLE test_role;
GRANT ROLE test_role to GROUP alice;
GRANT ROLE test_role to GROUP root;
GRANT ALL on DATABASE default to ROLE test_role WITH GRANT OPTION;
EOF
/usr/local/hive/bin/beeline -u "jdbc:hive2://localhost:10000/default;principal=hive/${HOSTNAME}@LOCAL" -f grant_alice.sql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this port (1000) configurable elsewhere?


# Log back in as hdfs
kdestroy
kinit -k -t /usr/local/hadoop/etc/hadoop/hdfs.keytab hdfs@LOCAL
33 changes: 30 additions & 3 deletions tools/hive/kerberos/templates/hive-site-kerberos.xml.template
Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,38 @@

<property>
<name>hive.users.in.admin.role</name>
<value>hdfs,hive</value>
<value>hive</value>
</property>

<!-- Hiveserver2, Sentry -->
<property>
<name>hive.security.authorization.manager</name>
<value>org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider</value>
<name>hive.server2.enable.doAs</name>
<value>false</value>
</property>

<property>
<name>hive.security.authorization.task.factory</name>
<value>org.apache.sentry.binding.hive.SentryHiveAuthorizationTaskFactoryImpl</value>
</property>

<property>
<name>hive.server2.session.hook</name>
<value>org.apache.sentry.binding.hive.HiveAuthzBindingSessionHook</value>
</property>

<!-- Metastore, Sentry -->
<property>
<name>hive.metastore.filter.hook</name>
<value>org.apache.sentry.binding.metastore.SentryMetaStoreFilterHook</value>
</property>

<property>
<name>hive.metastore.pre.event.listeners</name>
<value>org.apache.sentry.binding.metastore.MetastoreAuthzBinding</value>
</property>

<property>
<name>hive.metastore.event.listeners</name>
<value>org.apache.sentry.binding.metastore.SentryMetastorePostEventListener</value>
</property>
</configuration>
32 changes: 32 additions & 0 deletions tools/hive/kerberos/templates/sentry-site.xml.hive-client.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
<configuration>
<property>
<name>sentry.hive.provider</name>
<value>org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider</value>
</property>
<property>
<name>sentry.hive.server</name>
<value>server1</value>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this name specified?

</property>
<property>
<name>sentry.service.client.server.rpc-port</name>
<value>8038</value>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this port mapped to the Marathon JSON requirements somewhere too?

</property>
<property>
<name>sentry.service.client.server.rpc-address</name>
<value>localhost</value>
</property>

<!-- Properties required for setting the DB provider -->
<property>
<name>sentry.hive.provider.backend</name>
<value>org.apache.sentry.provider.db.SimpleDBProviderBackend</value>
</property>
<property>
<name>sentry.service.server.principal</name>
<value>sentry/{{HOSTNAME}}@LOCAL</value>
</property>
<property>
<name>sentry.metastore.service.users</name>
<value>hive</value>
</property>
</configuration>
34 changes: 34 additions & 0 deletions tools/hive/kerberos/templates/sentry-site.xml.server.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
<configuration>
<property>
<name>sentry.hive.server</name>
<value>server1</value>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is server1 specific to something at all?

</property>
<property>
<name>sentry.store.jdbc.url</name>
<value>jdbc:derby:;databaseName=metastore_db;create=true</value>
</property>
<property>
<name>sentry.service.server.principal</name>
<value>sentry/{{HOSTNAME}}@LOCAL</value>
</property>
<property>
<name>sentry.service.server.keytab</name>
<value>/usr/local/hadoop/etc/hadoop/hdfs.keytab</value>
</property>
<property>
<name>sentry.service.admin.group</name>
<value>hive</value>
</property>
<property>
<name>sentry.service.allow.connect</name>
<value>hive</value>
</property>
<property>
<name>sentry.store.jdbc.user</name>
<value>sentry</value>
</property>
<property>
<name>sentry.store.jdbc.password</name>
<value>test</value>
</property>
</configuration>