Skip to content
This repository has been archived by the owner on Sep 10, 2022. It is now read-only.

Build spark/hadoop from source and pip install the local package #7

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

LaVLaS
Copy link
Contributor

@LaVLaS LaVLaS commented Sep 3, 2020

Update the python-3.6 Dockerfile to build a custom spark & hadoop binary from source and pip install the python wheel so that it works with the default PYTHONPATH

Closes #6

@LaVLaS LaVLaS force-pushed the fix/thoth-station_spark_build branch from e4087ea to e704e4e Compare September 4, 2020 02:56
@LaVLaS LaVLaS force-pushed the fix/thoth-station_spark_build branch from e704e4e to 21f9196 Compare September 19, 2020 18:23
@LaVLaS LaVLaS changed the title WIP:Build spark/hadoop from source and pip install the local package Build spark/hadoop from source and pip install the local package Sep 21, 2020
ARG SPARK_SOURCE_REPO=https://github.com/apache/spark.git
ARG SPARK_SOURCE_REPO_BRANCH=v${SPARK_VERSION}
ARG SPARK_SOURCE_REPO_TARGET_DIR=spark
ARG SPARK_BUILD_ARGS="-Phive -Phive-thriftserver -Pkubernetes -Dhadoop.version=${HADOOP_VERSION}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think -P hive-thriftserver is necessary? Will this image replace the images in Data Catalog?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spark build args are from the examples on Apache Spark docs. If there is a reason to remove hive-thriftserver that won't break the ODH data engineering workflow then I have no objections from removing it as a default arg. Otherwise, you can remove it at build time by overriding SPARK_BUILD_ARGS

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update custom spark build to support s2i-thoth-ubi8-py36
2 participants