Skip to content

Commit

Permalink
Merge pull request #495 from mjohns-databricks/gdal-jammy-3
Browse files Browse the repository at this point in the history
Gdal jammy 3
  • Loading branch information
Milos Colic authored Jan 9, 2024
2 parents 76bc69f + b579239 commit 5c245f9
Show file tree
Hide file tree
Showing 105 changed files with 752 additions and 212 deletions.
3 changes: 2 additions & 1 deletion .github/actions/python_build/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,10 @@ runs:
- name: Install python dependencies
shell: bash
run: |
# - install pip libs
# note: gdal requires the extra args
cd python
pip install build wheel pyspark==${{ matrix.spark }} numpy==${{ matrix.numpy }}
pip install numpy==${{ matrix.numpy }}
pip install --no-build-isolation --no-cache-dir --force-reinstall gdal==${{ matrix.gdal }}
pip install .
- name: Test and build python package
Expand Down
17 changes: 8 additions & 9 deletions .github/actions/scala_build/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,17 +25,16 @@ runs:
sudo apt-add-repository "deb http://archive.ubuntu.com/ubuntu $(lsb_release -sc)-security main multiverse restricted universe"
sudo apt-add-repository "deb http://archive.ubuntu.com/ubuntu $(lsb_release -sc) main multiverse restricted universe"
sudo apt-get update -y
# - install numpy first
pip install --upgrade pip
pip install 'numpy>=${{ matrix.numpy }}'
# - install natives
sudo apt-get install -y unixodbc libcurl3-gnutls libsnappy-dev libopenjp2-7
sudo apt-get install -y gdal-bin libgdal-dev python3-gdal
# - install gdal with numpy
pip install --no-cache-dir --force-reinstall 'GDAL[numpy]==${{ matrix.gdal }}'
sudo wget -P /usr/lib -nc https://github.com/databrickslabs/mosaic/raw/main/resources/gdal/jammy/libgdalalljni.so
sudo wget -P /usr/lib -nc https://github.com/databrickslabs/mosaic/raw/main/resources/gdal/jammy/libgdalalljni.so.30
#sudo wget -P /usr/lib -nc https://github.com/databrickslabs/mosaic/raw/main/resources/gdal/jammy/libgdalalljni.so.30.0.3
sudo apt-get install -y gdal-bin libgdal-dev python3-numpy python3-gdal
# - install pip libs
pip install --upgrade pip
pip install gdal==${{ matrix.gdal }}
# - add the so files
sudo wget -nv -P /usr/lib -nc https://raw.githubusercontent.com/databrickslabs/mosaic/main/resources/gdal/jammy/libgdalalljni.so
sudo wget -nv -P /usr/lib -nc https://raw.githubusercontent.com/databrickslabs/mosaic/main/resources/gdal/jammy/libgdalalljni.so.30
sudo wget -nv -P /usr/lib -nc https://raw.githubusercontent.com/databrickslabs/mosaic/main/resources/gdal/jammy/libgdalalljni.so.30.0.3
- name: Test and build the scala JAR - skip tests is false
if: inputs.skip_tests == 'false'
shell: bash
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/build_main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
strategy:
matrix:
python: [ 3.10.12 ]
numpy: [ 1.21.5 ]
numpy: [ 1.22.4 ]
gdal: [ 3.4.1 ]
spark: [ 3.4.0 ]
R: [ 4.2.2 ]
Expand All @@ -28,7 +28,7 @@ jobs:
uses: ./.github/actions/scala_build
- name: build python
uses: ./.github/actions/python_build
- name: build R
uses: ./.github/actions/r_build
# - name: build R
# uses: ./.github/actions/r_build
- name: upload artefacts
uses: ./.github/actions/upload_artefacts
2 changes: 1 addition & 1 deletion .github/workflows/build_python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
strategy:
matrix:
python: [ 3.10.12 ]
numpy: [ 1.21.5 ]
numpy: [ 1.22.4 ]
gdal: [ 3.4.1 ]
spark: [ 3.4.0 ]
R: [ 4.2.2 ]
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build_r.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
strategy:
matrix:
python: [ 3.10.12 ]
numpy: [ 1.21.5 ]
numpy: [ 1.22.4 ]
gdal: [ 3.4.1 ]
spark: [ 3.4.0 ]
R: [ 4.2.2 ]
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build_scala.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
strategy:
matrix:
python: [ 3.10.12 ]
numpy: [ 1.21.5 ]
numpy: [ 1.22.4 ]
gdal: [ 3.4.1 ]
spark: [ 3.4.0 ]
R: [ 4.2.2 ]
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pypi-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ jobs:
strategy:
matrix:
python: [ 3.10.12 ]
numpy: [ 1.21.5 ]
numpy: [ 1.22.4 ]
gdal: [ 3.4.1 ]
spark: [ 3.4.0 ]
R: [ 4.2.2 ]
Expand Down
9 changes: 8 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,11 @@
## v0.3.14
## v0.4.0 [DBR 13.3 LTS]
- First release for DBR 13.3 LTS which is Ubuntu Jammy and Spark 3.4.1. Not backwards compatible, meaning it will not run on prior DBRs; requires either a Photon DBR or a ML Runtime (__Standard, non-Photon DBR no longer allowed__).
- New `setup_fuse_install` function to meet various requirements arising with Unity Catalog + Shared Access clusters; removed the scala equivalent function, making artifact setup and install python-first for scala and Spark SQL.
- Removed OSS ESRI Geometry API for 0.4 series, JTS now the only vector provider.
- MosaicAnalyzer functions now accept Spark DataFrames instead of MosaicFrame, which has been removed.
- Docs for 0.3.x have been archived and linked from current docs; notebooks for 0.3.x have been separated from current notebooks.

## v0.3.14 [DBR < 13]
- Fixes for Warning and Error messages on mosaic_enable call.
- Performance improvements for raster functions.
- Fix support for GDAL configuration via spark config (use 'spark.databricks.labs.mosaic.gdal.' prefix).
Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,7 @@
<scala.version>2.12.10</scala.version>
<scala.compat.version>2.12</scala.compat.version>
<spark.version>3.4.0</spark.version>
<mosaic.version>0.3.14</mosaic.version>
<mosaic.version>0.4.0</mosaic.version>
</properties>
</profile>
</profiles>
Expand Down
2 changes: 1 addition & 1 deletion python/mosaic/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@
from .models import SpatialKNN
from .readers import read

__version__ = "0.3.14"
__version__ = "0.4.0"
51 changes: 38 additions & 13 deletions python/mosaic/api/enable.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,10 @@
from mosaic.utils.notebook_utils import NotebookUtils


def enable_mosaic(spark: SparkSession, dbutils=None) -> None:
def enable_mosaic(
spark: SparkSession, dbutils = None, log_info: bool = False,
jar_path: str = None, jar_autoattach: bool = True
) -> None:
"""
Enable Mosaic functions.
Expand All @@ -22,9 +25,25 @@ def enable_mosaic(spark: SparkSession, dbutils=None) -> None:
spark : pyspark.sql.SparkSession
The active SparkSession.
dbutils : dbruntime.dbutils.DBUtils
The dbutils object used for `display` and `displayHTML` functions.
Optional, only applicable to Databricks users.
Optional, specify dbutils object used for `display` and `displayHTML` functions.
log_info : bool
Logging cannot be adjusted with Unity Catalog Shared Access clusters;
if you try to do so, will throw a Py4JSecurityException.
- True will try to setLogLevel to 'info'
- False will not; Default is False
jar_path : str
Convenience when you need to change the JAR path for Unity Catalog
Volumes with Shared Access clusters
- Default is None; if provided, sets
"spark.databricks.labs.mosaic.jar.path"
jar_autoattach : bool
Convenience when you need to turn off JAR auto-attach for Unity
Catalog Volumes with Shared Access clusters.
- False will not registers the JAR; sets
"spark.databricks.labs.mosaic.jar.autoattach" to "false"
- True will register the JAR; Default is True
Returns
-------
Expand All @@ -34,7 +53,7 @@ def enable_mosaic(spark: SparkSession, dbutils=None) -> None:
- `spark.databricks.labs.mosaic.jar.autoattach`: 'true' (default) or 'false'
Automatically attach the Mosaic JAR to the Databricks cluster? (Optional)
- `spark.databricks.labs.mosaic.jar.location`
- `spark.databricks.labs.mosaic.jar.path`
Explicitly specify the path to the Mosaic JAR.
(Optional and not required at all in a standard Databricks environment).
- `spark.databricks.labs.mosaic.geometry.api`: 'JTS'
Expand All @@ -43,8 +62,20 @@ def enable_mosaic(spark: SparkSession, dbutils=None) -> None:
Explicitly specify the index system to use for optimized spatial joins. (Optional)
"""
# Set spark session, conditionally:
# - set conf for jar autoattach
# - set conf for jar path
# - set log level to 'info'
if not jar_autoattach:
spark.conf.set("spark.databricks.labs.mosaic.jar.autoattach", "false")
print("...set 'spark.databricks.labs.mosaic.jar.autoattach' to false")
if jar_path is not None:
spark.conf.set("spark.databricks.labs.mosaic.jar.path", jar_path)
print(f"...set 'spark.databricks.labs.mosaic.jar.path' to '{jar_path}'")
if log_info:
spark.sparkContext.setLogLevel('info')
config.mosaic_spark = spark
_ = MosaicLibraryHandler(config.mosaic_spark)
_ = MosaicLibraryHandler(config.mosaic_spark, log_info = log_info)
config.mosaic_context = MosaicContext(config.mosaic_spark)

# Register SQL functions
Expand All @@ -56,14 +87,8 @@ def enable_mosaic(spark: SparkSession, dbutils=None) -> None:

isSupported = config.mosaic_context._context.checkDBR(spark._jsparkSession)
if not isSupported:
print(
"""
DEPRECATION WARNING:
Please use a Databricks:
- Photon-enabled Runtime for performance benefits
- Runtime ML for spatial AI benefits
Mosaic will stop working on this cluster after v0.3.x."""
)
# unexpected - checkDBR returns true or throws exception
print("""WARNING: checkDBR returned False.""")

# Not yet added to the pyspark API
with warnings.catch_warnings():
Expand Down
Loading

0 comments on commit 5c245f9

Please sign in to comment.