From c3cd1455530227b8710150a98c91ab82fae9bb0a Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 5 Jun 2023 23:58:00 +0000 Subject: [PATCH 01/28] Bump proj4j from 1.2.2 to 1.3.0 Bumps [proj4j](https://github.com/locationtech/proj4j) from 1.2.2 to 1.3.0. - [Release notes](https://github.com/locationtech/proj4j/releases) - [Changelog](https://github.com/locationtech/proj4j/blob/master/CHANGELOG.md) - [Commits](https://github.com/locationtech/proj4j/compare/v1.2.2...v1.3.0) --- updated-dependencies: - dependency-name: org.locationtech.proj4j:proj4j dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] --- pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pom.xml b/pom.xml index 1d5344c09..a2211cd97 100644 --- a/pom.xml +++ b/pom.xml @@ -124,7 +124,7 @@ org.locationtech.proj4j proj4j - 1.2.2 + 1.3.0 org.gdal From 645b26dc272112039186d497b78e8e19db172a01 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 5 Jun 2023 23:58:07 +0000 Subject: [PATCH 02/28] Bump proj4j-epsg from 1.2.2 to 1.3.0 Bumps [proj4j-epsg](https://github.com/locationtech/proj4j) from 1.2.2 to 1.3.0. - [Release notes](https://github.com/locationtech/proj4j/releases) - [Changelog](https://github.com/locationtech/proj4j/blob/master/CHANGELOG.md) - [Commits](https://github.com/locationtech/proj4j/compare/v1.2.2...v1.3.0) --- updated-dependencies: - dependency-name: org.locationtech.proj4j:proj4j-epsg dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] --- pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pom.xml b/pom.xml index 1d5344c09..1c0969b9f 100644 --- a/pom.xml +++ b/pom.xml @@ -119,7 +119,7 @@ org.locationtech.proj4j proj4j-epsg - 1.2.2 + 1.3.0 org.locationtech.proj4j From d1cc3f9d99b59d547edda7be2482ec9fe9079139 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 12 Jun 2023 23:58:09 +0000 Subject: [PATCH 03/28] Bump maven-surefire-plugin from 3.1.0 to 3.1.2 Bumps [maven-surefire-plugin](https://github.com/apache/maven-surefire) from 3.1.0 to 3.1.2. - [Release notes](https://github.com/apache/maven-surefire/releases) - [Commits](https://github.com/apache/maven-surefire/compare/surefire-3.1.0...surefire-3.1.2) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-surefire-plugin dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] --- pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pom.xml b/pom.xml index 1d5344c09..3460fe057 100644 --- a/pom.xml +++ b/pom.xml @@ -181,7 +181,7 @@ org.apache.maven.plugins maven-surefire-plugin - 3.1.0 + 3.1.2 true From b86194461bd2bc5030900f2cc043f911d614c41a Mon Sep 17 00:00:00 2001 From: Michael Johns Date: Tue, 13 Jun 2023 13:58:30 -0400 Subject: [PATCH 04/28] Update README.md Updated DBR support in README --- README.md | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index ad48bb37e..29383bd78 100644 --- a/README.md +++ b/README.md @@ -11,6 +11,7 @@ An extension to the [Apache Spark](https://spark.apache.org/) framework that all [![Language grade: Python](https://img.shields.io/lgtm/grade/python/g/databrickslabs/mosaic.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/databrickslabs/mosaic/context:python) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) + ## Why Mosaic? Mosaic was created to simplify the implementation of scalable geospatial data pipelines by bounding together common Open Source geospatial libraries via Apache Spark, with a set of [examples and best practices](#examples) for common geospatial use cases. @@ -41,11 +42,16 @@ Image1: Mosaic logical design. ## Getting started -Create a Databricks cluster running __Databricks Runtime 10.0__ (or later). +We recommend using Databricks Runtime versions 11.3 LTS or 12.2 LTS with Photon enabled; this will leverage the +Databricks h3 expressions when using H3 grid system. + +:warning: **Mosaic 0.3.x series does not support DBR 13.x** (coming soon with Mosaic 0.4.x series); also, DBR 10 is no longer supported in Mosaic. + +As of the 0.3.11 release, Mosaic issues the following warning when initialized on a cluster that is neither Photon Runtime nor Databricks Runtime ML [[ADB](https://learn.microsoft.com/en-us/azure/databricks/runtime/) | [AWS](https://docs.databricks.com/runtime/index.html) | [GCP](https://docs.gcp.databricks.com/runtime/index.html)]: -We recommend using Databricks Runtime versions 11.2 or higher with Photon enabled, this will leverage the -Databricks h3 expressions when using H3 grid system. +> DEPRECATION WARNING: Mosaic is not supported on the selected Databricks Runtime. Mosaic will stop working on this cluster from version v0.4.0+. Please use a Databricks Photon-enabled Runtime (for performance benefits) or Runtime ML (for spatial AI benefits). +If you are receiving this warning in v0.3.11, you will want to change to a supported runtime prior to updating Mosaic to run 0.4.0. The reason we are making this change is that we are streamlining Mosaic internals to be more aligned with future product APIs which are powered by Photon. Along this direction of change, Mosaic will be standardizing to JTS as its default and supported Vector Geometry Provider. ### Documentation From eb408423c3a66783f39baef9a508c200bfda53f9 Mon Sep 17 00:00:00 2001 From: Michael Johns Date: Fri, 16 Jun 2023 09:09:55 -0400 Subject: [PATCH 05/28] Update README.md - JTS Elevating JTS over OSS Esri as default Geometry Provider --- README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 29383bd78..307a2fae6 100644 --- a/README.md +++ b/README.md @@ -21,8 +21,8 @@ Mosaic was created to simplify the implementation of scalable geospatial data pi Mosaic provides geospatial tools for * Data ingestion (WKT, WKB, GeoJSON) * Data processing - * Geometry and geography `ST_` operations (with [ESRI](https://github.com/Esri/geometry-api-java) or [JTS](https://github.com/locationtech/jts)) - * Indexing (with [H3](https://github.com/uber/h3) or BNG) + * Geometry and geography `ST_` operations (with default [JTS](https://github.com/locationtech/jts) or [ESRI](https://github.com/Esri/geometry-api-java)) + * Indexing (with default [H3](https://github.com/uber/h3) or BNG) * Chipping of polygons and lines over an indexing grid [co-developed with Ordnance Survey and Microsoft](https://databricks.com/blog/2021/10/11/efficient-point-in-polygon-joins-via-pyspark-and-bng-geospatial-indexing.html) * Data visualization ([Kepler](https://github.com/keplergl/kepler.gl)) @@ -81,9 +81,9 @@ Then enable it with ```scala import com.databricks.labs.mosaic.functions.MosaicContext import com.databricks.labs.mosaic.H3 -import com.databricks.labs.mosaic.ESRI +import com.databricks.labs.mosaic.JTS -val mosaicContext = MosaicContext.build(H3, ESRI) +val mosaicContext = MosaicContext.build(H3, JTS) import mosaicContext.functions._ ``` @@ -109,9 +109,9 @@ Configure the [Automatic SQL Registration](https://databrickslabs.github.io/mosa %scala import com.databricks.labs.mosaic.functions.MosaicContext import com.databricks.labs.mosaic.H3 -import com.databricks.labs.mosaic.ESRI +import com.databricks.labs.mosaic.JTS -val mosaicContext = MosaicContext.build(H3, ESRI) +val mosaicContext = MosaicContext.build(H3, JTS) mosaicContext.register(spark) ``` From 32f307820f3c3942fcbf13461dfdc932afbe1480 Mon Sep 17 00:00:00 2001 From: Michael Johns Date: Fri, 16 Jun 2023 09:22:47 -0400 Subject: [PATCH 06/28] function note JTS as default geometry --- python/mosaic/api/enable.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/python/mosaic/api/enable.py b/python/mosaic/api/enable.py index 5d8813cbb..29d5fdf24 100644 --- a/python/mosaic/api/enable.py +++ b/python/mosaic/api/enable.py @@ -37,7 +37,7 @@ def enable_mosaic(spark: SparkSession, dbutils=None) -> None: - `spark.databricks.labs.mosaic.jar.location` Explicitly specify the path to the Mosaic JAR. (Optional and not required at all in a standard Databricks environment). - - `spark.databricks.labs.mosaic.geometry.api`: 'ESRI' (default) or 'JTS' + - `spark.databricks.labs.mosaic.geometry.api`: 'JTS' (default) or 'ESRI' Explicitly specify the underlying geometry library to use for spatial operations. (Optional) - `spark.databricks.labs.mosaic.index.system`: 'H3' (default) Explicitly specify the index system to use for optimized spatial joins. (Optional) From c42f26349fac8de51aeb80313e6519c323837d7d Mon Sep 17 00:00:00 2001 From: Michael Johns Date: Fri, 16 Jun 2023 14:03:51 +0000 Subject: [PATCH 07/28] More adjustments to JTS default --- R/sparkR-mosaic/enableMosaic.R | 8 ++++---- R/sparklyr-mosaic/enableMosaic.R | 8 ++++---- docs/code-example-notebooks/setup/setup-scala.scala | 4 ++-- docs/source/api/spatial-functions.rst | 2 +- docs/source/models/spatial-knn.rst | 8 ++++---- docs/source/usage/automatic-sql-registration.rst | 2 +- docs/source/usage/grid-indexes-bng.rst | 6 +++--- docs/source/usage/installation.rst | 8 ++++---- .../examples/python/Ship2ShipTransfers/01. Data Prep.py | 2 +- .../python/Ship2ShipTransfers/02. Data Ingestion.py | 2 +- .../python/Ship2ShipTransfers/03.a Overlap Detection.py | 2 +- .../Ship2ShipTransfers/03.b Advanced Overlap Detection.py | 2 +- notebooks/examples/scala/MosaicAndSedona.scala | 4 ++-- notebooks/examples/scala/QuickstartNotebook.scala | 4 ++-- notebooks/examples/sql/MosaicAndSedona.sql | 4 ++-- python/mosaic/api/functions.py | 2 +- python/mosaic/core/mosaic_context.py | 2 +- .../labs/mosaic/sql/extensions/MosaicSQLDefault.scala | 6 +++--- .../labs/mosaic/codegen/ConvertToCodegenMockTest.scala | 2 +- 19 files changed, 39 insertions(+), 39 deletions(-) diff --git a/R/sparkR-mosaic/enableMosaic.R b/R/sparkR-mosaic/enableMosaic.R index 0239de4b4..40a5c7b32 100644 --- a/R/sparkR-mosaic/enableMosaic.R +++ b/R/sparkR-mosaic/enableMosaic.R @@ -2,7 +2,7 @@ #' #' @description enableMosaic activates the context dependent Databricks Mosaic functions, giving control over the geometry API and index system used. #' See \url{https://databrickslabs.github.io/mosaic/} for full documentation -#' @param geometryAPI character, default="ESRI" +#' @param geometryAPI character, default="JTS" #' @param indexSystem character, default="H3" #' @param indexSystem boolean, default=F #' @name enableMosaic @@ -12,10 +12,10 @@ #' @examples #' \dontrun{ #' enableMosaic() -#' enableMosaic("ESRI", "H3") -#' enableMosaic("ESRI", "BNG") } +#' enableMosaic("JTS", "H3") +#' enableMosaic("JTS", "BNG") } enableMosaic <- function( - geometryAPI="ESRI" + geometryAPI="JTS" ,indexSystem="H3" ,rasterAPI="GDAL" ){ diff --git a/R/sparklyr-mosaic/enableMosaic.R b/R/sparklyr-mosaic/enableMosaic.R index 788a3a2e4..c0baedecd 100644 --- a/R/sparklyr-mosaic/enableMosaic.R +++ b/R/sparklyr-mosaic/enableMosaic.R @@ -3,7 +3,7 @@ #' @description enableMosaic activates the context dependent Databricks Mosaic functions, giving control over the geometry API and index system used. #' See \url{https://databrickslabs.github.io/mosaic/} for full documentation #' @param sc sparkContext -#' @param geometryAPI character, default="ESRI" +#' @param geometryAPI character, default="JTS" #' @param indexSystem character, default="H3" #' @name enableMosaic #' @rdname enableMosaic @@ -12,12 +12,12 @@ #' @examples #' \dontrun{ #' enableMosaic() -#' enableMosaic("ESRI", "H3") -#' enableMosaic("ESRI", "BNG")} +#' enableMosaic("JTS", "H3") +#' enableMosaic("JTS", "BNG")} enableMosaic <- function( sc - ,geometryAPI="ESRI" + ,geometryAPI="JTS" ,indexSystem="H3" ,rasterAPI="GDAL" ){ diff --git a/docs/code-example-notebooks/setup/setup-scala.scala b/docs/code-example-notebooks/setup/setup-scala.scala index 749873cb3..626815222 100644 --- a/docs/code-example-notebooks/setup/setup-scala.scala +++ b/docs/code-example-notebooks/setup/setup-scala.scala @@ -1,10 +1,10 @@ // Databricks notebook source import org.apache.spark.sql.functions._ import com.databricks.labs.mosaic.functions.MosaicContext -import com.databricks.labs.mosaic.ESRI +import com.databricks.labs.mosaic.JTS import com.databricks.labs.mosaic.H3 -val mosaicContext: MosaicContext = MosaicContext.build(H3, ESRI) +val mosaicContext: MosaicContext = MosaicContext.build(H3, JTS) // COMMAND ---------- diff --git a/docs/source/api/spatial-functions.rst b/docs/source/api/spatial-functions.rst index 0aaad1407..c4ace6c19 100644 --- a/docs/source/api/spatial-functions.rst +++ b/docs/source/api/spatial-functions.rst @@ -949,7 +949,7 @@ st_isvalid +---------------+ .. note:: Validity assertions will be dependent on the chosen geometry API. - The assertions used in the ESRI geometry API (the default) follow the definitions in the + The assertions used in the ESRI geometry API (JTS is the default) follow the definitions in the "Simple feature access - Part 1" document (OGC 06-103r4) for each geometry type. diff --git a/docs/source/models/spatial-knn.rst b/docs/source/models/spatial-knn.rst index e108c3b0b..0423d9022 100644 --- a/docs/source/models/spatial-knn.rst +++ b/docs/source/models/spatial-knn.rst @@ -157,9 +157,9 @@ The transformer is called SpatialKNN and it is used as follows: import com.databricks.labs.mosaic.models.knn.SpatialKNN import com.databricks.labs.mosaic.functions.MosaicContext import com.databricks.labs.mosaic.H3 - import com.databricks.labs.mosaic.ESRI + import com.databricks.labs.mosaic.JTS >>> - val mosaicContext = MosaicContext.build(H3, ESRI) + val mosaicContext = MosaicContext.build(H3, JTS) import mosaicContext.functions._ mosaicContext.register(spark) >>> @@ -328,9 +328,9 @@ These datasets are not serialised with the model, and neither are the model outp import com.databricks.labs.mosaic.models.knn.SpatialKNN import com.databricks.labs.mosaic.functions.MosaicContext import com.databricks.labs.mosaic.H3 - import com.databricks.labs.mosaic.ESRI + import com.databricks.labs.mosaic.JTS >>> - val mosaicContext = MosaicContext.build(H3, ESRI) + val mosaicContext = MosaicContext.build(H3, JTS) import mosaicContext.functions._ mosaicContext.register(spark) >>> diff --git a/docs/source/usage/automatic-sql-registration.rst b/docs/source/usage/automatic-sql-registration.rst index 2d353998b..4013aea71 100644 --- a/docs/source/usage/automatic-sql-registration.rst +++ b/docs/source/usage/automatic-sql-registration.rst @@ -60,7 +60,7 @@ To install Mosaic on your Databricks cluster, take the following steps: spark.databricks.labs.mosaic.index.system H3 # JTS or ESRI spark.databricks.labs.mosaic.geometry.api JTS - # MosaicSQL or MosaicSQLDefault, MosaicSQLDefault corresponds to (H3, ESRI) + # MosaicSQL or MosaicSQLDefault, MosaicSQLDefault corresponds to (H3, JTS) spark.sql.extensions com.databricks.labs.mosaic.sql.extensions.MosaicSQL Testing diff --git a/docs/source/usage/grid-indexes-bng.rst b/docs/source/usage/grid-indexes-bng.rst index 52879dd18..874ae9df5 100644 --- a/docs/source/usage/grid-indexes-bng.rst +++ b/docs/source/usage/grid-indexes-bng.rst @@ -34,15 +34,15 @@ configurations. Spark provides an easy way to supply configuration parameters us .. code-tab:: scala import com.databricks.labs.mosaic.functions.MosaicContext - import com.databricks.labs.mosaic.{BNG, ESRI} + import com.databricks.labs.mosaic.{BNG, JTS} - val mosaicContext = MosaicContext.build(BNG, ESRI) + val mosaicContext = MosaicContext.build(BNG, JTS) import mosaicContext.functions._ .. code-tab:: r R library(sparkrMosaic) - enableMosaic("ESRI", "BNG") + enableMosaic("JTS", "BNG") .. code-tab:: sql diff --git a/docs/source/usage/installation.rst b/docs/source/usage/installation.rst index ad5c832ac..8fd432e68 100644 --- a/docs/source/usage/installation.rst +++ b/docs/source/usage/installation.rst @@ -70,9 +70,9 @@ The mechanism for enabling the Mosaic functions varies by language: import com.databricks.labs.mosaic.functions.MosaicContext import com.databricks.labs.mosaic.H3 - import com.databricks.labs.mosaic.ESRI + import com.databricks.labs.mosaic.JTS - val mosaicContext = MosaicContext.build(H3, ESRI) + val mosaicContext = MosaicContext.build(H3, JTS) import mosaicContext.functions._ .. code-tab:: r R @@ -90,8 +90,8 @@ register the Mosaic SQL functions in your SparkSession from a Scala notebook cel import com.databricks.labs.mosaic.functions.MosaicContext import com.databricks.labs.mosaic.H3 - import com.databricks.labs.mosaic.ESRI + import com.databricks.labs.mosaic.JTS - val mosaicContext = MosaicContext.build(H3, ESRI) + val mosaicContext = MosaicContext.build(H3, JTS) mosaicContext.register(spark) diff --git a/notebooks/examples/python/Ship2ShipTransfers/01. Data Prep.py b/notebooks/examples/python/Ship2ShipTransfers/01. Data Prep.py index 95e3b385d..a0378d68b 100644 --- a/notebooks/examples/python/Ship2ShipTransfers/01. Data Prep.py +++ b/notebooks/examples/python/Ship2ShipTransfers/01. Data Prep.py @@ -10,7 +10,7 @@ from pyspark.sql.functions import * import mosaic as mos -spark.conf.set("spark.databricks.labs.mosaic.geometry.api", "ESRI") +spark.conf.set("spark.databricks.labs.mosaic.geometry.api", "JTS") spark.conf.set("spark.databricks.labs.mosaic.index.system", "H3") mos.enable_mosaic(spark, dbutils) diff --git a/notebooks/examples/python/Ship2ShipTransfers/02. Data Ingestion.py b/notebooks/examples/python/Ship2ShipTransfers/02. Data Ingestion.py index 4f02bdc85..5f4f3aff7 100644 --- a/notebooks/examples/python/Ship2ShipTransfers/02. Data Ingestion.py +++ b/notebooks/examples/python/Ship2ShipTransfers/02. Data Ingestion.py @@ -11,7 +11,7 @@ from pyspark.sql.functions import * import mosaic as mos -spark.conf.set("spark.databricks.labs.mosaic.geometry.api", "ESRI") +spark.conf.set("spark.databricks.labs.mosaic.geometry.api", "JTS") spark.conf.set("spark.databricks.labs.mosaic.index.system", "H3") mos.enable_mosaic(spark, dbutils) diff --git a/notebooks/examples/python/Ship2ShipTransfers/03.a Overlap Detection.py b/notebooks/examples/python/Ship2ShipTransfers/03.a Overlap Detection.py index 36726f75d..02864978a 100644 --- a/notebooks/examples/python/Ship2ShipTransfers/03.a Overlap Detection.py +++ b/notebooks/examples/python/Ship2ShipTransfers/03.a Overlap Detection.py @@ -12,7 +12,7 @@ from pyspark.sql.functions import * import mosaic as mos -spark.conf.set("spark.databricks.labs.mosaic.geometry.api", "ESRI") +spark.conf.set("spark.databricks.labs.mosaic.geometry.api", "JTS") spark.conf.set("spark.databricks.labs.mosaic.index.system", "H3") mos.enable_mosaic(spark, dbutils) diff --git a/notebooks/examples/python/Ship2ShipTransfers/03.b Advanced Overlap Detection.py b/notebooks/examples/python/Ship2ShipTransfers/03.b Advanced Overlap Detection.py index b08a10709..d867ab3c0 100644 --- a/notebooks/examples/python/Ship2ShipTransfers/03.b Advanced Overlap Detection.py +++ b/notebooks/examples/python/Ship2ShipTransfers/03.b Advanced Overlap Detection.py @@ -12,7 +12,7 @@ from pyspark.sql.functions import * import mosaic as mos -spark.conf.set("spark.databricks.labs.mosaic.geometry.api", "ESRI") +spark.conf.set("spark.databricks.labs.mosaic.geometry.api", "JTS") spark.conf.set("spark.databricks.labs.mosaic.index.system", "H3") mos.enable_mosaic(spark, dbutils) diff --git a/notebooks/examples/scala/MosaicAndSedona.scala b/notebooks/examples/scala/MosaicAndSedona.scala index af72a9b92..532b96fa5 100644 --- a/notebooks/examples/scala/MosaicAndSedona.scala +++ b/notebooks/examples/scala/MosaicAndSedona.scala @@ -23,9 +23,9 @@ SedonaSQLRegistrator.registerAll(spark) // Import Mosaic functions import com.databricks.labs.mosaic.functions.MosaicContext import com.databricks.labs.mosaic.H3 -import com.databricks.labs.mosaic.ESRI +import com.databricks.labs.mosaic.JTS -val mosaicContext = MosaicContext.build(H3, ESRI) +val mosaicContext = MosaicContext.build(H3, JTS) import mosaicContext.functions._ import org.apache.spark.sql.functions._ diff --git a/notebooks/examples/scala/QuickstartNotebook.scala b/notebooks/examples/scala/QuickstartNotebook.scala index d11a6c4d8..e1b1db134 100644 --- a/notebooks/examples/scala/QuickstartNotebook.scala +++ b/notebooks/examples/scala/QuickstartNotebook.scala @@ -24,9 +24,9 @@ print(s"The raw data is stored in $raw_path") import com.databricks.labs.mosaic.functions.MosaicContext import com.databricks.labs.mosaic.H3 -import com.databricks.labs.mosaic.ESRI +import com.databricks.labs.mosaic.JTS -val mosaicContext = MosaicContext.build(H3, ESRI) +val mosaicContext = MosaicContext.build(H3, JTS) import mosaicContext.functions._ import org.apache.spark.sql.functions._ diff --git a/notebooks/examples/sql/MosaicAndSedona.sql b/notebooks/examples/sql/MosaicAndSedona.sql index 71fc378b2..40b33167b 100644 --- a/notebooks/examples/sql/MosaicAndSedona.sql +++ b/notebooks/examples/sql/MosaicAndSedona.sql @@ -34,9 +34,9 @@ -- MAGIC // Import Mosaic functions -- MAGIC import com.databricks.labs.mosaic.functions.MosaicContext -- MAGIC import com.databricks.labs.mosaic.H3 --- MAGIC import com.databricks.labs.mosaic.ESRI +-- MAGIC import com.databricks.labs.mosaic.JTS -- MAGIC --- MAGIC val mosaicContext = MosaicContext.build(H3, ESRI) +-- MAGIC val mosaicContext = MosaicContext.build(H3, JTS) -- MAGIC import mosaicContext.functions._ -- MAGIC import org.apache.spark.sql.functions._ diff --git a/python/mosaic/api/functions.py b/python/mosaic/api/functions.py index 202d7c515..cc659d826 100644 --- a/python/mosaic/api/functions.py +++ b/python/mosaic/api/functions.py @@ -487,7 +487,7 @@ def st_isvalid(geom: ColumnOrName) -> Column: Notes ----- Validity assertions will be dependent on the chosen geometry API. - The assertions used in the ESRI geometry API (the default) follow the definitions in + The assertions used in the ESRI geometry API (JTS is the default) follow the definitions in the “Simple feature access - Part 1” document (OGC 06-103r4) for each geometry type. """ diff --git a/python/mosaic/core/mosaic_context.py b/python/mosaic/core/mosaic_context.py index e085993d3..10e321615 100644 --- a/python/mosaic/core/mosaic_context.py +++ b/python/mosaic/core/mosaic_context.py @@ -31,7 +31,7 @@ def __init__(self, spark: SparkSession): "spark.databricks.labs.mosaic.geometry.api" ) except Py4JJavaError as e: - self._geometry_api = "ESRI" + self._geometry_api = "JTS" try: self._index_system = spark.conf.get( diff --git a/src/main/scala/com/databricks/labs/mosaic/sql/extensions/MosaicSQLDefault.scala b/src/main/scala/com/databricks/labs/mosaic/sql/extensions/MosaicSQLDefault.scala index 583a8c4f6..d888a9d22 100644 --- a/src/main/scala/com/databricks/labs/mosaic/sql/extensions/MosaicSQLDefault.scala +++ b/src/main/scala/com/databricks/labs/mosaic/sql/extensions/MosaicSQLDefault.scala @@ -1,6 +1,6 @@ package com.databricks.labs.mosaic.sql.extensions -import com.databricks.labs.mosaic.core.geometry.api.ESRI +import com.databricks.labs.mosaic.core.geometry.api.JTS import com.databricks.labs.mosaic.core.index.H3IndexSystem import com.databricks.labs.mosaic.core.raster.api.RasterAPI.GDAL import com.databricks.labs.mosaic.functions.MosaicContext @@ -24,8 +24,8 @@ class MosaicSQLDefault extends (SparkSessionExtensions => Unit) with Logging { */ override def apply(ext: SparkSessionExtensions): Unit = { ext.injectCheckRule(spark => { - val mosaicContext = MosaicContext.build(H3IndexSystem, ESRI, GDAL) - logInfo(s"Registering Mosaic SQL Extensions (H3, ESRI, GDAL).") + val mosaicContext = MosaicContext.build(H3IndexSystem, JTS, GDAL) + logInfo(s"Registering Mosaic SQL Extensions (H3, JTS, GDAL).") mosaicContext.register(spark) // NOP rule. This rule is specified only to respect syntax. _ => () diff --git a/src/test/scala/com/databricks/labs/mosaic/codegen/ConvertToCodegenMockTest.scala b/src/test/scala/com/databricks/labs/mosaic/codegen/ConvertToCodegenMockTest.scala index 265695395..de9df4070 100644 --- a/src/test/scala/com/databricks/labs/mosaic/codegen/ConvertToCodegenMockTest.scala +++ b/src/test/scala/com/databricks/labs/mosaic/codegen/ConvertToCodegenMockTest.scala @@ -11,7 +11,7 @@ class ConvertToCodegenMockTest extends AnyFunSuite with MockFactory { test("ConvertTo Expression from GEOJSON to Unsupported format should throw an exception") { val ctx = stub[CodegenContext] val api = stub[GeometryAPI] - api.name _ when () returns "ESRI" + api.name _ when () returns "JTS" assertThrows[Error] { ConvertToCodeGen writeGeometryCode (ctx, "", "unsupported", api) From 42754927b34cd3cda89d428524030a1c8cf5d31c Mon Sep 17 00:00:00 2001 From: Michael Johns Date: Fri, 16 Jun 2023 10:21:18 -0400 Subject: [PATCH 08/28] Update CHANGELOG.md JTS default --- CHANGELOG.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 4af8f6839..b41bac205 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,6 @@ +## v0.3.12 +- Make JTS default Geometry Provider + ## v0.3.11 - Update the CONTRIBUTING.md to follow the standard process. - Fix for issue 383: grid_pointascellid fails with a Java type error when run on an already instantiated point. @@ -172,4 +175,4 @@ - Add Geometry validity expressions - Create WKT, WKB and Hex conversion expressions - Setup the project -- Define GitHub templates \ No newline at end of file +- Define GitHub templates From 6f0c96f53a10284d6e25ae52cdfcbc0421fff222 Mon Sep 17 00:00:00 2001 From: Michael Johns Date: Fri, 16 Jun 2023 10:39:53 -0400 Subject: [PATCH 09/28] Update installation.rst --- docs/source/usage/installation.rst | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/docs/source/usage/installation.rst b/docs/source/usage/installation.rst index 8fd432e68..b307088f4 100644 --- a/docs/source/usage/installation.rst +++ b/docs/source/usage/installation.rst @@ -4,15 +4,26 @@ Installation guide Supported platforms ################### -In order to use Mosaic, you must have access to a Databricks cluster running -Databricks Runtime 10.0 or higher (11.2 with photon or later is recommended). .. warning:: From version 0.4.x, Mosaic will require either * Databricks Runtime 11.2+ with Photon enabled * Databricks Runtime for ML 11.2+ + + Mosaic 0.3.x series does not support DBR 13.x (coming soon with Mosaic 0.4.x series); + also, DBR 10 is no longer supported in Mosaic. - Other Databricks Runtime versions will not be supported anymore. +We recommend using Databricks Runtime versions 11.3 LTS or 12.2 LTS with Photon enabled; +this will leverage the Databricks h3 expressions when using H3 grid system. +As of the 0.3.11 release, Mosaic issues the following warning when initialized on a cluster +that is neither Photon Runtime nor Databricks Runtime ML [`ADB `__ | `AWS `__ | `GCP `__]: + + DEPRECATION WARNING: Mosaic is not supported on the selected Databricks Runtime. Mosaic will stop working on this cluster from version v0.4.0+. Please use a Databricks Photon-enabled Runtime (for performance benefits) or Runtime ML (for spatial AI benefits). + +If you are receiving this warning in v0.3.11+, you will want to change to a supported runtime prior +to updating Mosaic to run 0.4.0. The reason we are making this change is that we are streamlining Mosaic +internals to be more aligned with future product APIs which are powered by Photon. Along this direction +of change, Mosaic will be standardizing to JTS as its default and supported Vector Geometry Provider. If you have cluster creation permissions in your Databricks workspace, you can create a cluster using the instructions From bfa9edacf0af5e4fce4acc9d0b9c7db7c02810b0 Mon Sep 17 00:00:00 2001 From: Michael Johns Date: Wed, 21 Jun 2023 07:30:39 -0400 Subject: [PATCH 10/28] Update index.rst Added DBR support --- docs/source/index.rst | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/docs/source/index.rst b/docs/source/index.rst index d35c51165..fe1ec92c9 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -42,11 +42,15 @@ Mosaic is an extension to the `Apache Spark `_ framework that allows easy and fast processing of very large geospatial datasets. .. warning:: - From version 0.4.x, Mosaic will require either + From version 0.4.0, Mosaic will require either * Databricks Runtime 11.2+ with Photon enabled * Databricks Runtime for ML 11.2+ + + Mosaic 0.3 series does not support DBR 13 (coming soon with Mosaic 0.4 series); + also, DBR 10 is no longer supported in Mosaic. - Other Databricks Runtime versions will not be supported anymore. +We recommend using Databricks Runtime versions 11.3 LTS or 12.2 LTS with Photon enabled; +this will leverage the Databricks H3 expressions when using H3 grid system. Mosaic provides: * easy conversion between common spatial data encodings (WKT, WKB and GeoJSON); From bae51f8d11078d91f5cf1e68334be4f6da1cd3c1 Mon Sep 17 00:00:00 2001 From: Michael Johns Date: Wed, 21 Jun 2023 07:32:08 -0400 Subject: [PATCH 11/28] Update installation.rst DBR support messaging --- docs/source/usage/installation.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/usage/installation.rst b/docs/source/usage/installation.rst index b307088f4..a4b97952b 100644 --- a/docs/source/usage/installation.rst +++ b/docs/source/usage/installation.rst @@ -6,15 +6,15 @@ Supported platforms ################### .. warning:: - From version 0.4.x, Mosaic will require either + From version 0.4.0, Mosaic will require either * Databricks Runtime 11.2+ with Photon enabled * Databricks Runtime for ML 11.2+ - Mosaic 0.3.x series does not support DBR 13.x (coming soon with Mosaic 0.4.x series); + Mosaic 0.3 series does not support DBR 13 (coming soon with Mosaic 0.4 series); also, DBR 10 is no longer supported in Mosaic. We recommend using Databricks Runtime versions 11.3 LTS or 12.2 LTS with Photon enabled; -this will leverage the Databricks h3 expressions when using H3 grid system. +this will leverage the Databricks H3 expressions when using H3 grid system. As of the 0.3.11 release, Mosaic issues the following warning when initialized on a cluster that is neither Photon Runtime nor Databricks Runtime ML [`ADB `__ | `AWS `__ | `GCP `__]: From 8c190481a33c6152a12c524cd0fe3435a967fac6 Mon Sep 17 00:00:00 2001 From: Michael Johns Date: Wed, 21 Jun 2023 07:33:38 -0400 Subject: [PATCH 12/28] Update README.md DBR support message --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 307a2fae6..052ebeb38 100644 --- a/README.md +++ b/README.md @@ -43,9 +43,9 @@ Image1: Mosaic logical design. ## Getting started We recommend using Databricks Runtime versions 11.3 LTS or 12.2 LTS with Photon enabled; this will leverage the -Databricks h3 expressions when using H3 grid system. +Databricks H3 expressions when using H3 grid system. -:warning: **Mosaic 0.3.x series does not support DBR 13.x** (coming soon with Mosaic 0.4.x series); also, DBR 10 is no longer supported in Mosaic. +:warning: **Mosaic 0.3 series does not support DBR 13** (coming soon with Mosaic 0.4 series); also, DBR 10 is no longer supported in Mosaic. As of the 0.3.11 release, Mosaic issues the following warning when initialized on a cluster that is neither Photon Runtime nor Databricks Runtime ML [[ADB](https://learn.microsoft.com/en-us/azure/databricks/runtime/) | [AWS](https://docs.databricks.com/runtime/index.html) | [GCP](https://docs.gcp.databricks.com/runtime/index.html)]: From b96f96c4023501b2e30742b64a2fa0849bef224c Mon Sep 17 00:00:00 2001 From: Erni Durdevic Date: Mon, 26 Jun 2023 06:33:47 -0700 Subject: [PATCH 13/28] Update LICENSE This updates the license to allow for modifications to external contributions --- LICENSE | 62 +++++++++++++++++++++------------------------------------ 1 file changed, 23 insertions(+), 39 deletions(-) diff --git a/LICENSE b/LICENSE index 95fecd816..21db58bb9 100644 --- a/LICENSE +++ b/LICENSE @@ -1,41 +1,25 @@ +DB license + Copyright (2022) Databricks, Inc. -This library (the "Software") may not be used except in connection with the Licensee's use of the Databricks Platform Services pursuant -to an Agreement (defined below) between Licensee (defined below) and Databricks, Inc. ("Databricks"). The Object Code version of the -Software shall be deemed part of the Downloadable Services under the Agreement, or if the Agreement does not define Downloadable Services, -Subscription Services, or if neither are defined then the term in such Agreement that refers to the applicable Databricks Platform -Services (as defined below) shall be substituted herein for “Downloadable Services.” Licensee's use of the Software must comply at -all times with any restrictions applicable to the Downlodable Services and Subscription Services, generally, and must be used in -accordance with any applicable documentation. For the avoidance of doubt, the Software constitutes Databricks Confidential Information -under the Agreement. - -Additionally, and notwithstanding anything in the Agreement to the contrary: -* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES - OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE - LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR - IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -* you may view, make limited copies of, and may compile the Source Code version of the Software into an Object Code version of the - Software. For the avoidance of doubt, you may not make derivative works of Software (or make any any changes to the Source Code - version of the unless you have agreed to separate terms with Databricks permitting such modifications (e.g., a contribution license - agreement)). - -If you have not agreed to an Agreement or otherwise do not agree to these terms, you may not use the Software or view, copy or compile -the Source Code of the Software. - -This license terminates automatically upon the termination of the Agreement or Licensee's breach of these terms. Additionally, -Databricks may terminate this license at any time on notice. Upon termination, you must permanently delete the Software and all -copies thereof (including the Source Code). - -Agreement: the agreement between Databricks and Licensee governing the use of the Databricks Platform Services, which shall be, with -respect to Databricks, the Databricks Terms of Service located at www.databricks.com/termsofservice, and with respect to Databricks -Community Edition, the Community Edition Terms of Service located at www.databricks.com/ce-termsofuse, in each case unless Licensee -has entered into a separate written agreement with Databricks governing the use of the applicable Databricks Platform Services. - -Databricks Platform Services: the Databricks services or the Databricks Community Edition services, according to where the Software is used. - -Licensee: the user of the Software, or, if the Software is being used on behalf of a company, the company. - -Object Code: is version of the Software produced when an interpreter or a compiler translates the Source Code into recognizable and -executable machine code. - -Source Code: the human readable portion of the Software. \ No newline at end of file +Definitions. + +Agreement: The agreement between Databricks, Inc., and you governing the use of the Databricks Services, which shall be, with respect to Databricks, the Databricks Terms of Service located at www.databricks.com/termsofservice, and with respect to Databricks Community Edition, the Community Edition Terms of Service located at www.databricks.com/ce-termsofuse, in each case unless you have entered into a separate written agreement with Databricks governing the use of the applicable Databricks Services. + +Software: The source code and object code to which this license applies. + +Scope of Use. You may not use this Software except in connection with your use of the Databricks Services pursuant to the Agreement. Your use of the Software must comply at all times with any restrictions applicable to the Databricks Services, generally, and must be used in accordance with any applicable documentation. You may view, use, copy, modify, publish, and/or distribute the Software solely for the purposes of using the code within or connecting to the Databricks Services. If you do not agree to these terms, you may not view, use, copy, modify, publish, and/or distribute the Software. + +Redistribution. You may redistribute and sublicense the Software so long as all use is in compliance with these terms. In addition: + +You must give any other recipients a copy of this License; +You must cause any modified files to carry prominent notices stating that you changed the files; +You must retain, in the source code form of any derivative works that you distribute, all copyright, patent, trademark, and attribution notices from the source code form, excluding those notices that do not pertain to any part of the derivative works; and +If the source code form includes a "NOTICE" text file as part of its distribution, then any derivative works that you distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the derivative works. +You may add your own copyright statement to your modifications and may provide additional license terms and conditions for use, reproduction, or distribution of your modifications, or for any such derivative works as a whole, provided your use, reproduction, and distribution of the Software otherwise complies with the conditions stated in this License. + +Termination. This license terminates automatically upon your breach of these terms or upon the termination of your Agreement. Additionally, Databricks may terminate this license at any time on notice. Upon termination, you must permanently delete the Software and all copies thereof. + +DISCLAIMER; LIMITATION OF LIABILITY. + +THE SOFTWARE IS PROVIDED “AS-IS” AND WITH ALL FAULTS. DATABRICKS, ON BEHALF OF ITSELF AND ITS LICENSORS, SPECIFICALLY DISCLAIMS ALL WARRANTIES RELATING TO THE SOURCE CODE, EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, IMPLIED WARRANTIES, CONDITIONS AND OTHER TERMS OF MERCHANTABILITY, SATISFACTORY QUALITY OR FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT. DATABRICKS AND ITS LICENSORS TOTAL AGGREGATE LIABILITY RELATING TO OR ARISING OUT OF YOUR USE OF OR DATABRICKS’ PROVISIONING OF THE SOURCE CODE SHALL BE LIMITED TO ONE THOUSAND ($1,000) DOLLARS. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. From fcdc8a99f9f0e2efbea4ce517fc72d327da86540 Mon Sep 17 00:00:00 2001 From: "milos.colic" Date: Fri, 30 Jun 2023 18:41:48 +0100 Subject: [PATCH 14/28] Fix coercion of cell geometries. --- .../labs/mosaic/core/index/IndexSystem.scala | 9 ++- .../index/MosaicExplodeBehaviors.scala | 79 ++++++++++++++----- .../expressions/index/MosaicExplodeTest.scala | 1 + 3 files changed, 67 insertions(+), 22 deletions(-) diff --git a/src/main/scala/com/databricks/labs/mosaic/core/index/IndexSystem.scala b/src/main/scala/com/databricks/labs/mosaic/core/index/IndexSystem.scala index 4cac86e49..0144760a0 100644 --- a/src/main/scala/com/databricks/labs/mosaic/core/index/IndexSystem.scala +++ b/src/main/scala/com/databricks/labs/mosaic/core/index/IndexSystem.scala @@ -160,7 +160,7 @@ abstract class IndexSystem(var cellIdType: DataType) extends Serializable { val intersections = for (index <- borderIndices) yield { val indexGeom = indexToGeometry(index, geometryAPI) val intersect = geometry.intersection(indexGeom) - val coerced = coerceChipGeometry(intersect, index, geometryAPI) + val coerced = coerceChipGeometry(intersect, indexGeom, geometry) val isCore = coerced.equals(indexGeom) val chipGeom = if (!isCore || keepCoreGeom) coerced else null @@ -276,12 +276,13 @@ abstract class IndexSystem(var cellIdType: DataType) extends Serializable { def area(index: String): Double = area(parse(index)) - def coerceChipGeometry(geom: MosaicGeometry, cell: Long, geometryAPI: GeometryAPI): MosaicGeometry = { + def coerceChipGeometry(geom: MosaicGeometry, indexGeom: MosaicGeometry, originGeom: MosaicGeometry): MosaicGeometry = { val geomType = GeometryTypeEnum.fromString(geom.getGeometryType) - if (geomType == GEOMETRYCOLLECTION) { + val originGeomType = GeometryTypeEnum.fromString(originGeom.getGeometryType) + if (geomType == GEOMETRYCOLLECTION || geomType != originGeomType) { // This case can occur if partial geometry is a geometry collection // or if the intersection includes a part of the boundary of the cell - geom.difference(indexToGeometry(cell, geometryAPI).getBoundary) + geom.difference(indexGeom.getBoundary) } else { geom } diff --git a/src/test/scala/com/databricks/labs/mosaic/expressions/index/MosaicExplodeBehaviors.scala b/src/test/scala/com/databricks/labs/mosaic/expressions/index/MosaicExplodeBehaviors.scala index 4d0166af1..dd5b71718 100644 --- a/src/test/scala/com/databricks/labs/mosaic/expressions/index/MosaicExplodeBehaviors.scala +++ b/src/test/scala/com/databricks/labs/mosaic/expressions/index/MosaicExplodeBehaviors.scala @@ -21,9 +21,9 @@ trait MosaicExplodeBehaviors extends MosaicSpatialQueryTest { mc.register(spark) val resolution = mc.getIndexSystem match { - case H3IndexSystem => 3 + case H3IndexSystem => 3 case BNGIndexSystem => 5 - case _ => 3 + case _ => 3 } val boroughs: DataFrame = getBoroughs(mc) @@ -56,7 +56,7 @@ trait MosaicExplodeBehaviors extends MosaicSpatialQueryTest { val resolution = mc.getIndexSystem match { case H3IndexSystem => 3 case BNGIndexSystem => 5 - case _ => 3 + case _ => 3 } val rdd = spark.sparkContext.makeRDD( @@ -112,7 +112,7 @@ trait MosaicExplodeBehaviors extends MosaicSpatialQueryTest { val resolution = mc.getIndexSystem match { case H3IndexSystem => 3 case BNGIndexSystem => 5 - case _ => 3 + case _ => 3 } val rdd = spark.sparkContext.makeRDD( @@ -149,7 +149,7 @@ trait MosaicExplodeBehaviors extends MosaicSpatialQueryTest { val resolution = mc.getIndexSystem match { case H3IndexSystem => 3 case BNGIndexSystem => 3 - case _ => 3 + case _ => 3 } val wktRows: DataFrame = getWKTRowsDf(mc.getIndexSystem).where(col("wkt").contains("LINESTRING")) @@ -202,7 +202,7 @@ trait MosaicExplodeBehaviors extends MosaicSpatialQueryTest { ) val res = noEmptyChips.collect() res.length should be > 0 - case _ => // do nothing + case _ => // do nothing } } @@ -216,7 +216,7 @@ trait MosaicExplodeBehaviors extends MosaicSpatialQueryTest { val resolution = mc.getIndexSystem match { case H3IndexSystem => 3 case BNGIndexSystem => 5 - case _ => 3 + case _ => 3 } val boroughs: DataFrame = getBoroughs(mc) @@ -249,7 +249,7 @@ trait MosaicExplodeBehaviors extends MosaicSpatialQueryTest { val resolution = mc.getIndexSystem match { case H3IndexSystem => 3 case BNGIndexSystem => 5 - case _ => 3 + case _ => 3 } val boroughs: DataFrame = getBoroughs(mc) @@ -282,7 +282,7 @@ trait MosaicExplodeBehaviors extends MosaicSpatialQueryTest { val resolution = mc.getIndexSystem match { case H3IndexSystem => 3 case BNGIndexSystem => 5 - case _ => 3 + case _ => 3 } val boroughs: DataFrame = getBoroughs(mc) @@ -313,7 +313,7 @@ trait MosaicExplodeBehaviors extends MosaicSpatialQueryTest { noException should be thrownBy funcs.grid_tessellateexplode(col("wkt"), 3, keepCoreGeometries = true) noException should be thrownBy funcs.grid_tessellateexplode(col("wkt"), 3, lit(false)) noException should be thrownBy funcs.grid_tessellateexplode(col("wkt"), lit(3), lit(false)) - //legacy APIs + // legacy APIs noException should be thrownBy funcs.mosaic_explode(col("wkt"), 3) noException should be thrownBy funcs.mosaic_explode(col("wkt"), lit(3)) noException should be thrownBy funcs.mosaic_explode(col("wkt"), 3, keepCoreGeometries = true) @@ -332,7 +332,7 @@ trait MosaicExplodeBehaviors extends MosaicSpatialQueryTest { val resExpr = mc.getIndexSystem match { case H3IndexSystem => lit(mc.getIndexSystem.resolutions.head).expr case BNGIndexSystem => lit("100m").expr - case _ => lit(3).expr + case _ => lit(3).expr } val mosaicExplodeExpr = MosaicExplode( @@ -390,14 +390,14 @@ trait MosaicExplodeBehaviors extends MosaicSpatialQueryTest { mc.getIndexSystem match { case H3IndexSystem => val rdd = spark.sparkContext.makeRDD( - Seq( - Row("LINESTRING (-85.0040681 42.2975028, -85.0073029 42.2975266)") - ) + Seq( + Row("LINESTRING (-85.0040681 42.2975028, -85.0073029 42.2975266)") + ) ) val schema = StructType( - List( - StructField("wkt", StringType) - ) + List( + StructField("wkt", StringType) + ) ) val df = spark.createDataFrame(rdd, schema) @@ -408,7 +408,50 @@ trait MosaicExplodeBehaviors extends MosaicSpatialQueryTest { df.select(expr(s"grid_tessellateexplode(wkt, 13, true)")) .collect() .length shouldEqual 48 - case _ => // do nothing + case _ => // do nothing } } + + def issue382(mosaicContext: MosaicContext): Unit = { + assume(mosaicContext.getIndexSystem == H3IndexSystem) + val sc = spark + import sc.implicits._ + import mosaicContext.functions._ + + val wkt = "POLYGON ((-8.522721910163417 53.40846416712235, -8.522828495418493 53.40871094834742," + + " -8.523239522405696 53.40879676331252, -8.52334611088906 53.409043543609435," + + " -8.523757142297253 53.409129356978674, -8.523863734008978 53.409376136347404," + + " -8.523559290871438 53.40953710231036, -8.523665882370468 53.40978388071435," + + " -8.523361436771772 53.40994484500841, -8.523468028058108 53.41019162244766," + + " -8.523163579998224 53.410352585072815, -8.52275254184475 53.41026676959102," + + " -8.52244809251643 53.41042772987954, -8.522037056535808 53.41034191209765," + + " -8.521732605939153 53.41050287004956, -8.52132157213149 53.41041704996761," + + " -8.521214991168797 53.410170272637956, -8.520803961782489 53.41008445096018," + + " -8.520697384048132 53.40983767270238, -8.520286359083132 53.40975184942885," + + " -8.520179784577046 53.409505070242936, -8.520484231594777 53.40934411429393," + + " -8.52037765687575 53.409097334143304, -8.520682101432444 53.40893637652535," + + " -8.520575526500501 53.408689595410024, -8.520879968596168 53.40852863612313," + + " -8.521290986816735 53.408614457283946, -8.521595427644318 53.408453495660524," + + " -8.522006448037782 53.40853931452139, -8.522310887597179 53.408378350561435," + + " -8.522721910163417 53.40846416712235))" + + val rdd = spark.sparkContext.makeRDD(Seq(Row(wkt))) + val schema = StructType(List(StructField("wkt", StringType))) + val df = spark.createDataFrame(rdd, schema) + + val result = df + .select(grid_tessellateexplode(col("wkt"), 11).alias("grid")) + .select(col("grid.wkb")) + .select(st_aswkt(col("wkb"))) + + val chips = result.as[String].collect() + val resultGeom = chips.map(mosaicContext.getGeometryAPI.geometry(_, "WKT")) + .reduce(_ union _) + + val expected = mosaicContext.getGeometryAPI.geometry(wkt, "WKT") + + math.abs(expected.getArea - resultGeom.getArea) should be < 1e-8 + + } + } diff --git a/src/test/scala/com/databricks/labs/mosaic/expressions/index/MosaicExplodeTest.scala b/src/test/scala/com/databricks/labs/mosaic/expressions/index/MosaicExplodeTest.scala index 67c3d0e0e..3ed35f825 100644 --- a/src/test/scala/com/databricks/labs/mosaic/expressions/index/MosaicExplodeTest.scala +++ b/src/test/scala/com/databricks/labs/mosaic/expressions/index/MosaicExplodeTest.scala @@ -16,5 +16,6 @@ class MosaicExplodeTest extends MosaicSpatialQueryTest with SharedSparkSession w testAllNoCodegen("MosaicExplode column function signatures") { columnFunctionSignatures } testAllNoCodegen("MosaicExplode auxiliary methods") { auxiliaryMethods } testAllNoCodegen("MosaicExplode Line cases identified by issue 360") { issue360 } + testAllNoCodegen("MosaicExplode Should properly handle polygons that are a union of cells") { issue382 } } From 5c86aeb59a845faf726760c3a844a27be6357bd1 Mon Sep 17 00:00:00 2001 From: Michael Johns Date: Sun, 23 Jul 2023 08:10:02 -0400 Subject: [PATCH 15/28] Update mosaic-gdal-3.4.3-filetree-init.sh Adding clarifications around DBRs supported to help customers avoid confusion with DBR 13 / Ubuntu 22.04. --- .../scripts/mosaic-gdal-3.4.3-filetree-init.sh | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/modules/python/gdal_package/databricks-mosaic-gdal/resources/scripts/mosaic-gdal-3.4.3-filetree-init.sh b/modules/python/gdal_package/databricks-mosaic-gdal/resources/scripts/mosaic-gdal-3.4.3-filetree-init.sh index 8d90034e3..56f8bdc11 100644 --- a/modules/python/gdal_package/databricks-mosaic-gdal/resources/scripts/mosaic-gdal-3.4.3-filetree-init.sh +++ b/modules/python/gdal_package/databricks-mosaic-gdal/resources/scripts/mosaic-gdal-3.4.3-filetree-init.sh @@ -2,12 +2,14 @@ # # File: mosaic-gdal-3.4.3-filetree-init.sh # Author: Michael Johns -# Modified: 2023-03-21 +# Modified: 2023-07-23 +# +# !!! FOR DBR 11.x and 12.x ONLY [Ubuntu 20.04] !!! +# !!! NOT for DBR 13.x [Ubuntu 22.04] !!! +# # 1. script is using custom tarballs for offline / self-contained install of GDAL # 2. This will unpack files directly into the filetree across cluster nodes (vs run apt install) -# 3. Note: Mosaic will be able to auto-detect and handle tarball unpacking -# without this init script on/around APR 2023 (so this is an alt to that capability) - +# # -- install databricks-mosaic-gdal on cluster # - from pypi.org (once available) pip install databricks-mosaic-gdal==3.4.3 From 8a83eed2429ce38c805e4a7b1616af4fd290d6dc Mon Sep 17 00:00:00 2001 From: Thomas Maschler Date: Tue, 1 Aug 2023 04:08:13 -0400 Subject: [PATCH 16/28] fix H3 geometry constructor (#374) * fix H3 geometry constructor * fix function names * add test for str function * lint with scalafmt * add comments --- .jvmopts | 1 + .../mosaic/core/index/H3IndexSystem.scala | 185 +++++++++++++++++- .../mosaic/core/index/H3IndexSystemTest.scala | 37 +++- 3 files changed, 212 insertions(+), 11 deletions(-) create mode 100644 .jvmopts diff --git a/.jvmopts b/.jvmopts new file mode 100644 index 000000000..d5d677f51 --- /dev/null +++ b/.jvmopts @@ -0,0 +1 @@ +-Djts.overlay=ng \ No newline at end of file diff --git a/src/main/scala/com/databricks/labs/mosaic/core/index/H3IndexSystem.scala b/src/main/scala/com/databricks/labs/mosaic/core/index/H3IndexSystem.scala index d48ce3191..79ba98390 100644 --- a/src/main/scala/com/databricks/labs/mosaic/core/index/H3IndexSystem.scala +++ b/src/main/scala/com/databricks/labs/mosaic/core/index/H3IndexSystem.scala @@ -3,7 +3,7 @@ package com.databricks.labs.mosaic.core.index import com.databricks.labs.mosaic.core.geometry.MosaicGeometry import com.databricks.labs.mosaic.core.geometry.api.GeometryAPI import com.databricks.labs.mosaic.core.types.model.Coordinates -import com.databricks.labs.mosaic.core.types.model.GeometryTypeEnum.POLYGON +import com.databricks.labs.mosaic.core.types.model.GeometryTypeEnum.{LINESTRING, POLYGON} import com.uber.h3core.H3Core import com.uber.h3core.util.GeoCoord import org.apache.spark.sql.types.LongType @@ -11,6 +11,7 @@ import org.apache.spark.unsafe.types.UTF8String import org.locationtech.jts.geom.Geometry import scala.collection.JavaConverters._ +import scala.collection.mutable import scala.util.{Success, Try} /** @@ -93,10 +94,9 @@ object H3IndexSystem extends IndexSystem(LongType) with Serializable { override def indexToGeometry(index: Long, geometryAPI: GeometryAPI): MosaicGeometry = { val boundary = h3.h3ToGeoBoundary(index).asScala val extended = boundary ++ List(boundary.head) - geometryAPI.geometry( - extended.map(p => geometryAPI.fromGeoCoord(Coordinates(p.lat, p.lng))), - POLYGON - ) + + if (crossesNorthPole(index) || crossesSouthPole(index)) makePoleGeometry(boundary, crossesNorthPole(index), geometryAPI) + else makeSafeGeometry(extended, geometryAPI) } /** @@ -198,10 +198,9 @@ object H3IndexSystem extends IndexSystem(LongType) with Serializable { override def indexToGeometry(index: String, geometryAPI: GeometryAPI): MosaicGeometry = { val boundary = h3.h3ToGeoBoundary(index).asScala val extended = boundary ++ List(boundary.head) - geometryAPI.geometry( - extended.map(p => geometryAPI.fromGeoCoord(Coordinates(p.lat, p.lng))), - POLYGON - ) + + if (crossesNorthPole(index) || crossesSouthPole(index)) makePoleGeometry(boundary, crossesNorthPole(index), geometryAPI) + else makeSafeGeometry(extended, geometryAPI) } override def format(id: Long): String = { @@ -227,4 +226,172 @@ object H3IndexSystem extends IndexSystem(LongType) with Serializable { override def distance(cellId: Long, cellId2: Long): Long = Try(h3.h3Distance(cellId, cellId2)).map(_.toLong).getOrElse(0) + // Find all cells that cross the north pole. There always is exactly one cell per resolution. + private lazy val northPoleCells = Range.inclusive(0, 15).map(h3.geoToH3(90, 0, _)) + + // Find all cells that cross the south pole. There always is exactly one cell per resolution. + private lazy val southPoleCells = Range.inclusive(0, 15).map(h3.geoToH3(-90, 0, _)) + + private def crossesNorthPole(cell_id: Long): Boolean = northPoleCells contains cell_id + private def crossesSouthPole(cell_id: Long): Boolean = southPoleCells contains cell_id + private def crossesNorthPole(cell_id: String): Boolean = northPoleCells contains h3.stringToH3(cell_id) + private def crossesSouthPole(cell_id: String): Boolean = southPoleCells contains h3.stringToH3(cell_id) + + /** + * Check if H3 cell crosses the anti-meridian. This check is not + * generalizable for arbitrary polygons. + * @param geometry + * H3 Geometry to be checked. + * @return + * boolean True if the geometry crosses the anti-meridian, false + * otherwise. + */ + private def crossesAntiMeridian(geometry: MosaicGeometry): Boolean = { + val minX = geometry.minMaxCoord("X", "MIN") + val maxX = geometry.minMaxCoord("X", "MAX") + minX < 0 && maxX >= 0 && ((maxX - minX > 180) || !geometry.isValid) + } + + /** + * Shift point that falls into the western hemisphere by 360 degrees to the + * east. + * @param lng + * Longitude of the point to be shifted. + * @param lat + * Latitude of the point to be shifted. + * @return + * Shifted point. + */ + private def shiftEast(lng: Double, lat: Double): (Double, Double) = { + if (lng < 0) (lng + 360.0, lat) + else (lng, lat) + } + + /** + * Shift point that lie east of the eastern hemisphere by 360 degrees to + * the west. + * @param lng + * Longitude of the point to be shifted. + * @param lat + * Latitude of the point to be shifted. + * @return + * Shifted point. + */ + private def shiftWest(lng: Double, lat: Double): (Double, Double) = { + if (lng >= 180.0) (lng - 360.0, lat) + else (lng, lat) + } + + /** + * @param coordinates + * A collection of [[GeoCoord]]s to be used to create a + * [[MosaicGeometry]]. + * @param geometryAPI + * An instance of [[GeometryAPI]] to be used to create a + * [[MosaicGeometry]]. + * @return + * A [[MosaicGeometry]] instance. Generates a polygon using the + * cooridaates for the outer ring in the order they are provided. This + * method will not check for validity of the geometry and may return an + * invalid geometry. + */ + private def makeUnsafeGeometry(coordinates: mutable.Buffer[GeoCoord], geometryAPI: GeometryAPI): MosaicGeometry = { + geometryAPI.geometry( + coordinates.map(p => geometryAPI.fromGeoCoord(Coordinates(p.lat, p.lng))), + POLYGON + ) + } + + /** + * A BBox that covers the eastern Hemisphere + * @param geometryAPI + * An instance of [[GeometryAPI]] to be used to create the geometry. + * @return + * A [[MosaicGeometry]] instance. + */ + private def makeEastBBox(geometryAPI: GeometryAPI): MosaicGeometry = + makeUnsafeGeometry( + mutable.Buffer(new GeoCoord(-90, 0), new GeoCoord(90, 0), new GeoCoord(90, 180), new GeoCoord(-90, 180), new GeoCoord(-90, 0)), + geometryAPI: GeometryAPI + ) + + /** + * A BBox that covers the western Hemisphere shifted by 360 degrees to the + * East + * @param geometryAPI + * An instance of [[GeometryAPI]] to be used to create the geometry. + * @return + * A [[MosaicGeometry]] instance. + */ + private def makeShiftedWestBBox(geometryAPI: GeometryAPI): MosaicGeometry = + makeUnsafeGeometry( + mutable + .Buffer(new GeoCoord(-90, 180), new GeoCoord(90, 180), new GeoCoord(90, 360), new GeoCoord(-90, 360), new GeoCoord(-90, 180)), + geometryAPI: GeometryAPI + ) + + /** + * Generate a pole-safe H3 geometry. Pole geometries require two additional + * vertices where the pole touches the anti-meridian. This method is not + * generalizable for arbitrary polygons. + * + * @param coordinates + * A collection of [[GeoCoord]]s to be used to create a + * [[MosaicGeometry]]. + * @param isNorthPole + * Boolean indicating if the pole is the north or south pole. + * @param geometryAPI + * An instance of [[GeometryAPI]] to be used to create a + * [[MosaicGeometry]]. + * @return + * A [[MosaicGeometry]] instance. + */ + private def makePoleGeometry(coordinates: mutable.Buffer[GeoCoord], isNorthPole: Boolean, geometryAPI: GeometryAPI): MosaicGeometry = { + + val lat = if (isNorthPole) 90 else -90 + + val coords = coordinates.map(geoCoord => shiftEast(geoCoord.lng, geoCoord.lat)).sortBy(_._1) + val lineString = geometryAPI.geometry( + coords.map(p => geometryAPI.fromGeoCoord(Coordinates(p._2, p._1))), + LINESTRING + ) + + val westernLine = lineString.intersection(makeEastBBox(geometryAPI)) + val easternLine = lineString.intersection(makeShiftedWestBBox(geometryAPI)).mapXY(shiftWest) + + val vertices = westernLine.getShellPoints.head ++ + Seq(geometryAPI.fromGeoCoord(Coordinates(lat, 180)), geometryAPI.fromGeoCoord(Coordinates(lat, -180))) ++ + easternLine.getShellPoints.head ++ Seq(westernLine.getShellPoints.head.head) + + geometryAPI.geometry(vertices, POLYGON) + + } + + /** + * Generate a pole-safe and antimeridian-safe H3 geometry. This method is + * not generalizable for arbitrary polygons. + * + * @param coordinates + * A collection of [[GeoCoord]]s to be used to create a + * [[MosaicGeometry]]. + * @param geometryAPI + * An instance of [[GeometryAPI]] to be used to create a + * [[MosaicGeometry]]. + * @return + * A [[MosaicGeometry]] instance. + */ + private def makeSafeGeometry(coordinates: mutable.Buffer[GeoCoord], geometryAPI: GeometryAPI): MosaicGeometry = { + + val unsafeGeometry = makeUnsafeGeometry(coordinates, geometryAPI) + + if (crossesAntiMeridian(unsafeGeometry)) { + val shiftedGeometry = unsafeGeometry.mapXY(shiftEast) + val westGeom = shiftedGeometry.intersection(makeEastBBox(geometryAPI: GeometryAPI)) + val eastGeom = shiftedGeometry.intersection(makeShiftedWestBBox(geometryAPI: GeometryAPI)).mapXY(shiftWest) + westGeom.union(eastGeom) + } else { + unsafeGeometry + } + } + } diff --git a/src/test/scala/com/databricks/labs/mosaic/core/index/H3IndexSystemTest.scala b/src/test/scala/com/databricks/labs/mosaic/core/index/H3IndexSystemTest.scala index 6b370a277..4b013f580 100644 --- a/src/test/scala/com/databricks/labs/mosaic/core/index/H3IndexSystemTest.scala +++ b/src/test/scala/com/databricks/labs/mosaic/core/index/H3IndexSystemTest.scala @@ -1,15 +1,20 @@ package com.databricks.labs.mosaic.core.index -import com.databricks.labs.mosaic.core.geometry.api.{ESRI, JTS} +import com.databricks.labs.mosaic.core.geometry.api.{ESRI, GeometryAPI, JTS} import com.databricks.labs.mosaic.core.geometry.{MosaicGeometryESRI, MosaicGeometryJTS} +import com.databricks.labs.mosaic.core.index.H3IndexSystem.indexToGeometry import com.databricks.labs.mosaic.core.types.model.GeometryTypeEnum.{LINESTRING, MULTILINESTRING, MULTIPOINT, MULTIPOLYGON, POINT, POLYGON} import com.databricks.labs.mosaic.core.types.model.GeometryTypeEnum +import com.uber.h3core.H3Core import org.apache.spark.sql.types.{BooleanType, LongType, StringType} import org.apache.spark.unsafe.types.UTF8String +import org.scalactic.Tolerance import org.scalatest.funsuite.AnyFunSuite import org.scalatest.matchers.should.Matchers._ -class H3IndexSystemTest extends AnyFunSuite { +import scala.jdk.CollectionConverters.collectionAsScalaIterableConverter + +class H3IndexSystemTest extends AnyFunSuite with Tolerance { test("H3IndexSystem auxiliary methods") { val indexRes = H3IndexSystem.pointToIndex(10, 10, 10) @@ -129,4 +134,32 @@ class H3IndexSystemTest extends AnyFunSuite { H3IndexSystem.coerceChipGeometry(geomsWKTs4.map(MosaicGeometryESRI.fromWKT)).isEmpty shouldBe true } + test("indexToGeometry should return valid and correct geometries") { + val h3: H3Core = H3Core.newInstance() + + val esriGeomAPI: GeometryAPI = GeometryAPI("ESRI") + val jtsGeomAPI: GeometryAPI = GeometryAPI("JTS") + val apis = Seq(esriGeomAPI, jtsGeomAPI) + + val baseCells = h3.getRes0Indexes.asScala.toList + val lvl1Cells = baseCells.flatMap(h3.h3ToChildren(_, 1).asScala) + val testCells = Seq(baseCells, lvl1Cells) + + val baseCellsStr = baseCells.map(h3.h3ToString(_)) + val lvl1CellsStr = lvl1Cells.map(h3.h3ToString(_)) + val testCellsStr = Seq(baseCellsStr, lvl1CellsStr) + + apis.foreach(api => { + testCells.foreach(cells => { + val geoms = cells.map(indexToGeometry(_, api)) + geoms.foreach(geom => geom.isValid shouldBe true) + geoms.foldLeft(0.0)((acc, geom) => acc + geom.getArea) shouldBe ((180.0 * 360.0) +- 0.0001) + }) + testCellsStr.foreach(cells => { + val geoms = cells.map(indexToGeometry(_, api)) + geoms.foreach(geom => geom.isValid shouldBe true) + geoms.foldLeft(0.0)((acc, geom) => acc + geom.getArea) shouldBe ((180.0 * 360.0) +- 0.0001) + }) + }) + } } From 3b4a52740e050543596afbdb64d762255241997e Mon Sep 17 00:00:00 2001 From: Michael Johns Date: Fri, 15 Sep 2023 17:21:43 -0400 Subject: [PATCH 17/28] Update spatial-functions.rst st_haversine is in km (vs km^2); also, reference the value of the radius used (which is in km). --- docs/source/api/spatial-functions.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/api/spatial-functions.rst b/docs/source/api/spatial-functions.rst index c4ace6c19..ab698b5c3 100644 --- a/docs/source/api/spatial-functions.rst +++ b/docs/source/api/spatial-functions.rst @@ -744,7 +744,7 @@ st_haversine | 10007.55722101796| +------------------------------------+ -.. note:: Results of this function are always expressed in km^2, while the input lat/lng pairs are expected to be in degrees. +.. note:: Results of this function are always expressed in km, while the input lat/lng pairs are expected to be in degrees. The radius used (in km) is 6371.0088. st_hasvalidcoordinates From e580ccb25f296b06da8b614e42406de783e81dd8 Mon Sep 17 00:00:00 2001 From: Michael Johns Date: Fri, 15 Sep 2023 17:28:54 -0400 Subject: [PATCH 18/28] Update index.rst remove references to Mosaic 0.4 --- docs/source/index.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/index.rst b/docs/source/index.rst index fe1ec92c9..ee499822e 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -42,14 +42,14 @@ Mosaic is an extension to the `Apache Spark `_ framework that allows easy and fast processing of very large geospatial datasets. .. warning:: - From version 0.4.0, Mosaic will require either + From versions after 0.3.x, Mosaic will require either * Databricks Runtime 11.2+ with Photon enabled * Databricks Runtime for ML 11.2+ - Mosaic 0.3 series does not support DBR 13 (coming soon with Mosaic 0.4 series); + Mosaic 0.3 series does not yet support DBR 13 (coming soon); also, DBR 10 is no longer supported in Mosaic. -We recommend using Databricks Runtime versions 11.3 LTS or 12.2 LTS with Photon enabled; +We currently recommend using Databricks Runtime versions 11.3 LTS or 12.2 LTS with Photon enabled; this will leverage the Databricks H3 expressions when using H3 grid system. Mosaic provides: From 08da4a7d429a16cacb5b9d9d0f9fae6226966791 Mon Sep 17 00:00:00 2001 From: Michael Johns Date: Fri, 15 Sep 2023 17:32:57 -0400 Subject: [PATCH 19/28] Update README.md Updated to remove some of the earlier plans for databricks-mosaic-gdal PyPI project. --- modules/python/gdal_package/README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/modules/python/gdal_package/README.md b/modules/python/gdal_package/README.md index a8aa68ef0..165af7dcf 100644 --- a/modules/python/gdal_package/README.md +++ b/modules/python/gdal_package/README.md @@ -2,7 +2,7 @@ > Current version is 3.4.3 (to match GDAL). -This is a filetree (vs apt based) drop-in packaging of GDAL with Java Bindings for Ubuntu 20.04 (Focal Fossa) which is used by [Databricks Runtime](https://docs.databricks.com/release-notes/runtime/releases.html) (DBR) 11+. +This is a filetree (vs apt based) drop-in packaging of GDAL with Java Bindings for Ubuntu 20.04 (Focal Fossa) which is used by [Databricks Runtime](https://docs.databricks.com/release-notes/runtime/releases.html) (DBR) 11 and 12 (not DBR 13 which is Ubuntu 22.04). 1. `gdal-3.4.3-filetree.tar.xz` is ~50MB - it is extracted with `tar -xf gdal-3.4.3-filetree.tar.xz -C /` 2. `gdal-3.4.3.-symlinks.tar.xz` is ~19MB - it is extracted with `tar -xhf gdal-3.4.3-symlinks.tar.xz -C /` @@ -14,4 +14,3 @@ An [init script](https://docs.databricks.com/clusters/init-scripts.html) is prov * This is a very specific packaging for GDAL + dependencies which removes any libraries that are already provided by DBR, so it will not be not useful outside Databricks. * It additionally includes GDAL shared objects (`.so`) for Java Bindings, GDAL 3.4.3 Python bindings, and tweak for OSGEO as currently supplied by [UbuntuGIS PPA](https://launchpad.net/~ubuntugis/+archive/ubuntu/ubuntugis-unstable) based init script [install-gdal-databricks.sh](https://github.com/databrickslabs/mosaic/blob/main/src/main/resources/scripts/install-gdal-databricks.sh) provided by Mosaic. This install replaces the existing way on Mosaic, so choose one or the other. * The GDAL JAR for 3.4 is not included but is provided by Mosaic itself and added to your Databricks cluster as part of the [enable_gdal](https://databrickslabs.github.io/mosaic/usage/install-gdal.html#enable-gdal-for-a-notebook) called when configuring Mosaic for GDAL. Separately, the JAR could be added as a [cluster-installed library](https://docs.databricks.com/libraries/cluster-libraries.html#cluster-installed-library), e.g. through Maven coordinates `org.gdal:gdal:3.4.0` from [mvnrepository](https://mvnrepository.com/artifact/org.gdal/gdal/3.4.0). -* Mosaic will soon be able to directly leverage this [PyPI](https://pypi.org/project/databricks-mosaic-gdal/) project and be able to altogether avoid the init script as a precursor to calling [enable_gdal](https://databrickslabs.github.io/mosaic/usage/install-gdal.html#enable-gdal-for-a-notebook). So check Mosaic [GDAL Installation Guide](https://databrickslabs.github.io/mosaic/usage/install-gdal.html#) for any changes on/around APR 2023. \ No newline at end of file From 97016e5b8e7445da649783b41c17ab297b26eaa6 Mon Sep 17 00:00:00 2001 From: Michael Johns Date: Fri, 15 Sep 2023 17:35:17 -0400 Subject: [PATCH 20/28] Update README.md Updating main README since we are no longer releasing Mosaic 0.4 --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 052ebeb38..74ff74c9c 100644 --- a/README.md +++ b/README.md @@ -45,13 +45,13 @@ Image1: Mosaic logical design. We recommend using Databricks Runtime versions 11.3 LTS or 12.2 LTS with Photon enabled; this will leverage the Databricks H3 expressions when using H3 grid system. -:warning: **Mosaic 0.3 series does not support DBR 13** (coming soon with Mosaic 0.4 series); also, DBR 10 is no longer supported in Mosaic. +:warning: **Mosaic 0.3 series does not support DBR 13** (coming soon); also, DBR 10 is no longer supported in Mosaic. As of the 0.3.11 release, Mosaic issues the following warning when initialized on a cluster that is neither Photon Runtime nor Databricks Runtime ML [[ADB](https://learn.microsoft.com/en-us/azure/databricks/runtime/) | [AWS](https://docs.databricks.com/runtime/index.html) | [GCP](https://docs.gcp.databricks.com/runtime/index.html)]: -> DEPRECATION WARNING: Mosaic is not supported on the selected Databricks Runtime. Mosaic will stop working on this cluster from version v0.4.0+. Please use a Databricks Photon-enabled Runtime (for performance benefits) or Runtime ML (for spatial AI benefits). +> DEPRECATION WARNING: Mosaic is not supported on the selected Databricks Runtime. Mosaic will stop working on this cluster after v0.3.x. Please use a Databricks Photon-enabled Runtime (for performance benefits) or Runtime ML (for spatial AI benefits). -If you are receiving this warning in v0.3.11, you will want to change to a supported runtime prior to updating Mosaic to run 0.4.0. The reason we are making this change is that we are streamlining Mosaic internals to be more aligned with future product APIs which are powered by Photon. Along this direction of change, Mosaic will be standardizing to JTS as its default and supported Vector Geometry Provider. +If you are receiving this warning in v0.3.11, you will want to change to being to plan for a supported runtime. The reason we are making this change is that we are streamlining Mosaic internals to be more aligned with future product APIs which are powered by Photon. Along this direction of change, Mosaic will be standardizing to JTS as its default and supported Vector Geometry Provider. ### Documentation From ab941e4edabecc700e840b7f251dd684f0121d15 Mon Sep 17 00:00:00 2001 From: Michael Johns Date: Fri, 15 Sep 2023 17:38:42 -0400 Subject: [PATCH 21/28] Update installation.rst Since we are not releasing Mosaic 0.4, updating instructions --- docs/source/usage/installation.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/source/usage/installation.rst b/docs/source/usage/installation.rst index a4b97952b..86974a136 100644 --- a/docs/source/usage/installation.rst +++ b/docs/source/usage/installation.rst @@ -6,11 +6,11 @@ Supported platforms ################### .. warning:: - From version 0.4.0, Mosaic will require either + From versions after 0.3.x, Mosaic will require either * Databricks Runtime 11.2+ with Photon enabled * Databricks Runtime for ML 11.2+ - Mosaic 0.3 series does not support DBR 13 (coming soon with Mosaic 0.4 series); + Mosaic 0.3 series does not support DBR 13 (coming soon); also, DBR 10 is no longer supported in Mosaic. We recommend using Databricks Runtime versions 11.3 LTS or 12.2 LTS with Photon enabled; @@ -18,10 +18,10 @@ this will leverage the Databricks H3 expressions when using H3 grid system. As of the 0.3.11 release, Mosaic issues the following warning when initialized on a cluster that is neither Photon Runtime nor Databricks Runtime ML [`ADB `__ | `AWS `__ | `GCP `__]: - DEPRECATION WARNING: Mosaic is not supported on the selected Databricks Runtime. Mosaic will stop working on this cluster from version v0.4.0+. Please use a Databricks Photon-enabled Runtime (for performance benefits) or Runtime ML (for spatial AI benefits). + DEPRECATION WARNING: Mosaic is not supported on the selected Databricks Runtime. Mosaic will stop working on this cluster after v0.3.x. Please use a Databricks Photon-enabled Runtime (for performance benefits) or Runtime ML (for spatial AI benefits). -If you are receiving this warning in v0.3.11+, you will want to change to a supported runtime prior -to updating Mosaic to run 0.4.0. The reason we are making this change is that we are streamlining Mosaic +If you are receiving this warning in v0.3.11+, you will want to change to being to plan for a supported runtime. +The reason we are making this change is that we are streamlining Mosaic internals to be more aligned with future product APIs which are powered by Photon. Along this direction of change, Mosaic will be standardizing to JTS as its default and supported Vector Geometry Provider. From b8e42ca302385be1e22cc7ba0a4c30e8d77d8da8 Mon Sep 17 00:00:00 2001 From: Michael Johns Date: Fri, 15 Sep 2023 17:40:17 -0400 Subject: [PATCH 22/28] Update README.md additional change --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 74ff74c9c..2fd96eed5 100644 --- a/README.md +++ b/README.md @@ -51,7 +51,7 @@ As of the 0.3.11 release, Mosaic issues the following warning when initialized o > DEPRECATION WARNING: Mosaic is not supported on the selected Databricks Runtime. Mosaic will stop working on this cluster after v0.3.x. Please use a Databricks Photon-enabled Runtime (for performance benefits) or Runtime ML (for spatial AI benefits). -If you are receiving this warning in v0.3.11, you will want to change to being to plan for a supported runtime. The reason we are making this change is that we are streamlining Mosaic internals to be more aligned with future product APIs which are powered by Photon. Along this direction of change, Mosaic will be standardizing to JTS as its default and supported Vector Geometry Provider. +If you are receiving this warning in v0.3.11+, you will want to change to being to plan for a supported runtime. The reason we are making this change is that we are streamlining Mosaic internals to be more aligned with future product APIs which are powered by Photon. Along this direction of change, Mosaic will be standardizing to JTS as its default and supported Vector Geometry Provider. ### Documentation From 380c74419847360c04f78c0e2f48f1e609e74b0c Mon Sep 17 00:00:00 2001 From: Michael Johns Date: Fri, 15 Sep 2023 17:42:30 -0400 Subject: [PATCH 23/28] Update MosaicContext.scala Since we are not releasing Mosaic 0.4, updating logging. --- .../com/databricks/labs/mosaic/functions/MosaicContext.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/main/scala/com/databricks/labs/mosaic/functions/MosaicContext.scala b/src/main/scala/com/databricks/labs/mosaic/functions/MosaicContext.scala index 5777cef8a..8f9bf92d7 100644 --- a/src/main/scala/com/databricks/labs/mosaic/functions/MosaicContext.scala +++ b/src/main/scala/com/databricks/labs/mosaic/functions/MosaicContext.scala @@ -944,10 +944,10 @@ object MosaicContext extends Logging { if (!isML && !isPhoton) { // Print out the warnings both to the log and to the console logWarning("DEPRECATION WARNING: Mosaic is not supported on the selected Databricks Runtime") - logWarning("DEPRECATION WARNING: Mosaic will stop working on this cluster from version v0.4.0+.") + logWarning("DEPRECATION WARNING: Mosaic will stop working on this cluster after v0.3.x.") logWarning("Please use a Databricks Photon-enabled Runtime (for performance benefits) or Runtime ML (for spatial AI benefits).") println("DEPRECATION WARNING: Mosaic is not supported on the selected Databricks Runtime") - println("DEPRECATION WARNING: Mosaic will stop working on this cluster from version v0.4.0+.") + println("DEPRECATION WARNING: Mosaic will stop working on this cluster after v0.3.x.") println("Please use a Databricks Photon-enabled Runtime (for performance benefits) or Runtime ML (for spatial AI benefits).") false } else { From 45414adc51c905f4b2bfc232feb39ede57067820 Mon Sep 17 00:00:00 2001 From: Michael Johns Date: Fri, 15 Sep 2023 18:21:54 -0400 Subject: [PATCH 24/28] Update mosaic-gdal-3.4.3-filetree-init.sh removed old comment --- .../resources/scripts/mosaic-gdal-3.4.3-filetree-init.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/python/gdal_package/databricks-mosaic-gdal/resources/scripts/mosaic-gdal-3.4.3-filetree-init.sh b/modules/python/gdal_package/databricks-mosaic-gdal/resources/scripts/mosaic-gdal-3.4.3-filetree-init.sh index 56f8bdc11..9b900479a 100644 --- a/modules/python/gdal_package/databricks-mosaic-gdal/resources/scripts/mosaic-gdal-3.4.3-filetree-init.sh +++ b/modules/python/gdal_package/databricks-mosaic-gdal/resources/scripts/mosaic-gdal-3.4.3-filetree-init.sh @@ -11,7 +11,7 @@ # 2. This will unpack files directly into the filetree across cluster nodes (vs run apt install) # # -- install databricks-mosaic-gdal on cluster -# - from pypi.org (once available) +# - use version 3.4.3 (exactly) from pypi.org pip install databricks-mosaic-gdal==3.4.3 # -- find the install dir From d3b0ea3090badccc6989975c882586a097b86819 Mon Sep 17 00:00:00 2001 From: Michael Johns Date: Fri, 15 Sep 2023 18:26:55 -0400 Subject: [PATCH 25/28] Update spatial-functions.rst specified WGS84 units are in degrees --- docs/source/api/spatial-functions.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/api/spatial-functions.rst b/docs/source/api/spatial-functions.rst index ab698b5c3..846300274 100644 --- a/docs/source/api/spatial-functions.rst +++ b/docs/source/api/spatial-functions.rst @@ -509,7 +509,7 @@ st_distance | 15.652475842498529| +------------------------+ -.. note:: Results of this function are always expressed in the original units of the input geometries. +.. note:: Results of this function are always expressed in the original units of the input geometries, e.g. for WGS84 (SRID 4326) units are degrees. st_dump ******* From 0eacf65e5cd0b5c48d73ee58210ab1eecda6ca1e Mon Sep 17 00:00:00 2001 From: Michael Johns Date: Fri, 15 Sep 2023 18:31:00 -0400 Subject: [PATCH 26/28] Update spatial-functions.rst specified st_distance is euclidean. --- docs/source/api/spatial-functions.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/api/spatial-functions.rst b/docs/source/api/spatial-functions.rst index 846300274..09c758292 100644 --- a/docs/source/api/spatial-functions.rst +++ b/docs/source/api/spatial-functions.rst @@ -459,7 +459,7 @@ st_distance .. function:: st_distance(geom1, geom2) - Compute the distance between `geom1` and `geom2`. + Compute the euclidean distance between `geom1` and `geom2`. :param geom1: Geometry :type geom1: Column @@ -509,7 +509,7 @@ st_distance | 15.652475842498529| +------------------------+ -.. note:: Results of this function are always expressed in the original units of the input geometries, e.g. for WGS84 (SRID 4326) units are degrees. +.. note:: Results of this euclidean distance function are always expressed in the original units of the input geometries, e.g. for WGS84 (SRID 4326) units are degrees. st_dump ******* From b4c035a43dff8ff7ed934b50b7670ecd0857708a Mon Sep 17 00:00:00 2001 From: Michael Johns Date: Fri, 15 Sep 2023 18:40:00 -0400 Subject: [PATCH 27/28] Update README.md fix grammer --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 2fd96eed5..e26d88267 100644 --- a/README.md +++ b/README.md @@ -51,7 +51,7 @@ As of the 0.3.11 release, Mosaic issues the following warning when initialized o > DEPRECATION WARNING: Mosaic is not supported on the selected Databricks Runtime. Mosaic will stop working on this cluster after v0.3.x. Please use a Databricks Photon-enabled Runtime (for performance benefits) or Runtime ML (for spatial AI benefits). -If you are receiving this warning in v0.3.11+, you will want to change to being to plan for a supported runtime. The reason we are making this change is that we are streamlining Mosaic internals to be more aligned with future product APIs which are powered by Photon. Along this direction of change, Mosaic will be standardizing to JTS as its default and supported Vector Geometry Provider. +If you are receiving this warning in v0.3.11+, you will want to begin to plan for a supported runtime. The reason we are making this change is that we are streamlining Mosaic internals to be more aligned with future product APIs which are powered by Photon. Along this direction of change, Mosaic will be standardizing to JTS as its default and supported Vector Geometry Provider. ### Documentation From de04a1312b7ba202073fbe8c62da5f6e037c8873 Mon Sep 17 00:00:00 2001 From: Michael Johns Date: Fri, 15 Sep 2023 18:43:19 -0400 Subject: [PATCH 28/28] Update installation.rst fixed grammatical issue --- docs/source/usage/installation.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/usage/installation.rst b/docs/source/usage/installation.rst index 86974a136..11263e11a 100644 --- a/docs/source/usage/installation.rst +++ b/docs/source/usage/installation.rst @@ -20,7 +20,7 @@ that is neither Photon Runtime nor Databricks Runtime ML [`ADB