Skip to content

Commit

Permalink
Merge branch 'main' into feature/grid_tiles
Browse files Browse the repository at this point in the history
  • Loading branch information
Milos Colic authored Oct 11, 2023
2 parents e2455b1 + c179d49 commit 8e18317
Show file tree
Hide file tree
Showing 31 changed files with 186 additions and 132 deletions.
5 changes: 4 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
## v0.3.12
- Make JTS default Geometry Provider

## v0.3.11
- Update the CONTRIBUTING.md to follow the standard process.
- Fix for issue 383: grid_pointascellid fails with a Java type error when run on an already instantiated point.
Expand Down Expand Up @@ -172,4 +175,4 @@
- Add Geometry validity expressions
- Create WKT, WKB and Hex conversion expressions
- Setup the project
- Define GitHub templates
- Define GitHub templates
62 changes: 23 additions & 39 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,41 +1,25 @@
DB license

Copyright (2022) Databricks, Inc.

This library (the "Software") may not be used except in connection with the Licensee's use of the Databricks Platform Services pursuant
to an Agreement (defined below) between Licensee (defined below) and Databricks, Inc. ("Databricks"). The Object Code version of the
Software shall be deemed part of the Downloadable Services under the Agreement, or if the Agreement does not define Downloadable Services,
Subscription Services, or if neither are defined then the term in such Agreement that refers to the applicable Databricks Platform
Services (as defined below) shall be substituted herein for “Downloadable Services.” Licensee's use of the Software must comply at
all times with any restrictions applicable to the Downlodable Services and Subscription Services, generally, and must be used in
accordance with any applicable documentation. For the avoidance of doubt, the Software constitutes Databricks Confidential Information
under the Agreement.

Additionally, and notwithstanding anything in the Agreement to the contrary:
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR
IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
* you may view, make limited copies of, and may compile the Source Code version of the Software into an Object Code version of the
Software. For the avoidance of doubt, you may not make derivative works of Software (or make any any changes to the Source Code
version of the unless you have agreed to separate terms with Databricks permitting such modifications (e.g., a contribution license
agreement)).

If you have not agreed to an Agreement or otherwise do not agree to these terms, you may not use the Software or view, copy or compile
the Source Code of the Software.

This license terminates automatically upon the termination of the Agreement or Licensee's breach of these terms. Additionally,
Databricks may terminate this license at any time on notice. Upon termination, you must permanently delete the Software and all
copies thereof (including the Source Code).

Agreement: the agreement between Databricks and Licensee governing the use of the Databricks Platform Services, which shall be, with
respect to Databricks, the Databricks Terms of Service located at www.databricks.com/termsofservice, and with respect to Databricks
Community Edition, the Community Edition Terms of Service located at www.databricks.com/ce-termsofuse, in each case unless Licensee
has entered into a separate written agreement with Databricks governing the use of the applicable Databricks Platform Services.

Databricks Platform Services: the Databricks services or the Databricks Community Edition services, according to where the Software is used.

Licensee: the user of the Software, or, if the Software is being used on behalf of a company, the company.

Object Code: is version of the Software produced when an interpreter or a compiler translates the Source Code into recognizable and
executable machine code.

Source Code: the human readable portion of the Software.
Definitions.

Agreement: The agreement between Databricks, Inc., and you governing the use of the Databricks Services, which shall be, with respect to Databricks, the Databricks Terms of Service located at www.databricks.com/termsofservice, and with respect to Databricks Community Edition, the Community Edition Terms of Service located at www.databricks.com/ce-termsofuse, in each case unless you have entered into a separate written agreement with Databricks governing the use of the applicable Databricks Services.

Software: The source code and object code to which this license applies.

Scope of Use. You may not use this Software except in connection with your use of the Databricks Services pursuant to the Agreement. Your use of the Software must comply at all times with any restrictions applicable to the Databricks Services, generally, and must be used in accordance with any applicable documentation. You may view, use, copy, modify, publish, and/or distribute the Software solely for the purposes of using the code within or connecting to the Databricks Services. If you do not agree to these terms, you may not view, use, copy, modify, publish, and/or distribute the Software.

Redistribution. You may redistribute and sublicense the Software so long as all use is in compliance with these terms. In addition:

You must give any other recipients a copy of this License;
You must cause any modified files to carry prominent notices stating that you changed the files;
You must retain, in the source code form of any derivative works that you distribute, all copyright, patent, trademark, and attribution notices from the source code form, excluding those notices that do not pertain to any part of the derivative works; and
If the source code form includes a "NOTICE" text file as part of its distribution, then any derivative works that you distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the derivative works.
You may add your own copyright statement to your modifications and may provide additional license terms and conditions for use, reproduction, or distribution of your modifications, or for any such derivative works as a whole, provided your use, reproduction, and distribution of the Software otherwise complies with the conditions stated in this License.

Termination. This license terminates automatically upon your breach of these terms or upon the termination of your Agreement. Additionally, Databricks may terminate this license at any time on notice. Upon termination, you must permanently delete the Software and all copies thereof.

DISCLAIMER; LIMITATION OF LIABILITY.

THE SOFTWARE IS PROVIDED “AS-IS” AND WITH ALL FAULTS. DATABRICKS, ON BEHALF OF ITSELF AND ITS LICENSORS, SPECIFICALLY DISCLAIMS ALL WARRANTIES RELATING TO THE SOURCE CODE, EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, IMPLIED WARRANTIES, CONDITIONS AND OTHER TERMS OF MERCHANTABILITY, SATISFACTORY QUALITY OR FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT. DATABRICKS AND ITS LICENSORS TOTAL AGGREGATE LIABILITY RELATING TO OR ARISING OUT OF YOUR USE OF OR DATABRICKS’ PROVISIONING OF THE SOURCE CODE SHALL BE LIMITED TO ONE THOUSAND ($1,000) DOLLARS. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
8 changes: 4 additions & 4 deletions R/sparkR-mosaic/enableMosaic.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
#'
#' @description enableMosaic activates the context dependent Databricks Mosaic functions, giving control over the geometry API and index system used.
#' See \url{https://databrickslabs.github.io/mosaic/} for full documentation
#' @param geometryAPI character, default="ESRI"
#' @param geometryAPI character, default="JTS"
#' @param indexSystem character, default="H3"
#' @param indexSystem boolean, default=F
#' @name enableMosaic
Expand All @@ -12,10 +12,10 @@
#' @examples
#' \dontrun{
#' enableMosaic()
#' enableMosaic("ESRI", "H3")
#' enableMosaic("ESRI", "BNG") }
#' enableMosaic("JTS", "H3")
#' enableMosaic("JTS", "BNG") }
enableMosaic <- function(
geometryAPI="ESRI"
geometryAPI="JTS"
,indexSystem="H3"
,rasterAPI="GDAL"
){
Expand Down
8 changes: 4 additions & 4 deletions R/sparklyr-mosaic/enableMosaic.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
#' @description enableMosaic activates the context dependent Databricks Mosaic functions, giving control over the geometry API and index system used.
#' See \url{https://databrickslabs.github.io/mosaic/} for full documentation
#' @param sc sparkContext
#' @param geometryAPI character, default="ESRI"
#' @param geometryAPI character, default="JTS"
#' @param indexSystem character, default="H3"
#' @name enableMosaic
#' @rdname enableMosaic
Expand All @@ -12,12 +12,12 @@
#' @examples
#' \dontrun{
#' enableMosaic()
#' enableMosaic("ESRI", "H3")
#' enableMosaic("ESRI", "BNG")}
#' enableMosaic("JTS", "H3")
#' enableMosaic("JTS", "BNG")}

enableMosaic <- function(
sc
,geometryAPI="ESRI"
,geometryAPI="JTS"
,indexSystem="H3"
,rasterAPI="GDAL"
){
Expand Down
24 changes: 15 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ An extension to the [Apache Spark](https://spark.apache.org/) framework that all
[![Language grade: Python](https://img.shields.io/lgtm/grade/python/g/databrickslabs/mosaic.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/databrickslabs/mosaic/context:python)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)


## Why Mosaic?

Mosaic was created to simplify the implementation of scalable geospatial data pipelines by bounding together common Open Source geospatial libraries via Apache Spark, with a set of [examples and best practices](#examples) for common geospatial use cases.
Expand All @@ -20,8 +21,8 @@ Mosaic was created to simplify the implementation of scalable geospatial data pi
Mosaic provides geospatial tools for
* Data ingestion (WKT, WKB, GeoJSON)
* Data processing
* Geometry and geography `ST_` operations (with [ESRI](https://github.com/Esri/geometry-api-java) or [JTS](https://github.com/locationtech/jts))
* Indexing (with [H3](https://github.com/uber/h3) or BNG)
* Geometry and geography `ST_` operations (with default [JTS](https://github.com/locationtech/jts) or [ESRI](https://github.com/Esri/geometry-api-java))
* Indexing (with default [H3](https://github.com/uber/h3) or BNG)
* Chipping of polygons and lines over an indexing grid [co-developed with Ordnance Survey and Microsoft](https://databricks.com/blog/2021/10/11/efficient-point-in-polygon-joins-via-pyspark-and-bng-geospatial-indexing.html)
* Data visualization ([Kepler](https://github.com/keplergl/kepler.gl))

Expand All @@ -41,11 +42,16 @@ Image1: Mosaic logical design.

## Getting started

Create a Databricks cluster running __Databricks Runtime 10.0__ (or later).
We recommend using Databricks Runtime versions 11.3 LTS or 12.2 LTS with Photon enabled; this will leverage the
Databricks H3 expressions when using H3 grid system.

:warning: **Mosaic 0.3 series does not support DBR 13** (coming soon); also, DBR 10 is no longer supported in Mosaic.

As of the 0.3.11 release, Mosaic issues the following warning when initialized on a cluster that is neither Photon Runtime nor Databricks Runtime ML [[ADB](https://learn.microsoft.com/en-us/azure/databricks/runtime/) | [AWS](https://docs.databricks.com/runtime/index.html) | [GCP](https://docs.gcp.databricks.com/runtime/index.html)]:

We recommend using Databricks Runtime versions 11.2 or higher with Photon enabled, this will leverage the
Databricks h3 expressions when using H3 grid system.
> DEPRECATION WARNING: Mosaic is not supported on the selected Databricks Runtime. Mosaic will stop working on this cluster after v0.3.x. Please use a Databricks Photon-enabled Runtime (for performance benefits) or Runtime ML (for spatial AI benefits).
If you are receiving this warning in v0.3.11+, you will want to begin to plan for a supported runtime. The reason we are making this change is that we are streamlining Mosaic internals to be more aligned with future product APIs which are powered by Photon. Along this direction of change, Mosaic will be standardizing to JTS as its default and supported Vector Geometry Provider.

### Documentation

Expand Down Expand Up @@ -75,9 +81,9 @@ Then enable it with
```scala
import com.databricks.labs.mosaic.functions.MosaicContext
import com.databricks.labs.mosaic.H3
import com.databricks.labs.mosaic.ESRI
import com.databricks.labs.mosaic.JTS

val mosaicContext = MosaicContext.build(H3, ESRI)
val mosaicContext = MosaicContext.build(H3, JTS)
import mosaicContext.functions._
```

Expand All @@ -103,9 +109,9 @@ Configure the [Automatic SQL Registration](https://databrickslabs.github.io/mosa
%scala
import com.databricks.labs.mosaic.functions.MosaicContext
import com.databricks.labs.mosaic.H3
import com.databricks.labs.mosaic.ESRI
import com.databricks.labs.mosaic.JTS

val mosaicContext = MosaicContext.build(H3, ESRI)
val mosaicContext = MosaicContext.build(H3, JTS)
mosaicContext.register(spark)
```

Expand Down
4 changes: 2 additions & 2 deletions docs/code-example-notebooks/setup/setup-scala.scala
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
// Databricks notebook source
import org.apache.spark.sql.functions._
import com.databricks.labs.mosaic.functions.MosaicContext
import com.databricks.labs.mosaic.ESRI
import com.databricks.labs.mosaic.JTS
import com.databricks.labs.mosaic.H3

val mosaicContext: MosaicContext = MosaicContext.build(H3, ESRI)
val mosaicContext: MosaicContext = MosaicContext.build(H3, JTS)

// COMMAND ----------

Expand Down
8 changes: 4 additions & 4 deletions docs/source/api/spatial-functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -459,7 +459,7 @@ st_distance

.. function:: st_distance(geom1, geom2)

Compute the distance between `geom1` and `geom2`.
Compute the euclidean distance between `geom1` and `geom2`.

:param geom1: Geometry
:type geom1: Column
Expand Down Expand Up @@ -509,7 +509,7 @@ st_distance
| 15.652475842498529|
+------------------------+

.. note:: Results of this function are always expressed in the original units of the input geometries.
.. note:: Results of this euclidean distance function are always expressed in the original units of the input geometries, e.g. for WGS84 (SRID 4326) units are degrees.

st_dump
*******
Expand Down Expand Up @@ -744,7 +744,7 @@ st_haversine
| 10007.55722101796|
+------------------------------------+

.. note:: Results of this function are always expressed in km^2, while the input lat/lng pairs are expected to be in degrees.
.. note:: Results of this function are always expressed in km, while the input lat/lng pairs are expected to be in degrees. The radius used (in km) is 6371.0088.


st_hasvalidcoordinates
Expand Down Expand Up @@ -949,7 +949,7 @@ st_isvalid
+---------------+

.. note:: Validity assertions will be dependent on the chosen geometry API.
The assertions used in the ESRI geometry API (the default) follow the definitions in the
The assertions used in the ESRI geometry API (JTS is the default) follow the definitions in the
"Simple feature access - Part 1" document (OGC 06-103r4) for each geometry type.


Expand Down
8 changes: 6 additions & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,11 +42,15 @@
Mosaic is an extension to the `Apache Spark <https://spark.apache.org/>`_ framework that allows easy and fast processing of very large geospatial datasets.

.. warning::
From version 0.4.x, Mosaic will require either
From versions after 0.3.x, Mosaic will require either
* Databricks Runtime 11.2+ with Photon enabled
* Databricks Runtime for ML 11.2+

Mosaic 0.3 series does not yet support DBR 13 (coming soon);
also, DBR 10 is no longer supported in Mosaic.

Other Databricks Runtime versions will not be supported anymore.
We currently recommend using Databricks Runtime versions 11.3 LTS or 12.2 LTS with Photon enabled;
this will leverage the Databricks H3 expressions when using H3 grid system.

Mosaic provides:
* easy conversion between common spatial data encodings (WKT, WKB and GeoJSON);
Expand Down
8 changes: 4 additions & 4 deletions docs/source/models/spatial-knn.rst
Original file line number Diff line number Diff line change
Expand Up @@ -157,9 +157,9 @@ The transformer is called SpatialKNN and it is used as follows:
import com.databricks.labs.mosaic.models.knn.SpatialKNN
import com.databricks.labs.mosaic.functions.MosaicContext
import com.databricks.labs.mosaic.H3
import com.databricks.labs.mosaic.ESRI
import com.databricks.labs.mosaic.JTS
>>>
val mosaicContext = MosaicContext.build(H3, ESRI)
val mosaicContext = MosaicContext.build(H3, JTS)
import mosaicContext.functions._
mosaicContext.register(spark)
>>>
Expand Down Expand Up @@ -328,9 +328,9 @@ These datasets are not serialised with the model, and neither are the model outp
import com.databricks.labs.mosaic.models.knn.SpatialKNN
import com.databricks.labs.mosaic.functions.MosaicContext
import com.databricks.labs.mosaic.H3
import com.databricks.labs.mosaic.ESRI
import com.databricks.labs.mosaic.JTS
>>>
val mosaicContext = MosaicContext.build(H3, ESRI)
val mosaicContext = MosaicContext.build(H3, JTS)
import mosaicContext.functions._
mosaicContext.register(spark)
>>>
Expand Down
2 changes: 1 addition & 1 deletion docs/source/usage/automatic-sql-registration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ To install Mosaic on your Databricks cluster, take the following steps:
spark.databricks.labs.mosaic.index.system H3
# JTS or ESRI
spark.databricks.labs.mosaic.geometry.api JTS
# MosaicSQL or MosaicSQLDefault, MosaicSQLDefault corresponds to (H3, ESRI)
# MosaicSQL or MosaicSQLDefault, MosaicSQLDefault corresponds to (H3, JTS)
spark.sql.extensions com.databricks.labs.mosaic.sql.extensions.MosaicSQL
Testing
Expand Down
6 changes: 3 additions & 3 deletions docs/source/usage/grid-indexes-bng.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,15 +34,15 @@ configurations. Spark provides an easy way to supply configuration parameters us
.. code-tab:: scala

import com.databricks.labs.mosaic.functions.MosaicContext
import com.databricks.labs.mosaic.{BNG, ESRI}
import com.databricks.labs.mosaic.{BNG, JTS}

val mosaicContext = MosaicContext.build(BNG, ESRI)
val mosaicContext = MosaicContext.build(BNG, JTS)
import mosaicContext.functions._

.. code-tab:: r R

library(sparkrMosaic)
enableMosaic("ESRI", "BNG")
enableMosaic("JTS", "BNG")

.. code-tab:: sql

Expand Down
Loading

0 comments on commit 8e18317

Please sign in to comment.