From 7e1318f8968d31634e874f23aaeff6cb756b4bd0 Mon Sep 17 00:00:00 2001 From: Michael Johns Date: Wed, 3 Apr 2024 08:44:36 -0400 Subject: [PATCH 1/7] Initial docs refresh for Mosaic 0.4.1. --- docs/source/api/raster-functions.rst | 610 +++++++++++++++++++++-- docs/source/api/spatial-aggregations.rst | 109 ++++ docs/source/api/spatial-functions.rst | 153 ++++-- docs/source/usage/install-gdal.rst | 24 +- 4 files changed, 816 insertions(+), 80 deletions(-) diff --git a/docs/source/api/raster-functions.rst b/docs/source/api/raster-functions.rst index 1a4465d4f..85c0009e6 100644 --- a/docs/source/api/raster-functions.rst +++ b/docs/source/api/raster-functions.rst @@ -13,9 +13,17 @@ Please see :doc:`Install and Enable GDAL with Mosaic ` for * Mosaic also provides a scalable retiling function that can be used to retile raster data in case of bottlenecking due to large files. * All raster functions respect the :code:`rst_` prefix naming convention. - * Mosaic is operating using raster tile objects only since 0.3.11. Tile objects are created using functions such as - :code:`rst_fromfile` or :code:`rst_fromcontent`. These functions are used as places to start when working with - initial data. If you use :code:`spark.read.format("gdal")` tiles are automatically generated for you. + * Mosaic operates using raster tile objects. Tile objects are created using functions such as + :ref:`rst_fromfile` or :ref:`rst_fromcontent`. These functions are used as places to start when working with + initial data. If you use :code:`spark.read.format("gdal")` tiles are automatically generated for you. + * **Changed in 0.4.1** Mosaic raster tile schema changed to the following: + :code:`>`. All APIs that use tiles now follow + this schema. Also, a new functions :ref:`rst_maketiles` is available that allows for single tile schema to handle + either a path (string) raster similar to :ref:`rst_fromfile` or a binary raster similar to :ref:`rst_fromcontent`; + however, a key difference is that :ref:`rst_maketiles` supports optional checkpointing for increased performance benefits. + * In 0.4.1, there are a new set of raster apis that have not yet had python bindings generated; however you can still + call the functions with pyspark function :code:`selectExpr`, e.g. :code:`selectExpr("rst_avg(...)")` which invokes the sql + registered expression. The calls are: :ref:`rst_avg`, :ref:`rst_max`, :ref:`rst_min`, :ref:`rst_median`, and :ref:`rst_pixelcount`. * Also, scala does not have a :code:`df.display()` method while python does. In practice you would most often call :code:`display(df)` in scala for a prettier output, but for brevity, we write :code:`df.show` in scala. @@ -23,6 +31,49 @@ Please see :doc:`Install and Enable GDAL with Mosaic ` for These functions will configure an init script in your preferred Workspace, Volume, or DBFS location to install GDAL on your cluster. See :doc:`Install and Enable GDAL with Mosaic ` for more details. +rst_avg +******* + +.. function:: rst_avg(tile) + + Returns an array containing mean values for each band. + The python bindings are available through sql, + e.g. :code:`selectExpr("rst_avg(tile)")` + + :param tile: A column containing the raster tile. + :type tile: Column (RasterTileType) + :rtype: Column: ArrayType(DoubleType) + + :example: + +.. tabs:: + .. code-tab:: python + + df.selectExpr("rst_avg(tile)"").limit(1).display() + +---------------+ + | rst_avg(tile) | + +---------------+ + | [42.0] | + +---------------+ + + .. code-tab:: scala + + df.select(rst_avg(col("tile"))).limit(1).show + +---------------+ + | rst_avg(tile) | + +---------------+ + | [42.0] | + +---------------+ + + .. code-tab:: sql + + SELECT rst_avg(tile) FROM table LIMIT 1 + +---------------+ + | rst_avg(tile) | + +---------------+ + | [42.0] | + +---------------+ + rst_bandmetadata **************** @@ -31,7 +82,7 @@ rst_bandmetadata Extract the metadata describing the raster band. Metadata is return as a map of key value pairs. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :param band: The band number to extract metadata for. :type band: Column (IntegerType) @@ -95,7 +146,7 @@ rst_boundingbox Returns the bounding box of the raster as a polygon geometry. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :rtype: Column: StructType(DoubleType, DoubleType, DoubleType, DoubleType) @@ -179,6 +230,91 @@ rst_clip | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | +----------------------------------------------------------------------------------------------------------------+ +rst_convolve +************ + +.. function:: rst_convolve(tile, kernel) + + Applies a convolution filter to the raster. + The result is Mosaic raster tile struct column to the filtered raster. + If used, the result is stored in the configured checkpoint directory. + The :code:`kernel` can be Array of Array of either Double, Integer, or Decimal; + ultimately all is cast to Double. Assumes the kernel is square and has an odd number + of rows and columns. Kernel uses the configured GDAL :code:`blockSize`` with a stride being + :code:`kernelSize/2`. + + :param tile: A column containing raster tile. + :type tile: Column (RasterTileType) + :param kernel: The kernel to apply to the raster. + :type kernel: Column (ArrayType(ArrayType(DoubleType))) + :rtype: Column: RasterTileType + + For clarity, this is ultimately the execution of the kernel. + + .. code-block:: text + def convolveAt(x: Int, y: Int, kernel: Array[Array[Double]]): Double = { + val kernelWidth = kernel.head.length + val kernelHeight = kernel.length + val kernelCenterX = kernelWidth / 2 + val kernelCenterY = kernelHeight / 2 + var sum = 0.0 + for (i <- 0 until kernelHeight) { + for (j <- 0 until kernelWidth) { + val xIndex = x + (j - kernelCenterX) + val yIndex = y + (i - kernelCenterY) + if (xIndex >= 0 && xIndex < width && yIndex >= 0 && yIndex < height) { + val maskValue = maskAt(xIndex, yIndex) + val value = elementAt(xIndex, yIndex) + if (maskValue != 0.0 && num.toDouble(value) != noDataValue) { + sum += num.toDouble(value) * kernel(i)(j) + } + } + } + } + sum + } + + :example: + +.. tabs:: + .. code-tab:: py + + df.withColumn("convolve_arr", array( + array(lit(1.0), lit(2.0), lit(3.0)), + array(lit(3.0), lit(2.0), lit(1.0)), + array(lit(1.0), lit(3.0), lit(2.0)))) + .select(rst_convolve("tile", "convolve_arr").display() + +--------------------------------------------------------------------------+ + | rst_convole(tile,convolve_arr) | + +--------------------------------------------------------------------------+ + | {"index_id":null,"raster":"SUkqAAg...= (truncated)", | + | "metadata":{"path":"... .tif","parentPath":"no_path","driver":"GTiff"}} | + +--------------------------------------------------------------------------+ + + .. code-tab:: scala + + df.withColumn("convolve_arr", array( + array(lit(1.0), lit(2.0), lit(3.0)), + array(lit(3.0), lit(2.0), lit(1.0)), + array(lit(1.0), lit(3.0), lit(2.0)))) + .select(rst_convolve(col("tile"), col("convolve_arr")).show + +--------------------------------------------------------------------------+ + | rst_convole(tile,convolve_arr) | + +--------------------------------------------------------------------------+ + | {"index_id":null,"raster":"SUkqAAg...= (truncated)", | + | "metadata":{"path":"... .tif","parentPath":"no_path","driver":"GTiff"}} | + +--------------------------------------------------------------------------+ + + .. code-tab:: sql + + SELECT rst_convolve(tile, convolve_arr) FROM table LIMIT 1 + +--------------------------------------------------------------------------+ + | rst_convolve(tile,convolve_arr) | + +--------------------------------------------------------------------------+ + | {"index_id":null,"raster":"SUkqAAg...= (truncated)", | + | "metadata":{"path":"... .tif","parentPath":"no_path","driver":"GTiff"}} | + +--------------------------------------------------------------------------+ + rst_combineavg ************** @@ -232,7 +368,6 @@ rst_combineavg | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | +----------------------------------------------------------------------------------------------------------------+ - rst_derivedband ************** @@ -316,6 +451,54 @@ rst_derivedband | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | +----------------------------------------------------------------------------------------------------------------+ +rst_filter +********** + +.. function:: rst_filter(tile,kernel_size,operation) + + Applies a filter to the raster. + Returns a new raster tile with the filter applied. + :code:`kernel_size` is the number of pixels to compare; it must be odd. + :code:`operation` is the op to apply, e.g. 'avg', 'median', 'mode', 'max', 'min'. + + :param tile: Mosaic raster tile struct column. + :type tile: Column (RasterTileType) + :param kernel_size: The size of the kernel. Has to be odd. + :type kernel_size: Column (IntegerType) + :param operation: The operation to apply to the kernel. + :type operation: Column (StringType) + :rtype: Column (RasterTileType) + + :example: + +.. tabs:: + .. code-tab:: py + + df.select(rst_filter('tile', lit(3), lit("mode"))).limit(1).display() + +-----------------------------------------------------------------------------------------------------------------------------+ + | rst_filter(tile,3,mode) | + +-----------------------------------------------------------------------------------------------------------------------------+ + | {"index_id":null,"raster":"SUkqAAg...= (truncated)","metadata":{"path":"... .tif","parentPath":"no_path","driver":"GTiff"}} | + +-----------------------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: scala + + df.select(rst_filter(col("tile"), lit(3), lit("mode"))).limit(1).show + +-----------------------------------------------------------------------------------------------------------------------------+ + | rst_filter(tile,3,mode) | + +-----------------------------------------------------------------------------------------------------------------------------+ + | {"index_id":null,"raster":"SUkqAAg...= (truncated)","metadata":{"path":"... .tif","parentPath":"no_path","driver":"GTiff"}} | + +-----------------------------------------------------------------------------------------------------------------------------+ + + + .. code-tab:: sql + + SELECT rst_filter(tile,3,"mode") FROM table LIMIT 1 + +-----------------------------------------------------------------------------------------------------------------------------+ + | rst_filter(tile,3,mode) | + +-----------------------------------------------------------------------------------------------------------------------------+ + | {"index_id":null,"raster":"SUkqAAg...= (truncated)","metadata":{"path":"... .tif","parentPath":"no_path","driver":"GTiff"}} | + +-----------------------------------------------------------------------------------------------------------------------------+ rst_frombands ************** @@ -498,7 +681,7 @@ rst_georeference GT(4) column rotation (typically zero). GT(5) n-s pixel resolution / pixel height (negative value for a north-up image). - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :rtype: Column: MapType(StringType, DoubleType) @@ -542,7 +725,7 @@ rest_getnodata Returns the nodata value of the raster tile bands. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :rtype: Column: ArrayType(DoubleType) @@ -586,7 +769,7 @@ rst_getsubdataset The name is the last identifier in the subdataset path (FORMAT:PATH:NAME). The subdataset name must be a valid subdataset name for the raster. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :param name: A column containing the name of the subdataset to return. :type name: Column (StringType) @@ -629,7 +812,7 @@ rst_height Returns the height of the raster tile in pixels. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :rtype: Column: IntegerType @@ -723,7 +906,7 @@ rst_isempty Returns true if the raster tile is empty. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :rtype: Column: BooleanType @@ -760,9 +943,79 @@ rst_isempty |false | +--------------------+ +rst_maketiles +************* + +.. function:: rst_maketiles(input, driver, size, withCheckpoint) + + Tiles the raster into tiles of the given size. + If the raster is stored on disk, :code:`input` is the path to the raster, + similar to :ref:`rst_fromfile`. + If the raster is stored in memory, :code:`input` is the bytes of the raster, + similar to :ref:`rst_fromcontent`. + If not specified, :code:`driver` is inferred from the file extension; if the + input is a byte array, the driver has to be specified. + If :code:`size` is set to -1, the file is loaded and returned as a single + tile; if set to 0, the file is loaded and subdivided into tiles of size 64MB; if set + to a positive value, the file is loaded and subdivided into tiles of the + specified size; if the file is too big to fit in memory, it is subdivided + into tiles of size 64MB. + If :code:`with_checkpoint` set to true, the tiles are written to the checkpoint + directory; if set to false, the tiles are returned as a in-memory byte arrays. + + :param input: path (StringType) or content (BinaryType) + :type input: Column + :param driver: The driver to use for reading the raster. + :type driver: Column(StringType) + :param size_in_mb: The size of the tiles in MB. + :type size_in_mb: Column(IntegerType) + :param with_checkpoint: whether to use configured checkpoint location. + :type with_checkpoint: Column(BooleanType) + :rtype: Column: RasterTileType + + :example: + +.. tabs:: + .. code-tab:: py + + spark.read.format("binaryFile").load(dbfs_dir)\ + .select(rst_maketiles("path")).limit(1).display() + +------------------------------------------------------------------------+ + | tile | + +------------------------------------------------------------------------+ + | {"index_id":null,"raster":"SUkqAMAAA (truncated)","metadata":{ | + | "parentPath":"no_path","driver":"GTiff","path":"...","last_error":""}} | + +------------------------------------------------------------------------+ + + .. code-tab:: scala + + spark.read.format("binaryFile").load(dbfs_dir) + .select(rst_maketiles(col("path"))).limit(1).show + +------------------------------------------------------------------------+ + | tile | + +------------------------------------------------------------------------+ + | {"index_id":null,"raster":"SUkqAMAAA (truncated)","metadata":{ | + | "parentPath":"no_path","driver":"GTiff","path":"...","last_error":""}} | + +------------------------------------------------------------------------+ + + .. code-tab:: sql + + SELECT rst_maketiles(path) FROM table LIMIT 1 + +------------------------------------------------------------------------+ + | tile | + +------------------------------------------------------------------------+ + | {"index_id":null,"raster":"SUkqAMAAA (truncated)","metadata":{ | + | "parentPath":"no_path","driver":"GTiff","path":"...","last_error":""}} | + +------------------------------------------------------------------------+ + +.. note:: + In initially enabled, checkpointing will remain on for tiles originating from this function, + meaning follow-on calls will also use checkpointing after first enabled. To switch away + from checkpointing down the line, you could call :ref:`rst_fromfile` using the checkpointed + locations as the :code:`path` input. rst_mapalgebra -******** +************** .. function:: rst_mapalgebra(tile, json_spec) @@ -818,6 +1071,91 @@ rst_mapalgebra | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | +----------------------------------------------------------------------------------------------------------------+ +rst_max +******* + +.. function:: rst_max(tile) + + Returns an array containing maximum values for each band. + The python bindings are available through sql, + e.g. :code:`selectExpr("rst_max(tile)")` + + :param tile: A column containing the raster tile. + :type tile: Column (RasterTileType) + :rtype: Column: ArrayType(DoubleType) + + :example: + +.. tabs:: + .. code-tab:: python + + df.selectExpr("rst_max(tile)"").limit(1).display() + +---------------+ + | rst_max(tile) | + +---------------+ + | [42.0] | + +---------------+ + + .. code-tab:: scala + + df.select(rst_max(col("tile"))).limit(1).show + +---------------+ + | rst_max(tile) | + +---------------+ + | [42.0] | + +---------------+ + + .. code-tab:: sql + + SELECT rst_max(tile) FROM table LIMIT 1 + +---------------+ + | rst_max(tile) | + +---------------+ + | [42.0] | + +---------------+ + +rst_median +********** + +.. function:: rst_median(tile) + + Returns an array containing median values for each band. + The python bindings are available through sql, + e.g. :code:`selectExpr("rst_median(tile)")` + + :param tile: A column containing the raster tile. + :type tile: Column (RasterTileType) + :rtype: Column: ArrayType(DoubleType) + + :example: + +.. tabs:: + .. code-tab:: python + + df.selectExpr("rst_median(tile)"").limit(1).display() + +---------------+ + | rst_median(tile) | + +---------------+ + | [42.0] | + +---------------+ + + .. code-tab:: scala + + df.select(rst_median(col("tile"))).limit(1).show + +---------------+ + | rst_median(tile) | + +---------------+ + | [42.0] | + +---------------+ + + .. code-tab:: sql + + SELECT rst_median(tile) FROM table LIMIT 1 + +---------------+ + | rst_median(tile) | + +---------------+ + | [42.0] | + +---------------+ rst_memsize ************* @@ -826,7 +1164,7 @@ rst_memsize Returns size of the raster tile in bytes. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :rtype: Column: LongType @@ -917,7 +1255,6 @@ rst_merge | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | +----------------------------------------------------------------------------------------------------------------+ - rst_metadata ************* @@ -926,7 +1263,7 @@ rst_metadata Extract the metadata describing the raster tile. Metadata is return as a map of key value pairs. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :rtype: Column: MapType(StringType, StringType) @@ -987,6 +1324,49 @@ rst_metadata | "NC_GLOBAL#cdm_data_type": "Grid"} | +--------------------------------------------------------------------------------------------------------------------+ +rst_min +******* + +.. function:: rst_min(tile) + + Returns an array containing minimum values for each band. + The python bindings are available through sql, + e.g. :code:`selectExpr("rst_min(tile)")` + + :param tile: A column containing the raster tile. + :type tile: Column (RasterTileType) + :rtype: Column: ArrayType(DoubleType) + + :example: + +.. tabs:: + .. code-tab:: python + + df.selectExpr("rst_min(tile)"").limit(1).display() + +---------------+ + | rst_min(tile) | + +---------------+ + | [42.0] | + +---------------+ + + .. code-tab:: scala + + df.select(rst_min(col("tile"))).limit(1).show + +---------------+ + | rst_min(tile) | + +---------------+ + | [42.0] | + +---------------+ + + .. code-tab:: sql + + SELECT rst_min(tile) FROM table LIMIT 1 + +---------------+ + | rst_min(tile) | + +---------------+ + | [42.0] | + +---------------+ + rst_ndvi ******** @@ -1044,7 +1424,7 @@ rst_numbands Returns number of bands in the raster tile. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :rtype: Column: IntegerType @@ -1081,6 +1461,49 @@ rst_numbands | 1 | +---------------------+ +rst_pixelcount +*************** + +.. function:: rst_pixelcount(tile) + + Returns an array containing valid pixel count values for each band. + The python bindings are available through sql, + e.g. :code:`selectExpr("rst_pixelcount(tile)")` + + :param tile: A column containing the raster tile. + :type tile: Column (RasterTileType) + :rtype: Column: ArrayType(LongType) + + :example: + +.. tabs:: + .. code-tab:: py + + df.select(mos.rst_pixelcount('tile')).display() + +----------------------+ + | rst_pixelcount(tile) | + +----------------------+ + | [120560172] | + +----------------------+ + + .. code-tab:: scala + + df.select(rst_pixelcount(col("tile"))).show + +----------------------+ + | rst_pixelcount(tile) | + +----------------------+ + | [120560172] | + +----------------------+ + + .. code-tab:: sql + + SELECT rst_pixelcount(tile) FROM table + +----------------------+ + | rst_pixelcount(tile) | + +----------------------+ + | [120560172] | + +----------------------+ + rst_pixelheight *************** @@ -1088,7 +1511,7 @@ rst_pixelheight Returns the height of the pixel in the raster tile derived via GeoTransform. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :rtype: Column: DoubleType @@ -1132,7 +1555,7 @@ rst_pixelwidth Returns the width of the pixel in the raster tile derived via GeoTransform. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :rtype: Column: DoubleType @@ -1179,7 +1602,7 @@ rst_rastertogridavg CellID can be LongType or StringType depending on the configuration of MosaicContext. The value/measure for each cell is the average of the pixel values in the cell. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :param resolution: A resolution of the grid index system. :type resolution: Column (IntegerType) @@ -1248,7 +1671,7 @@ rst_rastertogridcount CellID can be LongType or StringType depending on the configuration of MosaicContext. The value/measure for each cell is the average of the pixel values in the cell. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :param resolution: A resolution of the grid index system. :type resolution: Column (IntegerType) @@ -1317,7 +1740,7 @@ rst_rastertogridmax CellID can be LongType or StringType depending on the configuration of MosaicContext. The value/measure for each cell is the maximum pixel value. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :param resolution: A resolution of the grid index system. :type resolution: Column (IntegerType) @@ -1386,7 +1809,7 @@ rst_rastertogridmedian CellID can be LongType or StringType depending on the configuration of MosaicContext. The value/measure for each cell is the median pixel value. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :param resolution: A resolution of the grid index system. :type resolution: Column (IntegerType) @@ -1455,7 +1878,7 @@ rst_rastertogridmin CellID can be LongType or StringType depending on the configuration of MosaicContext. The value/measure for each cell is the median pixel value. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :param resolution: A resolution of the grid index system. :type resolution: Column (IntegerType) @@ -1523,7 +1946,7 @@ rst_rastertoworldcoord The result is a WKT point geometry. The coordinates are computed using the GeoTransform of the raster to respect the projection. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :param x: x coordinate of the pixel. :type x: Column (IntegerType) @@ -1569,7 +1992,7 @@ rst_rastertoworldcoordx Computes the world coordinates of the raster tile at the given x and y pixel coordinates. The result is the X coordinate of the point after applying the GeoTransform of the raster. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :param x: x coordinate of the pixel. :type x: Column (IntegerType) @@ -1615,7 +2038,7 @@ rst_rastertoworldcoordy Computes the world coordinates of the raster tile at the given x and y pixel coordinates. The result is the X coordinate of the point after applying the GeoTransform of the raster. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :param x: x coordinate of the pixel. :type x: Column (IntegerType) @@ -1663,7 +2086,7 @@ rst_retile The results are the paths to the new rasters. The result set is automatically exploded. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :param width: The width of the tiles. :type width: Column (IntegerType) @@ -1713,7 +2136,7 @@ rst_rotation The rotation is the angle between the X axis and the North axis. The rotation is computed using the GeoTransform of the raster. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :rtype: Column: DoubleType @@ -1757,7 +2180,7 @@ rst_scalex Computes the scale of the raster tile in the X direction. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :rtype: Column: DoubleType @@ -1798,7 +2221,7 @@ rst_scaley Computes the scale of the raster tile in the Y direction. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :rtype: Column: DoubleType @@ -1832,6 +2255,55 @@ rst_scaley | 1.2 | +------------------------------------------------------------------------------------------------------------------+ +rst_separatebands +***************** + +.. function:: rst_separatebands(tile) + + Returns a set of new single-band rasters, one for each band in the input raster. + Result set is automatically exploded based on how many bands exist. + Prior to the explode, you may want to maintain a column in the dataframe with a raster identifier. + + :param tile: A column containing the raster tile. + :type tile: Column (RasterTileType) + :rtype: Column: (RasterTileType) + + :example: + +.. tabs:: + .. code-tab:: py + + df.select(mos.rst_separatebands('tile')).display() + +--------------------------------------------------------------------------------------------------------------------------------+ + | tile | + +--------------------------------------------------------------------------------------------------------------------------------+ + | {"index_id":null,"raster":"SUkqAAg...= (truncated)", | + | "metadata":{"path":"....tif","last_error":"","all_parents":"no_path","driver":"GTiff","bandIndex":"1","parentPath":"no_path", | + | "last_command":"gdal_translate -of GTiff -b 1 -of GTiff -co TILED=YES -co COMPRESS=DEFLATE"}} | + +--------------------------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: scala + + df.select(rst_separatebands(col("tile"))).show + +--------------------------------------------------------------------------------------------------------------------------------+ + | tile | + +--------------------------------------------------------------------------------------------------------------------------------+ + | {"index_id":null,"raster":"SUkqAAg...= (truncated)", | + | "metadata":{"path":"....tif","last_error":"","all_parents":"no_path","driver":"GTiff","bandIndex":"1","parentPath":"no_path", | + | "last_command":"gdal_translate -of GTiff -b 1 -of GTiff -co TILED=YES -co COMPRESS=DEFLATE"}} | + +--------------------------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: sql + + SELECT rst_separatebands(tile) FROM table + +--------------------------------------------------------------------------------------------------------------------------------+ + | tile | + +--------------------------------------------------------------------------------------------------------------------------------+ + | {"index_id":null,"raster":"SUkqAAg...= (truncated)", | + | "metadata":{"path":"....tif","last_error":"","all_parents":"no_path","driver":"GTiff","bandIndex":"1","parentPath":"no_path", | + | "last_command":"gdal_translate -of GTiff -b 1 -of GTiff -co TILED=YES -co COMPRESS=DEFLATE"}} | + +--------------------------------------------------------------------------------------------------------------------------------+ + rst_setnodata ********************** @@ -1842,7 +2314,7 @@ rst_setnodata The same nodata value is set for all bands of the raster if a single value is passed. If an array of values is passed, the nodata value is set for each band of the raster. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array.A column containing the path to a raster file. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :param nodata: The nodata value to set. :type nodata: Column (DoubleType) / ArrayType(DoubleType) @@ -1889,7 +2361,7 @@ rst_skewx Computes the skew of the raster tile in the X direction. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array.A column containing the path to a raster file. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :rtype: Column: DoubleType @@ -1930,7 +2402,7 @@ rst_skewy Computes the skew of the raster tile in the Y direction. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array.A column containing the path to a raster file. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :rtype: Column: DoubleType @@ -1974,7 +2446,7 @@ rst_srid .. note:: For complex CRS definition the EPSG code may default to 0. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array.A column containing the path to a raster file. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :rtype: Column: DoubleType @@ -2017,7 +2489,7 @@ rst_subdatasets The subdatasets are the paths to the subdatasets of the raster. The result is a map of the subdataset path to the subdatasets and the description of the subdatasets. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array.A column containing the path to a raster file. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :rtype: Column: MapType(StringType, StringType) @@ -2073,7 +2545,7 @@ rst_subdivide .. note:: The size of the tiles is approximate. Due to compressions and other effects we cannot guarantee the size of the tiles in MB. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array.A column containing the path to a raster file. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :param size_in_MB: The size of the tiles in MB. :type size_in_MB: Column (IntegerType) @@ -2122,7 +2594,7 @@ rst_summary The logic is produced by gdalinfo procedure. The result is stored as JSON. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array.A column containing the path to a raster file. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :rtype: Column: MapType(StringType, StringType) @@ -2271,6 +2743,58 @@ rst_tooverlappingtiles | {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | +------------------------------------------------------------------------------------------------------------------+ +rst_transform +********************** + +.. function:: rst_transform(tile,srid) + + Transforms the raster to the given SRID. + The result is a Mosaic raster tile struct of the transformed raster. + If using checkpointing, the result will be stored there. + + :param tile: A column containing the raster tile. + :type tile: Column (RasterTileType) + :param srid: EPSG authority code for the file's projection. + :type srid: Column (IntegerType) + :rtype: Column: (RasterTileType) + + :example: + +.. tabs:: + .. code-tab:: py + + df.select(mos.rst_transform('tile', lit(4326))).display() + +----------------------------------------------------------------------------------------------------+ + | rst_transform(tile,4326) | + +----------------------------------------------------------------------------------------------------+ + | {"index_id":null,"raster":"SUkqAAg...= (truncated)","metadata":{"path":"... .tif","last_error":"", | + | "all_parents":"no_path","driver":"GTiff","parentPath":"no_path", | + | "last_command":"gdalwarp -t_srs EPSG:4326 -of GTiff -co TILED=YES -co COMPRESS=DEFLATE"}} | + +----------------------------------------------------------------------------------------------------+ + + .. code-tab:: scala + + df.select(rst_transform(col("tile"), lit(4326))).show + +----------------------------------------------------------------------------------------------------+ + | rst_transform(tile,4326) | + +----------------------------------------------------------------------------------------------------+ + | {"index_id":null,"raster":"SUkqAAg...= (truncated)","metadata":{"path":"... .tif","last_error":"", | + | "all_parents":"no_path","driver":"GTiff","parentPath":"no_path", | + | "last_command":"gdalwarp -t_srs EPSG:4326 -of GTiff -co TILED=YES -co COMPRESS=DEFLATE"}} | + +----------------------------------------------------------------------------------------------------+ + + .. code-tab:: sql + + SELECT rst_transform(tile,4326) FROM table + +----------------------------------------------------------------------------------------------------+ + | rst_transform(tile,4326) | + +----------------------------------------------------------------------------------------------------+ + | {"index_id":null,"raster":"SUkqAAg...= (truncated)","metadata":{"path":"... .tif","last_error":"", | + | "all_parents":"no_path","driver":"GTiff","parentPath":"no_path", | + | "last_command":"gdalwarp -t_srs EPSG:4326 -of GTiff -co TILED=YES -co COMPRESS=DEFLATE"}} | + +----------------------------------------------------------------------------------------------------+ + + rst_tryopen ********************** @@ -2320,7 +2844,7 @@ rst_upperleftx Computes the upper left X coordinate of the raster tile. The value is computed based on GeoTransform. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array.A column containing the path to a raster file. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :rtype: Column: DoubleType @@ -2362,7 +2886,7 @@ rst_upperlefty Computes the upper left Y coordinate of the raster tile. The value is computed based on GeoTransform. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array.A column containing the path to a raster file. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :rtype: Column: DoubleType @@ -2404,7 +2928,7 @@ rst_width Computes the width of the raster tile in pixels. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array.A column containing the path to a raster file. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :rtype: Column: IntegerType @@ -2448,7 +2972,7 @@ rst_worldtorastercoord The world coordinates are the coordinates in the CRS of the raster. The coordinates are resolved using GeoTransform. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array.A column containing the path to a raster file. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :param xworld: X world coordinate. :type xworld: Column (DoubleType) @@ -2498,7 +3022,7 @@ rst_worldtorastercoordx This method returns the X coordinate. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array.A column containing the path to a raster file. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :param xworld: X world coordinate. :type xworld: Column (DoubleType) @@ -2548,7 +3072,7 @@ rst_worldtorastercoordy This method returns the Y coordinate. - :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array.A column containing the path to a raster file. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :param xworld: X world coordinate. :type xworld: Column (DoubleType) diff --git a/docs/source/api/spatial-aggregations.rst b/docs/source/api/spatial-aggregations.rst index e1184cd2f..743d0b600 100644 --- a/docs/source/api/spatial-aggregations.rst +++ b/docs/source/api/spatial-aggregations.rst @@ -2,6 +2,115 @@ Spatial aggregation functions ============================= + +st_asgeojsontile_agg +******************** + +.. function:: st_asgeojsontile_agg(geom, attributes) + + Generates GeoJSON vector tiles from a group by statement over aggregated geometry column. + "Geom" column is WKB, WKT, or GeoJSON. + "Attributes" column is a spark struct; it requires minimally "id". + + :param geom: A grouped column containing geometries. + :type geom: Column + :param attributes: the attributes column to aggregate. + :type attributes: Column(StructType) + :rtype: Column + + :example: + +.. tabs:: + .. code-tab:: py + + df.groupBy()\ + .agg(mos.st_asgeojsontile_agg("geom", struct("id"))).limit(1).display() + +----------------------------------------------------------------------------------------------------------------+ + | st_asgeojsontile_agg(geom, struct(id)) | + +----------------------------------------------------------------------------------------------------------------+ + | {"type": "FeatureCollection", "name": "tiles", "crs": { | + | "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } }, "features": [ ... ] } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: scala + + df.groupBy() + .agg(st_asgeojsontile_agg(col("geom"), struct(col("id"))).limit(1).show + +----------------------------------------------------------------------------------------------------------------+ + | st_asgeojsontile_agg(geom, struct(id)) | + +----------------------------------------------------------------------------------------------------------------+ + | {"type": "FeatureCollection", "name": "tiles", "crs": { | + | "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } }, "features": [ ... ] } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: sql + + SELECT st_asgeojsontile_agg(geom, struct(id)) + FROM table + GROUP BY 1 + +----------------------------------------------------------------------------------------------------------------+ + | st_asgeojsontile_agg(geom, struct(id)) | + +----------------------------------------------------------------------------------------------------------------+ + | {"type": "FeatureCollection", "name": "tiles", "crs": { | + | "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } }, "features": [ ... ] } | + +----------------------------------------------------------------------------------------------------------------+ + + +st_asmvttile_agg +******************** + +.. function:: st_asmvttile_agg(geom, attributes, zxyID) + + Generates Mapbox Vector Tiles from a group by statement over aggregated geometry column. + "Geom" column is Mosaic Internal Geometry, e.g. using ST_GeomFrom[WKB|WKT|GeoJSON]. + The geometry that you work on requires an SRID, recommend using ST_UpdateSRID, + e.g. from 4326 to 3857 (required SRID). + "Attributes" column is a spark struct; it requires minimally "id". + "zxyID" column is a string. + + :param geom: A grouped column containing geometries. + :type geom: Column + :param attributes: the attributes column to aggregate. + :type attributes: Column(StructType) + :param zxyID: the zxyID column to aggregate. + :type attributes: Column(StringType) + :rtype: Column + + :example: + +.. tabs:: + .. code-tab:: py + + df.groupBy()\ + .agg(mos.st_asmvttile_agg("geom_3857", struct("id"), "zxyID")).limit(1).display() + +----------------------------------------------------------------------------------------------------------------+ + | st_asmvttile_agg(geom_3857, struct(id), zxyID) | + +----------------------------------------------------------------------------------------------------------------+ + | H4sIAAAAAAAAA5Ny5GItycxJLRZSFmJiYJBgVpLmfKXxwySIgYmZg5mJkZGRgYGRiZGFFYgZ+KWYMlOUuDQavk05e+ntl1fCGg0KFUwA... | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: scala + + df.groupBy() + .agg(st_asmvttiletile_agg(col("geom_3857"), struct(col("id")), col("zxyID")).limit(1).show + +----------------------------------------------------------------------------------------------------------------+ + | st_asmvttile_agg(geom_3857, struct(id), zxyID) | + +----------------------------------------------------------------------------------------------------------------+ + | H4sIAAAAAAAAA5Ny5GItycxJLRZSFmJiYJBgVpLmfKXxwySIgYmZg5mJkZGRgYGRiZGFFYgZ+KWYMlOUuDQavk05e+ntl1fCGg0KFUwA... | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: sql + + SELECT st_asmvttile_agg(geom_3857, struct(id), zxyID) + FROM table + GROUP BY 1 + +----------------------------------------------------------------------------------------------------------------+ + | st_asmvttile_agg(geom_3857, struct(id), zxyID) | + +----------------------------------------------------------------------------------------------------------------+ + | H4sIAAAAAAAAA5Ny5GItycxJLRZSFmJiYJBgVpLmfKXxwySIgYmZg5mJkZGRgYGRiZGFFYgZ+KWYMlOUuDQavk05e+ntl1fCGg0KFUwA... | + +----------------------------------------------------------------------------------------------------------------+ + + rst_combineavg_agg ***************** diff --git a/docs/source/api/spatial-functions.rst b/docs/source/api/spatial-functions.rst index f5c3b76fd..5430e9c42 100644 --- a/docs/source/api/spatial-functions.rst +++ b/docs/source/api/spatial-functions.rst @@ -1367,7 +1367,11 @@ st_setsrid .. note:: :ref:`st_setsrid` does not transform the coordinates of :code:`geom`, rather it tells Mosaic the SRID in which the current coordinates are expressed. - :ref:`st_setsrid` can only operate on geometries encoded in GeoJSON. + **Changed in 0.4 series** :ref:`st_srid`, :ref:`st_setsrid`, and :ref:`st_transform` operate best on + Mosaic Internal Geometry across language bindings, so recommend calling :ref:`st_geomfromwkt` or :ref:`st_geomfromwkb` + to convert from WKT and WKB. You can convert back after the transform, e.g. using :ref:`st_astext` or :ref:`st_asbinary`. + Alternatively, you can use :ref:`st_updatesrid` to transform WKB, WKB, GeoJSON, or Mosaic Internal Geometry + by specifying the :code:`srcSRID` and :code:`dstSRID`. st_simplify *********** @@ -1443,47 +1447,51 @@ st_srid json_geom = '{"type":"MultiPoint","coordinates":[[10,40],[40,30],[20,20],[30,10]],"crs":{"type":"name","properties":{"name":"EPSG:4326"}}}' df = spark.createDataFrame([{'json': json_geom}]) - df.select(st_srid(as_json('json'))).show(1) - +----------------------+ - |st_srid(as_json(json))| - +----------------------+ - | 4326| - +----------------------+ + df.select(st_srid(st_geomfromgeojson('json'))).show(1) + +--------------------------------------------+ + | st_srid(st_geomfromgeojson(as_json(json))) | + +--------------------------------------------+ + | 4326 | + +--------------------------------------------+ .. code-tab:: scala val df = List("""{"type":"MultiPoint","coordinates":[[10,40],[40,30],[20,20],[30,10]],"crs":{"type":"name","properties":{"name":"EPSG:4326"}}}""") .toDF("json") - df.select(st_srid(as_json(col("json")))).show(1) - +----------------------+ - |st_srid(as_json(json))| - +----------------------+ - | 4326| - +----------------------+ - + df.select(st_srid(st_geomfromgeojson(col("json")))).show(1) + +--------------------------------------------+ + | st_srid(st_geomfromgeojson(as_json(json))) | + +--------------------------------------------+ + | 4326 | + +--------------------------------------------+ + .. code-tab:: sql select st_srid(as_json('{"type":"MultiPoint","coordinates":[[10,40],[40,30],[20,20],[30,10]],"crs":{"type":"name","properties":{"name":"EPSG:4326"}}}')) - +------------+ - |st_srid(...)| - +------------+ - |4326 | - +------------+ - + +--------------------------------------------+ + | st_srid(st_geomfromgeojson(as_json(...))) | + +--------------------------------------------+ + | 4326 | + +--------------------------------------------+ + .. code-tab:: r R json_geom <- '{"type":"MultiPoint","coordinates":[[10,40],[40,30],[20,20],[30,10]],"crs":{"type":"name","properties":{"name":"EPSG:4326"}}}' df <- createDataFrame(data.frame(json=json_geom)) - showDF(select(df, st_srid(as_json(column('json'))))) - +------------+ - |st_srid(...)| - +------------+ - |4326 | - +------------+ + showDF(select(df, st_srid(st_geomfromgeojson(column('json'))))) + +--------------+ + | st_srid(...) | + +--------------+ + | 4326 | + +--------------+ .. note:: - :ref:`st_srid` can only operate on geometries encoded in GeoJSON. + **Changed in 0.4 series** :ref:`st_srid`, :ref:`st_setsrid`, and :ref:`st_transform` operate best on + Mosaic Internal Geometry across language bindings, so recommend calling :ref:`st_geomfromwkt` or :ref:`st_geomfromwkb` + to convert from WKT and WKB. You can convert back after the transform, e.g. using :ref:`st_astext` or :ref:`st_asbinary`. + Alternatively, you can use :ref:`st_updatesrid` to transform WKB, WKB, GeoJSON, or Mosaic Internal Geometry + by specifying the :code:`srcSRID` and :code:`dstSRID`. st_transform @@ -1492,8 +1500,8 @@ st_transform .. function:: st_transform(col, srid) Transforms the horizontal (XY) coordinates of :code:`geom` from the current reference system to that described by :code:`srid`. - - + Recommend use of Mosaic Internal Geometry for the transform, + then convert to desired interchange format [WKB, WKT, GeoJSON] afterwards. :param col: Geometry :type col: Column @@ -1508,7 +1516,7 @@ st_transform df = ( spark.createDataFrame([{'wkt': 'MULTIPOINT ((10 40), (40 30), (20 20), (30 10))'}]) - .withColumn('geom', st_setsrid(st_asgeojson('wkt'), lit(4326))) + .withColumn('geom', st_setsrid(st_geomfromwkt('wkt'), lit(4326))) ) df.select(st_astext(st_transform('geom', lit(3857)))).show(1, False) +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ @@ -1520,7 +1528,7 @@ st_transform .. code-tab:: scala val df = List("MULTIPOINT ((10 40), (40 30), (20 20), (30 10))").toDF("wkt") - .withColumn("geom", st_setsrid(st_asgeojson(col("wkt")), lit(4326))) + .withColumn("geom", st_setsrid(st_geomfromwkt(col("wkt")), lit(4326))) df.select(st_astext(st_transform(col("geom"), lit(3857)))).show(1, false) +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |convert_to(st_transform(geom, 3857)) | @@ -1530,7 +1538,7 @@ st_transform .. code-tab:: sql - select st_astext(st_transform(st_setsrid(st_asgeojson("MULTIPOINT ((10 40), (40 30), (20 20), (30 10))"), 4326) as geom, 3857)) + select st_astext(st_transform(st_setsrid(st_geomfromwkt("MULTIPOINT ((10 40), (40 30), (20 20), (30 10))"), 4326) as geom, 3857)) +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |convert_to(st_transform(geom, 3857)) | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ @@ -1540,7 +1548,7 @@ st_transform .. code-tab:: r R df <- createDataFrame(data.frame(wkt = "MULTIPOINT ((10 40), (40 30), (20 20), (30 10))")) - df <- withColumn(df, 'geom', st_setsrid(st_asgeojson(column('wkt')), lit(4326L))) + df <- withColumn(df, 'geom', st_setsrid(st_geomfromwkt(column('wkt')), lit(4326L))) showDF(select(df, st_astext(st_transform(column('geom'), lit(3857L)))), truncate=F) +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ @@ -1551,9 +1559,11 @@ st_transform .. note:: If :code:`geom` does not have an associated SRID, use :ref:`st_setsrid` to set this before calling :ref:`st_transform`. - **Changed in 0.4 series** :ref:`st_srid`, :ref:`st_setsrid`, and :ref:`st_transform` only operate on - GeoJSON (columnar) data, so be sure to call :ref:`st_asgeojson` to convert from WKT and WKB. You can convert - back after the transform, e.g. using :ref:`st_astext` or :ref:`st_asbinary`. + **Changed in 0.4 series** :ref:`st_srid`, :ref:`st_setsrid`, and :ref:`st_transform` operate best on + Mosaic Internal Geometry across language bindings, so recommend calling :ref:`st_geomfromwkt` or :ref:`st_geomfromwkb` + to convert from WKT and WKB. You can convert back after the transform, e.g. using :ref:`st_astext` or :ref:`st_asbinary`. + Alternatively, you can use :ref:`st_updatesrid` to transform WKB, WKB, GeoJSON, or Mosaic Internal Geometry + by specifying the :code:`srcSRID` and :code:`dstSRID`. st_translate @@ -1722,6 +1732,77 @@ st_unaryunion |POLYGON ((20 15, 20 10, 10 10, 10 20, 15 20, 15 25, 25 25, 25 15, 20 15))| +-------------------------------------------------------------------------+ + +st_updatesrid +************* + +.. function:: st_updatesrid(geom, srcSRID, destSRID) + + Updates the SRID of the input geometry `geom` from `srcSRID` to `destSRID`. + Geometry can be any supported [WKT, WKB, GeoJSON, Mosaic Internal Geometry]. + Transformed geometry in the provided format is returned. + + :param geom: Geometry to update the SRID + :type geom: Column + :param srcSRID: Original SRID + :type srcSRID: Column: Integer + :param destSRID: New SRID + :type destSRID: Column: Integer + :rtype: Column + + :example: + +.. tabs:: + .. code-tab:: py + + spark.createDataFrame([ + ["""POLYGON ((12.1773911 66.2559307, 12.1773712 66.2558954, 12.177202 66.2557779, 12.1770325 66.2557476, 12.1769472 66.2557593, + 12.1769162 66.2557719, 12.1769186 66.2557965, 12.1770058 66.2558191, 12.1771788 66.2559348, 12.1772692 66.2559828, + 12.1773634 66.2559793, 12.1773911 66.2559307))"""]], ["geom_wkt"])\ + .select(mos.st_updatesrid("geom_wkt", F.lit(4326), F.lit(3857))).display() + +---------------------------------------------------------------+ + | st_updatesrid(geom_wkt, CAST(4326 AS INT), CAST(3857 AS INT)) | + +---------------------------------------------------------------+ + | POLYGON ((1355580.9764425415 9947245.380472444, ... )) | + +---------------------------------------------------------------+ + + .. code-tab:: scala + + val df = List("""POLYGON ((12.1773911 66.2559307, 12.1773712 66.2558954, 12.177202 66.2557779, 12.1770325 66.2557476, + 12.1769472 66.2557593, 12.1769162 66.2557719, 12.1769186 66.2557965, 12.1770058 66.2558191, 12.1771788 66.2559348, + 12.1772692 66.2559828, 12.1773634 66.2559793, 12.1773911 66.2559307))""").toDF("geom_wkt") + df.select(st_updatesrid(col("geom_wkt"), lit(4326), lit(3857))).show + +---------------------------------------------------------------+ + | st_updatesrid(geom_wkt, CAST(4326 AS INT), CAST(3857 AS INT)) | + +---------------------------------------------------------------+ + | POLYGON ((1355580.9764425415 9947245.380472444, ... )) | + +---------------------------------------------------------------+ + + .. code-tab:: sql + + select st_updatesrid(geom_wkt, 4326, 3857) + from ( + select """POLYGON ((12.1773911 66.2559307, 12.1773712 66.2558954, 12.177202 66.2557779, 12.1770325 66.2557476, + 12.1769472 66.2557593, 12.1769162 66.2557719, 12.1769186 66.2557965, 12.1770058 66.2558191, 12.1771788 66.2559348, + 12.1772692 66.2559828, 12.1773634 66.2559793, 12.1773911 66.2559307))""" as geom_wkt + ) + +---------------------------------------------------------------+ + | st_updatesrid(geom_wkt, CAST(4326 AS INT), CAST(3857 AS INT)) | + +---------------------------------------------------------------+ + | POLYGON ((1355580.9764425415 9947245.380472444, ... )) | + +---------------------------------------------------------------+ + + .. code-tab:: r R + + df <- createDataFrame(data.frame(geom_wkt = "POLYGON (( ... ))")) + showDF(select(df, st_updatesrid(column("wkt"), lit(4326L), lit(3857L))), truncate=F) + +---------------------------------------------------------------+ + | st_updatesrid(geom_wkt, CAST(4326 AS INT), CAST(3857 AS INT)) | + +---------------------------------------------------------------+ + | POLYGON ((1355580.9764425415 9947245.380472444, ... )) | + +---------------------------------------------------------------+ + + st_x **** diff --git a/docs/source/usage/install-gdal.rst b/docs/source/usage/install-gdal.rst index e919940aa..310f41276 100644 --- a/docs/source/usage/install-gdal.rst +++ b/docs/source/usage/install-gdal.rst @@ -108,4 +108,26 @@ code at the top of the notebook: .. note:: You can configure init script from default ubuntu GDAL (3.4.1) to ubuntugis ppa @ https://launchpad.net/~ubuntugis/+archive/ubuntu/ppa (3.4.3) - with `setup_gdal(with_ubuntugis=True)` \ No newline at end of file + with `setup_gdal(with_ubuntugis=True)` + +GDAL Configuration +#################### + +Here are spark session configs available for raster, e.g. :code:`spark.conf.set("", "")`. + +.. list-table:: Title + :widths: 25 25 50 + :header-rows: 1 + + * - Config + - Default + - Comments + * - spark.databricks.labs.mosaic.raster.checkpoint + - "/dbfs/tmp/mosaic/raster/checkpoint" + - Checkpoint location, see :ref:`rst_maketiles` for more + * - spark.databricks.labs.mosaic.raster.tmp.prefix + - "" (will use "/tmp") + - Local directory for workers + * - spark.databricks.labs.mosaic.raster.blocksize + - "128" + - Blocksize in pixels, see :ref:`rst_convolve` and :ref:`rst_filter` for more From 63ffc7efd3b206b48ad79718017af0950006c1e7 Mon Sep 17 00:00:00 2001 From: Michael Johns Date: Wed, 3 Apr 2024 08:57:54 -0400 Subject: [PATCH 2/7] Consistent formatting of param references. --- docs/source/api/spatial-aggregations.rst | 10 +++++----- docs/source/api/spatial-functions.rst | 2 +- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/source/api/spatial-aggregations.rst b/docs/source/api/spatial-aggregations.rst index 743d0b600..ae58dd1dc 100644 --- a/docs/source/api/spatial-aggregations.rst +++ b/docs/source/api/spatial-aggregations.rst @@ -9,8 +9,8 @@ st_asgeojsontile_agg .. function:: st_asgeojsontile_agg(geom, attributes) Generates GeoJSON vector tiles from a group by statement over aggregated geometry column. - "Geom" column is WKB, WKT, or GeoJSON. - "Attributes" column is a spark struct; it requires minimally "id". + :code:`geom` column is WKB, WKT, or GeoJSON. + :code:`attributes` column is a spark struct; it requires minimally "id". :param geom: A grouped column containing geometries. :type geom: Column @@ -62,11 +62,11 @@ st_asmvttile_agg .. function:: st_asmvttile_agg(geom, attributes, zxyID) Generates Mapbox Vector Tiles from a group by statement over aggregated geometry column. - "Geom" column is Mosaic Internal Geometry, e.g. using ST_GeomFrom[WKB|WKT|GeoJSON]. + :code:`Geom` column is Mosaic Internal Geometry, e.g. using ST_GeomFrom[WKB|WKT|GeoJSON]. The geometry that you work on requires an SRID, recommend using ST_UpdateSRID, e.g. from 4326 to 3857 (required SRID). - "Attributes" column is a spark struct; it requires minimally "id". - "zxyID" column is a string. + :code:`Attributes` column is a spark struct; it requires minimally "id". + :code:`zxyID` column is a string. :param geom: A grouped column containing geometries. :type geom: Column diff --git a/docs/source/api/spatial-functions.rst b/docs/source/api/spatial-functions.rst index 5430e9c42..19310964f 100644 --- a/docs/source/api/spatial-functions.rst +++ b/docs/source/api/spatial-functions.rst @@ -1738,7 +1738,7 @@ st_updatesrid .. function:: st_updatesrid(geom, srcSRID, destSRID) - Updates the SRID of the input geometry `geom` from `srcSRID` to `destSRID`. + Updates the SRID of the input geometry :cdoe:`geom` from :code:`srcSRID` to :code:`destSRID`. Geometry can be any supported [WKT, WKB, GeoJSON, Mosaic Internal Geometry]. Transformed geometry in the provided format is returned. From 5b78130d2497a53ea903a0726e2af08467728568 Mon Sep 17 00:00:00 2001 From: Michael Johns Date: Wed, 3 Apr 2024 08:59:23 -0400 Subject: [PATCH 3/7] another formatting fix. --- docs/source/api/spatial-aggregations.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/api/spatial-aggregations.rst b/docs/source/api/spatial-aggregations.rst index ae58dd1dc..e14ab1fa2 100644 --- a/docs/source/api/spatial-aggregations.rst +++ b/docs/source/api/spatial-aggregations.rst @@ -62,10 +62,10 @@ st_asmvttile_agg .. function:: st_asmvttile_agg(geom, attributes, zxyID) Generates Mapbox Vector Tiles from a group by statement over aggregated geometry column. - :code:`Geom` column is Mosaic Internal Geometry, e.g. using ST_GeomFrom[WKB|WKT|GeoJSON]. + :code:`geom` column is Mosaic Internal Geometry, e.g. using ST_GeomFrom[WKB|WKT|GeoJSON]. The geometry that you work on requires an SRID, recommend using ST_UpdateSRID, e.g. from 4326 to 3857 (required SRID). - :code:`Attributes` column is a spark struct; it requires minimally "id". + :code:`attributes` column is a spark struct; it requires minimally "id". :code:`zxyID` column is a string. :param geom: A grouped column containing geometries. From 29abe288048efc3b2da45ecf19be2a387d9d17fe Mon Sep 17 00:00:00 2001 From: Stuart Lynn Date: Fri, 5 Apr 2024 17:22:49 +0100 Subject: [PATCH 4/7] fixed convolve examples --- docs/source/api/raster-functions.rst | 209 ++++++++++++++------------- 1 file changed, 107 insertions(+), 102 deletions(-) diff --git a/docs/source/api/raster-functions.rst b/docs/source/api/raster-functions.rst index 85c0009e6..fbaa1675e 100644 --- a/docs/source/api/raster-functions.rst +++ b/docs/source/api/raster-functions.rst @@ -205,115 +205,31 @@ rst_clip .. tabs:: .. code-tab:: py - df.select(mos.rst_clip("tile", F.lit("POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))"))).limit(1).display() - +----------------------------------------------------------------------------------------------------------------+ - | rst_clip(tile, POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0))) | - +----------------------------------------------------------------------------------------------------------------+ - | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | - +----------------------------------------------------------------------------------------------------------------+ + df.select(mos.rst_clip("tile", F.lit("POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))"))).limit(1).display() + +----------------------------------------------------------------------------------------------------------------+ + | rst_clip(tile, POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0))) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ .. code-tab:: scala - df.select(rst_clip(col("tile"), lit("POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))"))).limit(1).show - +----------------------------------------------------------------------------------------------------------------+ - | rst_clip(tile, POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0))) | - +-----------------------------------------------------------------------------------------------------------------+ - | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | - +-----------------------------------------------------------------------------------------------------------------+ + df.select(rst_clip(col("tile"), lit("POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))"))).limit(1).show + +----------------------------------------------------------------------------------------------------------------+ + | rst_clip(tile, POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0))) | + +-----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +-----------------------------------------------------------------------------------------------------------------+ .. code-tab:: sql - SELECT rst_clip(tile, "POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))") FROM table LIMIT 1 - +----------------------------------------------------------------------------------------------------------------+ - | rst_clip(tile, POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0))) | - +----------------------------------------------------------------------------------------------------------------+ - | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | - +----------------------------------------------------------------------------------------------------------------+ - -rst_convolve -************ - -.. function:: rst_convolve(tile, kernel) - - Applies a convolution filter to the raster. - The result is Mosaic raster tile struct column to the filtered raster. - If used, the result is stored in the configured checkpoint directory. - The :code:`kernel` can be Array of Array of either Double, Integer, or Decimal; - ultimately all is cast to Double. Assumes the kernel is square and has an odd number - of rows and columns. Kernel uses the configured GDAL :code:`blockSize`` with a stride being - :code:`kernelSize/2`. - - :param tile: A column containing raster tile. - :type tile: Column (RasterTileType) - :param kernel: The kernel to apply to the raster. - :type kernel: Column (ArrayType(ArrayType(DoubleType))) - :rtype: Column: RasterTileType - - For clarity, this is ultimately the execution of the kernel. - - .. code-block:: text - def convolveAt(x: Int, y: Int, kernel: Array[Array[Double]]): Double = { - val kernelWidth = kernel.head.length - val kernelHeight = kernel.length - val kernelCenterX = kernelWidth / 2 - val kernelCenterY = kernelHeight / 2 - var sum = 0.0 - for (i <- 0 until kernelHeight) { - for (j <- 0 until kernelWidth) { - val xIndex = x + (j - kernelCenterX) - val yIndex = y + (i - kernelCenterY) - if (xIndex >= 0 && xIndex < width && yIndex >= 0 && yIndex < height) { - val maskValue = maskAt(xIndex, yIndex) - val value = elementAt(xIndex, yIndex) - if (maskValue != 0.0 && num.toDouble(value) != noDataValue) { - sum += num.toDouble(value) * kernel(i)(j) - } - } - } - } - sum - } - - :example: - -.. tabs:: - .. code-tab:: py - - df.withColumn("convolve_arr", array( - array(lit(1.0), lit(2.0), lit(3.0)), - array(lit(3.0), lit(2.0), lit(1.0)), - array(lit(1.0), lit(3.0), lit(2.0)))) - .select(rst_convolve("tile", "convolve_arr").display() - +--------------------------------------------------------------------------+ - | rst_convole(tile,convolve_arr) | - +--------------------------------------------------------------------------+ - | {"index_id":null,"raster":"SUkqAAg...= (truncated)", | - | "metadata":{"path":"... .tif","parentPath":"no_path","driver":"GTiff"}} | - +--------------------------------------------------------------------------+ - - .. code-tab:: scala - - df.withColumn("convolve_arr", array( - array(lit(1.0), lit(2.0), lit(3.0)), - array(lit(3.0), lit(2.0), lit(1.0)), - array(lit(1.0), lit(3.0), lit(2.0)))) - .select(rst_convolve(col("tile"), col("convolve_arr")).show - +--------------------------------------------------------------------------+ - | rst_convole(tile,convolve_arr) | - +--------------------------------------------------------------------------+ - | {"index_id":null,"raster":"SUkqAAg...= (truncated)", | - | "metadata":{"path":"... .tif","parentPath":"no_path","driver":"GTiff"}} | - +--------------------------------------------------------------------------+ - - .. code-tab:: sql + SELECT rst_clip(tile, "POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))") FROM table LIMIT 1 + +----------------------------------------------------------------------------------------------------------------+ + | rst_clip(tile, POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0))) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ - SELECT rst_convolve(tile, convolve_arr) FROM table LIMIT 1 - +--------------------------------------------------------------------------+ - | rst_convolve(tile,convolve_arr) | - +--------------------------------------------------------------------------+ - | {"index_id":null,"raster":"SUkqAAg...= (truncated)", | - | "metadata":{"path":"... .tif","parentPath":"no_path","driver":"GTiff"}} | - +--------------------------------------------------------------------------+ rst_combineavg ************** @@ -368,6 +284,95 @@ rst_combineavg | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | +----------------------------------------------------------------------------------------------------------------+ +rst_convolve +************ + +.. function:: rst_convolve(tile, kernel) + + Applies a convolution filter to the raster. + The result is Mosaic raster tile struct column to the filtered raster. + If used, the result is stored in the configured checkpoint directory. + The :code:`kernel` can be Array of Array of either Double, Integer, or Decimal; + ultimately all is cast to Double. Assumes the kernel is square and has an odd number + of rows and columns. Kernel uses the configured GDAL :code:`blockSize`` with a stride being + :code:`kernelSize/2`. + + :param tile: A column containing raster tile. + :type tile: Column (RasterTileType) + :param kernel: The kernel to apply to the raster. + :type kernel: Column (ArrayType(ArrayType(DoubleType))) + :rtype: Column: RasterTileType + + :example: + +.. tabs:: + .. code-tab:: py + + df\ + .withColumn("convolve_arr", array( + array(lit(1.0), lit(2.0), lit(3.0)) + array(lit(3.0), lit(2.0), lit(1.0)), + array(lit(1.0), lit(3.0), lit(2.0)))\ + .select(rst_convolve("tile", "convolve_arr").display() + +---------------------------------------------------------------------------+ + | rst_convolve(tile,convolve_arr) | + +---------------------------------------------------------------------------+ + | {"index_id":null,"raster":"SUkqAAg...= (truncated)", | + | "metadata":{"path":"... .tif","parentPath":"no_path","driver":"GTiff"}} | + +---------------------------------------------------------------------------+ + + .. code-tab:: scala + + df + .withColumn("convolve_arr", array( + array(lit(1.0), lit(2.0), lit(3.0)), + array(lit(3.0), lit(2.0), lit(1.0)), + array(lit(1.0), lit(3.0), lit(2.0))) + ) + .select(rst_convolve(col("tile"), col("convolve_arr")).show + +---------------------------------------------------------------------------+ + | rst_convolve(tile,convolve_arr) | + +---------------------------------------------------------------------------+ + | {"index_id":null,"raster":"SUkqAAg...= (truncated)", | + | "metadata":{"path":"... .tif","parentPath":"no_path","driver":"GTiff"}} | + +---------------------------------------------------------------------------+ + + .. code-tab:: sql + + SELECT rst_convolve(tile, convolve_arr) FROM table LIMIT 1 + +---------------------------------------------------------------------------+ + | rst_convolve(tile,convolve_arr) | + +---------------------------------------------------------------------------+ + | {"index_id":null,"raster":"SUkqAAg...= (truncated)", | + | "metadata":{"path":"... .tif","parentPath":"no_path","driver":"GTiff"}} | + +---------------------------------------------------------------------------+ + +For clarity, this is ultimately the execution of the kernel. + + .. code-block:: scala + + def convolveAt(x: Int, y: Int, kernel: Array[Array[Double]]): Double = { + val kernelWidth = kernel.head.length + val kernelHeight = kernel.length + val kernelCenterX = kernelWidth / 2 + val kernelCenterY = kernelHeight / 2 + var sum = 0.0 + for (i <- 0 until kernelHeight) { + for (j <- 0 until kernelWidth) { + val xIndex = x + (j - kernelCenterX) + val yIndex = y + (i - kernelCenterY) + if (xIndex >= 0 && xIndex < width && yIndex >= 0 && yIndex < height) { + val maskValue = maskAt(xIndex, yIndex) + val value = elementAt(xIndex, yIndex) + if (maskValue != 0.0 && num.toDouble(value) != noDataValue) { + sum += num.toDouble(value) * kernel(i)(j) + } + } + } + } + sum + } + rst_derivedband ************** From 066d500565b6f8d864b63b9b64959388ad66ad8f Mon Sep 17 00:00:00 2001 From: Stuart Lynn Date: Fri, 5 Apr 2024 21:41:35 +0100 Subject: [PATCH 5/7] updated raster function docs --- docs/source/api/raster-functions.rst | 615 +++++++++++++++++---------- 1 file changed, 386 insertions(+), 229 deletions(-) diff --git a/docs/source/api/raster-functions.rst b/docs/source/api/raster-functions.rst index fbaa1675e..94daa5efa 100644 --- a/docs/source/api/raster-functions.rst +++ b/docs/source/api/raster-functions.rst @@ -3,7 +3,7 @@ Raster functions ================= Intro -################ +##### Raster functions are available in mosaic if you have installed the optional dependency `GDAL`. Please see :doc:`Install and Enable GDAL with Mosaic ` for installation instructions. @@ -13,24 +13,38 @@ Please see :doc:`Install and Enable GDAL with Mosaic ` for * Mosaic also provides a scalable retiling function that can be used to retile raster data in case of bottlenecking due to large files. * All raster functions respect the :code:`rst_` prefix naming convention. - * Mosaic operates using raster tile objects. Tile objects are created using functions such as - :ref:`rst_fromfile` or :ref:`rst_fromcontent`. These functions are used as places to start when working with - initial data. If you use :code:`spark.read.format("gdal")` tiles are automatically generated for you. - * **Changed in 0.4.1** Mosaic raster tile schema changed to the following: + +Tile objects +------------ + +Mosaic raster functions perform operations on "raster tile" objects. These can be created explicitly using functions +such as :ref:`rst_fromfile` or :ref:`rst_fromcontent` or implicitly when using Mosaic's GDAL datasource reader +e.g. :code:`spark.read.format("gdal")` + +**Important changes to tile objects** + * The Mosaic raster tile schema changed in v0.4.1 to the following: :code:`>`. All APIs that use tiles now follow - this schema. Also, a new functions :ref:`rst_maketiles` is available that allows for single tile schema to handle - either a path (string) raster similar to :ref:`rst_fromfile` or a binary raster similar to :ref:`rst_fromcontent`; - however, a key difference is that :ref:`rst_maketiles` supports optional checkpointing for increased performance benefits. - * In 0.4.1, there are a new set of raster apis that have not yet had python bindings generated; however you can still - call the functions with pyspark function :code:`selectExpr`, e.g. :code:`selectExpr("rst_avg(...)")` which invokes the sql - registered expression. The calls are: :ref:`rst_avg`, :ref:`rst_max`, :ref:`rst_min`, :ref:`rst_median`, and :ref:`rst_pixelcount`. - * Also, scala does not have a :code:`df.display()` method while python does. In practice you would most often call - :code:`display(df)` in scala for a prettier output, but for brevity, we write :code:`df.show` in scala. + this schema. + * The function :ref:`rst_maketiles` allows for the raster tile schema to hold either a path pointer (string) + or a byte array representation of the source raster. It also supports optional checkpointing for increased + performance during chains of raster operations. + +Updates to the raster features for 0.4.1 +---------------------------------------- + + * In 0.4.1, there are a new set of raster apis that have not yet had python bindings generated; however you can still + call the functions with pyspark function :code:`selectExpr`, e.g. :code:`selectExpr("rst_avg(...)")` which invokes the sql + registered expression. The calls are: :ref:`rst_avg`, :ref:`rst_max`, :ref:`rst_min`, :ref:`rst_median`, and :ref:`rst_pixelcount`. + * Also, scala does not have a :code:`df.display()` method while python does. In practice you would most often call + :code:`display(df)` in scala for a prettier output, but for brevity, we write :code:`df.show` in scala. .. note:: For mosaic versions > 0.4.0 you can use the revamped setup_gdal function or new setup_fuse_install. These functions will configure an init script in your preferred Workspace, Volume, or DBFS location to install GDAL on your cluster. See :doc:`Install and Enable GDAL with Mosaic ` for more details. +Functions +######### + rst_avg ******* @@ -185,14 +199,7 @@ rst_clip .. function:: rst_clip(tile, geometry) - Clips the raster tile to the supported geometry (WKB, WKT, GeoJSON). - The geometry is expected to be in the same coordinate reference system as the raster. - The geometry is expected to be a polygon or a multipolygon. - The output raster will have the same extent as the input geometry. - The output raster will have the same number of bands as the input raster. - The output raster will have the same pixel type as the input raster. - The output raster will have the same pixel size as the input raster. - The output raster will have the same coordinate reference system as the input raster. + Clips :code:`tile` with :code:`geometry`, provided in a supported encoding (WKB, WKT or GeoJSON). :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) @@ -200,6 +207,21 @@ rst_clip :type geometry: Column (GeometryType) :rtype: Column: RasterTileType +.. note:: + Notes + + :code:`geometry` is expected to be: + - in the same coordinate reference system as the raster. + - a polygon or a multipolygon. + + The output raster tiles will have: + - the same extent as the input geometry. + - the same number of bands as the input raster. + - the same pixel data type as the input raster. + - the same pixel size as the input raster. + - the same coordinate reference system as the input raster. +.. + :example: .. tabs:: @@ -237,19 +259,20 @@ rst_combineavg .. function:: rst_combineavg(tiles) Combines a collection of raster tiles by averaging the pixel values. - The rasters must have the same extent, number of bands, and pixel type. - The rasters must have the same pixel size and coordinate reference system. - The output raster will have the same extent as the input rasters. - The output raster will have the same number of bands as the input rasters. - The output raster will have the same pixel type as the input rasters. - The output raster will have the same pixel size as the input rasters. - The output raster will have the same coordinate reference system as the input rasters. - Also, see :ref:`rst_combineavg_agg` function. :param tiles: A column containing an array of raster tiles. :type tiles: Column (ArrayType(RasterTileType)) :rtype: Column: RasterTileType +.. note:: + + Notes + - Each tile in :code:`tiles` must have the same extent, number of bands, pixel data type, pixel size and coordinate reference system. + - The output raster will have the same extent, number of bands, pixel data type, pixel size and coordinate reference system as the input tiles. + + Also, see :ref:`rst_combineavg_agg` function. +.. + :example: .. tabs:: @@ -289,20 +312,22 @@ rst_convolve .. function:: rst_convolve(tile, kernel) - Applies a convolution filter to the raster. - The result is Mosaic raster tile struct column to the filtered raster. - If used, the result is stored in the configured checkpoint directory. - The :code:`kernel` can be Array of Array of either Double, Integer, or Decimal; - ultimately all is cast to Double. Assumes the kernel is square and has an odd number - of rows and columns. Kernel uses the configured GDAL :code:`blockSize`` with a stride being - :code:`kernelSize/2`. - + Applies a convolution filter to the raster. The result is Mosaic raster tile representing the filtered input :code:`tile`. + :param tile: A column containing raster tile. :type tile: Column (RasterTileType) :param kernel: The kernel to apply to the raster. :type kernel: Column (ArrayType(ArrayType(DoubleType))) :rtype: Column: RasterTileType +.. note:: + Notes + - The :code:`kernel` can be Array of Array of either Double, Integer, or Decimal but will be cast to Double. + - This method assumes the kernel is square and has an odd number of rows and columns. + - Kernel uses the configured GDAL :code:`blockSize` with a stride being :code:`kernelSize/2`. + +.. + :example: .. tabs:: @@ -374,19 +399,11 @@ For clarity, this is ultimately the execution of the kernel. } rst_derivedband -************** +*************** .. function:: rst_derivedband(tiles, python_func, func_name) Combine an array of raster tiles using provided python function. - The rasters must have the same extent, number of bands, and pixel type. - The rasters must have the same pixel size and coordinate reference system. - The output raster will have the same extent as the input rasters. - The output raster will have the same number of bands as the input rasters. - The output raster will have the same pixel type as the input rasters. - The output raster will have the same pixel size as the input rasters. - The output raster will have the same coordinate reference system as the input rasters. - Also, see :ref:`rst_derivedband_agg` function. :param tiles: A column containing an array of raster tiles. :type tiles: Column (ArrayType(RasterTileType)) @@ -396,6 +413,15 @@ rst_derivedband :type func_name: Column (StringType) :rtype: Column: RasterTileType +.. note:: + Notes + - Input raster tiles in :code:`tiles` must have the same extent, number of bands, pixel data type, pixel size and coordinate reference system. + - The output raster will have the same the same extent, number of bands, pixel data type, pixel size and coordinate reference system as the input raster tiles. + + See also: :ref:`rst_derivedband_agg` function. +.. + + :example: .. tabs:: @@ -511,18 +537,23 @@ rst_frombands .. function:: rst_frombands(tiles) Combines a collection of raster tiles of different bands into a single raster. - The rasters must have the same extent. - The rasters must have the same pixel coordinate reference system. - The output raster will have the same extent as the input rasters. - The output raster will have the same number of bands as all the input raster bands. - The output raster will have the same pixel type as the input raster bands. - The output raster will have the same pixel size as the highest resolution input rasters. - The output raster will have the same coordinate reference system as the input rasters. :param tiles: A column containing an array of raster tiles. :type tiles: Column (ArrayType(RasterTileType)) :rtype: Column: RasterTileType +.. note:: + + Notes + - All raster tiles must have the same extent. + - The tiles must have the same pixel coordinate reference system. + - The output tile will have the same extent as the input tiles. + - The output tile will have the a number of bands equivalent to the number of input tiles. + - The output tile will have the same pixel type as the input tiles. + - The output tile will have the same pixel size as the highest resolution input tile. + - The output tile will have the same coordinate reference system as the input tiles. +.. + :example: .. tabs:: @@ -557,16 +588,12 @@ rst_frombands +----------------------------------------------------------------------------------------------------------------+ rst_fromcontent -************ +*************** .. function:: rst_fromcontent(raster_bin, driver, ) Returns a tile from raster data. - The raster must be a binary. - The driver must be one that GDAL can read. - If the size_in_MB parameter is specified, the raster will be split into tiles of the specified size. - If the size_in_MB parameter is not specified or if the size_in_Mb < 0, the raster will only be split if - it exceeds Integer.MAX_VALUE. The split will be at a threshold of 64MB in this case. + :param raster_bin: A column containing the raster data. :type raster_bin: Column (BinaryType) @@ -574,6 +601,17 @@ rst_fromcontent :type size_in_MB: Column (IntegerType) :rtype: Column: RasterTileType +.. note:: + + Notes + - The input raster must be a byte array in a BinaryType column. + - The driver required to read the raster must be one supplied with GDAL. + - If the size_in_MB parameter is specified, the raster will be split into tiles of the specified size. + - If the size_in_MB parameter is not specified or if the size_in_Mb < 0, the raster will only be split if it exceeds Integer.MAX_VALUE. The split will be at a threshold of 64MB in this case. + + +.. + :example: .. tabs:: @@ -620,12 +658,6 @@ rst_fromfile .. function:: rst_fromfile(path, ) Returns a raster tile from a file path. - The file path must be a string. - The file path must be a valid path to a raster file. - The file path must be a path to a file that GDAL can read. - If the size_in_MB parameter is specified, the raster will be split into tiles of the specified size. - If the size_in_MB parameter is not specified or if the size_in_Mb < 0, the raster will only be split if - it exceeds Integer.MAX_VALUE. The split will be at a threshold of 64MB in this case. :param path: A column containing the path to a raster file. :type path: Column (StringType) @@ -633,6 +665,17 @@ rst_fromfile :type size_in_MB: Column (IntegerType) :rtype: Column: RasterTileType +.. note:: + + Notes + - The file path must be a string. + - The file path must be a valid path to a raster file. + - The file path must be a path to a file that GDAL can read. + - If the size_in_MB parameter is specified, the raster will be split into tiles of the specified size. + - If the size_in_MB parameter is not specified or if the size_in_Mb < 0, the raster will only be split if it exceeds Integer.MAX_VALUE. The split will be at a threshold of 64MB in this case. +.. + + :example: .. tabs:: @@ -678,13 +721,14 @@ rst_georeference .. function:: rst_georeference(raster_tile) - Returns GeoTransform of the raster tile as a GT array of doubles. - GT(0) x-coordinate of the upper-left corner of the upper-left pixel. - GT(1) w-e pixel resolution / pixel width. - GT(2) row rotation (typically zero). - GT(3) y-coordinate of the upper-left corner of the upper-left pixel. - GT(4) column rotation (typically zero). - GT(5) n-s pixel resolution / pixel height (negative value for a north-up image). + Returns GeoTransform of the raster tile as a GT array of doubles. The output takes the form of a MapType with the following keys: + + - :code:`GT(0)` x-coordinate of the upper-left corner of the upper-left pixel. + - :code:`GT(1)` w-e pixel resolution / pixel width. + - :code:`GT(2)` row rotation (typically zero). + - :code:`GT(3)` y-coordinate of the upper-left corner of the upper-left pixel. + - :code:`GT(4)` column rotation (typically zero). + - :code:`GT(5)` n-s pixel resolution / pixel height (negative value for a north-up image). :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) @@ -723,8 +767,8 @@ rst_georeference | "upperLeftX": -180.00000610436345, "scaleX": 0.050000001695656514} | +--------------------------------------------------------------------------------------------+ -rest_getnodata -************** +rst_getnodata +************* .. function:: rst_getnodata(tile) @@ -770,9 +814,6 @@ rst_getsubdataset .. function:: rst_getsubdataset(tile, name) Returns the subdataset of the raster tile with a given name. - The subdataset name must be a string. The name is not a full path. - The name is the last identifier in the subdataset path (FORMAT:PATH:NAME). - The subdataset name must be a valid subdataset name for the raster. :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) @@ -780,6 +821,12 @@ rst_getsubdataset :type name: Column (StringType) :rtype: Column: RasterTileType +.. note:: + Notes + - :code:`name` should be the last identifier in the standard GDAL subdataset path: :code:`DRIVER:PATH:NAME`. + - :code:`name` must be a valid subdataset name for the raster, i.e. it must exist within the raster. +.. + :example: .. tabs:: @@ -860,20 +907,48 @@ rst_initnodata .. function:: rst_initnodata(tile) Initializes the nodata value of the raster tile bands. - The nodata value will be set to default values for the pixel type of the raster bands. - The output raster will have the same extent as the input raster. - The default nodata value for ByteType is 0. - The default nodata value for UnsignedShortType is UShort.MaxValue (65535). - The default nodata value for ShortType is Short.MinValue (-32768). - The default nodata value for UnsignedIntegerType is Int.MaxValue (4.294967294E9). - The default nodata value for IntegerType is Int.MinValue (-2147483648). - The default nodata value for FloatType is Float.MinValue (-3.4028234663852886E38). - The default nodata value for DoubleType is Double.MinValue (-1.7976931348623157E308). :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :rtype: Column: RasterTileType +.. note:: + + Notes + - The nodata value will be set to a default sentinel values according to the pixel data type of the raster bands. + - The output raster will have the same extent as the input raster. + + .. list-table:: Default nodata values for raster data types + :widths: 25 25 50 + :header-rows: 1 + + * - Data Type + - Scala representation + - Value + * - ByteType + - + - 0 + * - UnsignedShortType + - :code:`UShort.MaxValue` + - 65535 + * - ShortType + - :code:`Short.MinValue` + - -32768 + * - UnsignedIntegerType + - :code:`Int.MaxValue` + - 4.294967294E9 + * - IntegerType + - :code:`Int.MinValue` + - -2147483648 + * - FloatType + - :code:`Float.MinValue` + - -3.4028234663852886E38 + * - DoubleType + - :code:`Double.MinValue` + - -1.7976931348623157E308 + .. +.. + :example: .. tabs:: @@ -953,20 +1028,7 @@ rst_maketiles .. function:: rst_maketiles(input, driver, size, withCheckpoint) - Tiles the raster into tiles of the given size. - If the raster is stored on disk, :code:`input` is the path to the raster, - similar to :ref:`rst_fromfile`. - If the raster is stored in memory, :code:`input` is the bytes of the raster, - similar to :ref:`rst_fromcontent`. - If not specified, :code:`driver` is inferred from the file extension; if the - input is a byte array, the driver has to be specified. - If :code:`size` is set to -1, the file is loaded and returned as a single - tile; if set to 0, the file is loaded and subdivided into tiles of size 64MB; if set - to a positive value, the file is loaded and subdivided into tiles of the - specified size; if the file is too big to fit in memory, it is subdivided - into tiles of size 64MB. - If :code:`with_checkpoint` set to true, the tiles are written to the checkpoint - directory; if set to false, the tiles are returned as a in-memory byte arrays. + Tiles the raster into tiles of the given size, optionally writing them to disk in the process. :param input: path (StringType) or content (BinaryType) :type input: Column @@ -978,6 +1040,33 @@ rst_maketiles :type with_checkpoint: Column(BooleanType) :rtype: Column: RasterTileType +.. note:: + + Notes: + + :code:`input` + - If the raster is stored on disk, :code:`input` should be the path to the raster, similar to :ref:`rst_fromfile`. + - If the raster is stored in memory, :code:`input` should be the byte array representation of the raster, similar to :ref:`rst_fromcontent`. + + :code:`driver` + - If not specified, :code:`driver` is inferred from the file extension + - If the input is a byte array, the driver must be explicitly specified. + + :code:`size` + - If :code:`size` is set to -1, the file is loaded and returned as a single tile + - If set to 0, the file is loaded and subdivided into tiles of size 64MB + - If set to a positive value, the file is loaded and subdivided into tiles of the specified size + - If the file is too big to fit in memory, it is subdivided into tiles of size 64MB. + + :code:`with_checkpoint` + - If :code:`with_checkpoint` set to true, the tiles are written to the checkpoint directory + - If set to false, the tiles are returned as in-memory byte arrays. + + Once enabled, checkpointing will remain enabled for tiles originating from this function, + meaning follow-on calls will also use checkpointing. To switch away from checkpointing down the line, + you could call :ref:`rst_fromfile` using the checkpointed locations as the :code:`path` input. +.. + :example: .. tabs:: @@ -1013,11 +1102,7 @@ rst_maketiles | "parentPath":"no_path","driver":"GTiff","path":"...","last_error":""}} | +------------------------------------------------------------------------+ -.. note:: - In initially enabled, checkpointing will remain on for tiles originating from this function, - meaning follow-on calls will also use checkpointing after first enabled. To switch away - from checkpointing down the line, you could call :ref:`rst_fromfile` using the checkpointed - locations as the :code:`path` input. + rst_mapalgebra ************** @@ -1025,20 +1110,12 @@ rst_mapalgebra .. function:: rst_mapalgebra(tile, json_spec) Performs map algebra on the raster tile. - Rasters are provided as 'A' to 'Z' values. - Bands are provided as 0..n values. - Uses gdal_calc: command line raster calculator with numpy syntax. Use any basic arithmetic supported by numpy - arrays (such as +, -, *, and /) along with logical operators (such as >, <, =). For this distributed implementation, - all rasters must have the same dimensions and no projection checking is performed. - Here are examples of the json_spec': (1) shows default indexing, (2) shows reusing an index, - and (3) shows band indexing. + Employs the :code:`gdal_calc` command line raster calculator with standard numpy syntax. + Use any basic arithmetic supported by numpy arrays (such as \+, \-, \*, and /) along with + logical operators (such as >, <, =). - .. code-block:: text - - (1) '{"calc": "A+B/C"}' - (2) '{"calc": "A+B/C", "A_index": 0, "B_index": 1, "C_index": 1}' - (3) '{"calc": "A+B/C", "A_index": 0, "B_index": 1, "C_index": 2, "A_band": 1, "B_band": 1, "C_band": 1}' + For this distributed implementation, all rasters must have the same dimensions and no projection checking is performed. :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) @@ -1046,6 +1123,29 @@ rst_mapalgebra :type json_spec: Column (StringType) :rtype: Column: RasterTileType +.. note:: + The :code:`json_spec` parameter + - Input rasters to the algebra function are referencable as variables with names :code:`A` through :code:`Z`. + - Bands from the input :code:`tile` are referencable using ordinal 0..n values. + + Examples of valid :code:`json_spec` + + + .. code-block:: text + + (1) '{"calc": "A+B/C"}' + (2) '{"calc": "A+B/C", "A_index": 0, "B_index": 1, "C_index": 1}' + (3) '{"calc": "A+B/C", "A_index": 0, "B_index": 1, "C_index": 2, "A_band": 1, "B_band": 1, "C_band": 1}' + + .. + + In these examples: + + 1. demonstrates default indexing (i.e. the first three bands in :code:`tile` are assigned A, B and C respectively) + 2. demonstrates reusing an index (B and C represent the same band); and + 3. shows band indexing. +.. + :example: .. tabs:: @@ -1212,22 +1312,32 @@ rst_merge .. function:: rst_merge(tiles) Combines a collection of raster tiles into a single raster. - The rasters do not need to have the same extent. - The rasters must have the same coordinate reference system. - The rasters are combined using gdalwarp. - The noData value needs to be initialised; if not, the non valid pixels may introduce artifacts in the output raster. - The rasters are stacked in the order they are provided. - The output raster will have the extent covering all input rasters. - The output raster will have the same number of bands as the input rasters. - The output raster will have the same pixel type as the input rasters. - The output raster will have the same pixel size as the highest resolution input rasters. - The output raster will have the same coordinate reference system as the input rasters. - Also, see :ref:`rst_merge_agg` function. :param tiles: A column containing an array of raster tiles. :type tiles: Column (ArrayType(RasterTileType)) :rtype: Column: RasterTileType +.. note:: + Notes + + Input tiles supplied in :code:`tiles`: + - are not required to have the same extent. + - must have the same coordinate reference system. + - must have the same pixel data type. + - will be combined using the :code:`gdalwarp` command. + - require a :code:`noData` value to have been initialised (if this is not the case, the non valid pixels may introduce artifacts in the output raster). + - will be stacked in the order they are provided. + + The resulting output raster will have: + - an extent that covers all of the input tiles; + - the same number of bands as the input tiles; + - the same pixel type as the input tiles; + - the same pixel size as the highest resolution input tiles; and + - the same coordinate reference system as the input tiles. + + See also :ref:`rst_merge_agg` function. +.. + :example: .. tabs:: @@ -1378,11 +1488,6 @@ rst_ndvi .. function:: rst_ndvi(tile, red_band_num, nir_band_num) Calculates the Normalized Difference Vegetation Index (NDVI) for a raster. - The NDVI is calculated using the formula: (NIR - RED) / (NIR + RED). - The output raster will have the same extent as the input raster. - The output raster will have a single band. - The output raster will have a pixel type of float64. - The output raster will have the same coordinate reference system as the input raster. :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) @@ -1392,6 +1497,16 @@ rst_ndvi :type nir_band_num: Column (IntegerType) :rtype: Column: RasterTileType +.. note:: + NDVI is calculated using the formula: (NIR - RED) / (NIR + RED). + + The output raster tiles will have: + - the same extent as the input raster. + - a single band. + - a pixel data type of float64. + - the same coordinate reference system as the input raster. +.. + :example: .. tabs:: @@ -1602,10 +1717,10 @@ rst_rastertogridavg .. function:: rst_rastertogridavg(tile, resolution) - The result is a 2D array of cells, where each cell is a struct of (cellID, value). - For getting the output of cellID->value pairs, please use explode() function twice. - CellID can be LongType or StringType depending on the configuration of MosaicContext. - The value/measure for each cell is the average of the pixel values in the cell. + + Compute the gridwise mean of the pixel values in :code:`tile`. + + The result is a 2D array of cells, where each cell is a struct of (:code:`cellID`, :code:`value`). :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) @@ -1613,6 +1728,13 @@ rst_rastertogridavg :type resolution: Column (IntegerType) :rtype: Column: ArrayType(ArrayType(StructType(LongType|StringType, DoubleType))) +.. note:: + Notes + - To obtain cellID->value pairs, use the Spark SQL explode() function twice. + - CellID can be LongType or StringType depending on the configuration of MosaicContext. + - The value/measure for each cell is the average of the pixel values in the cell. +.. + :example: .. tabs:: @@ -1671,10 +1793,9 @@ rst_rastertogridcount .. function:: rst_rastertogridcount(tile, resolution) - The result is a 2D array of cells, where each cell is a struct of (cellID, value). - For getting the output of cellID->value pairs, please use explode() function twice. - CellID can be LongType or StringType depending on the configuration of MosaicContext. - The value/measure for each cell is the average of the pixel values in the cell. + Compute the gridwise count of the pixels in :code:`tile`. + + The result is a 2D array of cells, where each cell is a struct of (:code:`cellID`, :code:`value`). :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) @@ -1682,6 +1803,13 @@ rst_rastertogridcount :type resolution: Column (IntegerType) :rtype: Column: ArrayType(ArrayType(StructType(LongType|StringType, DoubleType))) +.. note:: + Notes + - To obtain cellID->value pairs, use the Spark SQL explode() function twice. + - CellID can be LongType or StringType depending on the configuration of MosaicContext. + - The value/measure for each cell is the count of the pixel values in the cell. +.. + :example: .. tabs:: @@ -1740,10 +1868,9 @@ rst_rastertogridmax .. function:: rst_rastertogridmax(tile, resolution) - The result is a 2D array of cells, where each cell is a struct of (cellID, value). - For getting the output of cellID->value pairs, please use explode() function twice. - CellID can be LongType or StringType depending on the configuration of MosaicContext. - The value/measure for each cell is the maximum pixel value. + Compute the gridwise maximum of the pixels in :code:`tile`. + + The result is a 2D array of cells, where each cell is a struct of (:code:`cellID`, :code:`value`). :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) @@ -1751,6 +1878,13 @@ rst_rastertogridmax :type resolution: Column (IntegerType) :rtype: Column: ArrayType(ArrayType(StructType(LongType|StringType, DoubleType))) +.. note:: + Notes + - To obtain cellID->value pairs, use the Spark SQL explode() function twice. + - CellID can be LongType or StringType depending on the configuration of MosaicContext. + - The value/measure for each cell is the maximum of the pixel values in the cell. +.. + :example: .. tabs:: @@ -1809,10 +1943,9 @@ rst_rastertogridmedian .. function:: rst_rastertogridmedian(tile, resolution) - The result is a 2D array of cells, where each cell is a struct of (cellID, value). - For getting the output of cellID->value pairs, please use explode() function twice. - CellID can be LongType or StringType depending on the configuration of MosaicContext. - The value/measure for each cell is the median pixel value. + Compute the gridwise median value of the pixels in :code:`tile`. + + The result is a 2D array of cells, where each cell is a struct of (:code:`cellID`, :code:`value`). :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) @@ -1820,6 +1953,13 @@ rst_rastertogridmedian :type resolution: Column (IntegerType) :rtype: Column: ArrayType(ArrayType(StructType(LongType|StringType, DoubleType))) +.. note:: + Notes + - To obtain cellID->value pairs, use the Spark SQL explode() function twice. + - CellID can be LongType or StringType depending on the configuration of MosaicContext. + - The value/measure for each cell is the median of the pixel values in the cell. +.. + :example: .. tabs:: @@ -1855,7 +1995,7 @@ rst_rastertogridmedian .. code-tab:: sql - SELECT rst_rastertogridmax(tile, 3) FROM table + SELECT rst_rastertogridmedian(tile, 3) FROM table +--------------------------------------------------------------------------------------------------------------------+ | rst_rastertogridmedian(tile, 3) | +--------------------------------------------------------------------------------------------------------------------+ @@ -1878,10 +2018,9 @@ rst_rastertogridmin .. function:: rst_rastertogridmin(tile, resolution) - The result is a 2D array of cells, where each cell is a struct of (cellID, value). - For getting the output of cellID->value pairs, please use explode() function twice. - CellID can be LongType or StringType depending on the configuration of MosaicContext. - The value/measure for each cell is the median pixel value. + Compute the gridwise minimum of the pixel values in :code:`tile`. + + The result is a 2D array of cells, where each cell is a struct of (:code:`cellID`, :code:`value`). :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) @@ -1889,6 +2028,13 @@ rst_rastertogridmin :type resolution: Column (IntegerType) :rtype: Column: ArrayType(ArrayType(StructType(LongType|StringType, DoubleType))) +.. note:: + Notes + - To obtain cellID->value pairs, use the Spark SQL explode() function twice. + - CellID can be LongType or StringType depending on the configuration of MosaicContext. + - The value/measure for each cell is the minimum of the pixel values in the cell. +.. + :example: .. tabs:: @@ -1948,8 +2094,6 @@ rst_rastertoworldcoord .. function:: rst_rastertoworldcoord(tile, x, y) Computes the world coordinates of the raster tile at the given x and y pixel coordinates. - The result is a WKT point geometry. - The coordinates are computed using the GeoTransform of the raster to respect the projection. :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) @@ -1959,6 +2103,12 @@ rst_rastertoworldcoord :type y: Column (IntegerType) :rtype: Column: StringType +.. note:: + Notes + - The result is a WKT point geometry. + - The coordinates are computed using the GeoTransform of the raster to respect the projection. +.. + :example: .. tabs:: @@ -1990,13 +2140,15 @@ rst_rastertoworldcoord +------------------------------------------------------------------------------------------------------------------+ rst_rastertoworldcoordx -********************** +*********************** -.. function:: rst_rastertoworldcoord(tile, x, y) +.. function:: rst_rastertoworldcoordx(tile, x, y) Computes the world coordinates of the raster tile at the given x and y pixel coordinates. + The result is the X coordinate of the point after applying the GeoTransform of the raster. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :param x: x coordinate of the pixel. @@ -2036,12 +2188,13 @@ rst_rastertoworldcoordx +------------------------------------------------------------------------------------------------------------------+ rst_rastertoworldcoordy -********************** +*********************** .. function:: rst_rastertoworldcoordy(tile, x, y) Computes the world coordinates of the raster tile at the given x and y pixel coordinates. - The result is the X coordinate of the point after applying the GeoTransform of the raster. + + The result is the Y coordinate of the point after applying the GeoTransform of the raster. :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) @@ -2082,14 +2235,11 @@ rst_rastertoworldcoordy +------------------------------------------------------------------------------------------------------------------+ rst_retile -********************** +********** .. function:: rst_retile(tile, width, height) Retiles the raster tile to the given size. The result is a collection of new raster tiles. - The new rasters are stored in the checkpoint directory. - The results are the paths to the new rasters. - The result set is automatically exploded. :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) @@ -2133,13 +2283,12 @@ rst_retile +------------------------------------------------------------------------------------------------------------------+ rst_rotation -********************** +************ .. function:: rst_rotation(tile) - Computes the rotation of the raster tile in degrees. - The rotation is the angle between the X axis and the North axis. - The rotation is computed using the GeoTransform of the raster. + Computes the angle of rotation between the X axis of the raster tile and geographic North in degrees + using the GeoTransform of the raster. :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) @@ -2179,7 +2328,7 @@ rst_rotation +------------------------------------------------------------------------------------------------------------------+ rst_scalex -********************** +********** .. function:: rst_scalex(tile) @@ -2220,7 +2369,7 @@ rst_scalex +------------------------------------------------------------------------------------------------------------------+ rst_scaley -********************** +********** .. function:: rst_scaley(tile) @@ -2265,14 +2414,18 @@ rst_separatebands .. function:: rst_separatebands(tile) - Returns a set of new single-band rasters, one for each band in the input raster. - Result set is automatically exploded based on how many bands exist. - Prior to the explode, you may want to maintain a column in the dataframe with a raster identifier. + Returns a set of new single-band rasters, one for each band in the input raster. The result set will contain one row + per input band for each :code:`tile` provided. :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :rtype: Column: (RasterTileType) +.. note:: + ️⚠️ Before performing this operation, you may want to add an identifier column to the dataframe to trace each band + back to its original parent raster. +.. + :example: .. tabs:: @@ -2314,10 +2467,7 @@ rst_setnodata .. function:: rst_setnodata(tile, nodata) - Sets the nodata value of the raster tile. - The result is a new raster tile with the nodata value set. - The same nodata value is set for all bands of the raster if a single value is passed. - If an array of values is passed, the nodata value is set for each band of the raster. + Returns a new raster tile with the nodata value set to :code:`nodata`. :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) @@ -2325,6 +2475,12 @@ rst_setnodata :type nodata: Column (DoubleType) / ArrayType(DoubleType) :rtype: Column: (RasterTileType) +.. note:: + Notes + - If a single :code:`nodata` value is passed, the same nodata value is set for all bands of :code:`tile`. + - If an array of values is passed, the respective :code:`nodata` value is set for each band of :code:`tile`. +.. + :example: .. tabs:: @@ -2360,7 +2516,7 @@ rst_setnodata +------------------------------------------------------------------------------------------------------------------+ rst_skewx -********************** +********* .. function:: rst_skewx(tile) @@ -2401,9 +2557,9 @@ rst_skewx +------------------------------------------------------------------------------------------------------------------+ rst_skewy -********************** +********* -.. function:: rst_skewx(tile) +.. function:: rst_skewy(tile) Computes the skew of the raster tile in the Y direction. @@ -2442,12 +2598,11 @@ rst_skewy +------------------------------------------------------------------------------------------------------------------+ rst_srid -********************** +******** .. function:: rst_srid(tile) - Computes the SRID of the raster tile. - The SRID is the EPSG code of the raster. + Returns the SRID of the raster tile as an EPSG code. .. note:: For complex CRS definition the EPSG code may default to 0. @@ -2490,8 +2645,8 @@ rst_subdatasets .. function:: rst_subdatasets(tile) - Computes the subdatasets of the raster tile. - The subdatasets are the paths to the subdatasets of the raster. + Returns the subdatasets of the raster tile as a set of paths in the standard GDAL format. + The result is a map of the subdataset path to the subdatasets and the description of the subdatasets. :param tile: A column containing the raster tile. @@ -2538,23 +2693,26 @@ rst_subdatasets +--------------------------------------------------------------------------------------------------------------------+ rst_subdivide -********************** +************* .. function:: rst_subdivide(tile, sizeInMB) Subdivides the raster tile to the given tile size in MB. The result is a collection of new raster tiles. - The tiles are split until the expected size of a tile is < size_in_MB. - The tile is always split in 4 tiles. This ensures that the tiles are always split in the same way. - The aspect ratio of the tiles is preserved. - The result set is automatically exploded. - - .. note:: The size of the tiles is approximate. Due to compressions and other effects we cannot guarantee the size of the tiles in MB. :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :param size_in_MB: The size of the tiles in MB. :type size_in_MB: Column (IntegerType) +.. note:: + Notes + - Each :code:`tile` will be recursively split along two orthogonal axes until the expected size of the last child tile is < :code:`size_in_MB`. + - The aspect ratio of the tiles is preserved. + - The result set is automatically exploded. + + The size of the resulting tiles is approximate. Due to compression and other effects we cannot guarantee the size of the tiles in MB. +.. + :example: .. tabs:: @@ -2590,14 +2748,14 @@ rst_subdivide +------------------------------------------------------------------------------------------------------------------+ rst_summary -********************** +*********** .. function:: rst_summary(tile) - Computes the summary of the raster tile. - The summary is a map of the statistics of the raster. - The logic is produced by gdalinfo procedure. - The result is stored as JSON. + Returns a summary description of the raster tile including metadata and statistics in JSON format. + + Values returned here are produced by the :code:`gdalinfo` procedure. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) @@ -2646,21 +2804,27 @@ rst_summary +------------------------------------------------------------------------------------------------------------------+ rst_tessellate -********************** +************** .. function:: rst_tessellate(tile, resolution) - Tessellates the raster tile to the given resolution of the supported grid (H3, BNG, Custom). The result is a collection of new raster tiles. - Each tile in the tile set corresponds to a cell that is a part of the tesselation of the bounding box of the raster. - The result set is automatically exploded. - If rst_merge is called on the tile set the original raster will be reconstructed. - The output tiles have same number of bands as the input rasters. + Divides the raster tile into tessellating chips for the given resolution of the supported grid (H3, BNG, Custom). + The result is a collection of new raster tiles. + + Each tile in the tile set corresponds to an index cell intersecting the bounding box of :code:`tile`. :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) :param resolution: The resolution of the supported grid. :type resolution: Column (IntegerType) +.. note:: + Notes + - The result set is automatically exploded into a row-per-index-cell. + - If :ref:`rst_merge` is called on output tile set, the original raster will be reconstructed. + - Each output tile chip will have the same number of bands as its parent :code:`tile`. +.. + :example: .. tabs:: @@ -2699,12 +2863,11 @@ rst_tooverlappingtiles .. function:: rst_tooverlappingtiles(tile, width, height, overlap) - Splits the raster tile into overlapping tiles of the given width and height. - The overlap is the the percentage of the tile size that the tiles overlap. - The result is a collection of new raster files. - The result set is automatically exploded. - If rst_merge is called on the tile set the original raster will be reconstructed. - The output tiles have same number of bands as the input rasters. + Splits each :code:`tile` into a collection of new raster tiles of the given width and height, + with an overlap of :code:`overlap` percent. + + The result set is automatically exploded into a row-per-subtile. + :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) @@ -2715,6 +2878,12 @@ rst_tooverlappingtiles :param overlap: The overlap of the tiles in percentage. :type overlap: Column (IntegerType) +.. note:: + Notes + - If :ref:`rst_merge` is called on the tile set the original raster will be reconstructed. + - Each output tile chip will have the same number of bands as its parent :code:`tile`. +.. + :example: .. tabs:: @@ -2754,8 +2923,6 @@ rst_transform .. function:: rst_transform(tile,srid) Transforms the raster to the given SRID. - The result is a Mosaic raster tile struct of the transformed raster. - If using checkpointing, the result will be stored there. :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) @@ -2846,8 +3013,7 @@ rst_upperleftx .. function:: rst_upperleftx(tile) - Computes the upper left X coordinate of the raster tile. - The value is computed based on GeoTransform. + Computes the upper left X coordinate of :code:`tile` based its GeoTransform. :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) @@ -2888,8 +3054,7 @@ rst_upperlefty .. function:: rst_upperlefty(tile) - Computes the upper left Y coordinate of the raster tile. - The value is computed based on GeoTransform. + Computes the upper left Y coordinate of :code:`tile` based its GeoTransform. :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) @@ -2972,10 +3137,8 @@ rst_worldtorastercoord .. function:: rst_worldtorastercoord(tile, xworld, yworld) - Computes the raster tile coordinates of the world coordinates. - The raster coordinates are the pixel coordinates of the raster. - The world coordinates are the coordinates in the CRS of the raster. - The coordinates are resolved using GeoTransform. + Computes the (j, i) pixel coordinates of :code:`xworld` and :code:`yworld` within :code:`tile` + using the CRS of :code:`tile`. :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) @@ -3020,11 +3183,8 @@ rst_worldtorastercoordx .. function:: rst_worldtorastercoordx(tile, xworld, yworld) - Computes the raster tile coordinates of the world coordinates. - The raster coordinates are the pixel coordinates of the raster. - The world coordinates are the coordinates in the CRS of the raster. - The coordinates are resolved using GeoTransform. - This method returns the X coordinate. + Computes the j pixel coordinate of :code:`xworld` and :code:`yworld` within :code:`tile` + using the CRS of :code:`tile`. :param tile: A column containing the raster tile. @@ -3070,11 +3230,8 @@ rst_worldtorastercoordy .. function:: rst_worldtorastercoordy(tile, xworld, yworld) - Computes the raster tile coordinates of the world coordinates. - The raster coordinates are the pixel coordinates of the raster. - The world coordinates are the coordinates in the CRS of the raster. - The coordinates are resolved using GeoTransform. - This method returns the Y coordinate. + Computes the i pixel coordinate of :code:`xworld` and :code:`yworld` within :code:`tile` + using the CRS of :code:`tile`. :param tile: A column containing the raster tile. From e204fd4b47b8e304e6dc9e6c4456b8df492156d5 Mon Sep 17 00:00:00 2001 From: Stuart Lynn Date: Fri, 5 Apr 2024 22:03:29 +0100 Subject: [PATCH 6/7] updated aggregation func docs --- docs/source/api/spatial-aggregations.rst | 113 +++++++++++++---------- 1 file changed, 65 insertions(+), 48 deletions(-) diff --git a/docs/source/api/spatial-aggregations.rst b/docs/source/api/spatial-aggregations.rst index e14ab1fa2..d426c40c8 100644 --- a/docs/source/api/spatial-aggregations.rst +++ b/docs/source/api/spatial-aggregations.rst @@ -9,16 +9,17 @@ st_asgeojsontile_agg .. function:: st_asgeojsontile_agg(geom, attributes) Generates GeoJSON vector tiles from a group by statement over aggregated geometry column. - :code:`geom` column is WKB, WKT, or GeoJSON. - :code:`attributes` column is a spark struct; it requires minimally "id". + + - :code:`geom` column is WKB, WKT, or GeoJSON. + - :code:`attributes` column is a Spark struct; it requires minimally "id". :param geom: A grouped column containing geometries. :type geom: Column - :param attributes: the attributes column to aggregate. + :param attributes: The attributes column to aggregate. :type attributes: Column(StructType) :rtype: Column - :example: + :example: .. tabs:: .. code-tab:: py @@ -62,11 +63,6 @@ st_asmvttile_agg .. function:: st_asmvttile_agg(geom, attributes, zxyID) Generates Mapbox Vector Tiles from a group by statement over aggregated geometry column. - :code:`geom` column is Mosaic Internal Geometry, e.g. using ST_GeomFrom[WKB|WKT|GeoJSON]. - The geometry that you work on requires an SRID, recommend using ST_UpdateSRID, - e.g. from 4326 to 3857 (required SRID). - :code:`attributes` column is a spark struct; it requires minimally "id". - :code:`zxyID` column is a string. :param geom: A grouped column containing geometries. :type geom: Column @@ -76,7 +72,18 @@ st_asmvttile_agg :type attributes: Column(StringType) :rtype: Column - :example: +.. note:: + Notes + - :code:`geom` column must be represented using the Mosaic Internal Geometry, + e.g. using :code:`ST_GeomFrom[WKB|WKT|GeoJSON]`. + + - The geometry used in this operation must have an SRID set. + Use e.g. :code:`ST_SetSRID` or :code:`ST_UpdateSRID` to achieve this. + - MVT tiles require the SRID to be set to EPSG::3857. + - :code:`attributes` column is a Spark struct; it requires at least an "id" member. +.. + + :example: .. tabs:: .. code-tab:: py @@ -104,7 +111,7 @@ st_asmvttile_agg SELECT st_asmvttile_agg(geom_3857, struct(id), zxyID) FROM table GROUP BY 1 - +----------------------------------------------------------------------------------------------------------------+ + +----------------------------------------------------------------------------------------------------------------+ | st_asmvttile_agg(geom_3857, struct(id), zxyID) | +----------------------------------------------------------------------------------------------------------------+ | H4sIAAAAAAAAA5Ny5GItycxJLRZSFmJiYJBgVpLmfKXxwySIgYmZg5mJkZGRgYGRiZGFFYgZ+KWYMlOUuDQavk05e+ntl1fCGg0KFUwA... | @@ -112,23 +119,24 @@ st_asmvttile_agg rst_combineavg_agg -***************** +****************** .. function:: rst_combineavg_agg(tile) - Combines a group by statement over aggregated raster tiles by averaging the pixel values. - The rasters must have the same extent, number of bands, and pixel type. - The rasters must have the same pixel size and coordinate reference system. - The output raster will have the same extent as the input rasters. - The output raster will have the same number of bands as the input rasters. - The output raster will have the same pixel type as the input rasters. - The output raster will have the same pixel size as the input rasters. - The output raster will have the same coordinate reference system as the input rasters. + Aggregates raster tiles by averaging pixel values. :param tile: A grouped column containing raster tiles. :type tile: Column (RasterTileType) :rtype: Column: RasterTileType +.. note:: + + Notes + - Each :code:`tile` must have the same extent, number of bands, pixel data type, pixel size and coordinate reference system. + - The output raster will have the same extent, number of bands, pixel data type, pixel size and coordinate reference system as the input tiles. + + Also, see :ref:`rst_combineavg_agg` function. +.. :example: .. tabs:: @@ -165,18 +173,11 @@ rst_combineavg_agg rst_derivedband_agg -***************** +******************* .. function:: rst_derivedband_agg(tile, python_func, func_name) Combines a group by statement over aggregated raster tiles by using the provided python function. - The rasters must have the same extent, number of bands, and pixel type. - The rasters must have the same pixel size and coordinate reference system. - The output raster will have the same extent as the input rasters. - The output raster will have the same number of bands as the input rasters. - The output raster will have the same pixel type as the input rasters. - The output raster will have the same pixel size as the input rasters. - The output raster will have the same coordinate reference system as the input rasters. :param tile: A grouped column containing raster tile(s). :type tile: Column (RasterTileType) @@ -186,6 +187,12 @@ rst_derivedband_agg :type func_name: Column (StringType) :rtype: Column: RasterTileType +.. note:: + Notes + - Input raster tiles in :code:`tile` must have the same extent, number of bands, pixel data type, pixel size and coordinate reference system. + - The output raster will have the same the same extent, number of bands, pixel data type, pixel size and coordinate reference system as the input raster tiles. +.. + :example: .. tabs:: @@ -257,29 +264,39 @@ rst_derivedband_agg rst_merge_agg -************ +************* .. function:: rst_merge_agg(tile) - Combines a grouped aggregate of raster tiles into a single raster. - The rasters do not need to have the same extent. - The rasters must have the same coordinate reference system. - The rasters are combined using gdalwarp. - The noData value needs to be initialised; if not, the non valid pixels may introduce artifacts in the output raster. - The rasters are stacked in the order they are provided. - This order is randomized since this is an aggregation function. - If the order of rasters is important please first collect rasters and sort them by metadata information and then use - rst_merge function. - The output raster will have the extent covering all input rasters. - The output raster will have the same number of bands as the input rasters. - The output raster will have the same pixel type as the input rasters. - The output raster will have the same pixel size as the highest resolution input rasters. - The output raster will have the same coordinate reference system as the input rasters. + Aggregates raster tiles into a single raster. :param tile: A column containing raster tiles. :type tile: Column (RasterTileType) :rtype: Column: RasterTileType +.. note:: + Notes + + Input tiles in :code:`tile`: + - are not required to have the same extent. + - must have the same coordinate reference system. + - must have the same pixel data type. + - will be combined using the :code:`gdalwarp` command. + - require a :code:`noData` value to have been initialised (if this is not the case, the non valid pixels may introduce artifacts in the output raster). + - will be stacked in the order they are provided. + - This order is randomized since this is an aggregation function. + - If the order of rasters is important please first collect rasters and sort them by metadata information and then use rst_merge function. + + The resulting output raster will have: + - an extent that covers all of the input tiles; + - the same number of bands as the input tiles; + - the same pixel type as the input tiles; + - the same pixel size as the highest resolution input tiles; and + - the same coordinate reference system as the input tiles. + + See also :ref:`rst_merge` function. +.. + :example: .. tabs:: @@ -315,8 +332,8 @@ rst_merge_agg +----------------------------------------------------------------------------------------------------------------+ -st_intersects_aggregate -*********************** +st_intersects_agg +***************** .. function:: st_intersects_agg(leftIndex, rightIndex) @@ -406,7 +423,7 @@ st_intersects_aggregate st_intersection_agg -************************* +******************* .. function:: st_intersection_agg(leftIndex, rightIndex) @@ -550,7 +567,7 @@ st_union_agg +-------------------------------------------------------------------------+ grid_cell_intersection_agg -************ +************************** .. function:: grid_cell_intersection_agg(chips) @@ -604,7 +621,7 @@ grid_cell_intersection_agg +--------------------------------------------------------+ grid_cell_union_agg -************ +******************* .. function:: grid_cell_union_agg(chips) From 4c900a8f0cc0ea540df41621273d3633d05bd4b3 Mon Sep 17 00:00:00 2001 From: Stuart Lynn Date: Fri, 5 Apr 2024 22:10:54 +0100 Subject: [PATCH 7/7] updated spatial func docs --- docs/source/api/spatial-functions.rst | 231 ++++++++++++++------------ 1 file changed, 123 insertions(+), 108 deletions(-) diff --git a/docs/source/api/spatial-functions.rst b/docs/source/api/spatial-functions.rst index 19310964f..9e2dfbc55 100644 --- a/docs/source/api/spatial-functions.rst +++ b/docs/source/api/spatial-functions.rst @@ -300,7 +300,7 @@ st_centroid st_concavehull -************* +************** .. function:: st_concavehull(col, concavity, ) @@ -757,8 +757,67 @@ st_geometrytype +--------------------+ +st_hasvalidcoordinates +********************** + +.. function:: st_hasvalidcoordinates(col, crs, which) + + Checks if all points in :code:`geom` are valid with respect to crs bounds. + CRS bounds can be provided either as bounds or as reprojected_bounds. + + :param col: Geometry + :type col: Column + :param crs: CRS name (EPSG ID), e.g. "EPSG:2192" + :type crs: Column + :param which: Check against geographic :code:`"bounds"` or geometric :code:`"reprojected_bounds"` bounds. + :type which: Column + :rtype: Column: IntegerType + + :example: + +.. tabs:: + .. code-tab:: py + + df = spark.createDataFrame([{'wkt': 'POLYGON((5.84 45.64, 5.92 45.64, 5.89 45.81, 5.79 45.81, 5.84 45.64))'}]) + df.select(st_hasvalidcoordinates(col('wkt'), lit('EPSG:2192'), lit('bounds'))).show() + +----------------------------------------------+ + |st_hasvalidcoordinates(wkt, EPSG:2192, bounds)| + +----------------------------------------------+ + | true| + +----------------------------------------------+ + + .. code-tab:: scala + + val df = List(("POLYGON((5.84 45.64, 5.92 45.64, 5.89 45.81, 5.79 45.81, 5.84 45.64))")).toDF("wkt") + df.select(st_hasvalidcoordinates(col("wkt"), lit("EPSG:2192"), lit("bounds"))).show() + +----------------------------------------------+ + |st_hasvalidcoordinates(wkt, EPSG:2192, bounds)| + +----------------------------------------------+ + | true| + +----------------------------------------------+ + + .. code-tab:: sql + + SELECT st_hasvalidcoordinates("POLYGON((5.84 45.64, 5.92 45.64, 5.89 45.81, 5.79 45.81, 5.84 45.64))", "EPSG:2192", "bounds") + +----------------------------------------------+ + |st_hasvalidcoordinates(wkt, EPSG:2192, bounds)| + +----------------------------------------------+ + | true| + +----------------------------------------------+ + + .. code-tab:: r R + + df <- createDataFrame(data.frame(wkt = "POLYGON((5.84 45.64, 5.92 45.64, 5.89 45.81, 5.79 45.81, 5.84 45.64))")) + showDF(select(df, st_hasvalidcoordinates(column("wkt"), lit("EPSG:2192"), lit("bounds"))), truncate=F) + +----------------------------------------------+ + |st_hasvalidcoordinates(wkt, EPSG:2192, bounds)| + +----------------------------------------------+ + |true | + +----------------------------------------------+ + + st_haversine -*********** +************ .. function:: st_haversine(lat1, lng1, lat2, lng2) @@ -819,65 +878,6 @@ st_haversine .. note:: Results of this function are always expressed in km, while the input lat/lng pairs are expected to be in degrees. The radius used (in km) is 6371.0088. -st_hasvalidcoordinates -********************** - -.. function:: st_hasvalidcoordinates(col, crs, which) - - Checks if all points in :code:`geom` are valid with respect to crs bounds. - CRS bounds can be provided either as bounds or as reprojected_bounds. - - :param col: Geometry - :type col: Column - :param crs: CRS name (EPSG ID), e.g. "EPSG:2192" - :type crs: Column - :param which: Check against geographic :code:`"bounds"` or geometric :code:`"reprojected_bounds"` bounds. - :type which: Column - :rtype: Column: IntegerType - - :example: - -.. tabs:: - .. code-tab:: py - - df = spark.createDataFrame([{'wkt': 'POLYGON((5.84 45.64, 5.92 45.64, 5.89 45.81, 5.79 45.81, 5.84 45.64))'}]) - df.select(st_hasvalidcoordinates(col('wkt'), lit('EPSG:2192'), lit('bounds'))).show() - +----------------------------------------------+ - |st_hasvalidcoordinates(wkt, EPSG:2192, bounds)| - +----------------------------------------------+ - | true| - +----------------------------------------------+ - - .. code-tab:: scala - - val df = List(("POLYGON((5.84 45.64, 5.92 45.64, 5.89 45.81, 5.79 45.81, 5.84 45.64))")).toDF("wkt") - df.select(st_hasvalidcoordinates(col("wkt"), lit("EPSG:2192"), lit("bounds"))).show() - +----------------------------------------------+ - |st_hasvalidcoordinates(wkt, EPSG:2192, bounds)| - +----------------------------------------------+ - | true| - +----------------------------------------------+ - - .. code-tab:: sql - - SELECT st_hasvalidcoordinates("POLYGON((5.84 45.64, 5.92 45.64, 5.89 45.81, 5.79 45.81, 5.84 45.64))", "EPSG:2192", "bounds") - +----------------------------------------------+ - |st_hasvalidcoordinates(wkt, EPSG:2192, bounds)| - +----------------------------------------------+ - | true| - +----------------------------------------------+ - - .. code-tab:: r R - - df <- createDataFrame(data.frame(wkt = "POLYGON((5.84 45.64, 5.92 45.64, 5.89 45.81, 5.79 45.81, 5.84 45.64))")) - showDF(select(df, st_hasvalidcoordinates(column("wkt"), lit("EPSG:2192"), lit("bounds"))), truncate=F) - +----------------------------------------------+ - |st_hasvalidcoordinates(wkt, EPSG:2192, bounds)| - +----------------------------------------------+ - |true | - +----------------------------------------------+ - - st_intersection *************** @@ -1367,9 +1367,14 @@ st_setsrid .. note:: :ref:`st_setsrid` does not transform the coordinates of :code:`geom`, rather it tells Mosaic the SRID in which the current coordinates are expressed. - **Changed in 0.4 series** :ref:`st_srid`, :ref:`st_setsrid`, and :ref:`st_transform` operate best on + + **Changed in 0.4 series** + + :ref:`st_srid`, :ref:`st_setsrid`, and :ref:`st_transform` operate best on Mosaic Internal Geometry across language bindings, so recommend calling :ref:`st_geomfromwkt` or :ref:`st_geomfromwkb` - to convert from WKT and WKB. You can convert back after the transform, e.g. using :ref:`st_astext` or :ref:`st_asbinary`. + to convert from WKT and WKB. + + You can convert back after the transform, e.g. using :ref:`st_astext` or :ref:`st_asbinary`. Alternatively, you can use :ref:`st_updatesrid` to transform WKB, WKB, GeoJSON, or Mosaic Internal Geometry by specifying the :code:`srcSRID` and :code:`dstSRID`. @@ -1487,9 +1492,13 @@ st_srid +--------------+ .. note:: - **Changed in 0.4 series** :ref:`st_srid`, :ref:`st_setsrid`, and :ref:`st_transform` operate best on + **Changed in 0.4 series** + + :ref:`st_srid`, :ref:`st_setsrid`, and :ref:`st_transform` operate best on Mosaic Internal Geometry across language bindings, so recommend calling :ref:`st_geomfromwkt` or :ref:`st_geomfromwkb` - to convert from WKT and WKB. You can convert back after the transform, e.g. using :ref:`st_astext` or :ref:`st_asbinary`. + to convert from WKT and WKB. + + You can convert back after the transform, e.g. using :ref:`st_astext` or :ref:`st_asbinary`. Alternatively, you can use :ref:`st_updatesrid` to transform WKB, WKB, GeoJSON, or Mosaic Internal Geometry by specifying the :code:`srcSRID` and :code:`dstSRID`. @@ -1559,10 +1568,15 @@ st_transform .. note:: If :code:`geom` does not have an associated SRID, use :ref:`st_setsrid` to set this before calling :ref:`st_transform`. - **Changed in 0.4 series** :ref:`st_srid`, :ref:`st_setsrid`, and :ref:`st_transform` operate best on + + **Changed in 0.4 series** + + :ref:`st_srid`, :ref:`st_setsrid`, and :ref:`st_transform` operate best on Mosaic Internal Geometry across language bindings, so recommend calling :ref:`st_geomfromwkt` or :ref:`st_geomfromwkb` - to convert from WKT and WKB. You can convert back after the transform, e.g. using :ref:`st_astext` or :ref:`st_asbinary`. - Alternatively, you can use :ref:`st_updatesrid` to transform WKB, WKB, GeoJSON, or Mosaic Internal Geometry + to convert from WKT and WKB. + + You can convert back after the transform, e.g. using :ref:`st_astext` or :ref:`st_asbinary`. + Alternatively, you can use :ref:`st_updatesrid` to transform WKB, WKB, GeoJSON, or Mosaic Internal Geometry by specifying the :code:`srcSRID` and :code:`dstSRID`. @@ -1623,18 +1637,15 @@ st_translate |MULTIPOINT ((20 35), (50 25), (30 15), (40 5))| +----------------------------------------------+ -st_union -******** +st_unaryunion +************* -.. function:: st_union(left_geom, right_geom) +.. function:: st_unaryunion(col) - Returns the point set union of the input geometries. - Also, see :ref:`st_union_agg` function. + Returns a geometry that represents the point set union of the given geometry - :param left_geom: Geometry - :type left_geom: Column - :param right_geom: Geometry - :type right_geom: Column + :param col: Geometry + :type col: Column :rtype: Column: Geometry :example: @@ -1642,52 +1653,56 @@ st_union .. tabs:: .. code-tab:: py - df = spark.createDataFrame([{'left': 'POLYGON ((10 10, 20 10, 20 20, 10 20, 10 10))', 'right': 'POLYGON ((15 15, 25 15, 25 25, 15 25, 15 15))'}]) - df.select(st_union(col('left'), col('right'))).show() + df = spark.createDataFrame([{'wkt': 'MULTIPOLYGON (((10 10, 20 10, 20 20, 10 20, 10 10)), ((15 15, 25 15, 25 25, 15 25, 15 15)))'}]) + df.select(st_unaryunion('wkt')).show() +-------------------------------------------------------------------------+ - | st_union(left, right) | + | st_unaryunion(wkt, 2.0) | +-------------------------------------------------------------------------+ |POLYGON ((20 15, 20 10, 10 10, 10 20, 15 20, 15 25, 25 25, 25 15, 20 15))| +-------------------------------------------------------------------------+ .. code-tab:: scala - val df = List(("POLYGON ((10 10, 20 10, 20 20, 10 20, 10 10))", "POLYGON ((15 15, 25 15, 25 25, 15 25, 15 15))")).toDF("left", "right") - df.select(st_union(col('left'), col('right'))).show() + val df = List(("MULTIPOLYGON (((10 10, 20 10, 20 20, 10 20, 10 10)), ((15 15, 25 15, 25 25, 15 25, 15 15)))")).toDF("wkt") + df.select(st_unaryunion(col("wkt"))).show() +-------------------------------------------------------------------------+ - | st_union(left, right) | + | st_unaryunion(wkt, 2.0) | +-------------------------------------------------------------------------+ |POLYGON ((20 15, 20 10, 10 10, 10 20, 15 20, 15 25, 25 25, 25 15, 20 15))| +-------------------------------------------------------------------------+ .. code-tab:: sql - SELECT st_union("POLYGON ((10 10, 20 10, 20 20, 10 20, 10 10))", "POLYGON ((15 15, 25 15, 25 25, 15 25, 15 15))") + SELECT st_unaryunion("MULTIPOLYGON (((10 10, 20 10, 20 20, 10 20, 10 10)), ((15 15, 25 15, 25 25, 15 25, 15 15)))") +-------------------------------------------------------------------------+ - | st_union(left, right) | + | st_unaryunion(wkt, 2.0) | +-------------------------------------------------------------------------+ |POLYGON ((20 15, 20 10, 10 10, 10 20, 15 20, 15 25, 25 25, 25 15, 20 15))| +-------------------------------------------------------------------------+ .. code-tab:: r R - df <- createDataFrame(data.frame(p1 = "POLYGON ((10 10, 20 10, 20 20, 10 20, 10 10))", p2 = "POLYGON ((15 15, 25 15, 25 25, 15 25, 15 15))")) - showDF(select(df, st_union(column("p1"), column("p2"))), truncate=F) + df <- createDataFrame(data.frame(wkt = "MULTIPOLYGON (((10 10, 20 10, 20 20, 10 20, 10 10)), ((15 15, 25 15, 25 25, 15 25, 15 15)))") + showDF(select(df, st_unaryunion(column("wkt"))), truncate=F) +-------------------------------------------------------------------------+ - | st_union(left, right) | + | st_unaryunion(wkt, 2.0) | +-------------------------------------------------------------------------+ |POLYGON ((20 15, 20 10, 10 10, 10 20, 15 20, 15 25, 25 25, 25 15, 20 15))| +-------------------------------------------------------------------------+ -st_unaryunion -************* -.. function:: st_unaryunion(col) +st_union +******** - Returns a geometry that represents the point set union of the given geometry +.. function:: st_union(left_geom, right_geom) - :param col: Geometry - :type col: Column + Returns the point set union of the input geometries. + Also, see :ref:`st_union_agg` function. + + :param left_geom: Geometry + :type left_geom: Column + :param right_geom: Geometry + :type right_geom: Column :rtype: Column: Geometry :example: @@ -1695,52 +1710,52 @@ st_unaryunion .. tabs:: .. code-tab:: py - df = spark.createDataFrame([{'wkt': 'MULTIPOLYGON (((10 10, 20 10, 20 20, 10 20, 10 10)), ((15 15, 25 15, 25 25, 15 25, 15 15)))'}]) - df.select(st_unaryunion('wkt')).show() + df = spark.createDataFrame([{'left': 'POLYGON ((10 10, 20 10, 20 20, 10 20, 10 10))', 'right': 'POLYGON ((15 15, 25 15, 25 25, 15 25, 15 15))'}]) + df.select(st_union(col('left'), col('right'))).show() +-------------------------------------------------------------------------+ - | st_unaryunion(wkt, 2.0) | + | st_union(left, right) | +-------------------------------------------------------------------------+ |POLYGON ((20 15, 20 10, 10 10, 10 20, 15 20, 15 25, 25 25, 25 15, 20 15))| +-------------------------------------------------------------------------+ .. code-tab:: scala - val df = List(("MULTIPOLYGON (((10 10, 20 10, 20 20, 10 20, 10 10)), ((15 15, 25 15, 25 25, 15 25, 15 15)))")).toDF("wkt") - df.select(st_unaryunion(col("wkt"))).show() + val df = List(("POLYGON ((10 10, 20 10, 20 20, 10 20, 10 10))", "POLYGON ((15 15, 25 15, 25 25, 15 25, 15 15))")).toDF("left", "right") + df.select(st_union(col('left'), col('right'))).show() +-------------------------------------------------------------------------+ - | st_unaryunion(wkt, 2.0) | + | st_union(left, right) | +-------------------------------------------------------------------------+ |POLYGON ((20 15, 20 10, 10 10, 10 20, 15 20, 15 25, 25 25, 25 15, 20 15))| +-------------------------------------------------------------------------+ .. code-tab:: sql - SELECT st_unaryunion("MULTIPOLYGON (((10 10, 20 10, 20 20, 10 20, 10 10)), ((15 15, 25 15, 25 25, 15 25, 15 15)))") + SELECT st_union("POLYGON ((10 10, 20 10, 20 20, 10 20, 10 10))", "POLYGON ((15 15, 25 15, 25 25, 15 25, 15 15))") +-------------------------------------------------------------------------+ - | st_unaryunion(wkt, 2.0) | + | st_union(left, right) | +-------------------------------------------------------------------------+ |POLYGON ((20 15, 20 10, 10 10, 10 20, 15 20, 15 25, 25 25, 25 15, 20 15))| +-------------------------------------------------------------------------+ .. code-tab:: r R - df <- createDataFrame(data.frame(wkt = "MULTIPOLYGON (((10 10, 20 10, 20 20, 10 20, 10 10)), ((15 15, 25 15, 25 25, 15 25, 15 15)))") - showDF(select(df, st_unaryunion(column("wkt"))), truncate=F) + df <- createDataFrame(data.frame(p1 = "POLYGON ((10 10, 20 10, 20 20, 10 20, 10 10))", p2 = "POLYGON ((15 15, 25 15, 25 25, 15 25, 15 15))")) + showDF(select(df, st_union(column("p1"), column("p2"))), truncate=F) +-------------------------------------------------------------------------+ - | st_unaryunion(wkt, 2.0) | + | st_union(left, right) | +-------------------------------------------------------------------------+ |POLYGON ((20 15, 20 10, 10 10, 10 20, 15 20, 15 25, 25 25, 25 15, 20 15))| +-------------------------------------------------------------------------+ - st_updatesrid ************* .. function:: st_updatesrid(geom, srcSRID, destSRID) - Updates the SRID of the input geometry :cdoe:`geom` from :code:`srcSRID` to :code:`destSRID`. + Updates the SRID of the input geometry :code:`geom` from :code:`srcSRID` to :code:`destSRID`. Geometry can be any supported [WKT, WKB, GeoJSON, Mosaic Internal Geometry]. - Transformed geometry in the provided format is returned. + + Transformed geometry is returned in the same format provided. :param geom: Geometry to update the SRID :type geom: Column