Question: Raster Tile Merging and TIF File Output #489

RickLeite · 2023-12-15T12:57:14Z

How can I merge raster tiles and write them to a TIFF file?

Is there already a way to do that, or is it planned to be introduced?

My Current Approach:

df = spark.read.format("gdal").option("extensions", "tif")\
           .load("dbfs:/FileStore/temp/rastersfile/extracted")\
           .groupBy().agg(  collect_list("tile").alias("tile"))

merged_tile = df.select(mos.rst_merge("tile"))

result = merged_tile.select("rst_merge(tile)").collect()[0]

raster_data_base64 = result["rst_merge(tile)"]["raster"]
binary_raster_data = bytes(raster_data_base64)

output_path = "/dbfs/FileStore/temp/rastersfile/merged/mergedrasters.tif"
with open(output_path, "wb") as output_file:
    output_file.write(binary_raster_data)

The text was updated successfully, but these errors were encountered:

RickLeite · 2023-12-16T03:06:49Z

Clearly, my current approach results in the loss of all file Metadata. Additionally, handling a large number of rasters is causing kernel issues due to memory constraints. I've attempted to use the latest rasterio UDFs, but I'm unsure how to proceed after merging the tiles.

RickLeite · 2023-12-16T17:23:30Z

Using rasterio udf

df = spark.read.format("gdal").option("extensions", "tif")\
           .load('/FileStore/temp/esri')\
           .groupBy().agg(collect_list("tile").alias("tile"))

merged_tile = df.select(mos.rst_merge("tile").alias('merged'))

import numpy as np
import rasterio
from rasterio.io import MemoryFile
from io import BytesIO
from pyspark.sql.functions import udf
from pathlib import Path

@udf("string")
def write_raster(raster, parent_dir):
  with MemoryFile(BytesIO(raster)) as memfile:
    with memfile.open() as dataset:
      Path(parent_dir).mkdir(parents=True, exist_ok=True)
      extensions_map = rasterio.drivers.raster_driver_extensions()
      driver_map = {v: k for k, v in extensions_map.items()}
      extension = driver_map[dataset.driver]
      file_id = 5234476790949929865   # Manually set UUID
      path = f"{parent_dir}/{file_id}.{extension}"
      print(f" parent_dir: {parent_dir}, file_id: {file_id}, extension: {extension}")

      with rasterio.open(path, "w", **dataset.profile) as dst:
        dst.write(dataset.read())
        print(f"writed to: {path}")
      return path

Since the returned merged tiles only provide the index_id, raster, parentPath, and driver, I manually set the UUID myself

merged_tile.select(write_raster("merged.raster", lit("dbfs:/FileStore/temp/esri/rastermerged"))).show(truncate=False)

Apparently it is little buggy; it wrote to 'dbfs:' as if it were a 'dbfs:' folder, and surprisingly I can't access it by browsing the DBFS from the Databricks catalog. But anyway, I was able to move the file to the desired location with shutil.

import shutil
shutil.copy('dbfs:/FileStore/temp/esri/rastermerged/5234476790949929865.tiff', '/dbfs/FileStore/temp/esri/rastermerged/5234476790949929865.tiff')

But when downloading the merged file, it corresponded to only one of the rasters in the directory (the first one). This is strange because I merged them, and with the approach that I write decoding it from base64 to binaryformat, the results give me the merged rasters.

milos-colic · 2023-12-20T15:23:11Z

@RickLeite thank you for your question.

The parent behaviour you are describing is a current behavior which we plan to adjust.
At the moment only one parent is reported even though there may be many parents.
In the next versions we will update the schema to capture a list of parents as opposed to a single string parent path which we have now.

So your output file is a merged raster but it only selects a first parent from the collected set at runtime (wont be the same value between reruns).

This is currently planned for 0.4.1 version.

Kind regards
Milos

RickLeite · 2023-12-21T03:28:25Z

Hi @milos-colic,

Appreciate your response! Excited for what's ahead!

RickLeite changed the title ~~Question: Raster Tile Merging and TIFF File Output~~ Question: Raster Tile Merging and TIF File Output Dec 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Raster Tile Merging and TIF File Output #489

Question: Raster Tile Merging and TIF File Output #489

RickLeite commented Dec 15, 2023

RickLeite commented Dec 16, 2023

RickLeite commented Dec 16, 2023 •

edited

Loading

milos-colic commented Dec 20, 2023

RickLeite commented Dec 21, 2023

Question: Raster Tile Merging and TIF File Output #489

Question: Raster Tile Merging and TIF File Output #489

Comments

RickLeite commented Dec 15, 2023

How can I merge raster tiles and write them to a TIFF file?

My Current Approach:

RickLeite commented Dec 16, 2023

RickLeite commented Dec 16, 2023 • edited Loading

Using rasterio udf

milos-colic commented Dec 20, 2023

RickLeite commented Dec 21, 2023

RickLeite commented Dec 16, 2023 •

edited

Loading