Skip to content

Commit

Permalink
Merge pull request #562 from databrickslabs/checkpoint_0.4.2
Browse files Browse the repository at this point in the history
Global Checkpoint
  • Loading branch information
mjohns-databricks authored May 28, 2024
2 parents 66a0bc1 + 4b80084 commit 2643882
Show file tree
Hide file tree
Showing 202 changed files with 2,237 additions and 1,149 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/build_main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
python: [ 3.10.12 ]
numpy: [ 1.22.4 ]
gdal: [ 3.4.1 ]
spark: [ 3.4.0 ]
spark: [ 3.4.1 ]
R: [ 4.2.2 ]
steps:
- name: checkout code
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build_python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
python: [ 3.10.12 ]
numpy: [ 1.22.4 ]
gdal: [ 3.4.1 ]
spark: [ 3.4.0 ]
spark: [ 3.4.1 ]
R: [ 4.2.2 ]
steps:
- name: checkout code
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build_r.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
python: [ 3.10.12 ]
numpy: [ 1.22.4 ]
gdal: [ 3.4.1 ]
spark: [ 3.4.0 ]
spark: [ 3.4.1 ]
R: [ 4.2.2 ]
steps:
- name: checkout code
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build_scala.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
python: [ 3.10.12 ]
numpy: [ 1.22.4 ]
gdal: [ 3.4.1 ]
spark: [ 3.4.0 ]
spark: [ 3.4.1 ]
R: [ 4.2.2 ]
steps:
- name: checkout code
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pypi-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
python: [ 3.10.12 ]
numpy: [ 1.22.4 ]
gdal: [ 3.4.1 ]
spark: [ 3.4.0 ]
spark: [ 3.4.1 ]
R: [ 4.2.2 ]
steps:
- name: checkout code
Expand Down
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -159,3 +159,9 @@ spark-warehouse
.DS_Store
.Rproj.user
docker/.m2/
/python/notebooks/
/scripts/m2/
/python/mosaic_test/
/python/checkpoint/
/python/checkpoint-new/
/scripts/docker/docker-build/ubuntu-22-spark-3.4/Dockerfile
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,12 @@
## v0.4.3 [DBR 13.3 LTS]
- Pyspark requirement removed from python setup.cfg as it is supplied by DBR
- Python version limited to "<3.11,>=3.10" for DBR
- iPython dependency limited to "<8.11,>=7.4.2" for both DBR and keplergl-jupyter
- Expanded support for fuse-based checkpointing (persisted raster storage), managed through:
- spark config 'spark.databricks.labs.mosaic.raster.use.checkpoint' in addition to 'spark.databricks.labs.mosaic.raster.checkpoint'.
- python: `mos.enable_gdal(spark, with_checkpoint_path=path)`.
- scala: `MosaicGDAL.enableGDALWithCheckpoint(spark, path)`.

## v0.4.2 [DBR 13.3 LTS]
- Geopandas now fixed to "<0.14.4,>=0.14" due to conflict with minimum numpy version in geopandas 0.14.4.
- H3 python changed from "==3.7.0" to "<4.0,>=3.7" to pick up patches.
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ The repository is structured as follows:
## Test & build Mosaic

Given that DBR 13.3 is Ubuntu 22.04, we recommend using docker,
see [mosaic-docker.sh](https://github.com/databrickslabs/mosaic/blob/main/scripts/mosaic-docker.sh).
see [mosaic-docker.sh](https://github.com/databrickslabs/mosaic/blob/main/scripts/docker/mosaic-docker.sh).

### Scala JAR

Expand Down
2 changes: 1 addition & 1 deletion R/sparkR-mosaic/sparkrMosaic/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: sparkrMosaic
Title: SparkR bindings for Databricks Mosaic
Version: 0.4.2
Version: 0.4.3
Authors@R:
person("Robert", "Whiffin", , "[email protected]", role = c("aut", "cre")
)
Expand Down
2 changes: 1 addition & 1 deletion R/sparklyr-mosaic/sparklyrMosaic/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: sparklyrMosaic
Title: sparklyr bindings for Databricks Mosaic
Version: 0.4.2
Version: 0.4.3
Authors@R:
person("Robert", "Whiffin", , "[email protected]", role = c("aut", "cre")
)
Expand Down
2 changes: 1 addition & 1 deletion R/sparklyr-mosaic/tests.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ library(sparklyr.nested)
spark_home <- Sys.getenv("SPARK_HOME")
spark_home_set(spark_home)

install.packages("sparklyrMosaic_0.4.2.tar.gz", repos = NULL)
install.packages("sparklyrMosaic_0.4.3.tar.gz", repos = NULL)
library(sparklyrMosaic)

# find the mosaic jar in staging
Expand Down
1 change: 0 additions & 1 deletion docs/source/api/rasterio-udfs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,6 @@ depending on your needs.
def write_raster(raster, driver, file_id, fuse_dir):
from io import BytesIO
from pathlib import Path
from pyspark.sql.functions import udf
from rasterio.io import MemoryFile
import numpy as np
import rasterio
Expand Down
2 changes: 1 addition & 1 deletion docs/source/usage/install-gdal.rst
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ Here are spark session configs available for raster, e.g. :code:`spark.conf.set(
- Checkpoint location, e.g. :ref:`rst_maketiles`
* - spark.databricks.labs.mosaic.raster.use.checkpoint
- "false"
- Checkpoint for session, in 0.4.2+
- Checkpoint for session, in 0.4.3+
* - spark.databricks.labs.mosaic.raster.tmp.prefix
- "" (will use "/tmp")
- Local directory for workers
Expand Down
62 changes: 39 additions & 23 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -146,27 +146,6 @@
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.scoverage</groupId>
<artifactId>scoverage-maven-plugin</artifactId>
<version>2.0.2</version>
<executions>
<execution>
<id>scoverage-report</id>
<phase>package</phase>
<goals>
<goal>check</goal>
<goal>report-only</goal>
</goals>
</execution>
</executions>
<configuration>
<minimumCoverage>${minimum.coverage}</minimumCoverage>
<failOnMinimumCoverage>true</failOnMinimumCoverage>
<scalaVersion>${scala.version}</scalaVersion>
<additionalForkedProjectProperties>skipTests=false</additionalForkedProjectProperties>
</configuration>
</plugin>
<plugin>
<!-- see http://davidb.github.com/scala-maven-plugin -->
<groupId>net.alchim31.maven</groupId>
Expand Down Expand Up @@ -277,8 +256,45 @@
<properties>
<scala.version>2.12.10</scala.version>
<scala.compat.version>2.12</scala.compat.version>
<spark.version>3.4.0</spark.version>
<mosaic.version>0.4.2</mosaic.version>
<spark.version>3.4.1</spark.version>
<mosaic.version>0.4.3</mosaic.version>
</properties>
<build>
<plugins>
<plugin>
<groupId>org.scoverage</groupId>
<artifactId>scoverage-maven-plugin</artifactId>
<version>2.0.2</version>
<executions>
<execution>
<id>scoverage-report</id>
<phase>package</phase>
<goals>
<goal>check</goal>
<goal>report-only</goal>
</goals>
</execution>
</executions>
<configuration>
<minimumCoverage>${minimum.coverage}</minimumCoverage>
<failOnMinimumCoverage>true</failOnMinimumCoverage>
<scalaVersion>${scala.version}</scalaVersion>
<additionalForkedProjectProperties>skipTests=false</additionalForkedProjectProperties>
</configuration>
</plugin>
</plugins>
</build>
</profile>
<profile>
<!-- local testing `mvn test -PskipScoverage -DskipTests=false -Dsuite=...` -->
<id>skipScoverage</id>
<properties>
<scala.version>2.12.10</scala.version>
<scala.compat.version>2.12</scala.compat.version>
<spark.version>3.4.1</spark.version>
<mosaic.version>0.4.3</mosaic.version>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
</profile>
</profiles>
Expand Down
2 changes: 1 addition & 1 deletion python/mosaic/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@
from .models import SpatialKNN
from .readers import read

__version__ = "0.4.2"
__version__ = "0.4.3"
2 changes: 1 addition & 1 deletion python/mosaic/api/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from .accessors import *
from .aggregators import *
from .constructors import *
from .enable import enable_mosaic
from .enable import enable_mosaic, get_install_version, get_install_lib_dir
from .functions import *
from .fuse import *
from .predicates import *
Expand Down
55 changes: 43 additions & 12 deletions python/mosaic/api/enable.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import importlib.metadata
import importlib.resources
import warnings

from IPython.core.getipython import get_ipython
Expand Down Expand Up @@ -72,24 +74,25 @@ def enable_mosaic(
if not jar_autoattach:
spark.conf.set("spark.databricks.labs.mosaic.jar.autoattach", "false")
print("...set 'spark.databricks.labs.mosaic.jar.autoattach' to false")
config.jar_autoattach=False
if jar_path is not None:
spark.conf.set("spark.databricks.labs.mosaic.jar.path", jar_path)
print(f"...set 'spark.databricks.labs.mosaic.jar.path' to '{jar_path}'")
config.jar_path=jar_path
if log_info:
spark.sparkContext.setLogLevel("info")
config.log_info=True

# Config global objects
# - add MosaicContext after MosaicLibraryHandler
config.mosaic_spark = spark
_ = MosaicLibraryHandler(config.mosaic_spark, log_info=log_info)
config.mosaic_context = MosaicContext(config.mosaic_spark)

# Register SQL functions
optionClass = getattr(spark._sc._jvm.scala, "Option$")
optionModule = getattr(optionClass, "MODULE$")
config.mosaic_context._context.register(
spark._jsparkSession, optionModule.apply(None)
)

isSupported = config.mosaic_context._context.checkDBR(spark._jsparkSession)
if not isSupported:
_ = MosaicLibraryHandler(spark, log_info=log_info)
config.mosaic_context = MosaicContext(spark)
config.mosaic_context.jRegister(spark)

_jcontext = config.mosaic_context.jContext()
is_supported = _jcontext.checkDBR(spark._jsparkSession)
if not is_supported:
# unexpected - checkDBR returns true or throws exception
print("""WARNING: checkDBR returned False.""")

Expand All @@ -104,3 +107,31 @@ def enable_mosaic(
from mosaic.utils.kepler_magic import MosaicKepler

config.ipython_hook.register_magics(MosaicKepler)


def get_install_version() -> str:
"""
:return: mosaic version installed
"""
return importlib.metadata.version("databricks-mosaic")


def get_install_lib_dir(override_jar_filename=None) -> str:
"""
This is looking for the library dir under site packages using the jar name.
:return: located library dir.
"""
v = get_install_version()
jar_filename = f"mosaic-{v}-jar-with-dependencies.jar"
if override_jar_filename:
jar_filename = override_jar_filename
with importlib.resources.path("mosaic.lib", jar_filename) as p:
return p.parent.as_posix()


def refresh_context():
"""
Refresh mosaic context, using previously configured information.
- This is needed when spark configs change, such as for checkpointing.
"""
config.mosaic_context.jContextReset()
Loading

0 comments on commit 2643882

Please sign in to comment.