Skip to content

Commit def609d

Browse files
Refactor Zarr store writing (#18)
* Add reproduction scripts and tests for Zarr consolidation issue #3289 - Implemented `reproduce_consolidation_issue.py` to demonstrate the behavior of `zarr.consolidate_metadata()` at different hierarchical levels. - Created `test_consolidation_scenarios.py` to test various scenarios related to consolidation behavior and verify the issue described in the GitHub issue. - Added `test_tom_scenario.py` to replicate the test case provided by TomAugspurger and analyze consolidation results. - Developed `test_xarray_to_zarr_consolidation.py` to investigate how xarray's `to_zarr()` function interacts with consolidated metadata. - Introduced `open_sample.py` for accessing Zarr data from an S3 bucket using xarray and obstore. - Enhanced tests to check for consolidated metadata presence before and after consolidation operations. * Consolidate metadata at root level for consistent Zarr access and update group handling in GeoZarr dataset creation * Refactor recursive_copy to iterative_copy for improved performance and clarity; update references in GeoZarr conversion and utility functions. * Enhance CRS handling in prepare_dataset_with_crs_info by adding grid_mapping attributes and ensuring proper metadata for data variables. * Implement code changes to enhance functionality and improve performance * Update launch.json to modify GeoZarr conversion arguments and enhance spatial chunk handling * Add storage options to dataset opening in write_dataset_band_by_band_with_validation * Update AWS endpoint variable name in launch.json for consistency * Remove obsolete test scripts related to zarr consolidation scenarios * Remove zarr consolidation issue analysis document as it is no longer relevant * Refactor geozarr.py and utils.py for improved readability and consistency - Cleaned up whitespace and formatting in geozarr.py to enhance code readability. - Updated the validation logic in utils.py to ensure proper checks for existing band data. - Modified test_cli_e2e.py to improve assertions and formatting for better clarity. - Enhanced test_conversion.py by refining the structure and ensuring consistent formatting. - Removed the unused open_sample.py file to declutter the repository. * Refactor code for improved readability and consistency - Updated print statements for better formatting and clarity in cli.py, geozarr.py, and other modules. - Enhanced S3 storage options handling in fs_utils.py for better readability. - Improved argument parsing in create_parser function for better organization. - Refactored various functions to maintain consistent formatting and style across the codebase. - Added tests to ensure CLI commands and conversion functionalities work as expected. - Cleaned up whitespace and indentation for better code structure. * Update AWS endpoint variable name for consistency across documentation and code * Refactor type hints for improved clarity and consistency across modules * Refactor test cases for prepare_dataset_with_crs_info to improve clarity and accuracy * Refactor prepare_dataset_with_crs_info calls for improved readability * Refactor test for prepare_dataset_with_crs_info to enhance clarity and reduce nesting * Refactor pyproject.toml for improved formatting and dependency management * Remove code quality workflow configuration * Add proj-bin installation to CI and documentation workflows * Fix sudo usage in documentation workflow for dependency installation * Consolidate dependency installation steps and remove unused documentation job from CI workflow * Add pyproj as a dependency for geospatial functionality * Add gdal-bin installation to CI and documentation workflows * Add libgdal-dev installation to CI and documentation workflows * Update dependency installation to use --only-binary option for pip * Update validate-pyproject version to v0.24.1 in pre-commit configuration
1 parent 698dc8a commit def609d

File tree

17 files changed

+1329
-1257
lines changed

17 files changed

+1329
-1257
lines changed

.github/workflows/ci.yml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -83,13 +83,14 @@ jobs:
8383

8484
- name: Install dependencies
8585
run: |
86+
sudo apt-get update
87+
sudo apt-get install -y proj-bin gdal-bin libgdal-dev
8688
python -m pip install --upgrade pip
87-
pip install -e ".[dev,test]"
89+
pip install -e ".[dev,test]" --only-binary=:all:
8890
8991
- name: Run network tests
9092
run: |
9193
python -m pytest eopf_geozarr/tests/ -v --tb=short -m "network"
92-
9394
security:
9495
runs-on: ubuntu-latest
9596
steps:

.github/workflows/code-quality.yml

Lines changed: 0 additions & 89 deletions
This file was deleted.

.github/workflows/docs.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,10 @@ jobs:
1919

2020
- name: Install dependencies
2121
run: |
22+
sudo apt-get update
23+
sudo apt-get install -y proj-bin gdal-bin libgdal-dev
2224
python -m pip install --upgrade pip
23-
pip install -e ".[docs]"
25+
pip install -e ".[docs]" --only-binary=:all:
2426
2527
- name: Build documentation
2628
run: |

.pre-commit-config.yaml

Lines changed: 17 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,44 +1,29 @@
11
repos:
2-
- repo: https://github.com/pre-commit/pre-commit-hooks
3-
rev: v4.4.0
2+
- repo: https://github.com/abravalheri/validate-pyproject
3+
rev: v0.24.1
44
hooks:
5-
- id: trailing-whitespace
6-
- id: end-of-file-fixer
7-
- id: check-yaml
8-
- id: check-added-large-files
9-
- id: check-merge-conflict
10-
- id: check-toml
11-
- id: debug-statements
5+
- id: validate-pyproject
126

13-
- repo: https://github.com/psf/black
14-
rev: 23.7.0
15-
hooks:
16-
- id: black
17-
language_version: python3
18-
19-
- repo: https://github.com/pycqa/isort
7+
- repo: https://github.com/PyCQA/isort
208
rev: 5.12.0
219
hooks:
2210
- id: isort
23-
args: ["--profile", "black"]
11+
language_version: python
2412

25-
- repo: https://github.com/pycqa/flake8
26-
rev: 6.0.0
13+
- repo: https://github.com/astral-sh/ruff-pre-commit
14+
rev: v0.8.4
2715
hooks:
28-
- id: flake8
29-
additional_dependencies: [flake8-docstrings]
30-
args: [--max-line-length=150, --extend-ignore=E203]
16+
- id: ruff
17+
args: ["--fix"]
18+
- id: ruff-format
3119

3220
- repo: https://github.com/pre-commit/mirrors-mypy
33-
rev: v1.5.1
21+
rev: v1.11.2
3422
hooks:
3523
- id: mypy
36-
additional_dependencies: [types-requests, types-setuptools]
37-
args: [--ignore-missing-imports]
38-
39-
- repo: https://github.com/pycqa/bandit
40-
rev: 1.7.5
41-
hooks:
42-
- id: bandit
43-
args: ["-c", "pyproject.toml"]
44-
additional_dependencies: ["bandit[toml]"]
24+
language_version: python
25+
exclude: tests/.*
26+
additional_dependencies:
27+
- types-simplejson
28+
- types-attrs
29+
- pydantic~=2.0

.vscode/launch.json

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@
5555
"PYTHONPATH": "${workspaceFolder}/.venv/bin",
5656
"AWS_PROFILE": "eopf-explorer",
5757
"AWS_DEFAULT_REGION": "de",
58-
"AWS_S3_ENDPOINT": "https://s3.de.io.cloud.ovh.net/"
58+
"AWS_ENDPOINT_URL": "https://s3.de.io.cloud.ovh.net/"
5959
},
6060

6161
},
@@ -84,7 +84,7 @@
8484
"PYTHONPATH": "${workspaceFolder}/.venv/bin",
8585
"AWS_PROFILE": "eopf-explorer",
8686
"AWS_DEFAULT_REGION": "de",
87-
"AWS_S3_ENDPOINT": "https://s3.de.io.cloud.ovh.net/"
87+
"AWS_ENDPOINT_URL": "https://s3.de.io.cloud.ovh.net/"
8888
},
8989

9090
},
@@ -96,10 +96,17 @@
9696
"module": "eopf_geozarr",
9797
"args": [
9898
"convert",
99-
"https://objects.eodc.eu/e05ab01a9d56408d82ac32d69a5aae2a:202507-s02msil2a/04/products/cpm_v256/S2A_MSIL2A_20250704T094051_N0511_R036_T33SWB_20250704T115824.zarr",
100-
"s3://esa-zarr-sentinel-explorer-fra/tests-output/eopf_geozarr/S2A_MSIL2A_20250704T094051_N0511_R036_T33SWB_20250704T115824.zarr",
99+
// "https://objects.eodc.eu/e05ab01a9d56408d82ac32d69a5aae2a:202507-s02msil2a/04/products/cpm_v256/S2A_MSIL2A_20250704T094051_N0511_R036_T33SWB_20250704T115824.zarr",
100+
// "https://objects.eodc.eu/e05ab01a9d56408d82ac32d69a5aae2a:202508-s02msil2a/04/products/cpm_v256/S2B_MSIL2A_20250804T103629_N0511_R008_T31TDH_20250804T130722.zarr",
101+
// "https://objects.eodc.eu/e05ab01a9d56408d82ac32d69a5aae2a:202508-s02msil2a/07/products/cpm_v256/S2B_MSIL2A_20250807T104619_N0511_R051_T31TDH_20250807T131144.zarr",
102+
"https://objects.eodc.eu/e05ab01a9d56408d82ac32d69a5aae2a:202508-s02msil2a/11/products/cpm_v256/S2C_MSIL2A_20250811T112131_N0511_R037_T29TPF_20250811T152216.zarr",
103+
// "s3://esa-zarr-sentinel-explorer-fra/tests-output/eopf_geozarr/S2A_MSIL2A_20250704T094051_N0511_R036_T33SWB_20250704T115824.zarr",
104+
// "s3://esa-zarr-sentinel-explorer-fra/tests-output/eopf_geozarr/S2B_MSIL2A_20250804T103629_N0511_R008_T31TDH_20250804T130722.zarr",
105+
// "s3://esa-zarr-sentinel-explorer-fra/tests-output/eopf_geozarr/S2B_MSIL2A_20250807T104619_N0511_R051_T31TDH_20250807T131144.zarr",
106+
"s3://esa-zarr-sentinel-explorer-fra/tests-output/eopf_geozarr/S2C_MSIL2A_20250811T112131_N0511_R037_T29TPF_20250811T152216.zarr",
101107
"--groups", "/measurements/reflectance/r10m", "/measurements/reflectance/r20m", "/measurements/reflectance/r60m", "/quality/l2a_quicklook/r10m",
102-
"--spatial-chunk", "1024",
108+
"--crs-groups", "/conditions/geometry",
109+
"--spatial-chunk", "512",
103110
"--min-dimension", "256",
104111
"--tile-width", "256",
105112
"--max-retries", "2",
@@ -113,7 +120,7 @@
113120
"PYTHONPATH": "${workspaceFolder}/.venv/bin",
114121
"AWS_PROFILE": "eopf-explorer",
115122
"AWS_DEFAULT_REGION": "de",
116-
"AWS_S3_ENDPOINT": "https://s3.de.io.cloud.ovh.net/"
123+
"AWS_ENDPOINT_URL": "https://s3.de.io.cloud.ovh.net/"
117124
},
118125

119126
},
@@ -154,7 +161,7 @@
154161
"PYTHONPATH": "${workspaceFolder}/.venv/bin",
155162
"AWS_PROFILE": "eopf-explorer",
156163
"AWS_DEFAULT_REGION": "de",
157-
"AWS_S3_ENDPOINT": "https://s3.de.io.cloud.ovh.net/"
164+
"AWS_ENDPOINT_URL": "https://s3.de.io.cloud.ovh.net/"
158165
},
159166

160167
}

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -94,21 +94,21 @@ export AWS_DEFAULT_REGION=us-east-1
9494
export AWS_ACCESS_KEY_ID=your_ovh_access_key
9595
export AWS_SECRET_ACCESS_KEY=your_ovh_secret_key
9696
export AWS_DEFAULT_REGION=gra # or other OVH region
97-
export AWS_S3_ENDPOINT=https://s3.gra.cloud.ovh.net # OVH endpoint
97+
export AWS_ENDPOINT_URL=https://s3.gra.cloud.ovh.net # OVH endpoint
9898
```
9999

100100
**For other S3-compatible providers:**
101101
```bash
102102
export AWS_ACCESS_KEY_ID=your_access_key
103103
export AWS_SECRET_ACCESS_KEY=your_secret_key
104104
export AWS_DEFAULT_REGION=your_region
105-
export AWS_S3_ENDPOINT=https://your-s3-endpoint.com
105+
export AWS_ENDPOINT_URL=https://your-s3-endpoint.com
106106
```
107107

108108
**Alternative: AWS CLI Configuration**
109109
```bash
110110
aws configure
111-
# Note: For custom endpoints, you'll still need to set AWS_S3_ENDPOINT
111+
# Note: For custom endpoints, you'll still need to set AWS_ENDPOINT_URL
112112
```
113113

114114
#### S3 Features
@@ -166,7 +166,7 @@ from eopf_geozarr import create_geozarr_dataset
166166
os.environ['AWS_ACCESS_KEY_ID'] = 'your_ovh_access_key'
167167
os.environ['AWS_SECRET_ACCESS_KEY'] = 'your_ovh_secret_key'
168168
os.environ['AWS_DEFAULT_REGION'] = 'gra'
169-
os.environ['AWS_S3_ENDPOINT'] = 'https://s3.gra.cloud.ovh.net'
169+
os.environ['AWS_ENDPOINT_URL'] = 'https://s3.gra.cloud.ovh.net'
170170

171171
# Load your EOPF DataTree
172172
dt = xr.open_datatree("path/to/eopf/dataset.zarr", engine="zarr")

eopf_geozarr/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
create_geozarr_dataset,
1010
downsample_2d_array,
1111
is_grid_mapping_variable,
12-
recursive_copy,
12+
iterative_copy,
1313
setup_datatree_metadata_geozarr_spec_compliant,
1414
validate_existing_band_data,
1515
)
@@ -20,7 +20,7 @@
2020
"__version__",
2121
"create_geozarr_dataset",
2222
"setup_datatree_metadata_geozarr_spec_compliant",
23-
"recursive_copy",
23+
"iterative_copy",
2424
"consolidate_metadata",
2525
"async_consolidate_metadata",
2626
"downsample_2d_array",

0 commit comments

Comments
 (0)