Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unpacking of Sentinel-1 .zip files not working as expected #324

Open
caitlinadams opened this issue Dec 2, 2024 · 5 comments
Open

Unpacking of Sentinel-1 .zip files not working as expected #324

caitlinadams opened this issue Dec 2, 2024 · 5 comments
Labels

Comments

@caitlinadams
Copy link

Hi, thank you for this repository! My colleague @abradley60 has been using the library for IW processing with SNAP, and we are now looking to do EW processing with a GAMMA installation.

I have found a couple of stumbling points for working with the geocode function for GAMMA. The first of these relates to unpacking Sentinel-1 zip files.

Any assistance you can provide would be greatly appreciated!

  • which operating system are you using?
    Rocky Linux release 8.10 (Green Obsidian)

  • which environment is pyroSAR running in?
    Micromamba environment

  • which version of pyroSAR are you using?
    conda forge (pyrosar=0.23.0 in environment.yml file)

  • which function of pyroSAR did you call with which parameters?
    pyroSAR.drivers.SAFE.unpack(directory="/path/to/scene/zip/tmpdir")

Issue

Unpack function fails to unpack zip files. Unpacks the pdf and then fails.

Reviewing the code, I'm wondering whether it would be easier to try using extractall() first, then fall back to more manual unpacking? See
https://github.com/johntruckenbrodt/pyroSAR/blob/b55b43259e684d202f882d9c1b8655425f88078f/pyroSAR/drivers.py#L758C17-L760C33

At the end of the unpacking step, assuming it runs correctly, there is also code to update the pyroSAR.drivers.SAFE.scene attribute and .file attribute:
https://github.com/johntruckenbrodt/pyroSAR/blob/b55b43259e684d202f882d9c1b8655425f88078f/pyroSAR/drivers.py#L765C9-L767C65

As is, when using this for SAFE directories, the .scene and .file attributes both become the temporary directory:
Without unpacking

print(scene.scene)
print(scene.file)

gives

/g/data/yp75/projects/pyrosar_processing/data/scenes/ASF/S1A_EW_GRDM_1SDH_20240129T091735_20240129T091828_052319_065379_0F1E.zip
g/data/yp75/projects/pyrosar_processing/data/scenes/ASF/S1A_EW_GRDM_1SDH_20240129T091735_20240129T091828_052319_065379_0F1E.zip/S1A_EW_GRDM_1SDH_20240129T091735_20240129T091828_052319_065379_0F1E.SAFE

Running the following code (taken from unpacking function with variables subsituted):

scene.scene = temp_dir
main = os.path.join(scene.scene, os.path.basename(scene.file))
scene.file = main if os.path.isfile(main) else scene.scene

print(scene.scene)
print(scene.file)

gives

/g/data/yp75/projects/pyrosar_processing/data/scenes/ASF/tmpdir
/g/data/yp75/projects/pyrosar_processing/data/scenes/ASF/tmpdir

I'm not sure what these attributes should actually be pointing to, so it would be great if you could confirm that this is the expected behaviour.

Minimum reproducible example:

Note that I use a credential file with my earth data login and password of the form:

url: https://urs.earthdata.nasa.gov/users/new
login: MYLOGIN
password: MYPASS
email: MYEMAIL

Example

import asf_search as asf
import os
import yaml
from pyroSAR import identify

# project directories
proj_dir = "/g/data/yp75/projects/pyrosar_processing"
repo_dir = os.path.join(proj_dir, "s1-rtc-pyrosar-notebook")
data_dir =  os.path.join(proj_dir, "data")
cred_dir = os.path.join(repo_dir, "credentials")
earthdata_credentials_path = os.path.join(cred_dir, "credentials_earthdata.yaml")
scene_dir = os.path.join(data_dir,'scenes/ASF')
results_dir = os.path.join(data_dir,'results') 

scene_id = "S1A_EW_GRDM_1SDH_20240129T091735_20240129T091828_052319_065379_0F1E"
mode = "EW"
prod = "GRD_MD"

with open(earthdata_credentials_path, "r", encoding='utf8') as f:
        earthdata_credentials = yaml.safe_load(f.read())

session = asf.ASFSession()
session.auth_with_creds(
    earthdata_credentials['login'],
    earthdata_credentials['password']
)

results = asf.granule_search(
    [scene_id], 
    asf.ASFSearchOptions(beamMode=mode, processingLevel=prod)
)

scene_paths = []
scene_names = []
for s in results:
    name = s.properties['sceneName']
    scene_names.append(name)
    print(name)
    path = os.path.join(scene_dir, name)
    s.download(path=scene_dir, session=session)
    scene_paths.append(path)

scene_file = "S1A_EW_GRDM_1SDH_20240129T091735_20240129T091828_052319_065379_0F1E.zip"
scene_path = os.path.join(scene_dir, scene_file)
scene = identify(scene_path)

temp_dir = os.path.join(scene_dir, "tmpdir")

scene.unpack(directory=temp_dir)

Error

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[20], line 1
----> 1 scene.unpack(directory=temp_dir)

File [/g/data/yp75/projects/pyrosar_processing/micromamba/envs/pyrosar_rtc/lib/python3.10/site-packages/pyroSAR/drivers.py:1978](https://are.nci.org.au/g/data/yp75/projects/pyrosar_processing/micromamba/envs/pyrosar_rtc/lib/python3.10/site-packages/pyroSAR/drivers.py#line=1977), in SAFE.unpack(self, directory, overwrite, exist_ok)
   1976 def unpack(self, directory, overwrite=False, exist_ok=False):
   1977     outdir = os.path.join(directory, os.path.basename(self.file))
-> 1978     self._unpack(outdir, overwrite=overwrite, exist_ok=exist_ok)

File [/g/data/yp75/projects/pyrosar_processing/micromamba/envs/pyrosar_rtc/lib/python3.10/site-packages/pyroSAR/drivers.py:742](https://are.nci.org.au/g/data/yp75/projects/pyrosar_processing/micromamba/envs/pyrosar_rtc/lib/python3.10/site-packages/pyroSAR/drivers.py#line=741), in ID._unpack(self, directory, offset, overwrite, exist_ok)
    740 else:
    741     try:
--> 742         with open(outname, 'wb') as outfile:
    743             outfile.write(archive.read(item))
    744     except zf.BadZipfile:

FileNotFoundError: [Errno 2] No such file or directory: '[/g/data/yp75/projects/pyrosar_processing/data/scenes/ASF/tmpdir/S1A_EW_GRDM_1SDH_20240129T091735_20240129T091828_052319_065379_0F1E.SAFE/annotation/calibration/calibration-s1a-ew-grd-hh-20240129t091735-20240129t091828-052319-065379-001.xml](https://are.nci.org.au/g/data/yp75/projects/pyrosar_processing/data/scenes/ASF/tmpdir/S1A_EW_GRDM_1SDH_20240129T091735_20240129T091828_052319_065379_0F1E.SAFE/annotation/calibration/calibration-s1a-ew-grd-hh-20240129t091735-20240129t091828-052319-065379-001.xml)'

Working example:

import zipfile as zf

zf.is_zipfile(scene.scene)

archive = zf.ZipFile(scene.scene, 'r')
archive.extractall(temp_dir)
archive.close()
@caitlinadams
Copy link
Author

caitlinadams commented Dec 3, 2024

After chatting with @abradley60 today, we think it could be to do with the change to zip file format that's referenced here https://forum.step.esa.int/t/sentinel-1-latest-datasets-are-corrupted-incomplete/40512/6

I've tried the above minimal working example with scene_id = S1B_EW_GRDM_1SDH_20171009T104448_20171009T104545_007751_00DB09_14DB, which is an older scene, and the unpacking does occur successfully.

scene.scene after unpacking: /g/data/yp75/projects/pyrosar_processing/data/scenes/ASF/tmpdir/S1B_EW_GRDM_1SDH_20171009T104448_20171009T104545_007751_00DB09_14DB.SAFE

scene.file after unpacking: /g/data/yp75/projects/pyrosar_processing/data/scenes/ASF/tmpdir/S1B_EW_GRDM_1SDH_20171009T104448_20171009T104545_007751_00DB09_14DB.SAFE

Are you able to confirm whether it's the new format of the zipfiles? It would be interesting to work out an approach that can handle new or old formats.

@johntruckenbrodt
Copy link
Owner

Hi @caitlinadams,
thanks a lot for looking into this and sharing your findings. I have not looked into the unpacking mechanism in a longer while. On the infrastructure I use all scenes are already unpacked.
I am quite busy at the moment and unfortunately did not find the time to respond earlier.
Here some first answers..

  • The unpacking mechanism is as complex as it is to work for all product formats from various missions alike. It's been a while since I wrote it but I think just calling extractall() was not easily possible
  • the scene and file attributes should be the same for Sentinel-1 (folder with .SAFE extension) but might be different for other product formats. scene should point to the directory where the (unpacked) scene is stored and file should point to a file/folder which the product can be uniquely identified by. The regular expressions for unique identification are stored in pyroSAR.patterns

So, I am pretty sure it is an issue with the new zipfile format. Would you be willing to try to fix it? I'd really appreciate it.

@caitlinadams
Copy link
Author

Hi @johntruckenbrodt -- that all makes sense, thanks for clarifying!

I can certainly have a go -- I'm curious, what do you find is the best approach for editing and testing the code in this repo? Usually with something like this, I would install the package using pip with editable mode, but wasn't sure whether that would work here given the gdal dependencies (so far I've only installed from conda-forge). Any guidance you have for developing the code would be really helpful.

@johntruckenbrodt
Copy link
Owner

Thanks Caitlin! So what I usually do is clone the repository, create a mamba environment with all dependencies using the environment.yml file, and then install pyroSAR in editable mode using pip. To edit the code on our server I either run a Visual Studio Code instance on it and connect via the browser or use the auto-deploy functionality of my local IDE.

@caitlinadams
Copy link
Author

Perfect, thanks John. I will have a go and put up a PR if I can come up with a solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants