Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement pyzgy for ZGY files as segyio drop in replacement. #105

Closed
trhallam opened this issue Feb 4, 2022 · 19 comments
Closed

Implement pyzgy for ZGY files as segyio drop in replacement. #105

trhallam opened this issue Feb 4, 2022 · 19 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@trhallam
Copy link
Owner

trhallam commented Feb 4, 2022

Moving issue from tutorial page to actual repository.

trhallam/segysak-t21-tutorial#3 (comment)

Hi,

We noticed that there is a TODO item in _openzgy.py to create a converter which does just a few inlines at a time for large files. Do > we have a estimation when this can be done?

Best,
Zhao

@trhallam
Copy link
Owner Author

trhallam commented Feb 4, 2022

@musiczhzhao I'm sorry but this isn't a priority for me right now. I wasn't actually aware anyone was using the ZGY functionality in Segysak.

I am happy to accept pull requests if you would like to add the functionality. The idea is to minimise memory usage by streaming data from one format to the other.

Does the current functionality not work for your large files?

@da-wad
Copy link

da-wad commented Feb 4, 2022

@musiczhzhao If you're looking for a pip-installable Python package to look at a small subset of data from ZGY files may I recommend taking a look at pyZGY: https://github.com/equinor/pyzgy

Actually, @trhallam, this package would probably make ZGY support easier to implement for SEGY-SAK as the installation is simple and the syntax mirrors segyio :-)

@trhallam
Copy link
Owner Author

trhallam commented Feb 4, 2022

@da-wad , that's cool! I didn't know about this package. Definitely happy to devolve zgy handling to pyzgy at some point. Probably just needs some drop in changes to the main part of segysak.segy.

@da-wad
Copy link

da-wad commented Feb 4, 2022

With a brief glance at _segy_loader.py it looks like the only thing missing from the pyZGY segyio-mimicing interface which you use there is mmap() which could be skipped for ZGY, or implemented in pyZGY as a function returning False if you wanted to keep it strictly drop-in...

The implementation of mirroring segyio.trace does include some caching so that you're not re-reading the same bricks over and over, but it's not designed for sequential reading of the whole file, so to make it performant there would also need to be an option to increase this cache size: https://github.com/equinor/pyzgy/blob/master/pyzgy/loader.py#L30

@trhallam
Copy link
Owner Author

trhallam commented Feb 4, 2022

Sounds good.

Just a thought, mmap is meant to improve performance in segyio when reading large datasets. Maybe this function could be used to set the cache size with a sensible default keyword argument? Perhaps a bit obscure in terms of the name but it would then be consistent with segyio.

@musiczhzhao
Copy link

@musiczhzhao I'm sorry but this isn't a priority for me right now. I wasn't actually aware anyone was using the ZGY functionality in Segysak.

I am happy to accept pull requests if you would like to add the functionality. The idea is to minimise memory usage by streaming data from one format to the other.

Does the current functionality not work for your large files?

Hi @trhallam! Sorry for late response. The current functionality works fine when the whole segy can fit in the memory, but will encounter issue when the size became large.

I am happy to implement this feature, if I can have more information regarding the overall design of the package. Currently as I can see the functionalities of the segy and zgy are separated and data were converted to an internal format as a middle step. If we implement this new function in the _openzgy.py, is there a preferred approach?

Hi @da-wad - I haven't tried the pyZGY package yet but from the "About" looks like it is designed to only do the reading instead of both reading and writing? I will take a closer look.

Zhao

@da-wad
Copy link

da-wad commented Feb 14, 2022

Hi @musiczhzhao ... that is correct, although the openzgy package which pyZGY adapts to segyio-like syntax does support writing. So it would be possible to build this functionality too, and I'm open to PRs :-)

@trhallam
Copy link
Owner Author

Hi @musiczhzhao

ZGY was a bit of an after thought so it is in it's own area at the moment. The fact that pyzgy mirrors segyio though is an opportunity to leverage much of the existing segysak.segy code. We'll want to preserve the current structure but see if we can create some shared code under the hood.

The current model has some dispatch functions at the top level like segy_loader which call the underlying functions to perform different tasks, like loading 3D or 2D data, or doing file conversion. It is the underlying functions that we will need to modify to call either segyio or pyzgy. Then a zgy_loader function would need to be written that models segy_loader. I think preserving the old module structure is preferable, because it would maintain backwards compatibility.

For your work where the ZGY is too big for memory, porting to the segy_loader will offer a few benefits:

  • The ability to load subsets of the data.
  • Ability to convert (stream) the data to HDF5 which facilitates lazy loading.
  • @da-wad has ZFP made it fully into numcodecs yet? I know it was on a dev branch some time back. But it might be worth adding ZARR support as well if all this is happening.

I think the way forward is to do the following.

  • Try to drop pyzgy for segyio in the segysak.segy part of the software and see what breaks.
  • Address breakages either on segysak side or in pyzgy.
  • If OK, refactor the src so that zgy and segy can share the same core machinery but preserve the zgy and segy import paths for backwards compatibility. (This will likely be the biggest job).
  • Port segy tests to zgy.
  • Add ZGY functionality to the CLI tools. (Lower priority)

Are you happy to take this on? I can help with questions if you have any.

@trhallam trhallam changed the title Create a converter which does just a few inlines at a time, for large files. Implement pyzgy for ZGY files as segyio drop in replacement. Feb 15, 2022
@da-wad
Copy link

da-wad commented Feb 15, 2022

ZFP is still nearly in numcodecs. The blocker is lack of macOS wheels, which I can understand their need for. However, the old Travis CI/CD setup for zfpy with multibuild made it difficult and fragile to implement. Now https://github.com/LLNL/zfpy-wheels is using cibuildwheel through Github Actions it should be trivial™ to start building those and numcodecs can get on with their magic! (See: zarr-developers/numcodecs#303 (comment))

@musiczhzhao
Copy link

Thanks @trhallam and @da-wad for the details! Sorry didn't respond earlier as I was on something different.

Is there any suggestions on the easiest approach to drop pyzgy for segyio in the segysak.segy part of the software and see what breaks? I saw that segyio has been imported & used in many places and not sure a straight replacement of segyio with pysegy is a proper way. Also I am assuming no API changes?

Aside from this. I noticed that in segysak there is a requirement that the segy data loaded must be "regular" to be treated as 3D and thus can be further convert to zgy. This is preventing us to convert irregular segy data to zgy. May I quickly ask if this check is required to operate the zgy conversion? Is there a preferred way to improve this or if switching to use pysegy will have this issue resolved? I'm interest in having this part improved.

Some snapshots of the code:

segy/_segy_headers.py
image

segy/_segy_loader.py
image

openzgy/_openzgy.py
image

Best,
Zhao

@trhallam
Copy link
Owner Author

trhallam commented Mar 24, 2022

Hi Zhao,

I would simply start by replacing statements that say

import segyio

with

import pyzgy as segyio

and see what works and what breaks, are things missing from pyzgy that mirror the segyio behaviour.

You can load irregular data from SEGY using the segy_freeloader method. This gives you more flexibility. The limitation on the zgy writer is because only methods for 3D have been implemented. I think, most people working in Petrel use SEGY for 2D data anyway.

Can you explain what is irregular about your data? If it just has missing trace segysak can handle this, the general principle is that the byte locations describing the trace grid must be unique and orthogonal, e.g cdp or xline/iline or xline/iline/offset for example.

I'm also a little confused by your last comment. It sounds like you are trying to load SEGY and convert it to ZGY. If you are using ZGY then I'd assume you are using Schlumberger software. Why not just convert to ZGY using that? segysak is primarily designed to read SEGY into Python, it is not particularly performant for file conversion.

If this is your aim, you'll need to write quite a bit of code, because the segy_converter code which streams data from SEGY only outputs to Netcdf at the moment.

@trhallam
Copy link
Owner Author

@da-wad does pyzgy support writing to ZGY at this time? I couldn't see any write methods in the source.

@musiczhzhao
Copy link

Thanks @trhallam for the brilliant idea! Will definitely give a shot.

By irregular I mean different subline have different number of crosslines. "Missing traces" is a better way to understand probably. I do try to use this tool to do some experimental data conversion and I personally don't have convenient access to Schlumberger software unfortunately.

Can you help me to better understand the current limitation of segysak in converting data with missing traces from segy to zgy? Dose the intermediate NetCDF4 impose any constrains? Or we just need to add the ability to convert non 3d data in zgy_writter() method? (I assume the limitation is not from the OSDU python code.)

Best,
Zhao

@trhallam
Copy link
Owner Author

trhallam commented Mar 25, 2022

Ok, I think there was some confusion here with the original question, the comment in the code is to write a converter from zgy->xarray/netcdf, not to go from segy->zgy.

If you are having trouble loading data with missing traces it is likely you don't have the right byte locations specified on the loader.

Zgy as I understand it doesn't support dead traces because the regular geometry is fundamental to the underlying storage implementation. There may be some internal optimisations for size but the file is still defined by the ranges of iline and xline present in your data.

I think the best way forward here is to first get your data or a subset of it loading with the normal segy_loader, then try to write that subset with the current zgy_writer.

Once that is working, you can make a simpler version of the segy_converter function which loads your segy in smaller pieces before writing to a single Zgy.

The key here, is that you will need to pre-scrape the segy headers to build the full geometry for your volume. Most file formats don't allow dynamic expansion and Zgy is no different. The steps in your converter function should be:

  • scrape the segy headers using segy_header_scrape
  • define the full cube geometry for your ZGY file
  • use segy_loader to iterate loading sub-sets of your data into memory before writing them into your ZGY file

Long term if the package is to support conversion between many file formats it probably needs an interchange API between readers and writers but for the moment I think you could just write a function with the above steps that will meet your needs.

@da-wad
Copy link

da-wad commented Mar 25, 2022

@trhallam correct, pyzgy does not currently support writing ZGY files.

Regarding irregular 3D SEG-Y files, these could be converted to ZGY format if padded with zeroed traces so that they become regular.

I think it would be easier to give good advice to @musiczhzhao if he explains what he wants to achieve and why?

@trhallam
Copy link
Owner Author

@trhallam correct, pyzgy does not currently support writing ZGY files.

Regarding irregular 3D SEG-Y files, these could be converted to ZGY format if padded with zeroed traces so that they become regular.

I think it would be easier to give good advice to @musiczhzhao if he explains what he wants to achieve and why?

segysak already does this padding with zeros as long as the iline/xline byte locations specified can be used to create the full 3D volume grid.

@trhallam
Copy link
Owner Author

trhallam commented Oct 7, 2022

@musiczhzhao

Not sure if you are still interested in this, but I got around to incorporating pyzgy. It's written in using the new xarray backend API which means lazy loading of ZGY is support for large files.

You can try it out by installing the development branch
https://github.com/trhallam/segysak/tree/xarray_backend

Load ZGY files using the open_dataset method.

import xarray as xr
zgy = xr.open_dataset('file.zgy')

@musiczhzhao
Copy link

musiczhzhao commented Oct 21, 2022 via email

@trhallam
Copy link
Owner Author

zgy support is to be dropped from segysak in dev0.5.

Instead, ZGY support has been moved to Xarray native handling via a PR to pyzgy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants