-
-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for GOES and Himawari Satellite Imagery #222
Comments
hey @jacobbieker - for my understanding, the task is to download the GOES/Himawari data as NETCDF and upload it as a dataset on hugginface, similar to what you are trying to accoplish by creating the |
Hi, Somewhat, we want to add the ability to convert the native files from Himawari and GOES to Zarr with a similar format as the Google Public Dataset version of the EUMETSAT imagery, so it can all be accessed in the same way. Ideally it would also work for the GOES 13 to 15 imagery as well, which is not available on AWS and has to be accessed from the NOAA CLASS archive. |
Hi @jacobbieker - according to understanding of the task is to add the capability for Satip to process and convert native GOES/Himawari files from NetCDF to Zarr format, similar to how it currently handles EUMETSAT data. Additions to be made are :
As someone very new to the field of satellite image processing, I'd like to begin with a small task to gain a comprehensive understanding of the codebase. I would greatly appreciate any help you can offer in this regard. |
Hi, Yes, that is all correct! The smallest first task would probably be to use Goes2-go to add a download manager for GOES, or alternatively just download from the AWS bucket directly. The conversion to Zarr should also be fairly straightforward, as Of the two, I would probably go with trying to get GOES-2-go to download the data first, that might be the most straightfoward. |
@jacobbieker Thanks for the steps. I tried taking a stab at it. The below code snippet can download the files from the goes.
This works for me locally. Would the next step be to convert this data to Zarr format? Let me know if my understanding is correct. |
Hi, That is a good start, but we want to be able to give a datetime or range of dates and have the downloader download all the images during that time, not just the latest images. But once being able to select dates to download and downloading those dates, then the next step would be to convert the data to Zarr. For this, you should be able to open the NetCDF files that are downloaded with |
Hey @jacobbieker, Thanks for your reply. I have raised a PR that adds the GOES Data Download Manager Script. I also have a couple of questions regarding its integration:
|
Thanks for the PR! I'll look over it soon. For this architecture, we want it to be the same interface for getting all the different satellite imagery, so integrating it with the current |
hi @jacobbieker, is this issue still open? From my understanding, we have to merge eumetsat and goes in DownloadManager file itself. One way I thought of doing this is by creating a common or base class which could be used by the eumetsat and goes class individually, and then the actual DownloadManager which acts the main entry point. Please let me know if I'm in the right direction. |
HI. @jacobbieker , @suleman1412 , while you are working on the common download class for different satellites, The best source I could find is the WorldScienceDataBank run by NICT https://sc-web.nict.go.jp/himawari/himawari-archive.html. |
Hi,sorry for the delayed response, but yes, this would be the way to go. |
Yeah, that would be great, that data source is the same one that I found for it too. So yeah, if you want to go ahead and start adding that downloader, we can integrate it with the above later. |
I've made (or adopted from satip) a simple downloader for Himawari satellite. I noticed the noaa's AWS based data provision has better flexibility, so I changed my plan and downloaded data from there instead from NICT. The currently available time range is rather limited, but if I am not mis-informed this AWS will be the mainstream for the future Himawari and other satellite data provision in near future, so better investing time there than to NICT. @jacobbieker |
Hey, yeah, AWS seems to be the future for Himawari satellite data, but being able to access data from NICT will also be helpful in the future, as it allows for getting a longer archive of data, back before the start date of the AWS datasets. But for a first pass, just the AWS data is still a good thing to add! For EUMETSAT, we calculated the min and max across ~1000 randomly sampled raw images, and then used that to scale between 0 and 1ish (there are and will be data outside of those ranges in the final dataset, but most will fall within that range). I'd prefer not to throw away any of the data or information, just rescale it. If its not perfectly between 0 and 1 that is also okay though. |
Thank you for the explanation. Integrating both NICT and AWS data sounds ideal, and I'll consider adding NICT data after completing my current implementation. Regarding data compression, I'm not familiar with how rescaling to a 0-1 range reduces data size. Could you provide a link or more details on this method? Is it related to using fewer exponent bytes in float point expressions? I understand the priority is to maintain the original data as unchanged as possible. I’m currently compressing data into uint8 by normalizing values from min to max (0 to 255), reducing size to a quarter compared to saving as float. However, this may lose precision for small value differences, as the resolution for small differences is (max - min)/256. Using the 0.3 percentile was sufficient since differences smaller than this disappear with compression anyway. However, to cover rare extreme values, this method may not be enough. If storage allows, I can use a more generous compression method to better preserve data quality. |
We rescale the data to 0-1 for a few reasons. 1. We aren't allowed to redistributed EUMETSAT data in the original data values, so rescaling fixes that. 2. It puts the different bands in the same range of values and already normalized for ML usage, where we usually want the data between 0 and 1. It doesn't particularly help with compression, but helps with downstream tasks. As for converting to ints, that's fine, but in the code, we'd like to keep it as float and all the information that we can, since we use lossless compression we won't be throwing any of the data away anyway, and it can be helpful to have that extra information. If storage space is a concern, we could always set you up to be able to push Zarrs to our Hugging Face, or Source Cooperative so it can be stored, and easily accessible to everyone, in those places. |
Now I understand the purpose of rescaling. I'll make an option to normalize data by maybe 4 sigma or bigger to archive equivalent result will less computation cost. However, redistribution in original format is not restricted for Himawari, so I will keep the data as is for the default. I'm currently working on my laptop, and will have access to a research computing system in October. After I tested my code there I will send a pull request. |
It would be great if we could support GOES and Himawari satellite imagery. Between the 2 GOES and Himawari, this would then support global geostationary coverage of satellite imagery. The idea is to essentially make Satip not EUMETSAT specific, but more akin to NWP-Consumer, but for satellite imagery.
Detailed Description
GOES-16,-17,-18 support could be fairly easy through Microsoft's Planetary Computer, which has the NetCDF or GeoTIFF easily available there, which could be opened with rioxarray, which is already part of this repo. There are NetCDF Himawari live images on AWS as well (GOES is also there) so data access should not be a problem. The public data is available from 2017 for GOES, and 2020 for Himawari.
Ideally, it would also support older GOES and Himawari imagery, Satpy supports GOES-13,-14,-15 imagery which goes back to around 2010, and Himawari is available in archives from JAXA. This could give a global, decade-ish archive of imagery which could be quite useful for a lot of studies and training models nearly anywhere in the world.
Context
I would use it in either Dagster, or planetary-datasets for processing archival imagery and making it available on Hugging Face, which in addition to the public, near-realtime archives would be quite an impactful project. Already having the entire EUMETSAT MSG RSS datset available (for just about a year) has results in one paper we know of using the entire archive for a paper on solar forecasting. Extending that to have global coverage is a natural extension.
Possible Implementation
It could be a combination of unifying accessing the satellite imagery in different ways. For example, for Himawari data, I am creating
kerchunk
dataset of the public archival NetCDF files for Himawari-8 and -9 here: https://huggingface.co/datasets/jacobbieker/himawari8-kerchunk so OCF could copy that and keep extending it. GOES-17,-18-19 could be pulled from Planetary Computer, or do something similar as for Himawari, and pull the data straight from the AWS NetCDF archive.For older GOES imagery, the archive is available at NOAA's CLASS archive, which is free to download and redistribute. The data actually goes back at least 2 generations of GOES satellites, so the archive could go back to the early 2000s, or earlier. But I would propose just going back one generation of GOES satellites, so 2011ish, which would mostly match with the EUMETSAT archive (~2008).
For older Himawari imagery, it is available through DIAS, although there are more licensing restrictions, and I'm not necessarily sure if we could redistribute the data. But we could atleast include the more recent Himawari imagery that is publicly available.
The text was updated successfully, but these errors were encountered: