-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Downloaded tif files are black #8
Comments
Hi @blumenstiel - thanks for bringing this up! I had a look too and it does seem like these two cells are indeed corrupted. We made no changes to the original values, so like in the original Sentinel-2 data, 0 should represent no data (as far as I'm aware). It is somewhat unlikely that the corruption occurred during the upload, so we will investigate soon. If needed we can update the corresponding parquet file. Are there more files that are completely black that you found? |
Hi @mikonvergence, thanks for looking into it! I checked another 100 random samples and got 14 corrupted files:
So I assume that this potentially affects 10-20% of the gird cells. I did not manually check the samples but based on my code, each of these grid cell should either have only NaN values in S1 or S2. Maybe add a quick check after downloading/before uploading to your processing scripts? |
Hi, we're looking into this! Thanks for bringing to our attention. Doing some digging, there is a small percentage of S2 tiles (1.3%) which have 100% no-data (==0). I guess you got very unlucky, or something about your search made them more likely? Regardless, not sure why this has happened in the first place and why it got past our checks. Seems that all the IDs you list here have nodata==1.0 in the metadata (except the last grid tile, which I manually verified and it has an image over the sea, albeit a dark one). So, for now, I recommend explicitly filtering out tiles with 100% nodata percentage (the value is a ratio between 0-1, as sometimes we get images that are partially nodata). As I say, thanks for bringing this to our attention, we will look into correcting/removing these!! |
Thank you @aliFrancis! I forgot to look at the no-data column, this explains a lot. |
I downloaded some data and noticed that some S2 data is completely black, e.g., grid cell
207D_1378R
or438U_1009R
. The S1 data looks fine.I used the
filter_download
function that is provided in this repo, I tested with and withoutby_row
. I also testedImage.open(BytesIO(table[col][0].as_py())).show()
with the same result.The tif files do not include a
FillValue
. I assume 0 is used for NaN values?Is it possible that some data got corrupted during the download or upload to HF?
The text was updated successfully, but these errors were encountered: