Automate interpretation of _Unsigned attribute #1453

deeplycloudy · 2017-06-14T04:43:02Z

Closes Best practice when the _Unsigned attribute is present in NetCDF files #1444
Tests added / passed
Passes git diff upstream/master | flake8 --diff
Fully documented, including whats-new.rst for all changes and api.rst for new API

deeplycloudy · 2017-06-14T05:49:52Z

In addition to the included (basic) test I've also tested this with the real-world data that motivated the PR and #1444. While it's a working draft, I'd welcome comments on the basic approach and appropriateness of the test coverage.

shoyer · 2017-06-15T00:59:25Z

Instead of putting this alongside the mask_and_scale logic, can you make a separate class to do the dtype fixing in decode_cf_variable? Take a look at BoolTypeArray for an example.

deeplycloudy · 2017-06-20T02:03:12Z

I've created a new UnsignedIntTypeArray and have separated the logic from mask_and_scale. Lint has been removed and docs updated.

deeplycloudy · 2017-06-20T14:11:36Z

The CI fail is for 2.7/cdat/pynio in a couple of my new tests. In one, the fill value is not being applied, while in the other the unsigned conversion isn't happening. Are there any known differences in that cdat/pynio stack that would cause these to fail while others pass?

…. Move _Unsigned between attributes and encoding

deeplycloudy · 2017-07-16T16:29:10Z

Tests now pass after I realized I wasn't converting the _FillValue to unsigned.

I also turned off PyNIO's internal support for masking, in keeping with the philosophy that xarray should only use the backends to retrieve the bytes as represented on disk.

Note that some of the CI builds are skipping most of their tests (e.g, py=3.4; you can tell by the run time). This is a problem in other PRs as well.

jhamman

This is looking good. Can you run flake8 again to make sure there aren't any PEP8 violations. I just had one comment of substance.

We intentionally skip some I/O and plotting tests on some of the test matrix so we can test xarray with/without some optional dependencies.

jhamman · 2017-07-17T19:18:42Z

xarray/conventions.py

@@ -637,6 +665,13 @@ def maybe_encode_dtype(var, name=None):
                                  'any _FillValue to use for NaNs' % name,
                                  RuntimeWarning, stacklevel=3)
                data = duck_array_ops.around(data)[...]
+                if '_Unsigned' in encoding:


is _Unsigned = False a valid attribute?

If so, we need this to be if encoding.get('_Unsigned', False).

From the documentation I assembled as part of the same issue in NetCDF4-python, it looks like _Unsigned = "true" is present if the data are unsigned, with no mention of _Unsigned = "false". I haven't seen an example of False in the wild, but, your suggestion is consistent with the patch to NetCDF4-python for this issue.

I made the change you recommended on the encode side.

On decode there is a corresponding check of the attribute, so I made a change there, too.

jhamman · 2017-07-17T19:26:42Z

xarray/tests/test_backends.py

+        self.assertDatasetIdentical(encoded,
+                                    create_encoded_unsigned_masked_scaled_data(
+                                    )
+                                    )


let's pep8 this:

self.assertDatasetIdentical( encoded, create_encoded_unsigned_masked_scaled_data())

Done. That's much better.

jhamman · 2017-07-17T19:29:55Z

xarray/tests/test_backends.py

            self.assertDatasetAllClose(encoded, actual)
        # make sure roundtrip encoding didn't change the
        # original dataset.
        self.assertDatasetIdentical(encoded,
-                                    create_encoded_masked_and_scaled_data())
+                                    create_encoded_masked_and_scaled_data()
+                                    )


pep8 / move last ). This whole call should actually fit on one line.

jhamman

I think we're good to go here. @shoyer - I'll let you have the final review/merge.

shoyer · 2017-07-18T19:44:48Z

Sounds good, I'll take another look shortly.

…

On Tue, Jul 18, 2017 at 10:46 AM, Joe Hamman ***@***.***> wrote: ***@***.**** approved this pull request. I think we're good to go here. @shoyer <https://github.com/shoyer> - I'll let you have the final review/merge. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1453 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABKS1nsL9Rg-WevNBgY8i30AFt5SJGuBks5sPO-FgaJpZM4N5YIf> .

shoyer · 2017-07-19T01:54:27Z

xarray/conventions.py

@@ -637,6 +665,13 @@ def maybe_encode_dtype(var, name=None):
                                  'any _FillValue to use for NaNs' % name,
                                  RuntimeWarning, stacklevel=3)
                data = duck_array_ops.around(data)[...]
+                if encoding.get('_Unsigned', False):
+                    unsigned_dtype = 'i%s' % dtype.itemsize
+                    old_fill = np.asarray(attrs['_FillValue'])


This block should be guarded in a check to verify that _FillValue is actually defined as an attribute.

shoyer · 2017-07-19T01:55:56Z

xarray/conventions.py

@@ -786,6 +822,16 @@ def decode_cf_variable(var, concat_characters=True, mask_and_scale=True,
            dimensions = dimensions[:-1]
            data = CharToStringArray(data)

+    pop_to(attributes, encoding, '_Unsigned')
+    is_unsigned = encoding.get('_Unsigned', False)
+    if (is_unsigned) and (mask_and_scale is True):


You don't need the extra parentheses here, and use implicit boolean checks instead of is True. So this should be: if is_unsigned and mask_and_scale

shoyer · 2017-07-19T02:02:36Z

xarray/conventions.py

+                # Need to convert the fill_value to unsigned, too
+                # According to the CF spec, the fill value is of the same
+                # type as its variable, i.e. its storage format on disk
+                fill_value = np.asarray(fill_value, dtype=data.unsigned_dtype)


Can we simply cast fill_value to the dtype of data unilaterally here? e.g., fill_value = np.asarray(fill_value, dtype=data.dtype) without the if is_unsigned and has_fill check?

That seems a little more robust to me.

Done. I had to leave the has_fill check because fill_value will be None in cases where there is no fill_value attribute.

shoyer · 2017-07-19T02:03:08Z

xarray/tests/test_backends.py

+            self.assertDatasetAllClose(decoded, actual)
+        with self.roundtrip(decoded,
+                            open_kwargs=dict(decode_cf=False)) as actual:
+            # TODO: this assumes that all roundtrips will first


please remove this redundant TODO note

shoyer · 2017-07-19T02:04:02Z

xarray/tests/test_backends.py

+                            open_kwargs=dict(decode_cf=False)) as actual:
+            # TODO: this assumes that all roundtrips will first
+            # encode.  Is that something we want to test for?
+            self.assertDatasetAllClose(encoded, actual)


I'm not sure assertDatasetAllClose checks dtypes. It would be good to check explicitly here.

shoyer · 2017-07-28T17:39:11Z

Thanks @deeplycloudy !

deeplycloudy mentioned this pull request Jun 20, 2017

Best practice when the _Unsigned attribute is present in NetCDF files #1444

Closed

deeplycloudy force-pushed the unsigned-attr branch from fe541dd to 31d13ac Compare July 16, 2017 15:52

deeplycloudy added 13 commits July 16, 2017 10:53

Add support for _Unsigned attribute

4bfeee7

Update docstrings with new unsigned behavior

32a67f3

Cast to unsigned with copy instead of view, fixing infinite recursion…

ddbc99d

…. Move _Unsigned between attributes and encoding

Fix default argument for is_unsigned

11a39d0

Separate test for unsigned roundtrip

e248ae5

Move unsigned support out of mask_and_scale, update whats-new

079a7e5

Fix what's new date and add issue

e06f657

Putting enhancement in correct section of whats-new

b56cb75

Turn off _FillValue support provided by PyNIO. Let xarray handle it.

8b7df53

Convert _FillValue when _Unsigned is present

466928c

PEP8

31b9aea

No need to convert unsigned fill value if there is not fill value

4a7e481

PEP8

d6a90c7

deeplycloudy force-pushed the unsigned-attr branch from 31d13ac to d6a90c7 Compare July 16, 2017 16:08

jhamman reviewed Jul 17, 2017

View reviewed changes

deeplycloudy added 2 commits July 17, 2017 16:37

yet more PEP8

bef02b4

Be more careful with _Unsigned attribute check

748a106

jhamman approved these changes Jul 18, 2017

View reviewed changes

jhamman added topic-backends topic-CF conventions labels Jul 18, 2017

shoyer reviewed Jul 19, 2017

View reviewed changes

Better fencing for attribute checks. Style fixes. Test for dtypes.

9bec545

shoyer merged commit e3e6db5 into pydata:master Jul 28, 2017

jhamman modified the milestone: 0.10 Aug 4, 2017

jhamman mentioned this pull request Sep 19, 2017

Support for unsigned data #1579

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automate interpretation of _Unsigned attribute #1453

Automate interpretation of _Unsigned attribute #1453

deeplycloudy commented Jun 14, 2017 •

edited

Loading

deeplycloudy commented Jun 14, 2017

shoyer commented Jun 15, 2017

deeplycloudy commented Jun 20, 2017

deeplycloudy commented Jun 20, 2017

deeplycloudy commented Jul 16, 2017

jhamman left a comment

jhamman Jul 17, 2017

deeplycloudy Jul 17, 2017 •

edited

Loading

jhamman Jul 17, 2017

deeplycloudy Jul 17, 2017

jhamman Jul 17, 2017

deeplycloudy Jul 17, 2017

jhamman left a comment

shoyer commented Jul 18, 2017 via email

shoyer Jul 19, 2017

shoyer Jul 19, 2017

deeplycloudy Jul 21, 2017

shoyer Jul 19, 2017

deeplycloudy Jul 21, 2017

shoyer Jul 19, 2017

deeplycloudy Jul 21, 2017

shoyer Jul 19, 2017

deeplycloudy Jul 21, 2017

shoyer commented Jul 28, 2017

Automate interpretation of _Unsigned attribute #1453

Automate interpretation of _Unsigned attribute #1453

Conversation

deeplycloudy commented Jun 14, 2017 • edited Loading

deeplycloudy commented Jun 14, 2017

shoyer commented Jun 15, 2017

deeplycloudy commented Jun 20, 2017

deeplycloudy commented Jun 20, 2017

deeplycloudy commented Jul 16, 2017

jhamman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deeplycloudy Jul 17, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhamman left a comment

Choose a reason for hiding this comment

shoyer commented Jul 18, 2017 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shoyer commented Jul 28, 2017

deeplycloudy commented Jun 14, 2017 •

edited

Loading

deeplycloudy Jul 17, 2017 •

edited

Loading