Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallout of _Unsigned = "true": variable.dtype != variable[:].dtype #675

Closed
5tefan opened this issue Jun 6, 2017 · 3 comments
Closed

Fallout of _Unsigned = "true": variable.dtype != variable[:].dtype #675

5tefan opened this issue Jun 6, 2017 · 3 comments

Comments

@5tefan
Copy link

5tefan commented Jun 6, 2017

Checking out #674, related to _Unsigned = "true" from #656 and #671, I suppose the following is a question of semantics.

Should Variable.dtype return the data type on disk, or the datatype that the user will receive from __getitem__? Right now this test fails. I think it should pass, reasoning that:

  • The real datatype on disk should be completely abstracted, similar to endianness.
  • User can expect type of data returned to be consistent with result of file.variables["var"].dtype

If this test should indeed pass, I would be happy to initiate a PR with this test and a solution. The solution that came to mind would be to change references to self.dtype inside the Variable class to self._dtype and then expose dtype as a property, thoughts?

import unittest
import netCDF4
import tempfile
import numpy as np
import os


class TestUnsignedMasking(unittest.TestCase):
    def setUp(self):
        _, self.filename = tempfile.mkstemp()
        with netCDF4.Dataset(self.filename, "w") as nc_file:
            nc_file.createDimension("time", None)

            nc_x = nc_file.createVariable("x", np.int8, dimensions=("time"), fill_value=np.int8(-1))
            nc_x._Unsigned = "true"

            x = np.arange(10, dtype=np.int8)
            nc_file.variables["x"][:] = np.ma.masked_where(x > 5, x)

    def tearDown(self):
        os.remove(self.filename)

    def test_unsigned(self):
        with netCDF4.Dataset(self.filename, "r") as nc_file:
            self.assertTrue(nc_file.variables["x"].dtype == nc_file.variables["x"][:].dtype)

if __name__ == '__main__':
    unittest.main()

Results:

FAIL: test_unsigned (__main__.TestUnsignedMasking)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tst_Unsigned_masking.py", line 25, in test_unsigned
    self.assertTrue(nc_file.variables["x"].dtype == nc_file.variables["x"][:].dtype)
AssertionError: False is not true
@jswhit
Copy link
Collaborator

jswhit commented Jun 6, 2017

Right now Variable.dtype has to be the datatype on disk - that assumption is made internally in the module. Could be changed to _dtype, as you suggest. However, I'm not sold on this - the case could be made that the dtype should reflect the netcdf data type. Variable.dtype has always been different than the numpy dtype when scale_factor/add_offset is set. I view _Unsigned as another sort of rescaling.

@jswhit
Copy link
Collaborator

jswhit commented Jun 6, 2017

Note: Variable.dtype == numpy dtype if Variable.set_auto_scale(False)

@5tefan
Copy link
Author

5tefan commented Jun 7, 2017

Ah, the picture is definitely more consistent thinking about _Unsigned as a scaling.

@5tefan 5tefan closed this as completed Jun 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants