-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some data providers encode var names and units in a format that Nappy doesn't parse #45
Comments
I have the same problem... For example, my VNAMEs look like name; unit description; unit, or However the default pattern is myfile.var_and_units_pattern
Out[32]: re.compile(r'^\s*(.*)\((.+?)\)(.*)\s*$', re.UNICODE) function _attemptVarAndUnitsMatch seems to do the pattern matching (name, unit, description) - but how to set the pattern correctly? Could we extend the NACore class by a setter method for the |
@FObersteiner, I have a branch that allows you to supply your own Python function to separate the names from the units: https://github.com/LukeJones123/nappy/tree/feature-var_units_parsing. Hopefully Ag will release a new version with this merged in soon. |
@LukeJones123 out of curiosity, why did you implement it as a callback? Wouldn't it be more straight-forward to add a setter method for the class attribute, so the user can provide a custom regex before actually reading the file? |
A custom regex was the obvious first choice but unfortunately I find the strings are so heterogeneously formatted that I need to implement some logic to catch all cases, hence a callback function. A custom regex would be a good additional option to have, for those with simpler needs. |
@LukeJones123 let me propose a way that does not require to modify the API but satisfies your needs nevertheless! 1- initialize var_and_units_pattern as an underscored class attribute: def __init__(self):
# ...
self._var_and_units_pattern = re.compile(r"^\s*(.*)\((.+?)\)(.*)\s*$") 2- make it a property with a setter: @property
def var_and_units_pattern(self):
"""
a regex or function to parse vnames to (name & description, unit)
"""
return self._var_and_units_pattern
@var_and_units_pattern.setter
def var_and_units_pattern(self, new_pattern):
if not isinstance(new_pattern, (re.Pattern, types.FunctionType)):
raise TypeError(f"new pattern must be re.Pattern or function, "
f"not {type(new_pattern)}")
self._var_and_units_pattern = new_pattern 3- include the callback function option in _attemptVarAndUnitsMatch: def _attemptVarAndUnitsMatch(self, item):
if isinstance(self._var_and_units_pattern, types.FunctionType):
return self._var_and_units_pattern(item)
match = self._var_and_units_pattern.match(item)
if match:
# ... ---> now you can easily set your custom parser or regex after loading the file: f.var_and_units_pattern = custom_parser --> A working implementation in my fork |
Hi Florian, A problem with your suggestion is that a callback function is not the same thing as a regular expression. Having an attribute called var_and_units_pattern that is sometimes set to a regular expression and sometimes set to a function is a bit confusing. Since they are such different things I think a callback function should have a separate attribute. Your suggestion would be good as a way to set the pattern to a non-default value, and being able to provide a non-default pattern would also be a good thing, but that should be a separate development I think. Luke. |
@LukeJones123 you're right, the attribute name would be misleading if it can be a function.
|
Am I correct that what you're suggesting is, instead of passing the custom parser function as an input to the NAFile init() method, you first create the NAFile object and then use a setter to set the attribute? The trouble with that is it means the init() method will never be able to make any attempt to read the file, because it must allow for the caller to change attributes relevant to file parsing even after object initialisation. I think, at the moment, the file header is actually read by init(). Even if it wasn't, under the current API it could be changed to do so in future. Allowing the setting of attributes required for file parsing after initialisation will prevent that forever (unless you implement a backwards incompatible API change). In that respect, this attribute is similar to the ignore_header_lines argument, which is also passed to init. I know the way I've done it looks a bit clunky but I think it's the best way. |
@LukeJones123 why shouldn't you be able to set an attribute (with a setter) upon initialization? Simple example: class myClass:
def __init__(self, **kwargs):
self.a_string = kwargs['the_string'] if "the_string" in kwargs else None
@property
def a_string(self):
return self._a_string
@a_string.setter
def a_string(self, value):
self._a_string = value
c = myClass()
print(c.a_string)
# None
c_with_string = myClass(the_string="hello")
print(c_with_string.a_string)
# hello I think the The advantage would be that we could also add deleter; meaning that you can change or remove the custom parser func after initialization of the na file class instance. cheers, Florian |
In your example the setter serves no purpose if modifying self._a_string after initialisation is too late to have the desired effect, which is the case for this callback function, so this does not match with the use case in question. There is also no sensible use case for deleting the callback function at a later time. Can we please just agree on the current solution? |
Again, I don't think this is true. Anyways, what about Ag merges your PR and I'll have a look if I can make it look a bit less chunky afterwards - without breaking the functionality you need of course. |
@LukeJones123 @agstephens I think this can be closed, no? |
For me the issue will be closed when the changes are released in the next version - 2.0.2. |
Description
Some of the files in the NDACC database do not have variable name+units strings that conform to the expectations of Nappy. For example, some providers separate the two with a semicolon, e.g.
This issue could be addressed by a more complicate regular expression, but others are more convoluted and regular expressions aren't enough. Thus would be good if a custom user callback function could be allowed to do the parsing.
What I Did
The text was updated successfully, but these errors were encountered: