Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Timeseries can have timestamps + data that don't match in length #1536

Open
3 tasks done
jonahpearl opened this issue Aug 11, 2022 · 4 comments · May be fixed by #1538
Open
3 tasks done

[Bug]: Timeseries can have timestamps + data that don't match in length #1536

jonahpearl opened this issue Aug 11, 2022 · 4 comments · May be fixed by #1538
Labels
category: bug errors in the code or code behavior category: enhancement improvements of code or code behavior priority: medium non-critical problem and/or affecting only a small set of NWB users

Comments

@jonahpearl
Copy link

What happened?

I use the convenient timeseries pointer feature, where you can just store timestamps in one timeseries and have many others point to them. This saves a lot of memory! However, I just fixed a bug that arose because some data I stored in a timeseries didn't have the same length as the timeseries (due to a processing mistake). Then I went and checked, and it turns out that timeseries' timestamps don't need to have the same length as the data at all! I consider this a bug, given that timeseries are meant as a core unit of data in NWB, but let me know if I'm just using them wrong.

Steps to Reproduce

# normal case
t = nwb.TimeSeries(name='test', data=np.arange(10), timestamps=np.arange(10), unit='sec')

# I expect a value error here
nwb.TimeSeries(name='test', data=np.arange(10), timestamps=np.arange(5), unit='sec')

# This similarly should raise an error imo
s = nwb.TimeSeries(name='test2', data=np.arange(2), timestamps=t, unit='sec')

Traceback

n/a

Operating System

macOS

Python Executable

Conda

Python Version

3.9

Package Versions

nwb 2.0.0

Code of Conduct

@CodyCBakerPhD
Copy link
Collaborator

Hey there,

When running the latest version of PyNWB, (v2.1.0 - try pip install -U pynwb to update), you should see a warning along the lines of

UserWarning: <Object type> '<Object name>': Length of data does not match length of timestamps. Your data may be transposed. Time should be on the 0th dimension.

whenever such a size mismatch occurs.

As for why it's only a warning instead of an error, I believe the reasoning was to not break back-compatibility (is that right @lry @bendichter @oruebel?) or otherwise be too strict, as it can be a commonly observed mistake to unintentionally transpose the dimensions of a data field.

This is also a CRITICAL (highest-level) check implemented in the companion tool, the NWB Inspector (check implemented here). Though again I think the philosophy here is for something like this to not be an error that prevents the creation of the NWBFile, but rather a post-production quality check on the written file.

@oruebel
Copy link
Contributor

oruebel commented Aug 11, 2022

As for why it's only a warning instead of an error, I believe the reasoning was to not break back-compatibility (is that right

Yes, that is correct. If we had made it an error then read of files that have this issue would not work due to the ValueError being triggered on read.

However, as of the most recent HDMF release, we now have the ability to detect in __init__ of Container classe (e.g,. TimeSeries) whether we are in construct mode (i.e., automatic construction during read) or not. This was implemented in hdmf-dev/hdmf#751 . #1516 shows how this can be used in PynWB to implement checks in a way that you warn when reading from file and raise an error when constructing new data. Essentially, what it boils down to in most cases is that you simply call self._error_on_new_warn_on_construct(error_msg="my message") in __init__, which will either: 1) raise a ValueError when in user mode, 2) raise a warning when in construct mode (i.e., read from file), or 3) do nothing if error_msg=None is passed. In this way you can implement check function to either return a string or None and then simply call self._error_on_new_warn_on_construct(error_msg=self._check_timestamps_valid()) in __init__. If you need more complex behavior then you can also check if self._in_construct_mode: directly to distinguis between behavior during constuction on read and new creation by the user.

Long story short, with this, the check whether timestamps have the correct length could be updated to raise an error in user mode and still preserve backward compatibility by only raising a warning when reading an invalid file.

CC @weiglszonja since she implement this behavior for ImageSeries

@oruebel oruebel added category: enhancement improvements of code or code behavior priority: medium non-critical problem and/or affecting only a small set of NWB users category: bug errors in the code or code behavior labels Aug 11, 2022
@jonahpearl
Copy link
Author

jonahpearl commented Aug 11, 2022 via email

@weiglszonja
Copy link
Collaborator

Thank you for raising this, I started working on a PR that should fix this behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: bug errors in the code or code behavior category: enhancement improvements of code or code behavior priority: medium non-critical problem and/or affecting only a small set of NWB users
Projects
None yet
4 participants