[Bug]: Timeseries can have timestamps + data that don't match in length #1536

jonahpearl · 2022-08-11T18:45:56Z

What happened?

I use the convenient timeseries pointer feature, where you can just store timestamps in one timeseries and have many others point to them. This saves a lot of memory! However, I just fixed a bug that arose because some data I stored in a timeseries didn't have the same length as the timeseries (due to a processing mistake). Then I went and checked, and it turns out that timeseries' timestamps don't need to have the same length as the data at all! I consider this a bug, given that timeseries are meant as a core unit of data in NWB, but let me know if I'm just using them wrong.

Steps to Reproduce

# normal case
t = nwb.TimeSeries(name='test', data=np.arange(10), timestamps=np.arange(10), unit='sec')

# I expect a value error here
nwb.TimeSeries(name='test', data=np.arange(10), timestamps=np.arange(5), unit='sec')

# This similarly should raise an error imo
s = nwb.TimeSeries(name='test2', data=np.arange(2), timestamps=t, unit='sec')

Traceback

n/a

Operating System

macOS

Python Executable

Conda

Python Version

3.9

Package Versions

nwb 2.0.0

Code of Conduct

I agree to follow this project's Code of Conduct
Have you checked the Contributing document?
Have you ensured this bug was not already reported?

CodyCBakerPhD · 2022-08-11T18:58:23Z

Hey there,

When running the latest version of PyNWB, (v2.1.0 - try pip install -U pynwb to update), you should see a warning along the lines of

UserWarning: <Object type> '<Object name>': Length of data does not match length of timestamps. Your data may be transposed. Time should be on the 0th dimension.

whenever such a size mismatch occurs.

As for why it's only a warning instead of an error, I believe the reasoning was to not break back-compatibility (is that right @lry @bendichter @oruebel?) or otherwise be too strict, as it can be a commonly observed mistake to unintentionally transpose the dimensions of a data field.

This is also a CRITICAL (highest-level) check implemented in the companion tool, the NWB Inspector (check implemented here). Though again I think the philosophy here is for something like this to not be an error that prevents the creation of the NWBFile, but rather a post-production quality check on the written file.

oruebel · 2022-08-11T19:27:55Z

As for why it's only a warning instead of an error, I believe the reasoning was to not break back-compatibility (is that right

Yes, that is correct. If we had made it an error then read of files that have this issue would not work due to the ValueError being triggered on read.

However, as of the most recent HDMF release, we now have the ability to detect in __init__ of Container classe (e.g,. TimeSeries) whether we are in construct mode (i.e., automatic construction during read) or not. This was implemented in hdmf-dev/hdmf#751 . #1516 shows how this can be used in PynWB to implement checks in a way that you warn when reading from file and raise an error when constructing new data. Essentially, what it boils down to in most cases is that you simply call self._error_on_new_warn_on_construct(error_msg="my message") in __init__, which will either: 1) raise a ValueError when in user mode, 2) raise a warning when in construct mode (i.e., read from file), or 3) do nothing if error_msg=None is passed. In this way you can implement check function to either return a string or None and then simply call self._error_on_new_warn_on_construct(error_msg=self._check_timestamps_valid()) in __init__. If you need more complex behavior then you can also check if self._in_construct_mode: directly to distinguis between behavior during constuction on read and new creation by the user.

Long story short, with this, the check whether timestamps have the correct length could be updated to raise an error in user mode and still preserve backward compatibility by only raising a warning when reading an invalid file.

CC @weiglszonja since she implement this behavior for ImageSeries

jonahpearl · 2022-08-11T20:34:31Z

Got it, thank you both! I will update my version to get my desired behavior as a warning for now, which will at least alert me to it.

…

On Aug 11, 2022, at 3:28 PM, Oliver Ruebel ***@***.***> wrote: As for why it's only a warning instead of an error, I believe the reasoning was to not break back-compatibility (is that right Yes, that is correct. If we had made it an error then read of files that have this issue would not work due to the ValueError being triggered on read. However, as of the most recent HDFM release, we now have the ability to detect in __init__ of Container classe (e.g,. TimeSeries) whether we are in construct mode (i.e., automatic construction during read) or not. This was implemented in hdmf-dev/hdmf#751 <hdmf-dev/hdmf#751> . The following #1516 <#1516> shows how this can be used to implement checks in a way that you warn when reading from file and raise an error when constructing new data. Essentially, what it boils down in most cases is that you simply call self._error_on_new_warn_on_construct(error_msg="my message") in __init__, which will either: 1) raise a ValueError when in user mode, 2) raise a warning when in construct mode (i.e., read from file), or 3) do nothing if error_msg=None is passed. In this way you can implement check function to either return a string or None and then simpy call self._error_on_new_warn_on_construct(error_msg=self._check_timestamps_valid()) in __init__. If you need more complex behavior then you can also check if self._in_construct_mode: directly to distinguis between behavior during constuction on read and new creation by the user. Long story short, with this, the check whether timestamps have the correct length could be updated to raise an error in user mode and still preserve backward compatibility by only raising a warning when reading an invalid file. — Reply to this email directly, view it on GitHub <#1536 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQKOLZCCDVG3E5FXHSWYPGDVYVH4NANCNFSM56JF4MRA>. You are receiving this because you authored the thread.

weiglszonja · 2022-08-11T21:16:59Z

Thank you for raising this, I started working on a PR that should fix this behavior.

oruebel added category: enhancement improvements of code or code behavior priority: medium non-critical problem and/or affecting only a small set of NWB users category: bug errors in the code or code behavior labels Aug 11, 2022

weiglszonja linked a pull request Aug 11, 2022 that will close this issue

[BUG] Fix TimeSeries data does not match length of timestamps should raise an error when created #1538

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Timeseries can have timestamps + data that don't match in length #1536

[Bug]: Timeseries can have timestamps + data that don't match in length #1536

jonahpearl commented Aug 11, 2022

CodyCBakerPhD commented Aug 11, 2022

oruebel commented Aug 11, 2022 •

edited

Loading

jonahpearl commented Aug 11, 2022 via email

weiglszonja commented Aug 11, 2022

[Bug]: Timeseries can have timestamps + data that don't match in length #1536

[Bug]: Timeseries can have timestamps + data that don't match in length #1536

Comments

jonahpearl commented Aug 11, 2022

What happened?

Steps to Reproduce

Traceback

Operating System

Python Executable

Python Version

Package Versions

Code of Conduct

CodyCBakerPhD commented Aug 11, 2022

oruebel commented Aug 11, 2022 • edited Loading

jonahpearl commented Aug 11, 2022 via email

weiglszonja commented Aug 11, 2022

oruebel commented Aug 11, 2022 •

edited

Loading