Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make a wrapped lookup table to time-of-flight #180

Merged
merged 47 commits into from
Feb 11, 2025
Merged

Conversation

nvaytet
Copy link
Member

@nvaytet nvaytet commented Feb 6, 2025

For computing time of flight, instead of having a complicated graph where we first unwrap the time of arrivals, then fold them with the pulse period etc...
Screenshot_20250206_162251
we make a lookup table which can directly lookup the time-of-flight using the raw event_time_offset.
The graph now looks like
Screenshot_20250206_162613

The lookup table before:
Screenshot_20250206_164454

After (with wrap around):
Screenshot_20250206_164412

To handle pulse skipping, we now add an extra pulse dimension to the lookup table.

Before:
Screenshot_20250206_162654

After:
Screenshot_20250206_164616

This will make tof cheaper to compute and easier to incorporate the workflow into existing reduction workflow, including workflow control via widgets.
Another added bonus is that the resolution in the event_time_offset dimension is now much more predictable/controllable, as the range is always [0, 71ms] (before it depended on the choppers and the detector distances).

Finally, I also changed the WFM tests to use a tof simulation instead of the list of 6 manually chosen neutrons, as it is more consistent with the other unwrapping tests and catches more potential error (such as neutrons randomly being lost at the edge of the frames).

…or a cleaner split between histogram and event mode
@nvaytet nvaytet marked this pull request as ready for review February 7, 2025 10:28
Comment on lines 168 to 184
wavs = sc.broadcast(
simulation.wavelength.to(unit="m"), sizes=toas.sizes
).flatten(to="event")
dist = sc.broadcast(distances + simulation_distance, sizes=toas.sizes).flatten(
to="event"
)
tofs = dist * (sc.constants.m_n / sc.constants.h)
tofs *= wavs

data = sc.DataArray(
data=sc.broadcast(simulation.weight, sizes=toas.sizes).flatten(to="event"),
coords={
"toa": toas.flatten(to="event"),
"tof": tofs.to(unit=time_unit, copy=False),
"distance": dist,
},
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just flatten data, instead of all the pieces?

Comment on lines 225 to 235
# First, extend the table to the right by 1, and set the coordinate to pulse_period.
slab = sc.empty_like(table['event_time_offset', 0])
slab.coords['event_time_offset'] = pulse_period
table = sc.concat([table, slab], dim='event_time_offset')
# Then, copy over the values. Instead of using pulse_stride, we use the number of
# pulses in the table, as it could be that there were no events in the first pulse.
npulses = table.sizes['pulse']
for i in range(npulses):
pulse = (i + 1) % npulses
left_edge = table.data['pulse', pulse]['event_time_offset', 0]
table.data['pulse', i]['event_time_offset', -1] = left_edge
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this does not look wrong, I am a bit confused as to why we need to handle pulse-skipping as done above. Why group by pulse early and then deal with the mess here? Can't we just process based on the actual frame length (say 2*71ms), and then fold to obtain a pulse dim in the end?

Not saying it has to change, just wondering if it is simpler, without having thought through every detail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at this for too long, I get lost in the details and don't always manage to take a step back and think about it more on the higher conceptual level.

This is a great suggestion, I implemented it and it works 👍

Comment on lines +354 to +360
pulse_index = (
(
(da.bins.coords['event_time_zero'] - tmin).to(unit=eto_unit)
+ 0.5 * pulse_period
)
% frame_period
) // pulse_period
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, if we process data in chunks, without being pulse-skipping aware, won't we get inconsistent pulse_index for various chunks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understood which use case you were refering to. Do you mean if we are e.g. processing the data with the StreamProcessor?
Is it because the tmin would be different every time new data comes in?
Does it mean we would need a reference time like the run start time and read that from nexus?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was referring to something like StreamProcessor. I don't know if it has to come from NeXus, maybe it can be from the first chunk, or something, but simply looking at every chunk seems wrong?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a comment a defer this to another PR?

Resolution of the time of arrival axis in the lookup table.
Can be an integer (number of bins) or a sc.Variable (bin width).
Number of bins to use for the time of arrival (event_time_offset) axis in the lookup
table. Should be around 1000.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume this should be related to the instrument resolution? So instruments with a 1% resolution may need different values than ones with 5%?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's better to ask for a resolution in e.g. microseconds instead of a number of bins?

The reason why I did not do that was because the range needs to be exactly [0, 71ms], and if the 71ms is not a multiple of the bin width given by the user, we have to (silently?) modify the resolution that was actually given to us.
If they give a number of bins, we can guarantee that what they put in is reflected in the output.

That said, we could always go with the physical resolution if we properly document that we will guarantee at least what they asked for, but the resolution in the table could actually be a little finer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I changed it to be a bin width with a unit, instead of an integer.

Comment on lines 50 to 58
simulation: SimulationResults,
ltotal_range: LtotalRange,
distance_resolution: DistanceResolution,
toa_resolution: TimeOfArrivalResolution,
time_resolution: TimeResolution,
pulse_period: PulsePeriod,
pulse_stride: PulseStride,
pulse_stride_offset: PulseStrideOffset,
error_threshold: LookupTableRelativeErrorThreshold,
) -> TimeOfFlightLookupTable:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is so long that the probability that all bugs are caught in code review approaches zero. Can you split it up, so components can be reasoned about and tested individually? Maybe this should turn into a class, but free functions is also an option.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -6,7 +6,7 @@ pooch
pytest
scipy>=1.7.0
scippnexus @ git+https://github.com/scipp/scippnexus@main
scipp @ https://github.com/scipp/scipp/releases/download/nightly/scipp-nightly-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
scipp @ https://github.com/scipp/scipp/releases/download/nightly/scipp-nightly-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why cp312? Did you lock dependencies in a 3.12 env?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did :-(

Copy link
Member

@SimonHeybrock SimonHeybrock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am generally happy with the changes. It is still hard to truly verify correctness by looking at this, I presume bugs can be weeded out by more extensive testing in the wild?

Comment on lines 163 to 188
# Now fold the pulses
table = table.fold(
dim='event_time_offset', sizes={'pulse': pulse_stride, 'event_time_offset': -1}
)
# The event_time_offset does not need to be 2d, it's the same for all pulses.
table.coords['event_time_offset'] = table.coords['event_time_offset']['pulse', 0]

# We are still missing the upper edge of the table in the event_time_offset axis
# (at pulse_period). Because the event_time_offset is periodic, we can simply copy
# the left edge over to the right edge.
# Note that this needs to be done pulse by pulse, as the left edge of the second
# pulse is the same as the right edge of the first pulse, and so on (in the case
# of pulse_stride > 1).

# First, extend the table to the right by 1, and set the coordinate to pulse_period.
left = table['event_time_offset', 0]
slab = sc.empty_like(left)
slab.coords['event_time_offset'] = pulse_period
table = sc.concat([table, slab], dim='event_time_offset')
# Copy the values. We roll the values along the pulse dimension so that the left
# edge of the second pulse is the same as the right edge of the first pulse, and so
# on (in the case of pulse_stride > 1).
right = table['event_time_offset', -1]
right.values = np.roll(left.values, -1, axis=1)
right.variances = np.roll(left.variances, -1, axis=1)
return table
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something like:

table = sc.concat([table, table['event_time_offset', 0], dim='event_time_offset')
table.coords['event_time_offset'][-1] = pulse_period
return sc.concat([table['event_time_offset', i*size, (i+1)*size+1], for i in range(pulse_stride), dim='pulse')

# Compute a pulse index for every event: it is the index of the pulse within a
# frame period. When there is no pulse skipping, those are all zero. When there is
# pulse skipping, the index ranges from zero to pulse_stride - 1.
tmin = da.bins.coords['event_time_zero'].min()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe one could simply use epoch? Or some other fixed datetime?

Copy link
Member Author

@nvaytet nvaytet Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about this, the more it seems there's something wrong with the approach.

I tried with just using epoch, and for the examples we have, it also works.
But if I offset my event time zeros by a pulse period, the results are all wrong until I correct it using PulseStrideOffset=1.

This is reproducing the situation where one would start recording the file one pulse period later.
In practice, we can never really know when this was.
This would make auto-reduction impossible, because right now one needs to look at the data, and if it looks wrong, change the value of PulseStrideOffset.

I'm thinking we can only know by looking at the open and close times of the pulse skipping chopper?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an ugly hack in https://github.com/scipp/essreduce/tree/pulse-index which basically tries all possible PulseStrideOffsets and then picks the result with the least number of NaNs in it.

It works for the examples I've tried, but it doesn't feel safe and it also doubles or triples the cost of computing tof.
We could optimize by using just a small number of events at the start of the event buffer to decide which offset to go with, but it doesn't fix the fact that it doesn't feel very robust (might actually make robustness worse; how do you decide when you have enough neutrons to perform the test?)

@nvaytet
Copy link
Member Author

nvaytet commented Feb 11, 2025

There is still the issue with the reference time for the pulse_index, but this is blocking other work.
I opened #184 .
I will merge to we can move forward.

@nvaytet nvaytet merged commit 7c2bcb1 into main Feb 11, 2025
4 checks passed
@nvaytet nvaytet deleted the wrapped-tof-lookup branch February 11, 2025 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants