Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented audio extraction, adding audio streams, displaying audio stream #4

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

ashtondoane
Copy link
Collaborator

My notes on this commit:

  1. I decided not to implement glob/regex in the audio extraction function. To ensure that the function has a consistent effect (being one created file per call), I believe this regex processing should be done externally and then the function should be run in a for loop. Currently raises errors, which may need to be adjusted for bulk extraction, as this would halt the entire process. I'm open to notes and improvements for this one.

  2. We discussed the tables with events being modified for true timing relative to the meg - the exact formatting of this is important for me to continue implementing this class.

src/ilabs_streamsync/streamsync.py Outdated Show resolved Hide resolved
Comment on lines 38 to 39
Events associated with the stream. TODO: should they be integer sample
numbers? Timestamps? Do we support both?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as the input format for events, I don't really know what researchers/annotators are going to want to do. We have seen cases where the data was in the form of timestamps, probably something like HH:MM:SS.123456. So unless @NeuroLaunch has opinions about what (other) format(s) we should target, I'd say start with parsing HH:MM:SS.123456-formatted data, and we can expand to other formats later.

As far as the output format of events: MNE-Python has 2 ways of representing events (event arrays, and Annotations objects). We should decide which one (or both?) we want to use when converting/syncing camera timestamps to the Raw file's time domain. @NeuroLaunch do you have an opinion here? @ashtondoane are you familiar with the two kinds of MNE event representations?

If I had to put a stake in the ground I'd probably say "use Annotations" but I haven't thought very hard about it yet... maybe implement that first, and if we find that we need to also implement event array support, we can add that later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with the MNE representations, I will have to read the documentation. I'll begin with annotations as @NeuroLaunch also mentioned this as a possibility and we can adjust later if necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear to me that this has actually been addressed, as nothing is done with events in the code; unresolving.

src/ilabs_streamsync/streamsync.py Outdated Show resolved Hide resolved
src/ilabs_streamsync/streamsync.py Outdated Show resolved Hide resolved
src/ilabs_streamsync/streamsync.py Outdated Show resolved Hide resolved
src/ilabs_streamsync/streamsync.py Outdated Show resolved Hide resolved
src/ilabs_streamsync/streamsync.py Outdated Show resolved Hide resolved
src/ilabs_streamsync/streamsync.py Outdated Show resolved Hide resolved
src/ilabs_streamsync/streamsync.py Outdated Show resolved Hide resolved
src/ilabs_streamsync/streamsync.py Outdated Show resolved Hide resolved
ashtondoane and others added 5 commits September 6, 2024 23:49
Co-authored-by: Daniel McCloy <[email protected]>
…ference MEG. Implemented dispaly of all pulse channels (could be updated to be more user friendly).
@ashtondoane
Copy link
Collaborator Author

Let me know if anything else needs to change here.

Copy link
Member

@drammock drammock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments below regarding the StreamSync API.

self.streams = []
"""Initialize StreamSync object with 'Raw' MEG associated with it.

reference_object: str TODO: is str the best method for this, or should this be pathlib obj?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I thought reference_object should be an object in memory, not a path to a file (whether str or Path). I still lean that way, on the assumption that the user is very likely to also be at least a bit familiar with MNE-Python (and thus know how to load a Raw file).

But I guess there's a case to be made that if the add_stream method takes a file path, then maybe the StreamSync constructor should also take in a file path. After reflection I'd say let's support both. The way to annotate that is mne.io.Raw | path-like, and we write the code so that Raw and str and Path will all work:

if isinstance(reference_obj, str):
    reference_obj = Path(reference_obj)
if isinstance(reference_obj, Path):
    reference_obj = mne.io.read_raw(reference_obj, ...)
# from now on we can safely assume it's a Raw object

Comment on lines +49 to +55
#Check type and value of pulse_channel, and ensure reference object has such a channel.
if not pulse_channel:
raise TypeError("pulse_channel is None. Please provide a channel name of type str.")
if type(pulse_channel) is not str:
raise TypeError("pulse_chanel parameter must be of type str.")
if raw[pulse_channel] is None:
raise ValueError('pulse_channel does not exist in refrence_object.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several points of feedback here:

  1. MNE-Python lets you pick channels by index (integer) or by name (string); in principle we don't need to restrict users to just strings.
  2. Even if we did want to restrict to strings, the preferred way to type check would be if not isinstance(pulse_channel, str) rather than if type(pulse_channel) is not str
  3. all of these failure modes will be handled by MNE-Python already if the user passes an invalid channel selector. It should be enough to just do self.ref_stream = reference_object.get_data(picks=[pulse_channel]). If you want you could try/except that line, and provide a nicer error message than what MNE-Python provides when the channel is not found (if you think it would help the user substantially)

raise ValueError('pulse_channel does not exist in refrence_object.')


self.raw = mne.io.read_raw_fif(reference_object, preload=False, allow_maxshield=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here you are loading in (a second time) the same file you already loaded into the variable raw. Assuming that's a mistake?

setting that aside: what is the motivation for keeping a reference to the Raw object as part of the StreamSync object? I think all we need is sfreq and a numpy array of the pulse channel data (or am I forgetting something)?

Comment on lines +35 to +44
# Check provided reference_object for type and existence.
if not reference_object:
raise TypeError("reference_object is None. Please provide a path.")
if type(reference_object) is not str:
raise TypeError("reference_object must be a file path of type str.")
ref_path_obj = pathlib.Path(reference_object)
if not ref_path_obj.exists():
raise OSError("reference_object file path does not exist.")
if not ref_path_obj.suffix == ".fif":
raise ValueError("Provided reference object does not point to a .fif file.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is probably overkill. MNE-Python will already provide informative error messages if the path doesn't point to a valid Raw file, or if the thing that is passed as a filename isn't in fact path-like.



self.raw = mne.io.read_raw_fif(reference_object, preload=False, allow_maxshield=True)
self.ref_stream = raw[pulse_channel]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raw[pulse_channel] returns a tuple of two arrays: (data, times). do we need the times? If not we can do raw.get_data(picks=[pulse_channel])

Comment on lines 38 to 39
Events associated with the stream. TODO: should they be integer sample
numbers? Timestamps? Do we support both?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear to me that this has actually been addressed, as nothing is done with events in the code; unresolving.

audio_file = p.with_stem(f"{p.stem}_16_bit").with_suffix(".wav").name
if not p.exists():
raise ValueError('Path provided cannot be found.')
if not overwrite and pathlib.PurePath.joinpath(pathlib.Path(output_dir), pathlib.Path(audio_file)).exists():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not overwrite and pathlib.PurePath.joinpath(pathlib.Path(output_dir), pathlib.Path(audio_file)).exists():
if not overwrite and (pathlib.Path(output_dir) / audio_file).exists():

'-i', path_to_video,
'-map', '0:a', # audio only (per DM)
# '-af', 'highpass=f=0.1',
'-acodec', 'pcm_s16le',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-acodec is being passed twice, with different values. Is that intended?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants