You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to merge multiple waveform data (.dat) files into a single file. I'm using the wrdb.wrsamp() function for this task. The total number of files is approximately 10,000 and each one has 3 channels. I've tried several times, but every attempt results in a memory exception, requiring more than 40GB of memory. I'm unsure if I am doing something incorrect.
I've been unable to find a method to write the files incrementally. My current approach is to read each sample, combine all signals into an array, and write them. While this works fine with a small number of files, I'm having difficulties when it comes to larger datasets. Each file contains over 6 minutes of data.
Any assistance insights or suggestions on this matter would be highly appreciated.
The text was updated successfully, but these errors were encountered:
I have modified the library to use chunking instead of concatenating everything at once, resulting in a 300% reduction in memory usage compared to the original.
Thanks! Just to be clear, I assume you're talking about the function wr_dat_file, and your code would be to replace the code at lines 2381 to 2392 (following elif fmt == "16").
The existing code looks to me like it's a lot more complicated than it needs to be. I'm sure that your replacement code is more efficient, but I also suspect that the entire thing could be replaced with just one or two numpy function calls - there's no need to make so many copies of the data.
Compare this with how format 80 is handled (see the code under if fmt == "80"). Format 16 could probably be handled in a very similar way - we don't need to add an offset in that case, but we do need to convert to little-endian 16-bit integers and then reinterpret as an array of bytes.
Please consider opening a pull request with your changes.
I'm trying to merge multiple waveform data (.dat) files into a single file. I'm using the
wrdb.wrsamp()
function for this task. The total number of files is approximately 10,000 and each one has 3 channels. I've tried several times, but every attempt results in a memory exception, requiring more than 40GB of memory. I'm unsure if I am doing something incorrect.I've been unable to find a method to write the files incrementally. My current approach is to read each sample, combine all signals into an array, and write them. While this works fine with a small number of files, I'm having difficulties when it comes to larger datasets. Each file contains over 6 minutes of data.
Any assistance insights or suggestions on this matter would be highly appreciated.
The text was updated successfully, but these errors were encountered: