Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Exception When Merging Large Volumes of Waveform Data Files Using wrdb.wrsamp() #464

Open
DishanH opened this issue Aug 11, 2023 · 2 comments

Comments

@DishanH
Copy link

DishanH commented Aug 11, 2023

I'm trying to merge multiple waveform data (.dat) files into a single file. I'm using the wrdb.wrsamp() function for this task. The total number of files is approximately 10,000 and each one has 3 channels. I've tried several times, but every attempt results in a memory exception, requiring more than 40GB of memory. I'm unsure if I am doing something incorrect.

I've been unable to find a method to write the files incrementally. My current approach is to read each sample, combine all signals into an array, and write them. While this works fine with a small number of files, I'm having difficulties when it comes to larger datasets. Each file contains over 6 minutes of data.

Any assistance insights or suggestions on this matter would be highly appreciated.

@DishanH
Copy link
Author

DishanH commented Sep 14, 2023

I have modified the library to use chunking instead of concatenating everything at once, resulting in a 300% reduction in memory usage compared to the original.

chunk_size = 1000000
        b_write = np.zeros((0,), dtype=np.uint8)
    p = 0
    for i in range(0, len(d_signal), chunk_size):
        print(p.__str__() + " of " + (int(len(d_signal) / chunk_size)).__str__())
        p += 1
        chunk = d_signal[i:i+chunk_size]
        b1 = chunk & [255] * tsamps_per_frame
        b2 = (chunk & [65280] * tsamps_per_frame) >> 8

        # Interweave the bytes so that the same samples' bytes are consecutive
        b1 = b1.reshape((-1, 1))
        b2 = b2.reshape((-1, 1))
        chunk_bytes = np.concatenate((b1, b2), axis=1)
        chunk_bytes = chunk_bytes.reshape((1, -1))[0]

        # Convert to un_signed 8 bit dtype to write
        chunk_bytes = chunk_bytes.astype("uint8")
        b_write = np.concatenate((b_write, chunk_bytes))`

@bemoody
Copy link
Collaborator

bemoody commented Sep 14, 2023

Thanks! Just to be clear, I assume you're talking about the function wr_dat_file, and your code would be to replace the code at lines 2381 to 2392 (following elif fmt == "16").

The existing code looks to me like it's a lot more complicated than it needs to be. I'm sure that your replacement code is more efficient, but I also suspect that the entire thing could be replaced with just one or two numpy function calls - there's no need to make so many copies of the data.

Compare this with how format 80 is handled (see the code under if fmt == "80"). Format 16 could probably be handled in a very similar way - we don't need to add an offset in that case, but we do need to convert to little-endian 16-bit integers and then reinterpret as an array of bytes.

Please consider opening a pull request with your changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants