Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for memory mapped operation #51

Open
bvacaliuc opened this issue Jan 30, 2024 · 1 comment
Open

Add support for memory mapped operation #51

bvacaliuc opened this issue Jan 30, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@bvacaliuc
Copy link

The current processing API requires std::vector<char>& to provide input to the TPX3-protocol. This means that any file must be read-in completely via the function readTPX3RawToCharVec(). This also creates challenges when attempting to stream from a network port or integrate with DAS processing systems that operate in a zero-copy mode by referencing physical memory spaces.

A solution is to use the mmap() system call to obtain a pointer to the contents of the file ( or memory space containing the TPX3-protocol data ) such that the software can process hits without buffering the input into a memory space first.

In order for this to integrate into the existing API, it is necessary to provide 3 things:

  1. A char * input parameter for the start of the memory space
  2. A std::size_t input parameter for the extent of the memory space
  3. A std::size_t output parameter to feedback to the caller how much of the memory space was consumed in this iteration

The caller will then need to call the functions over and over until all of the input data is consumed ( in the memory-mapped case ), or keep looping as new input data is received ( in the streaming case ).

There is an alternative to keep the std::vector<char *> API, but that requires the use of a custom allocator. While this can be made to work, it still requires the API change to std::vector<char *, custom_allocator> and is far more intrusive to the code than simply providing alternate calling patterns for the TPX3 stream parsing function findTPX3H().

The other thing that will be needed is for findTPX3H() to limit is consumption of the input stream to a manageable chunk, if the caller does not do it themselves. If this is not done, then a very large file could exhaust the computer's memory space as the intermediate data structures used during clustering expand. When this is implemented, the API should change to indicate how much of the original vector was consumed even if std::vector<char *> is used to provide the input.

I am preparing a PR for the above and will reference this issue.

@bvacaliuc bvacaliuc added the enhancement New feature or request label Jan 30, 2024
bvacaliuc pushed a commit to bvacaliuc/mcpevent2hist that referenced this issue Jan 30, 2024
@bvacaliuc
Copy link
Author

Thank you for merging #52. I apologize, but there are two bugs in the TestExtractHits_large() test function:

  1. the referenced number of hits is incorrect ( the test file contains 5303344 hits )
  2. the tdc/gdc and timer variables are re-initialized to 0 in the loop improperly, this leads to incorrect hit accumulation

I will prepare another PR to resolve this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant