Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

157 chunk stuck orphan queue #162

Merged
merged 3 commits into from
May 6, 2024
Merged

Conversation

ibrodkin
Copy link
Collaborator

@ibrodkin ibrodkin commented May 2, 2024

This PR contains changes for a few issues that were uncovered in the investigation of the stuck StoryChuns

  1. The original issue of the ingested StoryChunk being forever stuck on the ChronoGrapher's orphanQueue insted of being collected by the appropriate StoryPipeline turned out being the timing issue. When the story producing client is short lived and sends the request for Story Release as soon as it's done generating events this causes the StoryGrapher to retire the appropriate StoryPipeline before the partial StoryChunk is received by the ChronoGrapher. Extending the ChronoGrapher acceptance window fixes this issue. Default acceptance window for ChronoGrapher is 300 secs , hardcoded for now, will be made configurable as part of Issue ConfigurationManager Audit and Improvements #155
  2. After (1) was handled the issue of StoryChunk memory corruption was exposed. GrapherRecordingServiceRDMA was creating deserialized StoryChunk object on a stack (local variable in the recording function) and then passing a pointer to the locally created StroyChunk to the IngestionQueue for further processing. the lifespan of the locally created StoryChunk was not guaranteed by the time the DataStore was operating on this pointer. StoryChunk in the RecordingService should instead be created on the heap , then ownership of the partial StoryChunk pointer should be released to the IngestionQueue, then the StoryPipeline that merges this partial StoryChunk into the pipeline and then frees the memory accordingly.
  3. I've added uniform debug messages throughout the code to track individual Story & Chunks accumulation, merging , and proceeding through the ChronoGrapher DataStore
  4. StoryChunk merging logic needed plenty of changes, they are in PR for issue # 125. They are also included in this PR
  5. CSVFileExtractor was mangling up the csv filename , so this needed a tweak as well
  6. the last piece in this PR is a fix for uint64_t to uint16_t truncation of acceptance time in StoryPipeline.h

With all these changes I can run multiple chrono_keepers and chrono_grapher in my local environment and observe the story accumulation through the chrono_keepers & chrono_grapher as expected

@ibrodkin ibrodkin added this to the 2024-05-10 milestone May 2, 2024
@ibrodkin ibrodkin requested review from fkengun and EnekoGonzalez3 May 2, 2024 21:10
@ibrodkin ibrodkin merged commit fb62dc4 into develop May 6, 2024
@ibrodkin ibrodkin linked an issue May 29, 2024 that may be closed by this pull request
@fkengun fkengun deleted the 157-chunk-stuck-orphan-queue branch November 21, 2024 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StoryChunk stuck on Grapher's OrphanQueue
3 participants