Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak due to recent performance improvement #217

Merged
merged 1 commit into from
Mar 24, 2014

Conversation

benlangfeld
Copy link
Member

Overview

  • Occurs with either :asterisk or :native_or_unimrcp output renderers.
  • Requires that you are using the suggested adhearsion-asr gem
  • Requires an ask. A say will not produce the error.
  • Does not require playing of any TTS or WAV in the #ask in order for the memory leak to grow. Our ask command was: ask :limit 1.
  • Does not require that the SIPp load test send any DTMF input whatsoever in order for the memory leak to grow.

Defect details

Contributors

sfgeorge referenced this pull request in sfgeorge/punchblock-issue-memleak Mar 18, 2014
@benlangfeld
Copy link
Member

Please give that commit a try, let me know how your testing goes.

@sfgeorge
Copy link
Member Author

Yes, the situation is significantly better by more than an order of magnitude! Under JRuby, I was able to run almost 40,000 calls (as opposed to just 1,700). This is a huge leap, but I'm hoping that we can do even better.

  • Using afb3c6b, and the same test setup, I warmed up to 8cps and things stayed stable for over an hour.
  • At first Full GCs were reclaiming a lot of memory, occurring every ~70 seconds
  • But after almost 40,000 calls, Full GCs occurred at least once per second.
  • After ~40,000 call failures started to occur.
  • The complete gc log is here

@sfgeorge
Copy link
Member Author

FYI I'm going back to testing Punchblock 2.1.1 to see if there is a slower leak there as well. I'll let you know what I find there. In my testing, I've found that I need to be a bit gentler on 2.1.1 (max of 4cps on my system), because it is not as performant as 2.5.0. Results will be a bit slower to receive, but I'll let you know what we see.

@sfgeorge
Copy link
Member Author

Results using Punchblock 2.1.1 were inconclusive in trying to track down the introduction of the slower memory leak, because the results were much poorer moving backwards. Got through one run of 1,352 calls before I got a "Celluloid: Punchblock::Translator::Asterisk crashed!". I followed it several other runs of 2.1.1. The most recent was a run of 0.5 cps, which crashed after only 11 calls. Seems like a bad apple to test on. I could move to older versions to compare against 2.5.0 if you suggest, but I'll hold off to hear what you'd like to test next. Thanks.

@sfgeorge
Copy link
Member Author

If the aforementioned crash issue with 2.1.1 requires cbe2fed, then I suppose my minimum version would be 2.3.0. The rub with that is that anything within the range 2.1.1 <= version < feature/dtmf_recognizer_leak (non-inclusive) has a much larger memory leak that would mask the ability to detect the smaller leak. Thoughts welcome!

@bklang
Copy link
Member

bklang commented Mar 21, 2014

@sfgeorge I have a suggestion to help track down this leak: I know that your testing was hampered by the fact that testing older versions exhibited the larger memory leak, making it impossible to see the smaller one. The fix for the large memory leak was actually very small, just a single line:

afb3c6b#diff-d24010d0ef2a8c58b8c443a082a78a07R125

It would help us if you could continue to bisect Punchblock and apply that fix to further narrow down the leak.

Other things that would be helpful:

  • A dump of ObjectSpace data right after Adhearsion boots and after running a large number of calls
  • Letting us know how the leak grows - for example, does it happen only when calls request input? Only when playing output? Does it happen simply by going through the Router even if the CallController's only action is to immediately hang up?
  • As always, a minimal reproduction is really the best thing - if you can exhibit the leak in a "hello world" Adhearsion app, that greatly reduces the amount of code that could be at fault.

@sfgeorge
Copy link
Member Author

Hey Ben, thanks for the info. I can certainly get more context such as ObjectSpace dumps. But as for the minimal reproduction.. please see the ahn directory of https://github.com/sfgeorge/punchblock-issue-memleak . I've tried very hard to detail exactly what the application, testing, and environment are for my tests.

@bklang
Copy link
Member

bklang commented Mar 21, 2014

@sfgeorge Ah, thanks, I didn't realize your minimal repro had been updated since @benlangfeld solved the first issue. I'll take a look over there.

@sfgeorge
Copy link
Member Author

np. Actually, haven't updated it. The same test shows the slower memory leak causing trouble at 40,000 calls instead of at 1,700 calls.

@sfgeorge
Copy link
Member Author

I can bump Gemfile[.lock] on the repo to reflect that I've been testing Punchblock's new feature/dtmf_recognizer_leak branch, would that be helpful?

@bklang
Copy link
Member

bklang commented Mar 21, 2014

I can bump Gemfile[.lock] on the repo to reflect that I've been testing Punchblock's new feature/dtmf_recognizer_leak, would that be helpful?

No, that's not necessary. Just the ObjectSpace information and any results you find by bisecting between the working and broken version (with the backported fix for the big leak) are the most important things at this point.

@benlangfeld benlangfeld merged commit afb3c6b into develop Mar 24, 2014
@bklang bklang mentioned this pull request Mar 24, 2014
@bklang
Copy link
Member

bklang commented Mar 24, 2014

Since this was merged, I have opened #218 to handle the remaining issue.

@bklang bklang deleted the feature/dtmf_recognizer_leak branch March 24, 2014 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants