-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raw profiler hook doesn't work with a profiler that adds methods #370
Comments
So, I think that you are running up against a problem that has been an issue since the beginning of ICorProfiler, and is difficult to work around. Maybe we should supply a better API for it. The problem is that all functions that are added to a module need to have a valid RVA. You can't set it to null when you create it. That won't work. The CLR will barf. There is a bit of a chicken/egg problem. You are supposed to get an RVA for a new method using IMethodMalloc::Alloc, which requires that you know the size of the method, which means that you need to have already generated all of the IL for the method. In order to do that, you typically need a method token to associate with the IL. To get a new method token for the method, you use IMetaDataEmit::DefineMethod. But, it requires an RVA, which can't be null. Using IMethodMalloc naturally leads to memory leaks because the you can't deallocate the memory for the original function body, and it will likely be replaced later anyway. To avoid this, CLRIE doesn't give you a real copy of IMethodMalloc when you ask for it. Instead, it give you the ability to allocate memory, but it will not have a valid RVA inside the CLR. So, if you are using IMethodMalloc, that might be the root of the problem here. The IMethodMalloc that is being returned isn't giving a valid RVA, and that could lead to the bad image format or invalid program. The way we typically get around this is to lie IMetaDataEmit::DefineMethod, and give it an RVA for a random method in the module. Then, when we get the instrumentation callback, we just generate the method body and replace the method. This is fine, because the original RVA will never be used to actually execute any code. The RVA of the method will be replaced after the JIT with the newly allocated memory. It may be necessary to rethink this design, and actually give clients a real copy of IMethodMalloc when asked for since it can be a source of bugs. It won't paint anyone into any corners because it is still possible to not use IMethodMalloc before calling IMetaDataEmit::DefineMethod. |
What you're saying is a bit disappointing to me, I spent time creating an instrumentation method because of the raw profiler hook. It's really interesting to be able to create application instrumentation that runs alongside an existing profiler, that is created by a third party company and that you have no control over. From what you're saying, this doesn't look realistic, a profiler that works correctly independently can become buggy when run under the raw profiler. If you don't control the code base of that profiler, then it's going to be a real headache. For reasons beyond my control, this isn't particularly an issue for me anymore, but I'm trying to be a good citizen and report this issue for people that might be in the same position in the future :) So, I've checked the Datadog code base that I've been testing against they do cal
This seems to work just fine when the profiler used independently, but I guess because of the issue you describe above it does not work when used via the raw profiler hook. I guess this can help us dig into this specific issue, and as Datadog profiler is open source they might accept a PR to help make them more compatible with CLRIE. |
This indicates to me that we should be returning a real instance of IMethodMalloc, if we can. That should solve the issue with datadog. I'm a little surprised that DefineMethod is accepting a null rva, and not causing problems, though. |
For what it is worth, we really want to make sure that we can be compatible with as many profilers as possible. So, if this is causing trouble with Datadog, it is definitely something we want to address. |
Maybe RVA being zero is a red herring, I wasn't sure what RVA was, so I checked the specs:
I guess that means the IMethodMalloc::Alloc wrapper is mostly like to be causing the issue. It's difficult to see how I could provide a repo, since DD don't distribute their profiler as nuget package, making adding dependency tricky. I'm happy to test locally if you provide a branch with a fix, not sure I have enough knowledge of CLRIE to make the suggested fix myself. |
@wiktork @jander-msft @WilliamXieMSFT I would like at least one other person to chime in on this because I remember having a discussion about IMethodMalloc a couple of years ago, and there were varying opinions. |
@robertpi if I pushed a branch with the fix in it, would you be able to build and test that locally, before we merged it? |
Sure, no problem at all. |
cc @davmason Some initial notes from our discussions so far:
|
@robertpi As we work through this issue, is the root of your specific problem that you are attempting to generate a new function during a class or module load callback, and the SetILFunctionBody call is failing? For that specific problem, we might want to actually add a new API for generating new methods. However, I still suspect that IMethodMalloc is an issue. |
@delmyers @wiktork the RVA issue shouldn't be causing issues. The runtime profiler tests call IMetadataEmit::DefineMethod with 0 for RVA and they work consistently, and I've seen multiple other profilers (CLRProfiler, DataDog, etc) use a dummy RVA. Either one should work. The only thing I can remember off the top of my head the RVA is used for by the runtime is to look up IL in the assembly, but methods you are adding should always have IL defined with SetILFunctionBody so we should never go down that path. Inlining has a quirk that it will ignore IL set by SetILFunctionBody, so it is possible if you define a method with a 0 RVA and then let it be inlined somewhere you could see issues. I don't necessarily think that's what actually is happening here, just thinking through all the possibilities. In reasonably recent runtimes on desktop and core IMethodMalloc will just new up some memory and return it, you are free to new it up yourself if you want. There is a gotcha between ICorProfilerInfo::SetILFunctionBody and ICorProfilerFunctionControl::SetILFunctionBody though. The ReJIT ICorProfilerFunctionControl one will allocate its own buffer and copy your IL, so you can delete the IL buffer afterwards. The old style ICorProfilerInfo one will not and store the pointer passed in. If you are freeing the IL buffer after calling ICorProfilerInfo::SetILFunctionBody that could be the issue. |
Thanks for the clarifications David. The changes we are looking at right now:
|
@delmyers To clarify the issue I'm having:
From my interpretation of the logs generated by CLRIE, the calls to SetILFunctionBody from Datadog profiler doesn't return an error code, but the body doesn't get set either. This branch contains some attempts to fix this, but they don't solve the issue: robertpi@22b96b5 My test application crashes before my Instrumentation Method attempts any rewrites, so don't know if that would work in conjunction with Datadog profiler. |
One more thought, although it's tricky for me to give you my the code / scripts for how I reproduce this issue, I think it should be simple to create your own:
I don't think an Instrumentation method would be needed. |
@robertpi I was able to repro the issue and I have a tentative fix. Would you be able to try it out? I've tried it with the data dog tracer, but I'm not sure it covers all the scenarios. The fix is at https://github.com/wiktork/CLRInstrumentationEngine/tree/dev/wiktork/customBuffer |
Hi @wiktork , I think what you've done helps, but my test web server still crashes, before it starts. However, I do see it now loads some of the DD manged dlls, so it's getting considerable further than it did without this change. I've had a look though the app logs and there's not clear reason why it crashes, I just see DD was in the middle of a rewrite. I don't think my instrumentation method preformed any rewrites, as I don't see any logs from it. There are the logs: dotnet-tracer-loader-dotnet-19300.log The instrumentation method logs are prefixed "[IM:B98771FE-2159-4143-A96C-95715667AA0D]" I'm going to strip back what I'm do to try and create a minimal reproduction of this issue, then push that to a new repo. |
@robertpi Please let us know if you are still seeing the failure with your instrumentation method + data dog + my current fixes. Thanks! |
any update? |
I placed the DataDog .NET tracer 1.28.8 in the raw profiler hook on .NET framework and observed a bug similar to what the OP wrote at the top (the RVA issue). The CLRIE version was 1.0.42. |
@sanikolov Appreciate the test confirmation. Given that this change is a bit large and impactful, we'll have to take some time to review it fully. Could you provide us your timeline/when you would need this change? We are planning to guard the changes behind an environment flag to minimize introducing issues and help expedite the release. |
see email I just sent. |
Blocked by DataDog/dd-trace-dotnet#2042 |
I'm testing my Instrumentation Method by running it along side a profiler provided by a popular APM vendor. The program (a web server) fails to start with the following error:
I think this is because
MicrosoftInstrumentationEngine::CCorProfilerInfoWrapper::SetILFunctionBody
fails to find aCMethodInfo
for the newly added method. I've tried to add fix this by callingm_pRealCorProfilerInfo->SetILFunctionBody
, under the assumption instrumentation methods won't need see the newly added method, but this does not seem to work. Here is the fix I was testing: robertpi@22b96b5#diff-4ddd686c604ac947d1ae58bc2b3cdedd0b15be84d9432579d6b1f6abc1acd4bcR474However, the IL seems to get corrupted in some way when I do this, because now when I run the program I see one of these two errors:
or
The text was updated successfully, but these errors were encountered: