-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-37906: [Integration][C#] Implement C Data Interface integration testing for C# #37904
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
f0b7413
to
4f1653a
Compare
} | ||
|
||
public class JsonSchema | ||
{ | ||
public List<JsonField> Fields { get; set; } | ||
public JsonMetadata Metadata { get; set; } | ||
|
||
/// <summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that all the code below is moved from IntegrationCommand.cs
in order for it to be available to the integration testing harness.
GC.Collect(); | ||
// XXX this doesn't seem to give stable and reliable measurements | ||
var gcInfo = GC.GetGCMemoryInfo(); | ||
return gcInfo.PromotedBytes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't able to find a formula that produced reliable numbers. Inevitably, the numbers are fluctuating (going up and down). I don't know if the .Net GC is just giving us approximate numbers, or if some internal caches are interfering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure that all the tests are using the TestMemoryAllocator (which uses the managed heap)? I suspect some arrays may be allocated using the NativeMemoryAllocator which does not used the managed heap.
Whether or not it's strictly required, I've developed a habit of something like
for (int i = 0; i < 3; i++) {
GC.Collect();
GC.WaitForPendingFinalizers();
}
and this seems to sometimes make a difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't know what kind of effect pythonnet would have on this. There may be Python objects waiting for a Python GC that are keeping .NET objects alive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure that all the tests are using the TestMemoryAllocator (which uses the managed heap)? I suspect some arrays may be allocated using the NativeMemoryAllocator which does not used the managed heap.
Hmm, I'm discovering the C# codebase, but this is what I get:
- the JSON reader in Apache.Arrow.IntegrationTest just uses the default memory allocator when calling
ArrowBuffer.Builder.Build()
- the TestMemoryAllocator doesn't implement
IOwnableAllocation
, which is required byCArrowArrayExporter
for all exported buffers
In any case, however, the problem I'm mentioning in the comment above (unstable measurements) can be observed even on the Schema tests, which shouldn't invoke an external allocator.
I also don't know what kind of effect pythonnet would have on this. There may be Python objects waiting for a Python GC that are keeping .NET objects alive.
That would seem surprising, but I can try a gc.collect()
on the Python side as well...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, none of your suggestions works unfortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My knowledge of the GC is less like science and more like voodoo. I'm also not familiar with this API. That said, PromotedBytes seems like it's probably not the right choice here and I would expect HeapSizeBytes to be the value that's actually wanted.
EDIT: Or for the total managed heap, GC.GetTotalMemory()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, but I also tried GC.GetTotalMemory()
, HeapSizeBytes
and also HeapSizeBytes - FragmentedBytes
. None gave stable values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look good to me.
But look at the test logs, I'm unsure of where we are setting these tests to skip: https://github.com/apache/arrow/actions/runs/6327155294/job/17182356143?pr=37904#step:8:14519
Some of them I would have thought would run.
These are existing skips that also propagate to the C Data Interface testing: arrow/dev/archery/archery/integration/datagen.py Lines 1787 to 1875 in 2864870
|
Neat! I'd actually started looking at implementing this yesterday and didn't see that there was a route through Python to test in this fashion so I'd started working on a separate project to use AOT compilation and generate a native DLL for this purpose. |
@eerhardt Would you like to review this PR soon? Otherwise I'm inclined to merge as it has already been approved by @CurtHagenlocher and @wjones127 . |
78c5c4f
to
d70236a
Compare
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 7667b81. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
…ion testing for C# (apache#37904) ### Rationale for this change All Arrow implementations supporting the C Data Interface should also implement integration testing, to ensure proper interoperability. ### What changes are included in this PR? * Implement C Data Interface integration testing for C# * Fix bugs in the C Data Interface implementation for C# ### Are these changes tested? Yes, by construction. ### Are there any user-facing changes? No. * Closes: apache#37906 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
…ion testing for C# (apache#37904) ### Rationale for this change All Arrow implementations supporting the C Data Interface should also implement integration testing, to ensure proper interoperability. ### What changes are included in this PR? * Implement C Data Interface integration testing for C# * Fix bugs in the C Data Interface implementation for C# ### Are these changes tested? Yes, by construction. ### Are there any user-facing changes? No. * Closes: apache#37906 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
…ion testing for C# (apache#37904) ### Rationale for this change All Arrow implementations supporting the C Data Interface should also implement integration testing, to ensure proper interoperability. ### What changes are included in this PR? * Implement C Data Interface integration testing for C# * Fix bugs in the C Data Interface implementation for C# ### Are these changes tested? Yes, by construction. ### Are there any user-facing changes? No. * Closes: apache#37906 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
…ion testing for C# (apache#37904) ### Rationale for this change All Arrow implementations supporting the C Data Interface should also implement integration testing, to ensure proper interoperability. ### What changes are included in this PR? * Implement C Data Interface integration testing for C# * Fix bugs in the C Data Interface implementation for C# ### Are these changes tested? Yes, by construction. ### Are there any user-facing changes? No. * Closes: apache#37906 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Rationale for this change
All Arrow implementations supporting the C Data Interface should also implement integration testing, to ensure proper interoperability.
What changes are included in this PR?
Are these changes tested?
Yes, by construction.
Are there any user-facing changes?
No.