Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address perf+correctness of StrLen and StrChr #480

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

neon-sunset
Copy link

No description provided.

Cesium.Runtime/StringFunctions.cs Show resolved Hide resolved
Cesium.Runtime/StringFunctions.cs Show resolved Hide resolved
Cesium.Runtime/StringFunctions.cs Outdated Show resolved Hide resolved
@ForNeVeR ForNeVeR self-requested a review November 17, 2023 14:42
@ForNeVeR ForNeVeR self-assigned this Nov 17, 2023
@neon-sunset
Copy link
Author

neon-sunset commented Nov 17, 2023

Aha, we get AVE. Let me check. Probably need to account for the fact that .IndexOf thinks it owns the memory when it doesn't causing it to dereference a memory range past null byte on vector load at a memory page boundary where we don't have read access to the next page, causing segfault.

Copy link
Owner

@ForNeVeR ForNeVeR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave a comment while there's still ongoing work.

I agree that the new approach is more correct (your points on UTF-8 and UTF-16 are all valid), and so yeah, I'd like to see the tests fixed and this merged.

I am also open for discussion on leaving the netstandard and only supporting .NET 7 (or even 8) in the stdlib, if it becomes too painful for us to properly support the older target frameworks.

@ForNeVeR ForNeVeR assigned neon-sunset and unassigned ForNeVeR Nov 17, 2023
@neon-sunset
Copy link
Author

neon-sunset commented Nov 17, 2023

@ForNeVeR In this particular case it's not an issue at all, to support NS2.0. It might, however, become an issue in the future.

A few thoughts: I don't think trying to reimplement stdlib is a scalable approach - it will either take too much effort or the quality of the resulting code may be insufficient.

Ideally, we should be mixing and matching the following options on a case-by-case basis:

  • Forwarding calls to .NET CoreLib
  • P/Invoking host-provided stdlib, like .NET itself does (e.g. it will call memcpy above certain threshold, even though today its own implementation literally does a better job than most due to conservative compilation flags of your average glibc)
  • Handling happy-path in C# and then deferring to either of the above

P/Invokes are already very cheap/almost free in .NET (Жова and Go could never) but in the future, if it ever becomes a concern, they can be made as cheap as calls in an actual compiled C code by applying [SuppressGCTransition], generating DirectPInvokes for NativeAOT and caching function pointers to a readonly static for JIT (if it doesn't do it already).

UPD: After further discussion with Andrii I might have been too hasty on this, the point stands but things like C locales are scary 😅

@ForNeVeR ForNeVeR assigned ForNeVeR and unassigned neon-sunset Nov 18, 2023
Cesium.Runtime/StringFunctions.cs Outdated Show resolved Hide resolved
@ForNeVeR ForNeVeR assigned neon-sunset and unassigned ForNeVeR Nov 21, 2023
@neon-sunset
Copy link
Author

@ForNeVeR The issue with SEGFAULT should be addressed now - I opted into a simpler approach with aligning the pointer at 16B boundary first and then using 128b vectors for searching the matches directly. This way it should never cross a page boundary (which was causing issues).

@ForNeVeR ForNeVeR assigned ForNeVeR and unassigned neon-sunset Dec 30, 2023
@ForNeVeR ForNeVeR self-requested a review December 30, 2023 14:36
@neon-sunset
Copy link
Author

neon-sunset commented Dec 30, 2023

I had to bump up .Runtime to net7.0 because cross-platform Vector API is not available on older targets. Let me know if it's an issue.

Copy link
Owner

@ForNeVeR ForNeVeR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with the implementation, but would insist on having a fallback for older runtimes. Do you want to implement it yourself, or will entrust it to me?

@@ -2,7 +2,7 @@

<PropertyGroup>
<TargetFrameworks>net7.0</TargetFrameworks>
<TargetFrameworks Condition=" $([MSBuild]::IsOsPlatform('Windows')) ">$(TargetFrameworks);net48</TargetFrameworks>
<!-- <TargetFrameworks Condition=" $([MSBuild]::IsOsPlatform('Windows')) ">$(TargetFrameworks);net48</TargetFrameworks> -->
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some thought, I still want us to support .NET Framework on Windows. Let's re-enable that here, and add a fallback implementation for .NET Standard.


namespace Cesium.Runtime.Tests;

public unsafe class StringFunctionTests
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add some Unicode tests, shall we?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do

@@ -1,7 +1,7 @@
<Project Sdk="Microsoft.NET.Sdk">

<PropertyGroup>
<TargetFrameworks>netstandard2.0;net6.0</TargetFrameworks>
<TargetFramework>net7.0</TargetFramework>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still insist on having netstandard2.0 support via #if.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is runtime. Is it planned to be referenced as a package by .NET Framework/pre-.NET 7 targets or is it packaged as a standalone, well, runtime?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is planned to be referenced by projects targeting older .NET versions.

return 0;
}
var start = str;
while ((nuint)str % 16 != 0)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest introducing a constant for this 16, like MinimalPageBoundary or something, to make the size choice more clear.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's just Vector128<byte>.Count which I omitted because surrounding code is compact too, let me fix it. (to which we align - this is both for performance and for never crossing page boundary, whatever size it is, it occured to me we don't need actual page size here since it's always bigger than 16B and 16B alignment will never cross it)

@neon-sunset
Copy link
Author

I am ok with the implementation, but would insist on having a fallback for older runtimes. Do you want to implement it yourself, or will entrust it to me?

Alright, I will if-def it. "Proper" way to fix it is by copying impl. from Glibc with its bitwise tricks (which we, in .NET, do not need) but I guess "works on NETFW, if you care about performance that's on you" is also fine.

@ForNeVeR ForNeVeR assigned neon-sunset and unassigned ForNeVeR Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants