-
Notifications
You must be signed in to change notification settings - Fork 696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Color parsing and formatting improvements #3170
Comments
I was also experimenting with having Color store its values in a Vector3 field, instead of the individual ints (the properties R, G, B, and A still return ints pulled from the Vector). That has some excellent performance benefits, too, though I wasn't ready to include any of it here since I didn't finish with that before wanting to fire off this issue first. It also has built-in highly-optimized facilities for things like the Euclidian distance calculation, and internally can use SIMD instructions when relevant with no effort on the code side. Even with the addition of casts that necessitates before returning values via the properties (since it stores floats), it still ends up blowing the ints out of the water in initial partial testing I did. Also, even without that modification, some use of passing the Color struct by reference resulted in some nice speedups for things like the FindClosestColor, which otherwise ends up causing a ton of copies. That stuff will be in a separate issue, though. Just wanted to mention while it was on my mind. |
Big fan of what you're on to here, @dodexahedron. My focus on the |
Oh, and this is gonna be even more important in the future because:
In #2729 I'm starting to take notes on how Etc... |
That probably will for sure improving the performance. |
I like going over things with fresh eyes to see where easy benefits can be realized with relatively low effort. It's one of the reasons I jumped in on tackling the unit tests for TGD. :) Also, so y'all can keep on with the things you're working on, I'll go ahead and finish this up and put in a PR when it's ready. For this issue, the scope will be limited to the TryParse method, unless something else is just too good to pass up. I forked from v2_develop as of earlier today, so I was working on code from commit 7fe95cb. But, there are several PRs ahead of that, and some did touch Color.cs, so unless anyone has any concerns or other suggestions on which code to use as base, I think I'll branch from a6dbfe6 on @BDisp's v2_timefield-textchanging-fix_3166 branch, which appears to be the latest as of this moment. |
But that doesn't include the fix of the |
Yeah those two should be independent at least. |
Got back to working on this, and here's an update on what I've got: I couldn't help myself and did go beyond the initial intention of just improving parsing, for performance enhancements, some formality for the Color type (various interface implementations), and for some feature expansion (mostly related to those interfaces). First, the Color struct had an easy performance and memory benefit to be gained by making it the C# equivalent of a union, via explicit layout of fields, so it is now internally just 32 bits, accessible via properties that expose those 32 bits as an unsigned integer, an integer, and 4 individual bytes for the color components, meaning no unchecked or unsafe blocks required, and saving 12 bytes. The existing properties are all there, though the individual color components themselves are bytes instead of ints. I also added implicit casts to and from Vector3, which enables hardware-accelerated math, such as for I also turned it into a I then implemented several interfaces, including I also expanded its ability to handle some extra text formats, both for parsing and formatting via ToString(), and implemented These interfaces of course add the ability to use them, but, more importantly, they are minimal-allocation means of doing these things, even when a Parameterless All existing tests continue to pass, as do the thousands of new test cases I've already written. I have more to add for the various ToString cases, as I'm not yet finished with that. The end goal, for formatting and parsing, is for any string representation explicitly defined for parsing to be supported as an output format, as well. Something to note about Color, that was already true: it's not binary-portable between big endian and little endian machines, as it assumes the byte order of the integer representation is little endian. That's fine in the majority of cases, though ARM machines can be big endian if configured that way. I'd say we continue to ignore that and let that be something such consumers have to deal with on their own. Perhaps we should document that Terminal.Gui assumes little endian byte order, though, as a general note, somewhere? |
I rebased on the current v2_develop branch as of a little while ago, since these changes would otherwise have been conflicts at merge time. I hadn't touched the moved code anyway, so it didn't affect me. I also have a new |
Whoah. Rad. So much to learn from here. Very grateful! Will dive deep asap and review. |
I haven't pushed the commits yet. I will when it's closer to completion. Plus I didn't do my usual micro-commits, due to how coupled a lot of the changes are. I'll put in a PR at that time. |
Another side note... I'm trying to be as accommodating of the alpha channel as I can, without changing existing behavior, to ease any potential future attempt to respect the alpha channel, if someone ever decides to do so. |
One cool thing about implementing IFormattable formally is that it makes color formatting extensible, if someone wants to pass in a custom IFormatProvider, to handle formatting in whatever way they wish. It's also a globalization-related change since, in any potentially relevant cases, it allows culture-specific formatting to be used and respected, at least to the degree that whatever I end up with can do, if a custom IFormatProvider is given. The weird upshot of that is that, if someone passes non-latin numeric characters, it would in theory be able to handle it out of the box. I'm sure that's supremely unlikely, but just a kinda nifty side note about what supporting those interfaces enables, for honestly not much actual effort on my part aside from being sure the right overloads are invoked in the right situations. |
Ok So After what looks like a mostly-complete implementation of the new stuff, I have one pre-existing test that is failing: I began to debug it but abandoned quickly when I saw that it looks like it may not be complete. That test makes a call to But the property it calls is named in a concerning way: I pretty much stopped analyzing at that point because I didn't want to get too distracted from what I'm doing. So, two requests:
There may be an additional case necessary in the new implementation of As a general note, tests that are that unrelated causing failures in each other points to a problem either with the code that is under test or the test itself. Usually, it means that the test itself is not actually testing a self-contained "unit of work" and is instead more of an integration test, which is fine in the general sense, as long as it's understood by the test-writer. But, it also quite often reveals a potential bug or code quality issue in the form of code from distinct units that is coupled in a way that may not be intended. Both of those situations are things which should ideally be avoided, when possible. In short, any "unit" test that incidentally results in a Color being parsed or formatted should "trust" that the code owned by Color is "correct" (which is why mocking is a common practice). This is of course a general statement and isn't restricted or only relevant to the Color class itself. Each class should have a test harness that exhaustively or at least adequately proves that that class, in a vacuum, behaves as expected and documented, in all reasonably conceivable cases. When tests start making assumptions about the output of other classes, it means that the code under test is now coupled to a specific behavior of some other code, which is fragile and subject to breakage at future points, even with seemingly benign changes in that other code or in the code that depends on it. De-coupling things appropriately, where there is tight coupling at present, can and probably will represent additional initial effort, but results in significant savings due to avoidance of what is some pretty high-interest technical debt. |
I did fix that failure by adding a case that was missing that I haven't yet written explicit tests to cover (I'm getting to that shortly), but I still think that test and potentially at least parts of the RuneCell class should be re-examined. |
I'm not a fan of some of the operators on Color. For example: public static bool operator == (Color left, ColorName right) => left.ColorName == right; What does that do, exactly? Well, if I were a consumer, I would assume it is value equality between a given Color and a named color. Well... Nope... It's fuzzy equality, because ColorName returns the result of I really don't think that's appropriate for an If I'm a consumer and I ask if a given color is equal to a ColorName, my expectation is that the operator tells me that the color value I have is exactly equal in value to the known/configured value for that named color. If I have "#F00000", I don't expect a value equality operator to ever tell me that it is And then the inequality operators that should be the inverse of those are even weirder and make less sense. F00000 isn't FF0000, so why should I'm a fan of what that code does, because it's a useful piece of functionality, if wanting to try to alias to EGA colors or whatever named color pallet, but that really belongs in dedicated methods and not the operators. Now... As far as Terminal.Gui itself goes, there is no reference to that operator (or other similarly odd fuzzy operators) except in the unit tests, so I'm strongly inclined to change it and the other similar operators to named instance methods, which also makes them a bit more discoverable, anyway. I just am not quite sure what to call them. The dotnet standard for boolean methods is to use an active verb from the perspective of In general, equality operators for value equality should have these properties, as laid out in the C# Programming Guide:
Additional problems can arise when a color value is equidistant from two named colors. In that case, we have to make an arbitrary decision of how to break the tie, but doing so requires additional code to guarantee the transitive property is guaranteed (which isn't easy or is at least highly arbitrary). |
Oh, another thought about the fuzzy equality operators. They have a high potential to break the use of Colors in any kind of HashTable or derived type, such as Dictionary, because two Colors that are "equal" can have a wide range of different results from GetHashCode, which is a biiiiig no-no. If any Equals method or |
@dodexahedron please see te reason of the use of the |
Yep. I'm familiar with the attribute. Thanks for linking that issue, for context. It is a long-standing bug of Visual Studio that the debugger doesn't always show ToString exactly as intended, and it's annoying that it's still an issue. 😅 But the attribute isn't what I'm interested in. The code in question is: public override string ToString ()
{
var colorSchemeStr = ColorSchemeDebuggerDisplay ();
return DebuggerDisplay;
} The first line does nothing, as its return value is not used, and the method is pure, so that line can just be removed. The second line is just odd, for a few reasons.
Below is a version of it all that is much shorter, gives you the debug output you want with the escaped characters, and lets ColorScheme and Attribute handle their own ToString duties, so they aren't as coupled and output is consistent with anywhere else Attribute.ToString is called. Here's the entirety of the modified RuneCell class: /// <summary>
/// Represents a single row/column within the <see cref="TextView"/>. Includes the glyph and the foreground/background
/// colors.
/// </summary>
[DebuggerDisplay ( "{nameof(ColorSchemeDebuggerDisplay)}" )]
public class RuneCell : IEquatable<RuneCell> {
/// <summary>
/// The glyph to draw.
/// </summary>
[JsonConverter ( typeof ( RuneJsonConverter ) )]
public Rune Rune { get; set; }
/// <summary>
/// The <see cref="Terminal.Gui.ColorScheme"/> color sets to draw the glyph with.
/// </summary>
[JsonConverter ( typeof ( ColorSchemeJsonConverter ) )]
public ColorScheme? ColorScheme { get; set; }
/// <summary>Indicates whether the current object is equal to another object of the same type.</summary>
/// <param name="other">An object to compare with this object.</param>
/// <returns>
/// <see langword="true"/> if the current object is equal to the <paramref name="other"/> parameter;
/// otherwise, <see langword="false"/>.
/// </returns>
public bool Equals ( RuneCell? other ) => other != null &&
Rune.Equals ( other.Rune ) &&
ColorScheme == other.ColorScheme;
/// <summary>Returns a string that represents the current object.</summary>
/// <returns>A string that represents the current object.</returns>
public override string ToString ( )
{
var colorSchemeStr = ColorScheme?.ToString ( ) ?? "null";
return $"U+{Rune.Value:X4} '{Rune.ToString ( )}'; {colorSchemeStr}";
}
string ColorSchemeDebuggerDisplay => ToString ( ); // Yes, this really does it. Visual Studio is just broken.
} Here's the only addition that needs to be made to the ColorScheme class for that: /// <inheritdoc />
public override string ToString ( ) => $"Normal: {Normal}; Focus: {Focus}; HotNormal: {HotNormal}; HotFocus: {HotFocus}; Disabled: {Disabled}"; And the only change to the test to match the output is: Assert.Equal ("U+0061 'a'; Normal: [Red,Red]; Focus: [White,Black]; HotNormal: [White,Black]; HotFocus: [White,Black]; Disabled: [White,Black]", rc2.ToString ()); Attribute brackets its output, so that's all that changed. It's 16 lines shorter, overall, including the addition to ColorScheme, and performs a lot fewer string allocations (there was a bunch of concatenation before, and the "null" string literal allocation on every call, which usually just gets thrown away). But it also now isn't inverted, as the property intended to make Visual Studio work properly is now slaved to ToString, and not the other way around. It still outputs the escaped version for ToString, which I'm not a fan of, but I just didn't bother with that. |
@tig I went ahead and pushed the current state of the code (still work in progress) so you can have a look at where I'm going with it: https://github.com/dodexahedron/Terminal.Gui/tree/v2_Color_TryParse_improvements_3170 I tried to break up a bunch of commits to show changes a little more progressively, but they're not great and I doubt many of the intermediate commits even build - it's just for review purposes. The current commit efed19d builds and passes all unit tests, which includes a fair amount of new ones (but still plenty need to be made, which I'll get to). There are over 2000 test cases in ColorTests, but they should complete in less than a second (I'm seeing all 2042 of them complete in 0.1-0.2 seconds). While this does do everything the old one did and more, it is still a work in progress, so I'm not ready to make a PR for it just yet. |
Oh and a note about the unit test project. I found a really handy extension library for XUnit that gives it some similar combinatorial test case generation capabilities as NUnit has, and have been making use of that for tests I wrote from that point forward. |
I 1000% agree. When I took over this project there were almost no unit tests. I pushed hard for all contributors to build them and a ton of folks stepped up. Sadly, none of us had really built unit tests ourselves at scale. Additionally a lot of core code was much more coupled previously. This meant the only way to test something was to use As a result, while we got good code coverage, many of the tests exhibit the bad behaviors you describe. I've been working really hard on cleaning this up peace meal and have made some progress. I've also been much stricter in code reviews. Thank you for highlighting it. Please feel free to open issues for any unit tests you think are egregious in this regard. |
I'm aligned with your analysis of No offense taken if you nuke it in this PR. I appreciate your clear articulation of the issues. |
@dodexahedron, without adding burden to an already epic endeavor, can I suggest you consider this: I love what they're doing there w.r.t. supporting markup. When we get to these:
We will want similar capabilites. At the minimum, as you think through your work her can you:
|
Glad my intention is taken as...well...intended! 😅 I certainly never mean to disparage or anything and my analyses are intended to just by dry, technical, and impersonal (though I know sometimes it's hard not to have some personal attachment to something we wrote or designed or whatever!). You should see some reviews I've done of some of my own old code in the past! Man, that guy had some silly ideas. This PR likely will end up being essentially unreadable as a pure merge, which is why I tried to at least do some post-hoc commits to try to show things in a more digestible way. I was making rapid significant changes during the peak of the work, so making commits in real time would have resulted in a lot of noise. I think from here I should probably be able to go back to a more normal (for me, anyway, haha) commit strategy, since the bulk of the implementation should be mostly set now. I'll run the format rules now, though, before I resume work, because things are kinda scattered. Oh! That reminds me of another handy little thing I added in one commit that's in the code as currently pushed to that branch.... The UnitTests/.filenesting.json file I added in bc5218 should be automatically picked up by VS in the UnitTests project. What that does is makes any .cs file in the same folder that has the same base name nest underneath the file with the base name, like how in a winforms or WPF project the codebehind files nest under the designer files. I did that because I'm trying to keep things organized as I add a bunch of test methods, and breaking them into partial classes in files like ColorTests.Constructors.cs, ColorTests.Operators.cs, etc, which all nest in VS under ColorTests.cs, makes them much more manageable, and also helps keep noise down with changes such as reorganizing code. It also makes the UI less sluggish since there are smaller files loaded up at a time. I'm happy to re-consolidate them after I'm done, if you like, but I tend to find the file nesting rules make a nice compromise that keeps things tidy while still allowing physical separation of logical units. 🤷♂️ |
I'm glad that there is what there is and that you've done what you've done! Miguel created one hell of a nifty and useful library, and I'm glad it lives on and that the current stewards are as willing to take a wrecking ball to things and modernize it as y'all are. :)
Yeah. Sometimes I'm pretty sure I can tell when something is an inherited legacy that's just being tolerated, because things just look different, ya know? And then it's also clear when the goal was to "get it done," which I toooootally feel your pain with haha. But hey - they're better than the 0 that was there before! One big thing I know I probably keep repeating way too much is that, for most tests, smaller but more exhaustive tests are generally more helpful than longer tests that do a lot. After all, unit tests are supposed to be tests of units of work. :) To that end, the use of parameterized tests (which have to be marked Theory for XUnit) that prove the given unit actually behaves as expected for a range of inputs is so valuable I can't repeat it enough. 😅 As I'm working on Color, I'm also re-working most of the existing tests. Some are just getting unrolled into parameterized tests, some are getting expanded, some are getting re-worked, some are getting removed, and some are getting replaced, as appropriate - a lot of the same sort of work I've been doing for TGD, just XUnit-flavored over here. That said, a couple questions for ya:
Yeah I'm loving a lot of this kind of restructuring, as there is definitely plenty of old technical debt that needs to be paid down, and this is the perfect time to do it! Part of the reason I'm putting my effort in over here, right now, is that some of this stuff being in flux directly impacts what I was helping out with over at TGD as well, so might as well work on the core now instead of having to refactor multiple times.
For now, I'm still keeping this as restricted to Color, though I have been making note of other things I'd like to address once this is finished. I see plenty of other places that could benefit from some similar work, and should be majority non-breaking. As you can probably see from what I've done with Color so far, I'm a big fan of flexibility through implementation of common and relevant interfaces, for multiple reasons. For one, the additional functionality is certainly nice to have. But, more importantly, it helps enforce a degree of standardization with other code and libraries - especially the BCL - as well as within this project itself. A pair of biggies that I think should be implemented in a lot of places are the IFormattable and IParsable interfaces and their Span-based derivatives. After all, this is a library for working with textual interfaces, so formal and consistent text parsing and formatting seems like a good baseline to promise to consumers. |
I dig the nesting idea. Keep it in and we'll see how it works! |
I've been thinking that using a limited form of XAML as markup would be wonderful, from a consumer perspective. Doesn't have to be nearly as comprehensive, given that this is a TUI, among other things, but it's a well-established markup for UI design and having something familiar could help adoption too. As for the two mentioned issues, I've been noticing places where such things would be easier with some relatively minor but still somewhat breaking changes, so it's encouraging to hear that I'm not alone in wanting that sort of thing. I think, though, that some things which can be layered on might do us well to eventually say "ok, that's for VNext," so we can get the already wonderfully improved V2 out and start getting feedback on it and learning whatever lessons we learn from whatever form it is in when we call it shippable. 😅 Things like more complete ANSI support and such probably fit into that category, in my opinion, but doing things in a way that makes those later additions easier is for sure an ideal I am completely on board with. |
Nope. Sorry. |
See |
Ok, getting back to work on this after taking a break over the weekend. :) There's some reorganization around |
Alrighty PR coming in shortly. The functionality is implemented, but I haven't fleshed out all the tests, yet. I merged current v2_develop into it to check for conflicts or new test failures, and that came back all good, so it's "ready," if you want, but more tests will be coming, so I'd suggest holding off for now. |
PR #3204 is in All that's really left is additional test writing, so, unless new tests smoke out any issues, changes to Color itself should be basically done for this. |
Just out of curiosity, it looks really cool what you've come up with! Why is try-parse called so often to call for such advanced code compared to the initial regex? I'm sure you have some use cases. Have you contemplated caching already parsed values as an alternative? That could boost performance even more. |
I'm working on a bunch of stuff, so I'm not immediately sure of what you're referring to. Can you point out a line and I'll explain? Totally possible that there's opportunity for more gains. |
the strings that are color parsed |
As for caching.... Well, as far as actual strings go, the runtime does that already, as it will intern strings if it finds it advantageous to do so. But, for the actual work of things like when a color is matched against names (by the distance check or otherwise) I could see a potential opportunity to squeeze out a little more, and actually had considered it when I was writing it up. However, the effort required wasn't worth it to me in the moment, especially since the span stuff is already so darn fast. BUT! Just speaking purely in terms of theory: Colors are value types. Therefore, once a color exists, any other identical values of it are technically redundant. So, upon creation of a new color from a string, a dictionary entry mapping that string to a boxed The thing is that tracking them via a cache introduces reference semantics for them, so you'd have to be careful. And the cost of boxing/unboxing is pretty high and probably more than a couple of byte matches in a Span (that collection pattern matching stuff in the big switch statement is powerful stuff that dotnet does). An alternative idea that might be a better compromise could be to have a dictionary of likely common values outside of the named colors, which can be checked first. The thing is, any cache keyed on a string is going to result in checking the string for equality anyway, which is most of the actual time-consuming work the parser does. So, my prediction is it wouldn't actually make a tangible difference, or could actually result in worse performance. |
Gotcha. You posted as I was writing the above response, so I think that question is addressed there? Key thing is that string comparisons are always going to happen, whether there's a cache or parsing is done in real-time, so there's already a sunk cost. But, the parser uses spans, which are significantly more performant than strings (it's kinda crazy how good they are in comparison), so any alternative is going to have to overcome that on top of anything else. |
As for string interning... We could explicitly TELL the runtime to intern color strings, as well, to guarantee that it happens once a unique color string is encountered. For any of these, though, I think we might be getting into micro-optimizations that are not as valuable as spending the effort elsewhere, at least until RTM. 🤷♂️ As much as I like to optimize things, exercising the self-control to not take it to the extreme is sometimes hard. 😅 |
Performance is also relative to numbers. How many color parsing executions would you recon there be in a single application execution? |
It's a good question and something I've been thinking of when working on this and when looking at other value types like Pos. It almost made me want to turn Color into a class, but it really is a good one to be a value type. As far as actually parsing a raw string goes? I would think that the majority of that is done at startup, when reading configuration. Otherwise, in a compiled application, the compiler is going to work its magic on a lot of stuff, with the value types, where it can. At runtime, we're mostly actually passing around an actual It would probably be a better use of time and effort to track down any place a color is created (including by implicit copy) and either fix that or be sure to stick to reference passing. In this current PR, I did already add a bunch of That said, I did make various constructors and methods in |
With nothing specific in mind, I would say that, if there are places where we pass around a |
A There are currently 4 We know the current This is relevant for app load, and for use-cases where the theme is changed. |
Thanks for that. Might be worth some benchmarking once we get finished with more stuff, for sure. |
Just a wild thought: So, in theory, configuration of colors (or configuration in general) isn't something that is terribly likely to change between executions of an application. With that basic assumption in mind, if we wanted to go for maximum startup performance, pre-computation and caching of things could be beneficial, for subsequent executions of the application. This is a common strategy employed by applications when there's an expensive operation that fits that criterion of only really needing to be done once, at startup. The most common example is graphics drivers and 3D games, which often will pre-compile shaders on first launch after a configuration or binary change, to front-load the work to the first launch, benefiting all future launches. This would be pretty easy to do with colors. We could parse the configuration and cache the computed color values in a binary file that can just be loaded up on future executions. The method for knowing if the cache is stale could be something as simple and quick as a hash of the configuration file. If it matches the last known hash, use the cache. If not, parse and create a new cache. BUT It's also important to realize that this is already pretty darn fast. In some quick benchmarking I just did, Parse handles hex strings in RRGGBB and AARRGGBB format extremely quickly. This is what I ran, which tests 255 different color strings in [Fact]
public void BenchmarkParse ( )
{
long totalTicks = 0;
Stopwatch sw = new ( );
List<Color> colors = new ( );
for ( int i = 0; i < 1000; i++ ) {
for ( byte b = 0; b < 255; b++ ) {
sw.Start ( );
Color c = Color.Parse ( $"#FF1234{b:x2}" );
sw.Stop ( );
totalTicks += sw.ElapsedTicks;
sw.Reset ( );
colors.Add ( c ); // Just doing this so it doesn't optimize it away. Not timed.
}
}
TimeSpan s = TimeSpan.FromTicks ( totalTicks );
} Average time with a debug build was 8 ticks, which is 800 nanoseconds. And note that includes the time to format the string before input, which is unfair. Average time in a release build was 5 ticks (500 nanoseconds). So, in the case of having 400 to operate on, at startup, we're looking at 320 microseconds in a debug build or 200 microseconds for release, on this machine's 5-year-old CPU. The disk IO activity to read the values will dominate that by an order of magnitude for HDD and will be on-par for SSD, so, in my opinion, further optimization of the parser for hex formats isn't worth the time. Dealing in Spans really is a major boost to things, and the more I use them, the more I absolutely love what the dotnet team has been doing in the last couple versions. I had them sprinkled here and there in dotnet 7 projects, but really dove in head first with dotnet 8 and....Damn... I'm hooked. 😂 |
Continuing along that train of thought... Considering how quick it already is, computing a hash of a config file to determine cache freshness would, by itself, already take just as long, and then the load of the cache file from disk would incur an additional IO penalty, so my assumption is that even a binary cache will be slower every time. I'm curious how this would perform on a Raspberry Pi 5 that Amazon just delivered today. I imagine it'll be slower, due to lower clock speed and slower memory, but I'd wager the performance difference will be proportional to those factors entirely, since it's a simple byte-matching algorithm. |
Here's an even more fair test, using 100000 random numbers generated by the crypto random number generator, and pre-formatting the string before parse, so now it's really only timing the parse: [Fact]
public void BenchmarkParse ( )
{
long totalTicks = 0;
Stopwatch sw = new ( );
List<Color> colors = new ( );
RandomNumberGenerator rng = RandomNumberGenerator.Create ( );
Span<byte> intBytes = stackalloc byte [4];
for ( int i = 0; i < 100000; i++ ) {
rng.GetBytes ( intBytes );
int colorInt = BinaryPrimitives.ReadInt32LittleEndian ( intBytes );
string hexString = $"#{colorInt:X8}";
sw.Start ( );
Color c = Color.Parse ( hexString );
sw.Stop ( );
totalTicks += sw.ElapsedTicks;
sw.Reset ( );
colors.Add ( c ); // Just doing this so it doesn't optimize it away. Not timed.
}
TimeSpan s = TimeSpan.FromTicks ( totalTicks );
} Average time in release build was 4 ticks on the same machine as before. So, 400 input strings, assuming all are random and unique (which is worst case, considering processor caches and whatnot), read at startup, would be 160 microseconds. :) |
All this is to say that yes, it was a good thought you had, and I shared similar curiosity. But this data shows we probably shouldn't bother. Just as a side note about the whole thing, before I even got started writing new code: I tried implementations using slightly tighter regular expressions, as well as tweaks to the creation of the regexes, such as making them forward-only (which has a significant impact on that kind of regex) and pre-compiling them. We are now running so much faster that we can complete one hundred thousand parses in a total time that is almost 100 times LESS than the time the best regex I could come up with was able to parse a single value (though the regex benchmarks speed up after the first use, which cuts the advantage by a factor of about 20 over 100k runs). All thanks to spans, since it's not like the code I wrote is terribly profound. 😅 |
Oh hey.... Some of the conditional compilation stuff in Responder breaks Release builds, in the unit test project. I didn't track them down when trying these benchmarks, and just defined the DEBUG_IDISPOSABLE symbol to make it go away since I didn't care about that in the context of that benchmark. But yeah - Definitely be sure code dependent on conditional stuff is handled everywhere. |
I love the use case of having a theme designer that dynamically updates the app as it's running. So config is not just a startup scenario. |
Noted. And yeah, that's a neat case and certainly relevant for the code we're talking about. It's also relevant to TGD, though TGD is a special case pretty much all around haha. But, as mentioned, we're talking less than a millisecond on middle-aged hardware on a laptop (which, as a matter of fact, was on battery at the time, as I was outside on my patio then, so it wasn't even performing at its full capability due to corporate power policy). So I'm still pretty confident we can move on for now, that optimization efforts by me or anyone else are probably better spent elsewhere, and, if we want to revisit it for a future enhancement, there's no reason we can't come back to it later! :) |
One thing I wanted to mention as an example of just how simple this particular operation is: The internet! Any web page you visit typically has, at minimum, dozens, but most likely anywhere from hundreds to thousands of text declarations of 32-bit colors (even respecting alpha for rendering), but you don't feel that enough to care or notice. Sure, I'll grant it's non-zero, but the point is that so many other parts of the application, be it a web browser or something like Terminal.Gui occupy a much larger proportion of the time to process things than parsing of color values from a few bytes of text to a 4-byte integer that anything beyond the low hanging fruit is in the realm of micro-optimizations. 🤷♂️ The biggest win for this code, honestly, was just removing the regexes, since those were heavy. Even plain old string parsing would have been fine with the same basic strategy I used. The use of spans was just opportunistic since it's at least mostly similar to working with strings and therefore was a mostly-free win for performance that I was (well... am, since I'm not done with tests yet!) more than happy to contribute. 😊 And then the other enhancements vis-a-vis interface implementations were just because I'm a nerd haha. And again I'm not saying there are no further possible gains - just that a couple hundred nanoseconds to parse text to a value that can be used from that point on has basically exhausted its opportunities for order-of-magnitude scale optimizations, and that similar or possibly less effort can be spent elsewhere for a better return on the time investment. |
Oops. Didn't drop a mention there to link it over here.... As of a few minutes ago, #3204 is now complete and ready for review/merge at your discretion. |
* Reduces indentation by remove the namespace braces. * Reduces indentation and removes unused using. * Ensures clear selection if it isn't selecting. * Turn on nullability context for TryParse and update usages as needed. * Use IsNullOrWhiteSpace, which includes IsNullOrEmpty All-whitespace values are also illegal, so may as well handle that here too * Respect the nullable here * It's a struct * Use byte.MaxValue and add remark * Just use the bytes directly * Must respect endianness * Add uint constructor so consumers don't have to do unchecked math * Completely re-work parsing and implement ISpanParsable<Color> All parsing is now almost-0 allocation, and is significantly faster than before * Extension methods required by new code * Use standard Math.Clamp method here * Add some new unit tests for TryParse * De-duplicate code and handle more cases * Enable nullability context for the file * Go ahead and enable the language features and analysis * Implicit usings remove a lot of boilerplate usings. * Add these to the dictionary to shut spell check up * Make this thing a record struct and a union, and update constructors This commit won't build. I'm just breaking out changes a little bit. * Some additional XmlDoc standardizing * Make FindClosestColor and CalculateColorDistance use the vector for SIMD * Add a TryFormat method for support of I*Formattable * Add an interface for support of custom formatting of Colors * Pass by in reference * Parse string delegates to Parse span * Parse now does all the work * Remove the old new code from TryParse * Some new cast operators * Add IFormattable.ToString implementation * Add the rest of the code for Color in its current (unfinished) state * Move that interface to its own file * Add ColorParseException class * Move Attribute to its own file, too. * Re-implement these operators as explicit methods * Get rid of fuzzy equality operators and update tests to use the named methods that replace them * Add an explicit test case for ToString with null format string and explicitly specified Invariant culture * Fix byte orders for hex format to be standard ARGB * Prove that ToString and Parse can round-trip values * Unroll this test into parameterized cases * Fix a couple of comments to match byte order * Update R# dictionary to match correct byte orders * Remove stray comment * Separate all types in this to their own files * Convert this one to use the handy extension * Add test for Argb property * Add a file nesting rule to make some incoming file changes display nicely * Move constructor tests to their own file and add some new tests * Add implicit cast from uint * More constructor tests * Since this is now a record, the equality operators are compiler generated Still spot-checking a few arbitrary values for completeness * Override ToString to delegate to Attribute * Simplify and clean up ToString. Delegate to ColorScheme * Update the test to match new output * These should be fields, really. It's a value type. * Add some type checks for change control * Allow unsafe and turn on implicit usings * Make this one better * Rename tests and remove redundant checks (the type checks already guarantee field consistency) * Reorganize a bit * Make these test 16 random values * Existing operator tests converted * That one is now redundant with both of the other tests that check all the named colors * Move this to type checks and simplify a little bit * These lambdas can be static * Move operators to another file. * Add global using for System.Text because it's EVERYWHERE * Reorganization of Color and some related types. Updated usages to reflect changes * Update tests to reflect changes in Terminal.Gui * Add missing keyword * Add entry to dictionary * Add dotsettings for Terminal.Gui Only specifies language level * Commit unsaved changes for usings here * Implement last remaining TryFormat method * A little cleanup/formality * Sorting rules Sort methods by name and interface they implement * Sort code * Match namespace for tests None! * Unroll ordinal check and reorganize. * Sort before writing new tests * These got reversed... * Add test to prove explicit cast to Vector3 works properly * Sort these too * Add test for uint->Color implicit operator * Add test for Color->uint implicit operator * Correct this test name and re-order * Add test for implicit conversion from Vector3 to Color * Add test for implicit conversion from Vector4 to Color * These also got reversed, like with Vector3 * Add test for implicit conversion from Color to Vector4 * Add test for GetHashCode * Make sure these are all under the same namespace * Remove a now-redundant test * Reorganize formatting and parsing tests to another type part * Tests moved back to Terminal.Gui.DrawingTests namespace as before. * Add tests for the constructor taking 3 or 4 integers and sort * Cleanup - Renamed some tests - Make a test even clearer - Removed redundant code - Got rid of unused parameter in Constructor_WithColorName_AllChannelsCorrect * That needs to be from the reverse map Not broken - just was sub-optimal due to my error. * Enable nullability context in this file Not sure how it got removed but whatev * Respect nullability context in this file now that it's on 🤦♂️ * Add tests for expected exceptions with whitespace or null values * Add test for parameterless constructor * A couple more places for reference passing and some SkipLocalsInit attributes. * Some XmlDoc corrections to reflect the final implementation * Remove namespace qualifier * Can't use these because of lambdas :( * Removed a collection that never ended up being needed. * Add bracing, newline, and modifier style rules * Add spacing rules inside parens/brackets * This was still under the Terminal.Gui.Drawing namespace. Revert that. * Applied updated formatting settings and addressed XmlDoc reviews in #3204 * More places where spaces got added in dependent code. Also a couple of null checks fixed to not use the equality operator * More dependent code format fixes * Finished re-formatting modified code that got spaces added everywhere * Visual studio didn't actually write this file to disk til I closed out of VS... Grr * Delete the ReSharper settings files from this branch. --------- Co-authored-by: BDisp <[email protected]>
Closing this as it has been merged. |
I don't even fully remember what brought me to it, but I noticed the regexes being used in Color parsing, and that raised performance concerns.
Color.TryParse does what it says, mostly, though it does miss some edge cases and allows out-of-bounds values, since it uses all signed integers.
I wrote up a quick XUnit test harness to show a simpler and more precise implementation of the rgb portion of it that is several times faster, and executes in constant time, via a combination of simple string manipulation and the C# 12 collection expression feature, which enables additional optimizations by the compiler over older ways of manipulating certain types of collections.
Here's the code. The relevant portion is what's between the stopwatch starts and stops.
First, it tries out the new method, asserting the expected values were properly parsed. Then it does it the old way with the regexes (but with one tweak, to enable it to handle rgba as well).
The new code would replace the two regex blocks at the bottom of the TryParse method.
The new code also implicitly gains bounds restrictions by using bytes, which will limit the input range to [0,255] or the appropriate overflow exception will be thrown by the framework (to be caught in TryParse).
This code is between 2 and 12 times faster on the machine I ran it on than the regex code.
Other portions of the TryParse method can be similarly modified to consolidate things into a single switch statement, for the other formats (such as hex), which would be nice, and which would also implicitly gain some free potential performance from use of collection expressions and the reduced allocations made possible with those language features.
At minimum, some of the hex format code can be simplified and improved like so (this is just in one case, but the method for each is nearly identical):
That gets it done in one allocation of a byte array, instead of all the individual substring calls, each of which is a new string allocation.
The text was updated successfully, but these errors were encountered: