Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce String/StringBuilder allocations #984

Merged
merged 5 commits into from
Aug 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -88,14 +88,62 @@ public static String[] split(String text) {
return list.toArray(NO_STRINGS);
}

private static boolean isAsciiLetterOrDigit(char c) {
return 'a' <= c && c <= 'z' ||
'A' <= c && c <= 'Z' ||

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'A' <= c && c <= 'Z' ||
isAsciiUpperCase(c) ||

Or would a method call here make things too slow? Though, it might get inlined, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in favour of reusing what we got; in theory the size (in term of bytecode size) of both methods is so small that will likely being inlined, if hot, unless we are unlucky and the depth of the call fall on the max inlining depth (which is 15 from JDK>=11 IIRC).
In theory these methods will be called for each character of each property names, which means they will likely (to be verified, really!) C2 compiled, and inlining, as intrinsics, won't kick in with previous compilation levels AFAK, so, If lucky, everything will work as expected.
To play safe, we should manually inline everything if this is supposed to happen at startup and we won't expect C2 compilation to kick-in.

TBH in a later iteration of this code, due to #984 (comment) I will likely turn this code into some branchless variant, which can work on 4 chars at time, which will give 3 important benefits:

  • most comparisons are on common prefixes, hence batching will grant to speedup moving to the different part of strings to compare
  • most names have a mix of alphabetical and not chars, very unpredictable; a branchless approach will pay less in such cases
  • the batchy and branchless variant will amortize the cost of unlucky inlining decisions or lower compilation level than C2 by performing more ops in a row

'0' <= c && c <= '9';
}

private static boolean isAsciiUpperCase(char c) {
return c >= 'A' && c <= 'Z';
}

private static char toAsciiLowerCase(char c) {
return isAsciiUpperCase(c) ? (char) (c + 32) : c;
}

public static boolean equalsIgnoreCaseReplacingNonAlphanumericByUnderscores(final String property,
CharSequence mappedProperty) {
int length = mappedProperty.length();
if (property.length() != mappedProperty.length()) {
// special-case/slow-path
if (length == 0 || property.length() != mappedProperty.length() + 1) {
return false;
}
if (mappedProperty.charAt(length - 1) == '"' &&
property.charAt(length - 1) == '_' && property.charAt(length) == '_') {
length = mappedProperty.length() - 1;
} else {
return false;
}
}
for (int i = 0; i < length; i++) {
char ch = mappedProperty.charAt(i);
if (!isAsciiLetterOrDigit(ch)) {
if (property.charAt(i) != '_') {
return false;
}
continue;
}
final char pCh = property.charAt(i);
// in theory property should be ascii too, but better play safe
if (pCh < 128) {
if (toAsciiLowerCase(pCh) != toAsciiLowerCase(ch)) {
return false;
}
} else if (Character.toLowerCase(property.charAt(i)) != Character.toLowerCase(ch)) {
return false;
}
}
return true;
}

public static String replaceNonAlphanumericByUnderscores(final String name) {
int length = name.length();
StringBuilder sb = new StringBuilder(length);
for (int i = 0; i < length; i++) {
char c = name.charAt(i);
if ('a' <= c && c <= 'z' ||
'A' <= c && c <= 'Z' ||
'0' <= c && c <= '9') {
if (isAsciiLetterOrDigit(c)) {
sb.append(c);
} else {
sb.append('_');
Expand All @@ -117,51 +165,48 @@ public static String toLowerCaseAndDotted(final String name) {
if ('_' == c) {
if (i == 0) {
// leading _ can only mean a profile
sb.append("%");
sb.append('%');
continue;
}

// Do not convert to index if the first segment is a number
if (beginSegment > 0) {
try {
String segment = sb.substring(beginSegment, i);
Integer.parseInt(segment);
sb.replace(beginSegment - 1, beginSegment, "[").append("]");
if (isNumeric(sb, beginSegment, i)) {
sb.setCharAt(beginSegment - 1, '[');
sb.append(']');

int j = i + 1;
if (j < length) {
if ('_' == name.charAt(j)) {
sb.append(".");
sb.append('.');
i = j;
}
}

continue;
} catch (NumberFormatException e) {
// Ignore, it is not an indexed number
}
}

int j = i + 1;
if (j < length) {
if ('_' == name.charAt(j) && !quotesOpen) {
sb.append(".");
sb.append("\"");
sb.append('.');
sb.append('\"');
i = j;
quotesOpen = true;
} else if ('_' == name.charAt(j) && quotesOpen) {
sb.append("\"");
sb.append('\"');
// Ending
if (j + 1 < length) {
sb.append(".");
sb.append('.');
}
i = j;
quotesOpen = false;
} else {
sb.append(".");
sb.append('.');
}
} else {
sb.append(".");
sb.append('.');
}
beginSegment = j;
} else {
Expand All @@ -171,6 +216,23 @@ public static String toLowerCaseAndDotted(final String name) {
return sb.toString();
}

public static boolean isNumeric(CharSequence digits) {
return isNumeric(digits, 0, digits.length());
}

public static boolean isNumeric(CharSequence digits, int start, int end) {
if (digits.length() == 0) {
return false;
}

for (int i = start; i < end; i++) {
if (!Character.isDigit(digits.charAt(i))) {
return false;
}
}
return true;
}

public static String skewer(String camelHumps) {
return skewer(camelHumps, '-');
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@
package io.smallrye.config.common.utils;

import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertFalse;
import static org.junit.jupiter.api.Assertions.assertThrows;
import static org.junit.jupiter.api.Assertions.assertTrue;

import org.junit.jupiter.api.Test;

Expand Down Expand Up @@ -122,4 +124,12 @@ void replaceNonAlphanumericByUnderscores() {
void toLowerCaseAndDotted() {
assertEquals("test.language.\"de.etr\"", StringUtil.toLowerCaseAndDotted("TEST_LANGUAGE__DE_ETR__"));
}

@Test
void isNumeric() {
assertTrue(StringUtil.isNumeric("0"));
assertFalse(StringUtil.isNumeric("false"));
assertTrue(StringUtil.isNumeric("foo[0]", 4, 5));
assertTrue(StringUtil.isNumeric(new StringBuilder("foo[0]"), 4, 5));
}
}
Loading
Loading