Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove Ascii class #89

Conversation

sclassen
Copy link
Contributor

@sclassen sclassen commented Aug 22, 2019

Most of the code in the class Ascii was never used.

  • Remove unused code
  • Simplify implementation of only remaining method
  • Move all methods from Ascii -> Strings
  • Add support for comparing characters outside of A-Z in a case insensitive way (for example äöü)

@sclassen sclassen mentioned this pull request Aug 22, 2019
Copy link
Contributor

@martinfrancois martinfrancois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your implementation of containsIgnoreCase() is indeed simpler, but it's slower than the current implementation.
To verify this, I benchmarked the current implementation, vs your implementation and vs s1.toLowerCase().contains(s2.toLowerCase()).
The current implementation was the fastest, with your implementation being 3x slower (up to 5.3x) and the last option being 3.4x slower (up to 7.4x) on average.

Since this method is being called quite a lot during search, I don't think it would be a good tradeoff in my opinion, since it may slow down the search considerably, especially when there are a lot of settings.

I do however agree that we can remove the unused methods, since we don't really need them. We originally left the method in the Ascii class intentionally because we were waiting for google/guava#3023 to be merged, and it would've made the refactoring a lot easier, since we only needed to remove the class itself from here and the imports, and wouldn't need to refactor the rest. But since it takes a long time for it to get merged and we don't have the Guava dependency anymore anyways I don't think that aspect is really important anymore.

My suggestion would be to revert the simplification to the original implementation and keep all of the other changes. This would mean the Ascii class will still be gone, but the implementation of containsIgnoreCase will be present in Strings. What do you think?

@sclassen
Copy link
Contributor Author

Maybe the difference in performance is explainable by a difference in feature.
I added the following line to AsciiTest in the master branch

assertTrue(Ascii.containsIgnoreCase("Übung", "übung"));

This causes the test to fail as the current implementation only supports case insensitive lookup for the 26 letters in the English alphabet. Other letters such as the German umlaut or French, Norwegian, Spanish special characters are not compared in a case insensitive manner.

The new implementation uses the Java built in functionality for comparing strings in a case insensitive manner an thus also supports characters other than the A-Z.

If you weight the performance higher than the support of non English characters than I wil restore the algorithm from the Ascii class.

@sclassen sclassen changed the title simplified Ascii class remove Ascii class Aug 25, 2019
@martinfrancois
Copy link
Contributor

I see.. This changes the perspective of course. Let's keep this implementation then, and in case there are reports about slow performance, we would probably need to introduce some sort of caching to make the search quicker. I added an issue #94.

@martinfrancois martinfrancois merged commit f46dd57 into dlsc-software-consulting-gmbh:master Aug 25, 2019
@sclassen sclassen deleted the simplifyAsciiClass branch August 25, 2019 18:23
@martinfrancois martinfrancois added cleanup enhancement New feature or request labels Sep 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants