Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't try to pluralize plural words #6

Open
SingingBush opened this issue Nov 26, 2014 · 3 comments
Open

don't try to pluralize plural words #6

SingingBush opened this issue Nov 26, 2014 · 3 comments

Comments

@SingingBush
Copy link

this library is almost what I need although I wish it could also return the singular versions of plural words. Shouldn't be too hard. I did find an issue though. When passed a word that is already plural, it tries to pluralize it instead of just returning the same string. This is a problem as I'm handling user input and will have no idea if the users have typed in singular or plural to start with.

Ideally I'd want a method that returns a plural if a singular has been input and a singular if a plural has been input.

btw, on the atteo.org page the xml block for maven needs editing, groupid should have an uppercase 'I'. it will cause maven to fail.

@sentinelt
Copy link
Member

I changed to uppercase 'I' on atteo.org page. Thanks for that.

When passed a word that is already plural, it tries to pluralize it instead of just returning the same string.

Sometimes the expected behavior would be to try to pluralize the word even it is already plural. So instead of changing the semantics I think it would be better to add a more general method - the one which will allow to check whether the word is plural or singular:

English.isPlural(String)

But this is non trivial functionality. If you know any materials which describe how to pragmatically find out whether the word is plural or not, or you know any libraries in other languages which do that, please let me know.

For plural to singular mapping please open a separate feature request.

@SingingBush
Copy link
Author

to do what I needed I used some code from here combined with your own English.plural() method:

public class WordMagic {

    private static final List<String> UNCOUNTABLES = Arrays.asList(new String[]{"equipment", "information", "rice", "money", "species", "series", "fish", "sheep"});

    private LinkedList<Rule> _singulars = new LinkedList<Rule>();

    public WordMagic() {
        addSingularizeRules();
    }

    /**
     * For a given word, return either the singular or the plural version
     * @param word the term that needs checking
     * @return either the singular or the plural version
     */
    public String calculateSingularOrPlural(final String word) {
        if (isUncountable(word)) return word;

        for (final Rule rule : _singulars) {
            final String result = rule.apply(word);
            if (result != null) return result;
        }
        // if no singular was found we'll assume that the word is already singular and can be safely pluralised.
        // English.plural() will always pluralise, even if it's already plural!!!
        return English.plural(word.trim());
    }


    private void addSingularize(final String rule, final String replacement) {
        final Rule singularizeRule = new Rule(rule, replacement);
        _singulars.addFirst(singularizeRule);
    }

    private void addSingularizeRules() {
        addSingularize("s$", "");
        addSingularize("(s|si|u)s$", "$1s"); // '-us' and '-ss' are already singular
        addSingularize("(n)ews$", "$1ews");
        addSingularize("([ti])a$", "$1um");
        addSingularize("((a)naly|(b)a|(d)iagno|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$", "$1$2sis");
        addSingularize("(^analy)ses$", "$1sis");
        addSingularize("(^analy)sis$", "$1sis"); // already singular, but ends in 's'
        addSingularize("([^f])ves$", "$1fe");
        addSingularize("(hive)s$", "$1");
        addSingularize("(tive)s$", "$1");
        addSingularize("([lr])ves$", "$1f");
        addSingularize("([^aeiouy]|qu)ies$", "$1y");
        addSingularize("(s)eries$", "$1eries");
        addSingularize("(m)ovies$", "$1ovie");
        addSingularize("(x|ch|ss|sh)es$", "$1");
        addSingularize("([m|l])ice$", "$1ouse");
        addSingularize("(bus)es$", "$1");
        addSingularize("(o)es$", "$1");
        addSingularize("(shoe)s$", "$1");
        addSingularize("(cris|ax|test)is$", "$1is"); // already singular, but ends in 's'
        addSingularize("(cris|ax|test)es$", "$1is");
        addSingularize("(octop|vir)i$", "$1us");
        addSingularize("(octop|vir)us$", "$1us"); // already singular, but ends in 's'
        addSingularize("(alias|status)es$", "$1");
        addSingularize("(alias|status)$", "$1"); // already singular, but ends in 's'
        addSingularize("^(ox)en", "$1");
        addSingularize("(vert|ind)ices$", "$1ex");
        addSingularize("(matr)ices$", "$1ix");
        addSingularize("(quiz)zes$", "$1");
    }

    private boolean isUncountable(final String word) {
        return StringUtils.isEmpty(word)? false : UNCOUNTABLES.contains(word.trim().toLowerCase());
    }

    private class Rule {
        private final String _expression;
        private final Pattern _expressionPattern;
        private final String _replacement;

        protected Rule(final String expression, final String replacement) {
            _expression = expression;
            _replacement = replacement != null ? replacement : "";
            _expressionPattern = Pattern.compile(_expression, Pattern.CASE_INSENSITIVE);
        }

        /**
         * Apply the rule against the input string, returning the modified string or null if the rule didn't apply (and no
         * modifications were made)
         *
         * @param input the input string
         * @return the modified string if this rule applied, or null if the input was not modified by this rule
         */
        protected String apply(final String input) {
            final Matcher matcher = _expressionPattern.matcher(input);
            return matcher.find() ? matcher.replaceAll(_replacement) : null;
        }
    }
}

Not sure if that's any help. I'm just hacking something together for a prototype.

@eepstein
Copy link

In line with the issue's title (but not its thrust), the library incorrectly "pluralizes" preferences to preferenceses. hmmm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants