sentence_concreteness

This is a package for tagging sentences with their concreteness. The measure is an average of words in a sentence. Words are matched to their root form and person, place, or organizational entities are tagged with a maximum concretenes of 5. The word concreteness ratings that this package relies upon were provided by Brysbaert, Warriner & Kuperman (2013).

This method has been empirically validated in our paper. If you find it helpful, please consider using the following citation:

Aubin Le Quéré, M., Matias, J.N. When curiosity gaps backfire: effects of headline concreteness on information selection decisions. Sci Rep 15, 994 (2025). https://doi.org/10.1038/s41598-024-81575-9

Installation

pip install sentence_concreteness

You will also need to download the spacy model.
python -m spacy download en_core_web_sm

Requirements

csv
string
inflect
spacy
truecase
nltk

Usage

See demo.py for an example of how to run sentence_concreteness.
Note: The python package is still experimental, please contact Marianne if you encounter any issues.

Documentation

`get_concreteness(word)`

Returns the matched concreteness for an individual word. This method will try to match a word to a root form if able.

Name	Type	Description
`word`	string	Word that you wish to retrieve the concreteness for.

`get_sentence_concreteness(sentence, verbose=False, num_unmatched_words_allowed=3)`

Returns the matched concreteness for a sentence word. For each word, this method will try to calculate a concreteness and then take the average of all retrieved concretenesses. If a word is considered an entity, it will automatically be assigned a concreteness of 5.

Name	Type	Description
`sentence`	string	Sentence that you wish to retrieve the concreteness for.
`verbose`	boolean	Whether you want more information.
`num_unmatched_words_allowed`	int	Number of allowable non-matched words before an error is returned.

Details

To calculate concreteness ratings, we first identify any person, place, or organizational entities in a headline using the spaCy package, and encode these entities with the highest concreteness score of 5. We then split our headline into a list of tokens and remove standardized stopwords from the headline. We ignore punctuation and cardinal numbers. From the remaining list of tokens, we take an iterative approach to mapping each token to its concreteness rating, checking between each step if the words maps to a concreteness rating. At each step, if we cannot yet retrieve a concreteness rating for a token, we first attempt to retrieve a singular version of the token (e.g. "elephants" → "elephant"), a present tense version (e.g. "lounged" → "lounge"), or a base adjective (e.g. "greatest" → "great"). If these steps all fail and a word is hyphenated, we take the average of both words (e.g. "super-spectacular" → "super", "spectacular").

Limitations

This scale was validated in the context of news headlines by the publisher Upworthy. Additionally, we selected headlines that were between 14 and 16 words. While the measure can be used more generally to tag sentences and for different sentence lengths, scholars may want to conduct additional validation to ensure the scale works for their specific context.

In very rare instances, truecase behaves non-deterministically, which can impact the NER results and therefore make the final concreteness score non-deterministic. In my experience, this only happens about once every 10,000 sentences, but is something to be aware of nontheless. This issue can be solved by removing truecase, which may or may not be appropriate for your use case.

Resources used

https://maria-antoniak.github.io/2020/03/25/pip.html
https://realpython.com/pypi-publish-python-package/#prepare-your-package-for-publication

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
build/lib/sentence_concreteness		build/lib/sentence_concreteness
sentence_concreteness		sentence_concreteness
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
concreteness_ratings.csv		concreteness_ratings.csv
demo.py		demo.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sentence_concreteness

Installation

Requirements

Usage

Documentation

`get_concreteness(word)`

`get_sentence_concreteness(sentence, verbose=False, num_unmatched_words_allowed=3)`

Details

Limitations

Resources used

About

Releases

Packages

Languages

License

maubinle/sentence_concreteness

Folders and files

Latest commit

History

Repository files navigation

sentence_concreteness

Installation

Requirements

Usage

Documentation

get_concreteness(word)

get_sentence_concreteness(sentence, verbose=False, num_unmatched_words_allowed=3)

Details

Limitations

Resources used

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`get_concreteness(word)`

`get_sentence_concreteness(sentence, verbose=False, num_unmatched_words_allowed=3)`

Packages