Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uv, python 3.12, only ._.blob access, remove textblob-de #27

Merged
merged 7 commits into from
Oct 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 6 additions & 16 deletions .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,37 +7,27 @@ on:
pull_request:
branches: [ main ]
push:
branches:
branches:
- main
workflow_dispatch:

jobs:
build:

runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.7", "3.8", "3.9"]
python-version: ["3.9", "3.10", "3.11", "3.12"]

steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install python dependencies
- name: Setup uv
run: |
curl -sSL https://install.python-poetry.org | python3 -
poetry export --without-hashes --output requirements.txt
python -m pip install --upgrade pip
pip install wheel
pip install -r requirements.txt
python -m textblob.download_corpora
python -m spacy download en_core_web_sm
pip install textblob-de
python -m spacy download de_core_news_sm
pip install textblob-fr
python -m spacy download fr_core_news_sm
pip install pytest
curl -LsSf https://astral.sh/uv/install.sh | sh
- name: Test with pytest
run: |
pytest
uv run --python ${{ matrix.python-version }} --all-extras pytest
19 changes: 10 additions & 9 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,12 @@

## Development environment

### poetry
### uv

`poetry` is used to manage python dependencies. See the docs on how to install python [https://python-poetry.org/](https://python-poetry.org/). To activate the poetry virtual environment run the following commands:
`uv` is used to manage python dependencies. Run the following to install `uv`:

```bash
poetry install
poetry shell
curl -LsSf https://astral.sh/uv/install.sh | sh
```

### just
Expand All @@ -19,24 +18,26 @@ poetry shell

## Code formatting

Please use the [black](https://black.readthedocs.io/en/stable/) for formatting code before submitting a PR.

```bash
black spacytextblob
just format
```

## Testing

Please validate that all tests pass before submitting a PR by running:

```bash
pytest
# Test against the latest supported version of Python
just test

# Tet against all supported versions of Python
just test-matrix
```

## Docs

To build the docs and visually inspect the docs please run:

```bash
just docs
just preview-docs
```
4 changes: 1 addition & 3 deletions docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@ When adding *spacytextblob* to your spaCy pipeline you can optionally pass addit

| Name | Type | Description |
|------|------|-------------|
| `blob_only` | `bool` | If True, *spacytextblob* will only expose `._.blob` and not attempt to expose `._.polarity`, `._.subjectivity`, or `._.assessments`. This should always be set to True when using TextBlob extensions. By default `False`. |
| `custom_blob` | `Dict[str, str]` | The `"custom_blob"` key should be assigned to a dictionary that tells spaCy what function to replace `textblob.TextBlob` with. In this case, we want to replace it with `TextBlobDE`. The key of the dictionary is `"@misc"`. This tells spaCy to look into the misc section of the spaCy register. The value should be the string name of a function that you have registered with spaCy. See the [TextBlob extensions](tutorial/textblob_extensions.md) section for more details. |


Expand All @@ -45,7 +44,6 @@ from spacytextblob.spacytextblob import SpacyTextBlob
nlp = spacy.load("de_core_news_sm")

nlp.add_pipe( "spacytextblob", config={
"blob_only": ..., # bool
"custom_blob": ... # Dict[str, str]
})
```
Expand All @@ -61,5 +59,5 @@ Using *spacytextblob* without an extension:
Using *spacytextblob* with an extension:

```python
{! docs/static/reference_code/textblob_de_example.py !}
{! docs/static/reference_code/textblob_fr_example.py !}
```
12 changes: 12 additions & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
# Changelog

## 5.0.0 (2024-10-12)

**Breaking changes**

- Update supported Python versions from 3.9 to 3.12.
- Removed support for the `textblob-de` extension. See [#25](https://github.com/SamEdwardes/spacytextblob/issues/25) for more details.
- Removed support for accessing `._.polarity`, `._.sentiment`, `._.subjectivity`, and `._.assessments`. Now, only the `._.blob` attribute is exposed. All other textblob attributes should be access through it. For example: `._.blob.polarity`, `._.blob.sentiment`, `._.blob.subjectivity`, and `._.blob.sentiment_assessments.assessments`. This simplifies the code base and makes it easier to maintain. Lastly, this means that the config option `{"blob_only": bool}` was removed.

**Other changes**

- Use `uv` instead of `poetry`.

## 4.0.0 (2022-02-19)

- New custom attribute `doc._.blob`, `span._.blob`, `token._.blob`.
Expand Down
7 changes: 4 additions & 3 deletions docs/static/reference_code/spacytextblob_example.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

nlp = spacy.load('en_core_web_sm')
from spacytextblob.spacytextblob import SpacyTextBlob # noqa: F401

nlp = spacy.load("en_core_web_sm")
text = "I had a really horrible day. It was the worst day ever! But every now and then I have a really good day that makes me happy."
nlp.add_pipe("spacytextblob")
doc = nlp(text)
Expand All @@ -16,4 +17,4 @@
# [(['really', 'horrible'], -1.0, 1.0, None), (['worst', '!'], -1.0, 1.0, None), (['really', 'good'], 0.7, 0.6000000000000001, None), (['happy'], 0.8, 1.0, None)]

print(doc._.blob.ngrams())
# [WordList(['I', 'had', 'a']), WordList(['had', 'a', 'really']), WordList(['a', 'really', 'horrible']), WordList(['really', 'horrible', 'day']), WordList(['horrible', 'day', 'It']), WordList(['day', 'It', 'was']), WordList(['It', 'was', 'the']), WordList(['was', 'the', 'worst']), WordList(['the', 'worst', 'day']), WordList(['worst', 'day', 'ever']), WordList(['day', 'ever', 'But']), WordList(['ever', 'But', 'every']), WordList(['But', 'every', 'now']), WordList(['every', 'now', 'and']), WordList(['now', 'and', 'then']), WordList(['and', 'then', 'I']), WordList(['then', 'I', 'have']), WordList(['I', 'have', 'a']), WordList(['have', 'a', 'really']), WordList(['a', 'really', 'good']), WordList(['really', 'good', 'day']), WordList(['good', 'day', 'that']), WordList(['day', 'that', 'makes']), WordList(['that', 'makes', 'me']), WordList(['makes', 'me', 'happy'])]
# [WordList(['I', 'had', 'a']), WordList(['had', 'a', 'really']), WordList(['a', 'really', 'horrible']), WordList(['really', 'horrible', 'day']), WordList(['horrible', 'day', 'It']), WordList(['day', 'It', 'was']), WordList(['It', 'was', 'the']), WordList(['was', 'the', 'worst']), WordList(['the', 'worst', 'day']), WordList(['worst', 'day', 'ever']), WordList(['day', 'ever', 'But']), WordList(['ever', 'But', 'every']), WordList(['But', 'every', 'now']), WordList(['every', 'now', 'and']), WordList(['now', 'and', 'then']), WordList(['and', 'then', 'I']), WordList(['then', 'I', 'have']), WordList(['I', 'have', 'a']), WordList(['have', 'a', 'really']), WordList(['a', 'really', 'good']), WordList(['really', 'good', 'day']), WordList(['good', 'day', 'that']), WordList(['day', 'that', 'makes']), WordList(['that', 'makes', 'me']), WordList(['makes', 'me', 'happy'])]
17 changes: 8 additions & 9 deletions docs/static/reference_code/textblob_de_example.py
Original file line number Diff line number Diff line change
@@ -1,22 +1,21 @@
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob
from textblob_de import TextBlobDE

from spacytextblob.spacytextblob import SpacyTextBlob # noqa: F401

text = '''
Heute ist der 3. Mai 2014 und Dr. Meier feiert seinen 43. Geburtstag. Ich muss
unbedingt daran denken, Mehl, usw. für einen Kuchen einzukaufen. Aber leider
text = """
Heute ist der 3. Mai 2014 und Dr. Meier feiert seinen 43. Geburtstag. Ich muss
unbedingt daran denken, Mehl, usw. für einen Kuchen einzukaufen. Aber leider
habe ich nur noch EUR 3.50 in meiner Brieftasche.
'''
"""


@spacy.registry.misc("spacytextblob.de_blob")
def create_de_blob():
return TextBlobDE

config = {
"blob_only": True,
"custom_blob": {"@misc": "spacytextblob.de_blob"}
}

config = {"custom_blob": {"@misc": "spacytextblob.de_blob"}}

nlp = spacy.load("de_core_news_sm")
nlp.add_pipe("spacytextblob", config=config)
Expand Down
2 changes: 1 addition & 1 deletion docs/static/reference_code/textblob_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,4 @@
# [(['really', 'horrible'], -1.0, 1.0, None), (['worst', '!'], -1.0, 1.0, None), (['really', 'good'], 0.7, 0.6000000000000001, None), (['happy'], 0.8, 1.0, None)]

print(blob.ngrams())
# [WordList(['I', 'had', 'a']), WordList(['had', 'a', 'really']), WordList(['a', 'really', 'horrible']), WordList(['really', 'horrible', 'day']), WordList(['horrible', 'day', 'It']), WordList(['day', 'It', 'was']), WordList(['It', 'was', 'the']), WordList(['was', 'the', 'worst']), WordList(['the', 'worst', 'day']), WordList(['worst', 'day', 'ever']), WordList(['day', 'ever', 'But']), WordList(['ever', 'But', 'every']), WordList(['But', 'every', 'now']), WordList(['every', 'now', 'and']), WordList(['now', 'and', 'then']), WordList(['and', 'then', 'I']), WordList(['then', 'I', 'have']), WordList(['I', 'have', 'a']), WordList(['have', 'a', 'really']), WordList(['a', 'really', 'good']), WordList(['really', 'good', 'day']), WordList(['good', 'day', 'that']), WordList(['day', 'that', 'makes']), WordList(['that', 'makes', 'me']), WordList(['makes', 'me', 'happy'])]
# [WordList(['I', 'had', 'a']), WordList(['had', 'a', 'really']), WordList(['a', 'really', 'horrible']), WordList(['really', 'horrible', 'day']), WordList(['horrible', 'day', 'It']), WordList(['day', 'It', 'was']), WordList(['It', 'was', 'the']), WordList(['was', 'the', 'worst']), WordList(['the', 'worst', 'day']), WordList(['worst', 'day', 'ever']), WordList(['day', 'ever', 'But']), WordList(['ever', 'But', 'every']), WordList(['But', 'every', 'now']), WordList(['every', 'now', 'and']), WordList(['now', 'and', 'then']), WordList(['and', 'then', 'I']), WordList(['then', 'I', 'have']), WordList(['I', 'have', 'a']), WordList(['have', 'a', 'really']), WordList(['a', 'really', 'good']), WordList(['really', 'good', 'day']), WordList(['good', 'day', 'that']), WordList(['day', 'that', 'makes']), WordList(['that', 'makes', 'me']), WordList(['makes', 'me', 'happy'])]
16 changes: 8 additions & 8 deletions docs/static/reference_code/textblob_fr_example.py
Original file line number Diff line number Diff line change
@@ -1,26 +1,26 @@
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob
from textblob import Blobber
from textblob_fr import PatternTagger, PatternAnalyzer
from textblob_fr import PatternAnalyzer, PatternTagger

from spacytextblob.spacytextblob import SpacyTextBlob # noqa: F401


@spacy.registry.misc("spacytextblob.fr_blob")
def create_fr_blob():
tb = Blobber(pos_tagger=PatternTagger(), analyzer=PatternAnalyzer())
return tb

config = {
"blob_only": True,
"custom_blob": {"@misc": "spacytextblob.fr_blob"}
}

config = {"custom_blob": {"@misc": "spacytextblob.fr_blob"}}

nlp_fr = spacy.load("fr_core_news_sm")
nlp_fr.add_pipe("spacytextblob", config=config)
text = u"Quelle belle matinée"
text = "Quelle belle matinée"
doc = nlp_fr(text)

print(doc)
# Quelle belle matinée
print(doc._.blob)
# Quelle belle matinée
print(doc._.blob.sentiment)
# Quelle belle matinée
# (0.8, 0.8)
76 changes: 27 additions & 49 deletions docs/tutorial/textblob_extensions.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,76 +9,49 @@ TextBlob supports adding custom models and new languages through “extensions
```python linenums="1"
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob
from textblob_de import TextBlobDE # (1)
from textblob import Blobber
from textblob_fr import PatternTagger, PatternAnalyzer # (1)

text = u"Quelle belle matinée"

text = '''
Heute ist der 3. Mai 2014 und Dr. Meier feiert seinen 43. Geburtstag. Ich muss
unbedingt daran denken, Mehl, usw. für einen Kuchen einzukaufen. Aber leider
habe ich nur noch EUR 3.50 in meiner Brieftasche.
'''
@spacy.registry.misc("spacytextblob.fr_blob") # (2)
def create_fr_blob():
tb = Blobber(pos_tagger=PatternTagger(), analyzer=PatternAnalyzer())
return tb # (3)

@spacy.registry.misc("spacytextblob.de_blob") # (2)
def create_de_blob():
return TextBlobDE # (3)
nlp_fr = spacy.load("fr_core_news_sm")


nlp = spacy.load("de_core_news_sm")

nlp.add_pipe(
"spacytextblob", # (4)
nlp_fr.add_pipe(
"spacytextblob", # (4)
config={ # (5)
"blob_only": True, # (6)
"custom_blob": {"@misc": "spacytextblob.de_blob"} # (7)
"custom_blob": {"@misc": "spacytextblob.fr_blob"} # (6)
}
)
doc = nlp(text)

print(doc._.blob.sentences)
# [Sentence("Heute ist der 3. Mai 2014 und Dr. Meier feiert seinen 43. Geburtstag."), Sentence("Ich muss unbedingt daran denken, Mehl, usw. für einen Kuchen einzukaufen."), Sentence("Aber leider habe ich nur noch EUR 3.50 in meiner Brieftasche.")]
doc = nlp_fr(text)

print(doc)
# Quelle belle matinée
print(doc._.blob)
# Quelle belle matinée
print(doc._.blob.sentiment)
# Sentiment(polarity=0.0, subjectivity=0.0)

print(doc._.blob.tags)
# [('Heute', 'RB'), ('ist', 'VB'), ('der', 'DT'), ('3.', 'LS'), ('Mai', 'NN'), ('2014', 'CD'), ('und', 'CC'), ('Dr.', 'NN'), ('Meier', 'NN'), ('feiert', 'NN'), ('seinen', 'PRP$'), ('43.', 'CD'), ('Geburtstag', 'NN'), ('Ich', 'PRP'), ('muss', 'VB'), ('unbedingt', 'RB'), ('daran', 'RB'), ('denken', 'VB'), ('Mehl', 'NN'), ('usw.', 'IN'), ('für', 'IN'), ('einen', 'DT'), ('Kuchen', 'JJ'), ('einzukaufen', 'NN'), ('Aber', 'CC'), ('leider', 'VBN'), ('habe', 'VB'), ('ich', 'PRP'), ('nur', 'RB'), ('noch', 'IN'), ('EUR', 'NN'), ('3.50', 'CD'), ('in', 'IN'), ('meiner', 'JJ'), ('Brieftasche', 'NN')]
# (0.8, 0.8)
```

1. Load the TextBlob extension package.
2. For a function to be used inside the NLP pipeline you must register the function with spacy using `@spacy.registry.misc()`. You can name the function what ever you like. For the example I have registered the function with the name `"spacytextblob.de_blob"`.
3. *spacytextblob* is able to support TextBlob extensions by replacing the default `textblob.TextBlob` with an alternative. In the case of the [textblob-de](https://github.com/markuskiller/textblob-de) extension they provide an alternative blob that you can import (`from textblob_de import TextBlobDE`).
2. For a function to be used inside the NLP pipeline you must register the function with spacy using `@spacy.registry.misc()`. You can name the function what ever you like. For the example I have registered the function with the name `"spacytextblob.fr_blob"`.
3. *spacytextblob* is able to support TextBlob extensions by replacing the default `textblob.TextBlob` with an alternative.
4. Add *spacytextblob* to your spaCy pipeline as you normally would.
5. The `config` parameter allows you to pass additional configuration options to the *spacytextblob* pipeline.
6. When using a TextBlob extension you should always set `"blob_only": True`. The extension may modify the textblob.TextBlob object. By setting `"blob_only": True` *spacytextblob* will only expose `._.blob` and not attempt to expose `._.polarity`, `._.subjectivity`, or `._.assessments`.
7. The `"custom_blob"` key should be assigned to a dictionary that tells spaCy what function to replace `textblob.TextBlob` with. In this case, we want to replace it with `TextBlobDE`. The key of the dictionary is `"@misc"`. This tells spaCy to look into the misc section of the spaCy register. The value should be the string name of the function that we registered above in line 12.
6. The `"custom_blob"` key should be assigned to a dictionary that tells spaCy what function to replace `textblob.TextBlob` with. In this case, we want to replace it with `TextBlobDE`. The key of the dictionary is `"@misc"`. This tells spaCy to look into the misc section of the spaCy register. The value should be the string name of the function that we registered above in line 12.

## Extensions

The following extensions have been tested and are supported. Other extensions may work, but have not been tested.

### textblob-de

textblob-de is a TextBlob extensions that enables German language support for TextBlob by Steven Loria ([https://github.com/markuskiller/textblob-de](https://github.com/markuskiller/textblob-de)).

```bash
pip install textblob-de
```

To use it with *spacytextblob* First install a spaCy model that supports German ([https://spacy.io/models/de](https://spacy.io/models/de)):

```bash
python -m spacy download de_core_news_sm
```

The code below demonstrates how you can then use and access textblob-de within *spacytextblob*.

```python linenums="1"
{! docs/static/reference_code/textblob_de_example.py !}
```

### textblob-fr

textblob-fr is a TextBlob extension that enables French language support for TextBlob ([https://github.com/sloria/textblob-fr](https://github.com/sloria/textblob-fr)).
textblob-fr is a TextBlob extension that enables French language support for TextBlob ([https://github.com/sloria/textblob-fr](https://github.com/sloria/textblob-fr)).

```bash
pip install textblob-fr
Expand All @@ -96,9 +69,14 @@ The code below demonstrates how you can then use and access textblob-fr within *
{! docs/static/reference_code/textblob_fr_example.py !}
```

### textblob-de

!!! warning

textblob-de is **not** supported. As of spacytextblob 4.1.0. The textblob-de library depends on a Google Translate feature that no longer works. More details can be found in this issue [https://github.com/markuskiller/textblob-de/issues/24](https://github.com/markuskiller/textblob-de/issues/24).

### textblob-aptagger

!!! warning

textblob-aptagger is **not** supported. As of TextBlob 0.11.0, TextBlob uses NLTK's averaged perceptron tagger by default. This package is no longer necessary ([https://github.com/sloria/textblob-aptagger](https://github.com/sloria/textblob-aptagger)).

Loading
Loading